IsoMEX - PacBio 10x data from IsoSeq into MEX format

ehutchins · May 5, 2025, 9:23pm

I thought I’d share a GitHub repo I made for processing 10x data with PacBio. It is called IsoMex, and it takes the outputs from the IsoSeq/Pigeon PacBio pipeline and converts them into .mex files compatible with CellRanger. We encountered an issue with duplicate gene IDs that prevented analyzing more than one sample at a time, and this is a workaround I came up with.

Is anyone else working with 10x data with long read technology (PacBio or Oxford Nanopore)? It’s still fairly new and we’ve encountered some issues with the data analysis for both - happy to chat with anyone in a similar space.

gginnan · May 9, 2025, 6:00pm

Really cool that you were able to develop a custom workaround! Is rendering IsoMex’s output (.mex files) compatible with CellRanger done to keep the pipeline within the 10x ecosystem or are .mex files the standard file format for these data? Really my question is are you thinking of expanding output options later down the road?

Thanks for sharing your work, @ehutchins ! Tagging some folks interested in transcriptomics to see if there are similar pain points / opportunities to check out IsoMex! @mattia.volta @zarshad @rochet071369 @marekpiatek @amclean @ara8 @juanbot

ehutchins · June 3, 2025, 4:07pm

Thanks @gginnan ! The .mex files are the standard output format from CellRanger, so that they can be loaded downstream by other tools such as scanpy and seurat for analysis.

Did you have any specific output options in mind, @gginnan ?

gginnan · June 3, 2025, 4:46pm

Thanks, @ehutchins! I didn’t have any specific output options in mind, no, but I was just wondering (as a non-expert in gene expression matrices) if there were particular advantages to .mex format (e.g., maximim compatibility with other tools/softwares/packages) or if there were alternatives.

Thanks for the clarification

ehutchins · June 5, 2025, 10:37pm

Here’s the MEX format description. My TL;DR: It’s used with 10x/single cell data because it saves space by having one file as an index for the other two files - that way you don’t have a much larger input file with a lot of 0s/NAs. There are 3 files:

matrix.mtx : index file
genes.tsv: all annotated genes
barcodes.tsv: all barcodes

Topic		Replies	Views
Tips on how to convert genetic data from BGEN to another format? Analyzing and Reusing Data data-format , data-conversion , bgen	3	96	February 13, 2025
Bgen files QC - sex check Analyzing and Reusing Data genetic-data , how-to , data-quality	1	98	July 18, 2024
How to obtain transcriptomics data from the PDBP? Accessing and Understanding Data how-to , data-access , pdbp , biomarker	2	80	October 23, 2023
Anyone worked with ExpansionHunter for repeat expansions in Parkinsonism? Analyzing and Reusing Data genetic-data , how-to , data-access , data-analysis	2	51	April 8, 2025
What tags would be helpful to find and organize conversations? Ideas and Inspiration meta , how-to , community-guidelines	14	182	April 1, 2024

IsoMEX - PacBio 10x data from IsoSeq into MEX format

Related topics