IsoMEX - PacBio 10x data from IsoSeq into MEX format

I thought I’d share a GitHub repo I made for processing 10x data with PacBio. It is called IsoMex, and it takes the outputs from the IsoSeq/Pigeon PacBio pipeline and converts them into .mex files compatible with CellRanger. We encountered an issue with duplicate gene IDs that prevented analyzing more than one sample at a time, and this is a workaround I came up with.

Is anyone else working with 10x data with long read technology (PacBio or Oxford Nanopore)? It’s still fairly new and we’ve encountered some issues with the data analysis for both - happy to chat with anyone in a similar space.

4 Likes

Really cool that you were able to develop a custom workaround! Is rendering IsoMex’s output (.mex files) compatible with CellRanger done to keep the pipeline within the 10x ecosystem or are .mex files the standard file format for these data? Really my question is are you thinking of expanding output options later down the road?

Thanks for sharing your work, @ehutchins ! Tagging some folks interested in transcriptomics to see if there are similar pain points / opportunities to check out IsoMex! @mattia.volta @zarshad @rochet071369 @marekpiatek @amclean @ara8 @juanbot

Thanks @gginnan ! The .mex files are the standard output format from CellRanger, so that they can be loaded downstream by other tools such as scanpy and seurat for analysis.

Did you have any specific output options in mind, @gginnan ?

Thanks, @ehutchins! I didn’t have any specific output options in mind, no, but I was just wondering (as a non-expert in gene expression matrices) if there were particular advantages to .mex format (e.g., maximim compatibility with other tools/softwares/packages) or if there were alternatives.

Thanks for the clarification :slight_smile:

Here’s the MEX format description. My TL;DR: It’s used with 10x/single cell data because it saves space by having one file as an index for the other two files - that way you don’t have a much larger input file with a lot of 0s/NAs. There are 3 files:

  1. matrix.mtx : index file
  2. genes.tsv: all annotated genes
  3. barcodes.tsv: all barcodes