๐ Introducing CellMapper: Lightning-Fast Cell Mapping Across Datasets
Transfer labels, embeddings, and expression values between datasets in seconds! โก
Hey everyone! ๐ Bridging the gap between different single-cell datasets has always been challenging. Today I’m excited to unveil CellMapper, a high-performance tool that makes this a bit easier through optimized k-NN transfer. Whether you’re mapping cell types from dissociated to spatial data, transferring embeddings between datasets, or identifying cellular niches, CellMapper makes these complex tasks both simple and blazingly fast. All you need it a joint embedding for your data, which you can get with methods like scVI, scArches, GLUE, scANVI, ENVI, MIDAS and many more, depending on the type of mapping problem.
What’s CellMapper? ๐ค
CellMapper is a k-NN-based tool that lets you map cells across different representations to transfer:
- ๐ท๏ธ Cell type labels
- ๐ Embeddings
- ๐งฌ Expression values
Performance Optimized for Scale โก
CellMapper achieves its efficiency through:
- Accelerated neighborhood search using faiss or RAPIDS on GPU
- Sparse matrix multiplications for memory-efficient data transfer
- Modular interface that separates neighborhood calculation from data transfer
This architecture enables CellMapper to handle 1.5 million cells in about 30 seconds on a single RTX 4090 with 60 GB of CPU memoryโmaking it practical for working with modern large-scale datasets.
Cool Things You Can Do With CellMapper ๐
- ๐ Transfer cell type labels from dissociated to spatial datasets
- ๐ซ Map embeddings between query and reference datasets
- ๐ Calculate presence scores for your cells in reference atlases
- ๐๏ธ Identify cellular niches in spatial data
- ๐ Evaluate your transfers with built-in metrics
The Math Behind It ๐งฎ
CellMapper is built on a straightforward but powerful approach: k-nearest neighbor (k-NN) graphs with kernels applied to create mapping matrices. Here’s how it works:
- For each query cell, find its k nearest neighbors in the reference dataset
- Apply a kernel function to turn these neighbor relationships into weights
- Use these weights to transfer information from reference to query cells
The method is expressed by the formula:
$$Y_{\text{query}} = M \cdot Y_{\text{reference}}$$
Where $M$ is our mapping matrix derived from the k-NN graph, and $Y_{\text{reference}}$ can represent:
- Categorical data from
.obs
(automatically one-hot encoded) - Dense arrays from
.obsm
(like UMAP or PCA embeddings) - Sparse matrices from
.X
or layers (e.g. gene expression data)
This approach is highly flexible and can be applied to virtually any type of data!
Getting Started in 3 Lines of Code ๐ป
from cellmapper import CellMapper
cmap = CellMapper(query, reference).fit(
use_rep="X_joint", obs_keys="celltype", obsm_keys="X_umap", layer_key="X"
)
That’s it! This will transfer cell types, UMAP embeddings, and expression values from your reference to your query dataset.
Standing on the Shoulders of Giants ๐จโ๐ฌ
The k-NN transfer approach isn’t novel โ it’s a common technique used throughout the field. Among others, CellMapper is heavily inspired by:
- Scanpy’s ingest function
- The HNOCA-tools package
What makes CellMapper different is its focus on efficiency, flexibility, and ease of use. It separates the method (k-NN graph with kernels) from the application (mapping across representations), allowing for greater versatility and performance optimization.
Why I Built This ๐ญ
Working with datasets across different modalities or platforms presents specific computational challenges โ especially when transferring information between them at scale. Existing tools often struggled with either performance on large datasets or flexibility across data types. CellMapper addresses these challenges by:
- Leveraging high-performance computing libraries for nearest neighbor search
- Providing flexibility across data types and modalities
- Offering a clean API with intuitive defaults
- Integrating tightly with AnnData objects
For more details, tutorials, and examples, check out the documentation or dive into the GitHub repo!
Happy mapping! ๐บ๏ธ