๐Ÿš€ Introducing CellMapper: Lightning-Fast Cell Mapping Across Datasets

Transfer labels, embeddings, and expression values between datasets in seconds! โšก

Hey everyone! ๐Ÿ‘‹ Bridging the gap between different single-cell datasets has always been challenging. Today I’m excited to unveil CellMapper, a high-performance tool that makes this a bit easier through optimized k-NN transfer. Whether you’re mapping cell types from dissociated to spatial data, transferring embeddings between datasets, or identifying cellular niches, CellMapper makes these complex tasks both simple and blazingly fast. All you need it a joint embedding for your data, which you can get with methods like scVI, scArches, GLUE, scANVI, ENVI, MIDAS and many more, depending on the type of mapping problem.

What’s CellMapper? ๐Ÿค”

CellMapper is a k-NN-based tool that lets you map cells across different representations to transfer:

  • ๐Ÿท๏ธ Cell type labels
  • ๐Ÿ“Š Embeddings
  • ๐Ÿงฌ Expression values

Performance Optimized for Scale โšก

CellMapper achieves its efficiency through:

  • Accelerated neighborhood search using faiss or RAPIDS on GPU
  • Sparse matrix multiplications for memory-efficient data transfer
  • Modular interface that separates neighborhood calculation from data transfer

This architecture enables CellMapper to handle 1.5 million cells in about 30 seconds on a single RTX 4090 with 60 GB of CPU memoryโ€”making it practical for working with modern large-scale datasets.

Cool Things You Can Do With CellMapper ๐Ÿ”

  • ๐Ÿ”„ Transfer cell type labels from dissociated to spatial datasets
  • ๐Ÿ’ซ Map embeddings between query and reference datasets
  • ๐Ÿ“ Calculate presence scores for your cells in reference atlases
  • ๐Ÿ™๏ธ Identify cellular niches in spatial data
  • ๐Ÿ“ˆ Evaluate your transfers with built-in metrics

The Math Behind It ๐Ÿงฎ

CellMapper is built on a straightforward but powerful approach: k-nearest neighbor (k-NN) graphs with kernels applied to create mapping matrices. Here’s how it works:

  1. For each query cell, find its k nearest neighbors in the reference dataset
  2. Apply a kernel function to turn these neighbor relationships into weights
  3. Use these weights to transfer information from reference to query cells

The method is expressed by the formula:

$$Y_{\text{query}} = M \cdot Y_{\text{reference}}$$

Where $M$ is our mapping matrix derived from the k-NN graph, and $Y_{\text{reference}}$ can represent:

  • Categorical data from .obs (automatically one-hot encoded)
  • Dense arrays from .obsm (like UMAP or PCA embeddings)
  • Sparse matrices from .X or layers (e.g. gene expression data)

This approach is highly flexible and can be applied to virtually any type of data!

Getting Started in 3 Lines of Code ๐Ÿ’ป

from cellmapper import CellMapper

cmap = CellMapper(query, reference).fit(
    use_rep="X_joint", obs_keys="celltype", obsm_keys="X_umap", layer_key="X"
)

That’s it! This will transfer cell types, UMAP embeddings, and expression values from your reference to your query dataset.

Standing on the Shoulders of Giants ๐Ÿ‘จโ€๐Ÿ”ฌ

The k-NN transfer approach isn’t novel โ€” it’s a common technique used throughout the field. Among others, CellMapper is heavily inspired by:

What makes CellMapper different is its focus on efficiency, flexibility, and ease of use. It separates the method (k-NN graph with kernels) from the application (mapping across representations), allowing for greater versatility and performance optimization.

Why I Built This ๐Ÿ’ญ

Working with datasets across different modalities or platforms presents specific computational challenges โ€” especially when transferring information between them at scale. Existing tools often struggled with either performance on large datasets or flexibility across data types. CellMapper addresses these challenges by:

  1. Leveraging high-performance computing libraries for nearest neighbor search
  2. Providing flexibility across data types and modalities
  3. Offering a clean API with intuitive defaults
  4. Integrating tightly with AnnData objects

For more details, tutorials, and examples, check out the documentation or dive into the GitHub repo!

Happy mapping! ๐Ÿ—บ๏ธ

Edit this page

Marius Lange
Marius Lange
Postdoctoral researcher

Interested in single-cell genomics and machine learning.

Related