CellProgramMapper

📖 Documentation: https://zaoqu-liu.github.io/CellProgramMapper/

Overview

CellProgramMapper is a high-performance R package for projecting single-cell transcriptomic data onto reference gene expression programs (GEPs). The package implements non-negative matrix factorization (NMF)-based methods for systematic characterization of cellular transcriptional states.

Methodology

Mathematical Framework

Given a query expression matrix X ∈ ℝ^n×p (n cells × p genes) and a reference spectra matrix H ∈ ℝ^k×p (k programs × p genes), CellProgramMapper estimates the usage matrix W ∈ ℝ^n×k by solving:

$$\min_{W \geq 0} |X - WH|_F^2$$

For each cell i, this decomposes into independent Non-Negative Least Squares (NNLS) subproblems:

$$\min_{w_i \geq 0} |x_i - H^\top w_i|_2^2$$

Implementation

Two NNLS solvers are provided:

Method	Algorithm	Reference
Coordinate Descent	Sequential coordinate-wise optimization	Franc et al. (2005)
Active Set	Lawson-Hanson algorithm	Lawson & Hanson (1974)

The coordinate descent method is generally faster for typical problem sizes, while the active set method provides guaranteed finite convergence.

Preprocessing

Input data undergoes standardization by scaling each gene by its population standard deviation (without centering):

$$x'_j = \frac{x_j}{\sigma_j}, \quad \sigma_j = \sqrt{\frac{1}{n}\sum_{i=1}^n (x_{ij} - \bar{x}_j)^2}$$

This matches the preprocessing in sklearn.preprocessing.scale(X, with_mean=False).

Key Features

CellProgramMapper provides a complete pure R solution for NMF-based cell annotation:

Feature	Implementation	Details
Core NMF usage fitting	C++ NNLS	Coordinate descent and active set algorithms
Data preprocessing	C++ implementation	Population standard deviation scaling
Score computation	R matrix ops	Weighted sum and threshold-based scores
Cell type prediction	Built-in ML models	Multinomial logistic regression
Reference management	curl download	Auto-caching with version control
NPZ file reading	Native R + reticulate	Full NumPy format support
Consensus reference building	R + C++	Multi-dataset GEP clustering

Built-in Machine Learning Models: The package includes pre-trained model parameters for cell type prediction, enabling accurate predictions without external dependencies.

# List available built-in models
list_builtin_models()

# Get T-cell lineage predictions (matches Python exactly)
labels <- predict_lineage(usage_norm, "TCAT.V1")

# Get probability distribution for each class
probs <- get_lineage_probabilities(usage_norm, "TCAT.V1")

Installation

From R-universe (Recommended)

install.packages("CellProgramMapper", 
                 repos = "https://zaoqu-liu.r-universe.dev")

From GitHub

# install.packages("remotes")
remotes::install_github("Zaoqu-Liu/CellProgramMapper")

Dependencies

Required:

R (≥ 4.0.0)
Rcpp, RcppArmadillo
Matrix, data.table
curl, yaml, rappdirs
future, future.apply

Optional:

Seurat/SeuratObject (for Seurat integration)
hdf5r, anndata (for h5ad file support)
reticulate (for reading NPZ files with object arrays)

Quick Start

library(CellProgramMapper)

# Map cells to reference gene expression programs
result <- CellProgramMapper(
    query = seurat_obj,        # Seurat object, matrix, or file path
    reference = "TCAT.V1",     # Pre-built reference or custom file
    method = "cd",             # "cd" (coordinate descent) or "active_set"
    verbose = TRUE
)

# Access results
usage_matrix <- result$usage_norm   # Normalized usage (rows sum to 1)
scores <- result$scores             # Computed add-on scores

# Integration with Seurat
seurat_obj <- add_results_to_seurat(seurat_obj, result)

Available References

# List pre-built references
available_references()

Building Custom References

Construct consensus GEPs from multiple cNMF analyses:

consensus <- BuildConsensusReference(
    cnmf_paths = c("path/to/cnmf1", "path/to/cnmf2"),
    ks = c(10, 15),
    density_thresholds = c(0.1, 0.1),
    output_dir = "./consensus_output",
    corr_thresh = 0.5
)

Performance

CellProgramMapper is optimized for computational efficiency:

C++ Backend: Core NNLS solvers implemented in C++ via RcppArmadillo
Sparse Matrix Support: Native handling of sparse matrices
Parallel Processing: Optional parallelization via future framework
Batch Processing: Memory-efficient processing of large datasets

Output Structure

The CellProgramMapper() function returns a CellProgramMapperResult object containing:

Field	Description
`usage`	Raw usage matrix (cells × programs)
`usage_norm`	Normalized usage matrix (rows sum to 1)
`scores`	Computed add-on scores
`overlap_genes`	Genes used for mapping
`n_cells`	Number of cells processed
`n_programs`	Number of programs

Documentation

Detailed documentation and tutorials are available at:

References

Lawson CL, Hanson RJ (1974). Solving Least Squares Problems. Prentice-Hall.
Franc V, Hlavac V, Navara M (2005). Sequential Coordinate-Wise Algorithm for the Non-negative Least Squares Problem. CAIP 2005.
Lee DD, Seung HS (1999). Learning the parts of objects by non-negative matrix factorization. Nature 401:788-791.

Citation

If you use CellProgramMapper in your research, please cite:

@software{CellProgramMapper,
  author = {Liu, Zaoqu},
  title = {CellProgramMapper: Projection of Single-Cell Data onto Reference Gene Expression Programs},
  year = {2026},
  url = {https://github.com/Zaoqu-Liu/CellProgramMapper}
}

License

Contact

Author: Zaoqu Liu
Email: liuzaoqu@163.com
GitHub: https://github.com/Zaoqu-Liu/CellProgramMapper

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.github/workflows		.github/workflows
R		R
docs		docs
inst		inst
man		man
src		src
tests		tests
vignettes		vignettes
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
LICENSE.md		LICENSE.md
NAMESPACE		NAMESPACE
README.md		README.md
_pkgdown.yml		_pkgdown.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CellProgramMapper

Overview

Methodology

Mathematical Framework

Implementation

Preprocessing

Key Features

Installation

From R-universe (Recommended)

From GitHub

Dependencies

Quick Start

Available References

Building Custom References

Performance

Output Structure

Documentation

References

Citation

License

Contact

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CellProgramMapper

Overview

Methodology

Mathematical Framework

Implementation

Preprocessing

Key Features

Installation

From R-universe (Recommended)

From GitHub

Dependencies

Quick Start

Available References

Building Custom References

Performance

Output Structure

Documentation

References

Citation

License

Contact

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages