Train Cell Type-Specific NMF Models for Recovering Spatial EcoTypes
Source:R/NMFGenerateW.R
NMFGenerateWList.Rd
This function trains cell type-specific NMF (Non-Negative Matrix Factorization) models to recover SE-specific cell states from single-cell data, as part of the Spatial EcoTyper analysis workflow. It downsamples cells for training when the dataset size is large, and selects a subset of features with the highest specificity.
Usage
NMFGenerateWList(
scdata,
scmeta,
CellType = "CellType",
SE = "SE",
scale = TRUE,
Sample = NULL,
balance.sample = TRUE,
nfeature = 2000,
nfeature.per.se = 50,
min.cells = 20,
downsample = 2500,
ncores = 1,
seed = 2024
)
Arguments
- scdata
Numeric matrix containing single-cell expression data.
- scmeta
Data frame containing metadata information associated with single-cell data, including cell types and spatial clusters.
- CellType
Character string specifying the column name in the metadata data frame containing cell type annotations.
- SE
Character string specifying the column name in the metadata data frame containing spatial ecotype annotations.
- scale
Boolean specifying whether to perform univariance normalization before training the models (default: TRUE).
- Sample
Character string specifying the column name in the metadata data frame containing sample annotations. If specified, the univariance normalization will be performed within each sample.
- balance.sample
Boolean specifying whether to perform balance the cells from all samples before training the models (default: TRUE).
- nfeature
Integer specifying the top variable features for training the models (default: 2000).
- nfeature.per.se
Integer specifying the maximal number of features to select for each SE (default: 50).
- min.cells
Integer specifying the minimal number of cells required for each SE cell state.
- downsample
Integer specifying the number of cells per cell type (downsampling) for training the NMF models (default: 2500).
- ncores
Integer specifying the number of CPU cores to use for parallel processing.
- seed
Integer specifying the seed for random sampling during downsampling.