
Train Cell Type-Specific NMF Models for Recovering Spatial EcoTypes
Source:R/NMFGenerateW.R
      NMFGenerateWList.RdThis function trains cell type-specific NMF (Non-Negative Matrix Factorization) models to recover SE-specific cell states from single-cell data, as part of the Spatial EcoTyper analysis workflow. It downsamples cells for training when the dataset size is large, and selects a subset of features with the highest specificity.
Usage
NMFGenerateWList(
  scdata,
  scmeta,
  CellType = "CellType",
  SE = "SE",
  scale = TRUE,
  Sample = NULL,
  balance.sample = TRUE,
  nfeature = 2000,
  nfeature.per.se = 50,
  min.cells = 20,
  downsample = 2500,
  ncores = 1,
  seed = 2024
)Arguments
- scdata
- Numeric matrix containing single-cell expression data. 
- scmeta
- Data frame containing metadata information associated with single-cell data, including cell types and spatial clusters. 
- CellType
- Character string specifying the column name in the metadata data frame containing cell type annotations. 
- SE
- Character string specifying the column name in the metadata data frame containing spatial ecotype annotations. 
- scale
- Boolean specifying whether to perform univariance normalization before training the models (default: TRUE). 
- Sample
- Character string specifying the column name in the metadata data frame containing sample annotations. If specified, the univariance normalization will be performed within each sample. 
- balance.sample
- Boolean specifying whether to perform balance the cells from all samples before training the models (default: TRUE). 
- nfeature
- Integer specifying the top variable features for training the models (default: 2000). 
- nfeature.per.se
- Integer specifying the maximal number of features to select for each SE (default: 50). 
- min.cells
- Integer specifying the minimal number of cells required for each SE cell state. 
- downsample
- Integer specifying the number of cells per cell type (downsampling) for training the NMF models (default: 2500). 
- ncores
- Integer specifying the number of CPU cores to use for parallel processing. 
- seed
- Integer specifying the seed for random sampling during downsampling.