Skip to contents

This function trains cell type-specific NMF (Non-Negative Matrix Factorization) models to recover SE-specific cell states from single-cell data, as part of the Spatial EcoTyper analysis workflow. It downsamples cells for training when the dataset size is large, and selects a subset of features with the highest specificity.

Usage

NMFGenerateWList(
  scdata,
  scmeta,
  CellType = "CellType",
  SE = "SE",
  scale = TRUE,
  Sample = NULL,
  balance.sample = TRUE,
  nfeature = 2000,
  nfeature.per.se = 50,
  min.cells = 20,
  downsample = 2500,
  ncores = 1,
  seed = 2024
)

Arguments

scdata

Numeric matrix containing single-cell expression data.

scmeta

Data frame containing metadata information associated with single-cell data, including cell types and spatial clusters.

CellType

Character string specifying the column name in the metadata data frame containing cell type annotations.

SE

Character string specifying the column name in the metadata data frame containing spatial ecotype annotations.

scale

Boolean specifying whether to perform univariance normalization before training the models (default: TRUE).

Sample

Character string specifying the column name in the metadata data frame containing sample annotations. If specified, the univariance normalization will be performed within each sample.

balance.sample

Boolean specifying whether to perform balance the cells from all samples before training the models (default: TRUE).

nfeature

Integer specifying the top variable features for training the models (default: 2000).

nfeature.per.se

Integer specifying the maximal number of features to select for each SE (default: 50).

min.cells

Integer specifying the minimal number of cells required for each SE cell state.

downsample

Integer specifying the number of cells per cell type (downsampling) for training the NMF models (default: 2500).

ncores

Integer specifying the number of CPU cores to use for parallel processing.

seed

Integer specifying the seed for random sampling during downsampling.

Value

A list of cell type-specific NMF models, each represented by its corresponding factorization matrix W.