Skip to contents

Trains NMF-based spatial ecotype (SE) recovery models using subsets of single-cell data and evaluates predictions on held-out data. Supports repeated cross-validation and parallel execution.

Usage

LoocvPredict(
  scdata,
  scmeta,
  Sample = "Sample",
  CellType = "CellType",
  SE = "SE",
  repeats = 1,
  ncores = 4,
  scale = TRUE,
  verbose = TRUE,
  ...
)

Arguments

scdata

A gene expression matrix (genes x cells).

scmeta

A data.frame containing metadata for each cell. Row names of `scmeta` should match the column names in `scdata`.

Sample

Character. Column name in `scmeta` specifying sample IDs. If fewer than two unique samples are present, cells are randomly split into training and test sets within each cell type.

CellType

Character. Column name specifying cell type annotations in `scmeta`.

SE

Character. Column name specifying spatial ecotype labels in `scmeta`.

repeats

Integer. Number of cross-validation repeats.

ncores

Integer. Number of cores for parallel computation.

scale

Boolean specifying whether to perform univariance normalization for training and validation data (default: TRUE).

verbose

Boolean specifying whether to print the log messages.

...

Additional arguments passed to NMFGenerateWList.

Value

The input `scmeta` data.frame with an added column `cvPred` containing predicted SE labels for each cell.

Details

For each repeat:

  • If multiple samples are available, performs leave-one-sample-out CV

  • Otherwise, performs random stratified splitting within cell types

  • Trains NMF models using NMFGenerateWList

  • Predicts SE labels using RecoverSE

Predictions across repeats are aggregated by majority vote per cell.