Perform leave-one-out cross-validation (LOOCV) for SE prediction

Trains NMF-based spatial ecotype (SE) recovery models using subsets of single-cell data and evaluates predictions on held-out data. Supports repeated cross-validation and parallel execution.

Usage

LoocvPredict(
  scdata,
  scmeta,
  Sample = "Sample",
  CellType = "CellType",
  SE = "SE",
  repeats = 30,
  ncores = 4,
  scale = TRUE,
  verbose = TRUE,
  ...
)

Arguments

scdata: A gene expression matrix (genes x cells).
scmeta: A data.frame containing metadata for each cell. Row names of `scmeta` should match the column names in `scdata`.
Sample: Character. Column name in `scmeta` specifying sample IDs. If fewer than two unique samples are present, cells are randomly split into training and test sets within each cell type.
CellType: Character. Column name specifying cell type annotations in `scmeta`.
SE: Character. Column name specifying spatial ecotype labels in `scmeta`.
repeats: Integer. Number of cross-validation repeats.
ncores: Integer. Number of cores for parallel computation.
scale: Boolean specifying whether to perform univariance normalization for training and validation data (default: TRUE).
verbose: Boolean specifying whether to print the log messages.
...: Additional arguments passed to NMFGenerateWList.

Value

The input `scmeta` data.frame with an added column `cvPred` containing predicted SE labels for each cell.

Details

For each repeat:

If multiple samples are available, performs leave-one-sample-out CV
Otherwise, performs random stratified splitting within cell types
Trains NMF models using NMFGenerateWList
Predicts SE labels using RecoverSE

Predictions across repeats are aggregated by majority vote per cell.