
Perform leave-one-out cross-validation (LOOCV) for SE prediction
Source:R/LoocvPredict.R
LoocvPredict.RdTrains NMF-based spatial ecotype (SE) recovery models using subsets of single-cell data and evaluates predictions on held-out data. Supports repeated cross-validation and parallel execution.
Usage
LoocvPredict(
scdata,
scmeta,
Sample = "Sample",
CellType = "CellType",
SE = "SE",
repeats = 1,
ncores = 4,
scale = TRUE,
verbose = TRUE,
...
)Arguments
- scdata
A gene expression matrix (genes x cells).
- scmeta
A data.frame containing metadata for each cell. Row names of `scmeta` should match the column names in `scdata`.
- Sample
Character. Column name in `scmeta` specifying sample IDs. If fewer than two unique samples are present, cells are randomly split into training and test sets within each cell type.
- CellType
Character. Column name specifying cell type annotations in `scmeta`.
- SE
Character. Column name specifying spatial ecotype labels in `scmeta`.
- repeats
Integer. Number of cross-validation repeats.
- ncores
Integer. Number of cores for parallel computation.
- scale
Boolean specifying whether to perform univariance normalization for training and validation data (default: TRUE).
- verbose
Boolean specifying whether to print the log messages.
- ...
Additional arguments passed to
NMFGenerateWList.
Value
The input `scmeta` data.frame with an added column `cvPred` containing predicted SE labels for each cell.
Details
For each repeat:
If multiple samples are available, performs leave-one-sample-out CV
Otherwise, performs random stratified splitting within cell types
Trains NMF models using
NMFGenerateWListPredicts SE labels using
RecoverSE
Predictions across repeats are aggregated by majority vote per cell.