Integrate Multiple Spatial Transcriptomics Datasets to Identify Conserved Spatial Ecotypes

This function performs SpatialEcoTyper analysis on multiple spatial transcriptomics datasets. It normalizes the input data, performs SpatialEcoTyper analysis on each dataset, and integrates the results across samples.

Usage

MultiSpatialEcoTyper(
  data_list,
  metadata_list,
  outdir = "./",
  normalization.method = "None",
  nmf_ranks = 10,
  nrun.per.rank = 30,
  min.coph = 0.95,
  radius = 50,
  min.cts.per.region = 1,
  nfeatures = 3000,
  min.features = 10,
  Region = NULL,
  subresolution = 30,
  minibatch = 5000,
  ncores = 1,
  seed = 1,
  filter.region.by.celltypes = NULL,
  ...
)

Arguments

data_list: A named list of expression matrices where each matrix represents gene expression data for a sample. The columns of each matrix correspond to cells, and the rows correspond to genes. Sample names should be used as list names. Otherwise, the samples will be named as 'Sample1' through 'SampleN'.
metadata_list: A named list of metadata data frames where each data frame contains metadata corresponding to the cells in the expression matrices. Each row should correspond to a column (cell) in the expression matrices. Each metadata should include at least three columns, including X, Y and CellType.
outdir: Directory where the results will be saved. Defaults to the current directory with a subdirectory named "SpatialEcoTyper_results_" followed by the current date.
normalization.method: Method for normalizing the expression data. Options include "None" (default), "SCT", or other methods compatible with Seurat's `NormalizeData` function.
nmf_ranks: Integer or a vector specifying the number of clusters (10 by default). When an integer vector is supplied, the function will test all supplied numbers and select the optimal number, which takes time.
nrun.per.rank: An integer specifying the the number of runs per rank for NMF (default: 30).
min.coph: Numeric specifying the minimum cophenetic coefficient required for a rank to be optimal.
radius: Numeric specifying the radius (in the same units as spatial coordinates) for defining spatial neighborhoods around each cell. Default is 50.
min.cts.per.region: Integer specifying the minimum number of cell types required for a microregion.
nfeatures: An integer specifying the maximum number of top variable genes to select for each cell type.
min.features: An integer specifying the minimum number of shared features (genes) required across samples.
Region: Character string specifying the column name in metadata data frames containing region annotations (default: NULL). Pathologist annotation is recommended if available.
subresolution: Numeric specifying the resolution for clustering within each sample.
minibatch: Integer specifying the number of columns to process in each minibatch in the SNF analysis. Default is 5000. This option splits the matrix into smaller chunks (minibatch), thus reducing memory usage.
ncores: Integer specifying the number of cores for parallel processing. Default is 1.
seed: An integer used to seed the random number generator for NMF analysis.
filter.region.by.celltypes: A character vector specifying the cell types to include in the analysis. Only spatial microregions that contain at least one of the specified cell types will be analyzed, while regions lacking these cell types will be excluded from the SE discovery process. If NULL, all spatial microregions will be included, regardless of cell type composition.
...: Additional arguments passed to the `SpatialEcoTyper` function.

Value

The function saves the results in the specified output directory.

Details

This function takes a list of gene expression matrices and corresponding metadata, normalizes the data if specified, performs SpatialEcoTyper on each sample, and integrates the results across multiple samples to identify conserved spatial ecotypes.

Examples

# See https://digitalcytometry.github.io/spatialecotyper/docs/articles/Integration.html