Skip to contents

Overview

In this tutorial, we will illustrate how to recover spatial ecotypes (SEs) from single-cell-scale spatial transcriptomics (ST) data, generated by platforms such as MERSCOPE, Xenium, CosMx, and Visium HD.

First load required packages for this vignette

Data preparation

SE recovery for single-cell ST data requires two inputs:

  • Gene expression matrix: a numeric matrix with genes as rows and cells as columns.
  • Metadata: a data frame containing at least three columns: “X” (x-coordinate), “Y” (y-coordinate), and “CellType” (cell type annotation). Row names must match the column names (cell IDs) of the expression matrix.
Text files as input
# Load metadata
scmeta <- read.table("https://spatialecotyper.stanford.edu/inc/inc.public.vignettes.php?file=CRC2_subset_scmeta.tsv",
                      sep = "\t", header = TRUE, row.names = 1)
head(scmeta[, c("X", "Y", "CellType")])
##                                            X         Y   CellType
## HumanColonCancerPatient2__cell_1085 5057.912 -1721.871 Macrophage
## HumanColonCancerPatient2__cell_1205 5035.339 -1783.352 Macrophage
## HumanColonCancerPatient2__cell_1312 5012.093 -1908.390 Macrophage
## HumanColonCancerPatient2__cell_1625 5073.743 -1861.740 Macrophage
## HumanColonCancerPatient2__cell_1629 5030.235 -1867.441 Macrophage
## HumanColonCancerPatient2__cell_1639 5020.448 -1882.751 Macrophage
# Load expression matrix
scdata <- fread("https://spatialecotyper.stanford.edu/inc/inc.public.vignettes.php?file=CRC2_subset_counts.tsv.gz",
                sep = "\t",header = TRUE, data.table = FALSE)
rownames(scdata) <- scdata[, 1]
scdata <- as.matrix(scdata[, -1])
head(scdata[,1:5])
##         HumanColonCancerPatient2__cell_1085 HumanColonCancerPatient2__cell_1205
## PDK4                                      0                                   0
## CCL26                                     0                                   0
## CX3CL1                                    0                                   0
## PGLYRP1                                   0                                   0
## CD4                                       0                                   0
## SNAI2                                     0                                   0
##         HumanColonCancerPatient2__cell_1312 HumanColonCancerPatient2__cell_1625
## PDK4                                      1                                   0
## CCL26                                     0                                   0
## CX3CL1                                    0                                   0
## PGLYRP1                                   0                                   0
## CD4                                       0                                   0
## SNAI2                                     0                                   0
##         HumanColonCancerPatient2__cell_1629
## PDK4                                      0
## CCL26                                     0
## CX3CL1                                    0
## PGLYRP1                                   0
## CD4                                       0
## SNAI2                                     0
# Ensure matching cell order
scdata <- scdata[, match(rownames(scmeta), colnames(scdata))]

Data normalization

The data should be normalized to a log-transformed CPM (counts per million) or a similar scale. The NormalizeData function is applied at this step to perform this normalization.

normdata = NormalizeData(scdata)

SE recovery

The RecoverSE function will be used to assign single cells into SEs. Users can either use default model to recover predefined SEs or use custom model to recover newly defined SEs.

Note: Cell type annotation (celltypes argument) is required.

Using default models

The default NMF models were trained on MERSCOPE data across carcinomas and melanoma. The model supports the following cell types: B cells, CD4+ T cells, CD8+ T cells, NK cells, plasma cells, macrophages, dendritic cells, fibroblasts, and endothelial cells. All cells in the query data should be grouped into one of “B”, “CD4T”, “CD8T”, “NK”, “Plasma”, “Macrophage”, “DC”, “Fibroblast”, and “Endothelial”, case sensitive. All the other cell types will be assigned to NonSE.

sepreds <- RecoverSE(normdata, 
                     celltypes = scmeta$CellType,
                     ncores = 8)
head(sepreds)
##                                                                     CID
## HumanColonCancerPatient2__cell_1085 HumanColonCancerPatient2__cell_1085
## HumanColonCancerPatient2__cell_1205 HumanColonCancerPatient2__cell_1205
## HumanColonCancerPatient2__cell_1312 HumanColonCancerPatient2__cell_1312
## HumanColonCancerPatient2__cell_1625 HumanColonCancerPatient2__cell_1625
## HumanColonCancerPatient2__cell_1629 HumanColonCancerPatient2__cell_1629
## HumanColonCancerPatient2__cell_1639 HumanColonCancerPatient2__cell_1639
##                                       CellType InitSE    SE PredScore
## HumanColonCancerPatient2__cell_1085 Macrophage   SE11   SE9 0.7437170
## HumanColonCancerPatient2__cell_1205 Macrophage   SE09 NonSE 0.8450224
## HumanColonCancerPatient2__cell_1312 Macrophage   SE11   SE9 0.7613463
## HumanColonCancerPatient2__cell_1625 Macrophage   SE07   SE6 0.6164063
## HumanColonCancerPatient2__cell_1629 Macrophage   SE08 NonSE 0.5823971
## HumanColonCancerPatient2__cell_1639 Macrophage   SE11   SE9 0.8711407
Using custom models

To use custom model, users should first develop a model following the tutorial NMF Model Development for Spatial Ecotype Recovery from Single-Cell and Spatial Transcriptomics Data. The resulting model can be used for SE recovery. An example model is available at SE_Recovery_W_list.rds.

Download a custom model

url <- "https://spatialecotyper.stanford.edu/inc/inc.public.vignettes.php?file=SE_Recovery_W_list.rds"
download.file(url, destfile = "SE_Recovery_W_list.rds", mode = "wb")

Load the custom model

Ws <- readRDS("SE_Recovery_W_list.rds")
names(Ws) ## named list of W matrices
head(Ws[[1]]) ## feature by SE matrix

Using custom model for SE recovery by specifying the Ws argument.

sepreds <- RecoverSE(normdata, celltypes = scmeta$CellType, Ws = Ws)

After prediction, we recommend retaining only assignments corresponding to SE-specific cell states identified in Tutorial 3. Cells not assigned to these SE-specific states should be grouped as “non-SE,” as they cannot be robustly mapped to corresponding SE.

Combine SE recovery results into metadata

# Add SE assignments into the metadata
scmeta$SE = sepreds$SE[match(rownames(scmeta), rownames(sepreds))]

Validation: spatial colocalization of SE cell states

After recovering SE-specific cell states from single-cell ST data, users can validate the SEs by testing whether cell states belonging to the same SE are more spatially colocalized than expected by random chance using the Colocalization function. The spatial coordinates (column X and Y), cell type (CellType) and cell state annotations (State) are required for the analysis.

Note: By default (test = TRUE), the function evaluates the significance of cell state co-localization within each SE and returns a vector of p-values. This analysis requires cell state names to follow the format “SE1_CD4T”, where an underscore separates the SE label from the cell type, enabling extraction of SE groupings from the cell state names.

# Verify the four columns required for the analysis.
head(scmeta[, c("X", "Y", "SE", "CellType")]) 
##                                            X         Y    SE   CellType
## HumanColonCancerPatient2__cell_1085 5057.912 -1721.871   SE9 Macrophage
## HumanColonCancerPatient2__cell_1205 5035.339 -1783.352 NonSE Macrophage
## HumanColonCancerPatient2__cell_1312 5012.093 -1908.390   SE9 Macrophage
## HumanColonCancerPatient2__cell_1625 5073.743 -1861.740   SE6 Macrophage
## HumanColonCancerPatient2__cell_1629 5030.235 -1867.441 NonSE Macrophage
## HumanColonCancerPatient2__cell_1639 5020.448 -1882.751   SE9 Macrophage
## For quick test run, 10 permutations were performed.
coloc_res = Colocalization(scmeta, coords = c("X", "Y"), 
                           SE = "SE", CellType = "CellType",
                           radius = 50, nperm = 10, 
                           test = TRUE, ncores = 8)
head(coloc_res$ColocIndex[, 1:5]) ## Matrix of colocalization indices between all cell states
##                        NonSE_B NonSE_CD4T  NonSE_CD8T  NonSE_DC
## NonSE_B             6.99868028 -0.1088442 -1.26127962 -2.481705
## NonSE_CD4T          0.01198789  1.0876716 -0.27916629 -2.693859
## NonSE_CD8T         -1.13257146 -0.4715074  1.93495806 -3.688940
## NonSE_DC           -2.19144921 -2.5835766 -3.63125229 -1.089487
## NonSE_Endothelial  -2.00235783  2.0257263 -0.06124556  1.092382
## NonSE_Fibroblast  -12.86098291  4.3604439 -2.84485824  4.309132
##                   NonSE_Endothelial
## NonSE_B                  -1.9353359
## NonSE_CD4T                1.9644019
## NonSE_CD8T               -0.2698798
## NonSE_DC                  1.8692459
## NonSE_Endothelial         1.3615706
## NonSE_Fibroblast          3.9068697
coloc_res$Pval ## Vector of p-values for colocalization significance
##        NonSE          SE1          SE2          SE3          SE4          SE5 
## 3.682687e-01 5.522735e-15 4.121595e-03 1.959896e-03 5.406980e-08 2.303532e-02 
##          SE6          SE7          SE8          SE9 
## 1.615148e-02 4.001441e-07 5.090381e-08 9.746007e-02

You can visualize the results using the ColocalizationHeatmapView function.

p = ColocalizationHeatmapView(coloc_res$ColocIndex, coloc_res$Pval)
p = ComplexHeatmap::draw(p)
ses = gsub("_.*", "", rownames(coloc_res$ColocIndex))
drawRectangleAnnotation(p, ses, ses)

Validation: spatial autocorrelation

Users can also validate the SEs by assessing the spatial autocorrelation with Moran’s I. To control for bias, permutation experiments will be performed to normalize Moran’s I into z-scores using the ComputeNormalizedMoranI function.

## For quick test run, test with 10 permutations were performed.
morani_zscore = ComputeNormalizedMoranI(scmeta, coords = c("X", "Y"), 
                                        SE = "SE", CellType = "CellType", 
                                        nperm = 10, ncores = 8)
morani_zscore
##      NonSE        SE1        SE2        SE3        SE4        SE5        SE6 
##  17.707991  83.251639  27.769887  14.100131 128.408053  13.101159  18.912067 
##        SE7        SE8        SE9 
##  16.739382  57.931111   8.569262

Validation: average expression of SE markers

Users can also validate the SEs by visualizing the average expression of SE consensus markers across recovered SEs using the AverageConsensusMarkers function.

## The function is built on Seurat functions, so first create a Seurat object.
obj = CreateSeuratObject(scdata, meta.data = scmeta)
obj = NormalizeData(obj)
## Visualize the SE consensus markers across recovered SEs
res = AverageConsensusMarkers(obj, group.by = "SE")
plot(res$p)

# Underlying data
res$AvgExp
##             SE1        SE2        SE3        SE4         SE5         SE6
## SE1  1.63899149 -0.7067854 -0.4808280  0.8900028 -1.07180027  1.21067361
## SE2 -0.84198747  1.7084204 -0.6388041 -1.6401970 -0.19640606  0.97722401
## SE3  1.23530812 -1.2044554  1.1599992 -0.5802536 -0.84596171 -0.85957318
## SE4  0.06104092 -2.1637758  0.3105004  1.6581537 -0.03363019  0.23035569
## SE5 -0.04421435  1.3939729 -0.2878237 -2.3379710  0.17834575  0.27247684
## SE6  0.01721607  0.3289262 -0.8663956  0.8662647 -0.86950581  1.59766483
## SE7  0.55549363 -0.1465396  0.6349276 -2.1011003 -0.72279879 -0.03098458
## SE8 -1.22153700  0.8819891 -0.6609707 -0.6608298 -0.15405488  0.07337827
## SE9 -1.02337627  1.0094264 -0.3752031 -0.6409218 -1.01265650 -0.37474314
##             SE7        SE8        SE9
## SE1 -0.06272876 -0.4191507 -0.9983748
## SE2 -0.09267733  0.4311746  0.2932529
## SE3  0.75394104 -0.6232041  0.9641996
## SE4 -0.37194620  0.4923575 -0.1830561
## SE5 -0.06336877  0.4210737  0.4675086
## SE6  0.07290969  0.5261576 -1.6732377
## SE7  1.36375460  0.6531338 -0.2058864
## SE8  0.58219283  1.9611643 -0.8013322
## SE9 -0.38983885  1.1548420  1.6524713

The AverageConsensusMarkers function can also be used to visualize average expression of other gene lists by specifying the markers parameter, which should be a list of gene sets.

markers = list(set1 = c("FOS", "EGR1", "DUSP1", "JUNB", "JUN", "FOSB"),
               set2 = c("STAT1", "TAP1", "HLA-A", "HLA-B", "HLA-C", "B2M"),
               set3 = c("GZMA", "GZMB", "PRF1"),
               set4 = c("PDCD1", "CTLA4", "LAG3", " HAVCR2", "TIGIT"))
res = AverageConsensusMarkers(obj, group.by = "SE", markers = markers)
plot(res$p)

Session info

The session info allows users to replicate the exact environment and identify potential discrepancies in package versions or configurations that might be causing problems.

## R version 4.4.1 (2024-06-14)
## Platform: aarch64-apple-darwin20
## Running under: macOS 26.4.1
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/Los_Angeles
## tzcode source: internal
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] pals_1.9              SpatialEcoTyper_1.0.2 NMF_0.28             
##  [4] Biobase_2.64.0        BiocGenerics_0.50.0   cluster_2.1.6        
##  [7] rngtools_1.5.2        registry_0.5-1        RANN_2.6.2           
## [10] Matrix_1.7-0          data.table_1.16.0     Seurat_5.1.0         
## [13] SeuratObject_5.0.2    sp_2.1-4              ggplot2_3.5.1        
## [16] dplyr_1.1.4          
## 
## loaded via a namespace (and not attached):
##   [1] RcppAnnoy_0.0.22       splines_4.4.1          later_1.3.2           
##   [4] R.oo_1.26.0            tibble_3.2.1           polyclip_1.10-7       
##   [7] fastDummies_1.7.4      lifecycle_1.0.4        sf_1.1-0              
##  [10] doParallel_1.0.17      globals_0.16.3         lattice_0.22-6        
##  [13] MASS_7.3-60.2          magrittr_2.0.3         plotly_4.10.4         
##  [16] sass_0.4.9             rmarkdown_2.28         jquerylib_0.1.4       
##  [19] yaml_2.3.10            httpuv_1.6.15          sctransform_0.4.1     
##  [22] spam_2.10-0            spatstat.sparse_3.1-0  reticulate_1.39.0     
##  [25] mapproj_1.2.11         cowplot_1.1.3          pbapply_1.7-2         
##  [28] DBI_1.3.0              RColorBrewer_1.1-3     maps_3.4.2            
##  [31] abind_1.4-5            Rtsne_0.17             R.utils_2.12.3        
##  [34] purrr_1.0.2            circlize_0.4.16        IRanges_2.38.1        
##  [37] S4Vectors_0.42.1       ggrepel_0.9.6          irlba_2.3.5.1         
##  [40] listenv_0.9.1          spatstat.utils_3.1-0   units_1.0-1           
##  [43] goftest_1.2-3          RSpectra_0.16-2        spatstat.random_3.3-1 
##  [46] fitdistrplus_1.2-1     parallelly_1.38.0      pkgdown_2.1.0         
##  [49] leiden_0.4.3.1         codetools_0.2-20       tidyselect_1.2.1      
##  [52] shape_1.4.6.1          matrixStats_1.4.1      stats4_4.4.1          
##  [55] spatstat.explore_3.3-2 jsonlite_1.8.8         GetoptLong_1.0.5      
##  [58] e1071_1.7-16           progressr_0.14.0       ggridges_0.5.6        
##  [61] survival_3.6-4         iterators_1.0.14       systemfonts_1.1.0     
##  [64] foreach_1.5.2          tools_4.4.1            ragg_1.3.2            
##  [67] ica_1.0-3              Rcpp_1.0.13            glue_1.7.0            
##  [70] gridExtra_2.3          xfun_0.52              withr_3.0.1           
##  [73] BiocManager_1.30.25    fastmap_1.2.0          boot_1.3-30           
##  [76] fansi_1.0.6            spData_2.3.4           digest_0.6.37         
##  [79] R6_2.5.1               mime_0.12              wk_0.9.5              
##  [82] textshaping_0.4.0      colorspace_2.1-1       scattermore_1.2       
##  [85] tensor_1.5             dichromat_2.0-0.1      spatstat.data_3.1-2   
##  [88] R.methodsS3_1.8.2      utf8_1.2.4             tidyr_1.3.1           
##  [91] generics_0.1.3         class_7.3-22           httr_1.4.7            
##  [94] htmlwidgets_1.6.4      spdep_1.4-2            uwot_0.2.2            
##  [97] pkgconfig_2.0.3        gtable_0.3.5           ComplexHeatmap_2.20.0 
## [100] lmtest_0.9-40          htmltools_0.5.8.1      dotCall64_1.1-1       
## [103] clue_0.3-65            scales_1.3.0           png_0.1-8             
## [106] spatstat.univar_3.0-1  knitr_1.48             rstudioapi_0.16.0     
## [109] reshape2_1.4.4         rjson_0.2.22           nlme_3.1-164          
## [112] proxy_0.4-27           cachem_1.1.0           zoo_1.8-12            
## [115] GlobalOptions_0.1.2    stringr_1.5.1          KernSmooth_2.23-24    
## [118] miniUI_0.1.1.1         s2_1.1.9               desc_1.4.3            
## [121] pillar_1.9.0           grid_4.4.1             vctrs_0.6.5           
## [124] promises_1.3.0         xtable_1.8-4           evaluate_0.24.0       
## [127] magick_2.8.5           cli_3.6.3              compiler_4.4.1        
## [130] rlang_1.1.4            crayon_1.5.3           future.apply_1.11.2   
## [133] classInt_0.4-11        plyr_1.8.9             fs_1.6.4              
## [136] stringi_1.8.4          viridisLite_0.4.2      deldir_2.0-4          
## [139] gridBase_0.4-7         munsell_0.5.1          lazyeval_0.2.2        
## [142] spatstat.geom_3.3-2    RcppHNSW_0.6.0         patchwork_1.2.0       
## [145] future_1.34.0          shiny_1.9.1            highr_0.11            
## [148] ROCR_1.0-11            igraph_2.0.3           bslib_0.8.0