This function preprocesses single-cell spatial transcriptomics data by filtering out low-quality genes and cells based on specified thresholds. It ensures that only genes expressed in a minimum number of cells and cells expressing a minimum number of features are retained. Additionally, it reformats the metadata to include spatial coordinates (X and Y).
Arguments
- expdat
A matrix or data frame representing the gene expression data, where rows correspond to genes and columns correspond to cells.
- metadata
A data frame containing metadata associated with each cell. Must include spatial coordinates (e.g., X and Y) as well as other cell-specific annotations. The row names of the `metadata` must match the column names of the `expdat`.
- min.cells
An integer specifying the minimum number of cells in which a gene must be expressed to be retained (default is 3).
- min.features
An integer specifying the minimum number of features (genes) a cell must express to be retained (default is 5).
- X
A string specifying the column name in the metadata data frame that represents the X spatial coordinate (default is "X").
- Y
A string specifying the column name in the metadata data frame that represents the Y spatial coordinate (default is "Y").
Value
A list containing two elements:
- expdat
A filtered matrix of gene expression data, converted to a sparse matrix
- metadata
A filtered data frame of metadata, aligned with the filtered gene expression data, including reformatted spatial coordinates.
Examples
library(SpatialEcoTyper)
library(data.table)
library(googledrive)
drive_deauth() # no Google sign-in is required
drive_download(as_id("1CgUOQKrWY_TG61o5aw7J9LZzE20D6NuI"),
"HumanMelanomaPatient1_subset_scmeta.tsv", overwrite = TRUE)
#> File downloaded:
#> • HumanMelanomaPatient1_subset_scmeta.tsv
#> <id: 1CgUOQKrWY_TG61o5aw7J9LZzE20D6NuI>
#> Saved locally as:
#> • HumanMelanomaPatient1_subset_scmeta.tsv
drive_download(as_id("1CoQmU3u8MoVC8RbLUvTDQmOuJJ703HHB"),
"HumanMelanomaPatient1_subset_counts.tsv.gz", overwrite = TRUE)
#> File downloaded:
#> • HumanMelanomaPatient1_subset_counts.tsv.gz
#> <id: 1CoQmU3u8MoVC8RbLUvTDQmOuJJ703HHB>
#> Saved locally as:
#> • HumanMelanomaPatient1_subset_counts.tsv.gz
scdata <- fread("HumanMelanomaPatient1_subset_counts.tsv.gz",
sep = "\t",header = TRUE, data.table = FALSE)
rownames(scdata) <- scdata[, 1]
scdata <- as.matrix(scdata[, -1])
scmeta <- read.table("HumanMelanomaPatient1_subset_scmeta.tsv",
sep = "\t", header = TRUE, row.names = 1)
processed <- PreprocessST(expdat = scdata, scmeta, X = "X", Y = "Y",
min.cells = 3, min.features = 5)
#> 2024-11-07 01:13:45.324804 Remove 87 genes expressed in fewer than 3 cells
head(processed$metadata)
#> X Y CellType CellTypeName
#> HumanMelanomaPatient1__cell_3655 1894.706 -6367.766 Fibroblast Fibroblasts
#> HumanMelanomaPatient1__cell_3657 1942.480 -6369.602 Fibroblast Fibroblasts
#> HumanMelanomaPatient1__cell_3658 1963.007 -6374.026 Fibroblast Fibroblasts
#> HumanMelanomaPatient1__cell_3660 1981.600 -6372.266 Fibroblast Fibroblasts
#> HumanMelanomaPatient1__cell_3661 1742.939 -6374.851 Fibroblast Fibroblasts
#> HumanMelanomaPatient1__cell_3663 1921.683 -6383.309 Fibroblast Fibroblasts
#> Region Dist2Interface
#> HumanMelanomaPatient1__cell_3655 Stroma -883.1752
#> HumanMelanomaPatient1__cell_3657 Stroma -894.8463
#> HumanMelanomaPatient1__cell_3658 Stroma -904.1115
#> HumanMelanomaPatient1__cell_3660 Stroma -907.8909
#> HumanMelanomaPatient1__cell_3661 Stroma -874.2712
#> HumanMelanomaPatient1__cell_3663 Stroma -903.6559
head(processed$expdat)
#> 6 x 27907 sparse Matrix of class "dgCMatrix"
#> [[ suppressing 34 column names ‘HumanMelanomaPatient1__cell_3655’, ‘HumanMelanomaPatient1__cell_3657’, ‘HumanMelanomaPatient1__cell_3658’ ... ]]
#>
#> PDK4 . 1 1 . . . . . . . . . . . . 1 . 2 1 . 1 . . . 2 2 . . . . . . . 1
#> TNFRSF17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> ICAM3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> FAP 1 . . . . 1 2 1 . 1 . 1 . . . . . . . . 1 . . . . . 1 1 . 1 . . . 1
#> GZMB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . .
#> TSC2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#>
#> PDK4 ......
#> TNFRSF17 ......
#> ICAM3 ......
#> FAP ......
#> GZMB ......
#> TSC2 ......
#>
#> .....suppressing 27873 columns in show(); maybe adjust options(max.print=, width=)
#> ..............................