Skip to contents

This function preprocesses single-cell spatial transcriptomics data by filtering out low-quality genes and cells based on specified thresholds. It ensures that only genes expressed in a minimum number of cells and cells expressing a minimum number of features are retained. Additionally, it reformats the metadata to include spatial coordinates (X and Y).

Usage

PreprocessST(
  expdat,
  metadata,
  min.cells = 3,
  min.features = 5,
  X = "X",
  Y = "Y"
)

Arguments

expdat

A matrix or data frame representing the gene expression data, where rows correspond to genes and columns correspond to cells.

metadata

A data frame containing metadata associated with each cell. Must include spatial coordinates (e.g., X and Y) as well as other cell-specific annotations. The row names of the `metadata` must match the column names of the `expdat`.

min.cells

An integer specifying the minimum number of cells in which a gene must be expressed to be retained (default is 3).

min.features

An integer specifying the minimum number of features (genes) a cell must express to be retained (default is 5).

X

A string specifying the column name in the metadata data frame that represents the X spatial coordinate (default is "X").

Y

A string specifying the column name in the metadata data frame that represents the Y spatial coordinate (default is "Y").

Value

A list containing two elements:

expdat

A filtered matrix of gene expression data, converted to a sparse matrix

metadata

A filtered data frame of metadata, aligned with the filtered gene expression data, including reformatted spatial coordinates.

Examples

library(SpatialEcoTyper)
library(data.table)
library(googledrive)
drive_deauth() # no Google sign-in is required
drive_download(as_id("1CgUOQKrWY_TG61o5aw7J9LZzE20D6NuI"),
                    "HumanMelanomaPatient1_subset_scmeta.tsv", overwrite = TRUE)
#> File downloaded:
#>HumanMelanomaPatient1_subset_scmeta.tsv
#>   <id: 1CgUOQKrWY_TG61o5aw7J9LZzE20D6NuI>
#> Saved locally as:
#>HumanMelanomaPatient1_subset_scmeta.tsv
drive_download(as_id("1CoQmU3u8MoVC8RbLUvTDQmOuJJ703HHB"),
              "HumanMelanomaPatient1_subset_counts.tsv.gz", overwrite = TRUE)
#> File downloaded:
#>HumanMelanomaPatient1_subset_counts.tsv.gz
#>   <id: 1CoQmU3u8MoVC8RbLUvTDQmOuJJ703HHB>
#> Saved locally as:
#>HumanMelanomaPatient1_subset_counts.tsv.gz
scdata <- fread("HumanMelanomaPatient1_subset_counts.tsv.gz",
                sep = "\t",header = TRUE, data.table = FALSE)
rownames(scdata) <- scdata[, 1]
scdata <- as.matrix(scdata[, -1])
scmeta <- read.table("HumanMelanomaPatient1_subset_scmeta.tsv",
                     sep = "\t", header = TRUE, row.names = 1)
processed <- PreprocessST(expdat = scdata, scmeta, X = "X", Y = "Y",
                          min.cells = 3, min.features = 5)
#> 2024-11-07 01:13:45.324804 Remove 87 genes expressed in fewer than 3 cells
head(processed$metadata)
#>                                         X         Y   CellType CellTypeName
#> HumanMelanomaPatient1__cell_3655 1894.706 -6367.766 Fibroblast  Fibroblasts
#> HumanMelanomaPatient1__cell_3657 1942.480 -6369.602 Fibroblast  Fibroblasts
#> HumanMelanomaPatient1__cell_3658 1963.007 -6374.026 Fibroblast  Fibroblasts
#> HumanMelanomaPatient1__cell_3660 1981.600 -6372.266 Fibroblast  Fibroblasts
#> HumanMelanomaPatient1__cell_3661 1742.939 -6374.851 Fibroblast  Fibroblasts
#> HumanMelanomaPatient1__cell_3663 1921.683 -6383.309 Fibroblast  Fibroblasts
#>                                  Region Dist2Interface
#> HumanMelanomaPatient1__cell_3655 Stroma      -883.1752
#> HumanMelanomaPatient1__cell_3657 Stroma      -894.8463
#> HumanMelanomaPatient1__cell_3658 Stroma      -904.1115
#> HumanMelanomaPatient1__cell_3660 Stroma      -907.8909
#> HumanMelanomaPatient1__cell_3661 Stroma      -874.2712
#> HumanMelanomaPatient1__cell_3663 Stroma      -903.6559
head(processed$expdat)
#> 6 x 27907 sparse Matrix of class "dgCMatrix"
#>   [[ suppressing 34 column names ‘HumanMelanomaPatient1__cell_3655’, ‘HumanMelanomaPatient1__cell_3657’, ‘HumanMelanomaPatient1__cell_3658’ ... ]]
#>                                                                             
#> PDK4     . 1 1 . . . . . . . . . . . . 1 . 2 1 . 1 . . . 2 2 . . . . . . . 1
#> TNFRSF17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> ICAM3    . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#> FAP      1 . . . . 1 2 1 . 1 . 1 . . . . . . . . 1 . . . . . 1 1 . 1 . . . 1
#> GZMB     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 . . . .
#> TSC2     . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
#>                
#> PDK4     ......
#> TNFRSF17 ......
#> ICAM3    ......
#> FAP      ......
#> GZMB     ......
#> TSC2     ......
#> 
#>  .....suppressing 27873 columns in show(); maybe adjust options(max.print=, width=)
#>  ..............................