This function generates pseudobulk samples by aggregating single-cell transcriptomics.
Arguments
- data
A matrix of normalized gene expression data (genes x cells). If NULL, counts must be provided.
- groups
A named vector indicating the group (e.g., spatial ecotype) for each cell. The names should correspond to the column names of the data matrix.
- counts
A matrix of raw counts data (genes x cells). Used to generate normalized data if data is not provided.
- n_mixtures
An integer specifying the number of pseudobulk samples to create. Default is 100.
Value
A list containing two elements:
- Fracs
A matrix of the fractions of each group in the pseudobulk samples (rows represent pseudobulk samples, columns represent groups).
- Mixtures
A matrix of pseudobulk gene expression data (genes x pseudobulk samples).
Details
If `data` is not provided, the function will normalize the `counts` matrix by dividing each column by its sum and multiplying by 10,000.
If the maximum value in `data` is less than 80, it assumes the data is in log2 scale and converts it back to non-log scale.
The `groups` vector is used to ensure that cells are correctly assigned to their respective groups. If `groups` does not have names, it is assumed that the names correspond to the column names of the `data` matrix.
Pseudobulk samples are created by sampling cells from each group based on predefined fractions, and then calculating the average expression for each gene in the pseudobulk samples.
The pseudobulk data is then normalized using Seurat's `NormalizeData` function.
Examples
library(SpatialEcoTyper)
library(googledrive)
drive_deauth() # no Google sign-in is required
drive_download(as_id("15n9zlXed74oeGaO1pythOOM_iWIfuMn2"), "Melanoma_WU2161_counts.rds",
overwrite = TRUE)
#> Error in map(as_id(id), get_one_file_id): ℹ In index: 1.
#> Caused by error in `.f()`:
#> ! Client error: (404) Not Found
#> File not found: 15n9zlXed74oeGaO1pythOOM_iWIfuMn2.
#> • message: File not found: 15n9zlXed74oeGaO1pythOOM_iWIfuMn2.
#> • domain: global
#> • reason: notFound
#> • location: fileId
#> • locationType: parameter
counts <- readRDS("Melanoma_WU2161_counts.rds") ## raw counts of scRNA-seq data
#> Warning: cannot open compressed file 'Melanoma_WU2161_counts.rds', probable reason 'No such file or directory'
#> Error in gzfile(file, "rb"): cannot open the connection
groups <- sample(paste0("SE", 1:10), ncol(counts), replace = TRUE)
#> Error in sample.int(length(x), size, replace, prob): invalid 'size' argument
names(groups) <- colnames(counts)
result = CreatePseudobulks(counts = counts, groups = groups, n_mixtures = 20)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 't': argument is not a matrix
head(result$Mixtures[, 1:5]) ## Gene expression matrix of pseudobulks
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'result' not found
head(result$Fracs) ## SE fractions in pseudobulks
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'result' not found
