This function generates pseudobulk samples by aggregating single-cell transcriptomics.
Arguments
- data
A matrix of normalized gene expression data (genes x cells). If NULL, counts must be provided.
- groups
A named vector indicating the group (e.g., spatial ecotype) for each cell. The names should correspond to the column names of the data matrix.
- counts
A matrix of raw counts data (genes x cells). Used to generate normalized data if data is not provided.
- n_mixtures
An integer specifying the number of pseudobulk samples to create. Default is 100.
Value
A list containing two elements:
- Fracs
A matrix of the fractions of each group in the pseudobulk samples (rows represent pseudobulk samples, columns represent groups).
- Mixtures
A matrix of pseudobulk gene expression data (genes x pseudobulk samples).
Details
If `data` is not provided, the function will normalize the `counts` matrix by dividing each column by its sum and multiplying by 10,000.
If the maximum value in `data` is less than 80, it assumes the data is in log2 scale and converts it back to non-log scale.
The `groups` vector is used to ensure that cells are correctly assigned to their respective groups. If `groups` does not have names, it is assumed that the names correspond to the column names of the `data` matrix.
Pseudobulk samples are created by sampling cells from each group based on predefined fractions, and then calculating the average expression for each gene in the pseudobulk samples.
The pseudobulk data is then normalized using Seurat's `NormalizeData` function.
Examples
library(SpatialEcoTyper)
library(googledrive)
drive_deauth() # no Google sign-in is required
drive_download(as_id("15n9zlXed74oeGaO1pythOOM_iWIfuMn2"), "Melanoma_WU2161_counts.rds",
overwrite = TRUE)
#> File downloaded:
#> • Melanoma_WU2161_counts.rds <id: 15n9zlXed74oeGaO1pythOOM_iWIfuMn2>
#> Saved locally as:
#> • Melanoma_WU2161_counts.rds
counts <- readRDS("Melanoma_WU2161_counts.rds") ## raw counts of scRNA-seq data
groups <- sample(paste0("SE", 1:10), ncol(counts), replace = TRUE)
names(groups) <- colnames(counts)
result = CreatePseudobulks(counts = counts, groups = groups, n_mixtures = 20)
head(result$Mixtures[, 1:5]) ## Gene expression matrix of pseudobulks
#> Pseudobulk1 Pseudobulk2 Pseudobulk3 Pseudobulk4 Pseudobulk5
#> AL627309.1 0.010074937 0.026466048 0.0171018497 0.0171069227 0.0255710058
#> AL627309.5 0.028185407 0.028121530 0.0440359686 0.0470177161 0.0566584344
#> AP006222.2 0.005813013 0.002907824 0.0051710095 0.0088178291 0.0068326888
#> AC114498.1 0.014872736 0.014857974 0.0069979292 0.0069840099 0.0069909626
#> AL669831.2 0.001482140 0.000000000 0.0003703704 0.0007391259 0.0003700004
#> LINC01409 0.038924808 0.030773482 0.0406855932 0.0485869241 0.0419149703
head(result$Fracs) ## SE fractions in pseudobulks
#> SE1 SE10 SE2 SE3 SE4 SE5
#> Pseudobulk1 0.148299599 0.04084198 0.13605238 0.04812145 0.07334537 0.16468016
#> Pseudobulk2 0.180245105 0.03591520 0.03277197 0.16672719 0.17952414 0.11634038
#> Pseudobulk3 0.078572940 0.08653050 0.12737995 0.10409351 0.08368176 0.06809299
#> Pseudobulk4 0.073579654 0.20107738 0.10550911 0.11484678 0.07561444 0.11455294
#> Pseudobulk5 0.003943657 0.15120503 0.09552227 0.05698550 0.14124883 0.10354736
#> Pseudobulk6 0.067123736 0.11070905 0.15959928 0.07370885 0.15368544 0.06388965
#> SE6 SE7 SE8 SE9
#> Pseudobulk1 0.00000000 0.16123196 0.02122383 0.20620327
#> Pseudobulk2 0.03504460 0.11745721 0.00000000 0.13597421
#> Pseudobulk3 0.16320454 0.09041136 0.07594698 0.12208546
#> Pseudobulk4 0.04802129 0.16669192 0.05996442 0.04014206
#> Pseudobulk5 0.07907159 0.13299510 0.08381708 0.15166359
#> Pseudobulk6 0.16595165 0.01566788 0.13811904 0.05154544