Create Pseudo-bulk Mixtures — CreatePseudobulks • SpatialEcoTyper

This function generates pseudobulk samples by aggregating single-cell transcriptomics.

Usage

CreatePseudobulks(data = NULL, groups, counts = NULL, n_mixtures = 100)

Arguments

data: A matrix of normalized gene expression data (genes x cells). If NULL, counts must be provided.
groups: A named vector indicating the group (e.g., spatial ecotype) for each cell. The names should correspond to the column names of the data matrix.
counts: A matrix of raw counts data (genes x cells). Used to generate normalized data if data is not provided.
n_mixtures: An integer specifying the number of pseudobulk samples to create. Default is 100.

Value

A list containing two elements:

Fracs: A matrix of the fractions of each group in the pseudobulk samples (rows represent pseudobulk samples, columns represent groups).
Mixtures: A matrix of pseudobulk gene expression data (genes x pseudobulk samples).

Details

If `data` is not provided, the function will normalize the `counts` matrix by dividing each column by its sum and multiplying by 10,000.
If the maximum value in `data` is less than 80, it assumes the data is in log2 scale and converts it back to non-log scale.
The `groups` vector is used to ensure that cells are correctly assigned to their respective groups. If `groups` does not have names, it is assumed that the names correspond to the column names of the `data` matrix.
Pseudobulk samples are created by sampling cells from each group based on predefined fractions, and then calculating the average expression for each gene in the pseudobulk samples.
The pseudobulk data is then normalized using Seurat's `NormalizeData` function.

Examples

library(SpatialEcoTyper)
library(googledrive)
drive_deauth() # no Google sign-in is required
drive_download(as_id("15n9zlXed74oeGaO1pythOOM_iWIfuMn2"), "Melanoma_WU2161_counts.rds",
                    overwrite = TRUE)
#> File downloaded:
#> • Melanoma_WU2161_counts.rds <id: 15n9zlXed74oeGaO1pythOOM_iWIfuMn2>
#> Saved locally as:
#> • Melanoma_WU2161_counts.rds
counts <- readRDS("Melanoma_WU2161_counts.rds") ## raw counts of scRNA-seq data
groups <- sample(paste0("SE", 1:10), ncol(counts), replace = TRUE)
names(groups) <- colnames(counts)
result = CreatePseudobulks(counts = counts, groups = groups, n_mixtures = 20)
head(result$Mixtures[, 1:5]) ## Gene expression matrix of pseudobulks
#>            Pseudobulk1 Pseudobulk2  Pseudobulk3  Pseudobulk4  Pseudobulk5
#> AL627309.1 0.010074937 0.026466048 0.0171018497 0.0171069227 0.0255710058
#> AL627309.5 0.028185407 0.028121530 0.0440359686 0.0470177161 0.0566584344
#> AP006222.2 0.005813013 0.002907824 0.0051710095 0.0088178291 0.0068326888
#> AC114498.1 0.014872736 0.014857974 0.0069979292 0.0069840099 0.0069909626
#> AL669831.2 0.001482140 0.000000000 0.0003703704 0.0007391259 0.0003700004
#> LINC01409  0.038924808 0.030773482 0.0406855932 0.0485869241 0.0419149703
head(result$Fracs) ## SE fractions in pseudobulks
#>                     SE1       SE10        SE2        SE3        SE4        SE5
#> Pseudobulk1 0.148299599 0.04084198 0.13605238 0.04812145 0.07334537 0.16468016
#> Pseudobulk2 0.180245105 0.03591520 0.03277197 0.16672719 0.17952414 0.11634038
#> Pseudobulk3 0.078572940 0.08653050 0.12737995 0.10409351 0.08368176 0.06809299
#> Pseudobulk4 0.073579654 0.20107738 0.10550911 0.11484678 0.07561444 0.11455294
#> Pseudobulk5 0.003943657 0.15120503 0.09552227 0.05698550 0.14124883 0.10354736
#> Pseudobulk6 0.067123736 0.11070905 0.15959928 0.07370885 0.15368544 0.06388965
#>                    SE6        SE7        SE8        SE9
#> Pseudobulk1 0.00000000 0.16123196 0.02122383 0.20620327
#> Pseudobulk2 0.03504460 0.11745721 0.00000000 0.13597421
#> Pseudobulk3 0.16320454 0.09041136 0.07594698 0.12208546
#> Pseudobulk4 0.04802129 0.16669192 0.05996442 0.04014206
#> Pseudobulk5 0.07907159 0.13299510 0.08381708 0.15166359
#> Pseudobulk6 0.16595165 0.01566788 0.13811904 0.05154544