Skip to contents

This function generates pseudobulk samples by aggregating single-cell transcriptomics.

Usage

CreatePseudobulks(data = NULL, groups, counts = NULL, n_mixtures = 100)

Arguments

data

A matrix of normalized gene expression data (genes x cells). If NULL, counts must be provided.

groups

A named vector indicating the group (e.g., spatial ecotype) for each cell. The names should correspond to the column names of the data matrix.

counts

A matrix of raw counts data (genes x cells). Used to generate normalized data if data is not provided.

n_mixtures

An integer specifying the number of pseudobulk samples to create. Default is 100.

Value

A list containing two elements:

Fracs

A matrix of the fractions of each group in the pseudobulk samples (rows represent pseudobulk samples, columns represent groups).

Mixtures

A matrix of pseudobulk gene expression data (genes x pseudobulk samples).

Details

  • If `data` is not provided, the function will normalize the `counts` matrix by dividing each column by its sum and multiplying by 10,000.

  • If the maximum value in `data` is less than 80, it assumes the data is in log2 scale and converts it back to non-log scale.

  • The `groups` vector is used to ensure that cells are correctly assigned to their respective groups. If `groups` does not have names, it is assumed that the names correspond to the column names of the `data` matrix.

  • Pseudobulk samples are created by sampling cells from each group based on predefined fractions, and then calculating the average expression for each gene in the pseudobulk samples.

  • The pseudobulk data is then normalized using Seurat's `NormalizeData` function.

Examples

library(SpatialEcoTyper)
library(googledrive)
drive_deauth() # no Google sign-in is required
drive_download(as_id("15n9zlXed74oeGaO1pythOOM_iWIfuMn2"), "Melanoma_WU2161_counts.rds",
                    overwrite = TRUE)
#> File downloaded:
#>Melanoma_WU2161_counts.rds <id: 15n9zlXed74oeGaO1pythOOM_iWIfuMn2>
#> Saved locally as:
#>Melanoma_WU2161_counts.rds
counts <- readRDS("Melanoma_WU2161_counts.rds") ## raw counts of scRNA-seq data
groups <- sample(paste0("SE", 1:10), ncol(counts), replace = TRUE)
names(groups) <- colnames(counts)
result = CreatePseudobulks(counts = counts, groups = groups, n_mixtures = 20)
head(result$Mixtures[, 1:5]) ## Gene expression matrix of pseudobulks
#>            Pseudobulk1 Pseudobulk2  Pseudobulk3  Pseudobulk4  Pseudobulk5
#> AL627309.1 0.010074937 0.026466048 0.0171018497 0.0171069227 0.0255710058
#> AL627309.5 0.028185407 0.028121530 0.0440359686 0.0470177161 0.0566584344
#> AP006222.2 0.005813013 0.002907824 0.0051710095 0.0088178291 0.0068326888
#> AC114498.1 0.014872736 0.014857974 0.0069979292 0.0069840099 0.0069909626
#> AL669831.2 0.001482140 0.000000000 0.0003703704 0.0007391259 0.0003700004
#> LINC01409  0.038924808 0.030773482 0.0406855932 0.0485869241 0.0419149703
head(result$Fracs) ## SE fractions in pseudobulks
#>                     SE1       SE10        SE2        SE3        SE4        SE5
#> Pseudobulk1 0.148299599 0.04084198 0.13605238 0.04812145 0.07334537 0.16468016
#> Pseudobulk2 0.180245105 0.03591520 0.03277197 0.16672719 0.17952414 0.11634038
#> Pseudobulk3 0.078572940 0.08653050 0.12737995 0.10409351 0.08368176 0.06809299
#> Pseudobulk4 0.073579654 0.20107738 0.10550911 0.11484678 0.07561444 0.11455294
#> Pseudobulk5 0.003943657 0.15120503 0.09552227 0.05698550 0.14124883 0.10354736
#> Pseudobulk6 0.067123736 0.11070905 0.15959928 0.07370885 0.15368544 0.06388965
#>                    SE6        SE7        SE8        SE9
#> Pseudobulk1 0.00000000 0.16123196 0.02122383 0.20620327
#> Pseudobulk2 0.03504460 0.11745721 0.00000000 0.13597421
#> Pseudobulk3 0.16320454 0.09041136 0.07594698 0.12208546
#> Pseudobulk4 0.04802129 0.16669192 0.05996442 0.04014206
#> Pseudobulk5 0.07907159 0.13299510 0.08381708 0.15166359
#> Pseudobulk6 0.16595165 0.01566788 0.13811904 0.05154544