Skip to contents

This function generates pseudobulk samples by aggregating single-cell transcriptomics.

Usage

CreatePseudobulks(data = NULL, groups, counts = NULL, n_mixtures = 100)

Arguments

data

A matrix of normalized gene expression data (genes x cells). If NULL, counts must be provided.

groups

A named vector indicating the group (e.g., spatial ecotype) for each cell. The names should correspond to the column names of the data matrix.

counts

A matrix of raw counts data (genes x cells). Used to generate normalized data if data is not provided.

n_mixtures

An integer specifying the number of pseudobulk samples to create. Default is 100.

Value

A list containing two elements:

Fracs

A matrix of the fractions of each group in the pseudobulk samples (rows represent pseudobulk samples, columns represent groups).

Mixtures

A matrix of pseudobulk gene expression data (genes x pseudobulk samples).

Details

  • If `data` is not provided, the function will normalize the `counts` matrix by dividing each column by its sum and multiplying by 10,000.

  • If the maximum value in `data` is less than 80, it assumes the data is in log2 scale and converts it back to non-log scale.

  • The `groups` vector is used to ensure that cells are correctly assigned to their respective groups. If `groups` does not have names, it is assumed that the names correspond to the column names of the `data` matrix.

  • Pseudobulk samples are created by sampling cells from each group based on predefined fractions, and then calculating the average expression for each gene in the pseudobulk samples.

  • The pseudobulk data is then normalized using Seurat's `NormalizeData` function.

Examples

library(SpatialEcoTyper)
counts <- fread("https://spatialecotyper.stanford.edu/inc/inc.public.vignettes.php?file=scRNAseq_demo_counts.tsv",
                sep = "\t", header = TRUE, data.table = FALSE)
rownames(counts) = counts[, 1]
counts = as.matrix(counts[, -1])
groups <- sample(paste0("SE", 1:10), ncol(counts), replace = TRUE)
names(groups) <- colnames(counts)
result = CreatePseudobulks(counts = counts, groups = groups, n_mixtures = 20)
head(result$Mixtures[, 1:5]) ## Gene expression matrix of pseudobulks
#>            Pseudobulk1  Pseudobulk2  Pseudobulk3 Pseudobulk4  Pseudobulk5
#> AL627309.1 0.022400036 0.0237588981 0.0404695702 0.021582389 0.0332645615
#> AL627309.5 0.037513779 0.0364766312 0.0511169321 0.058034053 0.0488995289
#> AP006222.2 0.002201564 0.0002290762 0.0002827068 0.002476785 0.0006911286
#> AC114498.1 0.014872736 0.0009226095 0.0000000000 0.010640104 0.0000000000
#> AL669831.2 0.000370741 0.0003700004 0.0000000000 0.000000000 0.0000000000
#> LINC01409  0.035601840 0.0329151072 0.0280835293 0.040958856 0.0356076402
head(result$Fracs) ## SE fractions in pseudobulks
#>                     SE1       SE10        SE2        SE3        SE4        SE5
#> Pseudobulk1 0.002385833 0.12812274 0.12133029 0.11022785 0.15096178 0.02909183
#> Pseudobulk2 0.125894503 0.05919119 0.06175861 0.00000000 0.17164106 0.07283431
#> Pseudobulk3 0.026766800 0.15755722 0.05150441 0.08051248 0.07243675 0.07318947
#> Pseudobulk4 0.086545847 0.14256954 0.08898381 0.11858692 0.07273662 0.16087729
#> Pseudobulk5 0.085243128 0.00000000 0.19710081 0.06754208 0.09197440 0.18673578
#> Pseudobulk6 0.144867055 0.10004239 0.09563137 0.06480841 0.13269719 0.21700110
#>                     SE6        SE7        SE8        SE9
#> Pseudobulk1 0.137884470 0.09856729 0.14711607 0.07431184
#> Pseudobulk2 0.103105640 0.13583622 0.14072572 0.12901275
#> Pseudobulk3 0.093491750 0.13653579 0.17400164 0.13400369
#> Pseudobulk4 0.006250445 0.09189345 0.04284254 0.18871354
#> Pseudobulk5 0.151838273 0.00000000 0.08386295 0.13570257
#> Pseudobulk6 0.042467112 0.02007788 0.07038096 0.11202653