Skip to contents

This function generates pseudobulk samples by aggregating single-cell transcriptomics.

Usage

CreatePseudobulks(data = NULL, groups, counts = NULL, n_mixtures = 100)

Arguments

data

A matrix of normalized gene expression data (genes x cells). If NULL, counts must be provided.

groups

A named vector indicating the group (e.g., spatial ecotype) for each cell. The names should correspond to the column names of the data matrix.

counts

A matrix of raw counts data (genes x cells). Used to generate normalized data if data is not provided.

n_mixtures

An integer specifying the number of pseudobulk samples to create. Default is 100.

Value

A list containing two elements:

Fracs

A matrix of the fractions of each group in the pseudobulk samples (rows represent pseudobulk samples, columns represent groups).

Mixtures

A matrix of pseudobulk gene expression data (genes x pseudobulk samples).

Details

  • If `data` is not provided, the function will normalize the `counts` matrix by dividing each column by its sum and multiplying by 10,000.

  • If the maximum value in `data` is less than 80, it assumes the data is in log2 scale and converts it back to non-log scale.

  • The `groups` vector is used to ensure that cells are correctly assigned to their respective groups. If `groups` does not have names, it is assumed that the names correspond to the column names of the `data` matrix.

  • Pseudobulk samples are created by sampling cells from each group based on predefined fractions, and then calculating the average expression for each gene in the pseudobulk samples.

  • The pseudobulk data is then normalized using Seurat's `NormalizeData` function.

Examples

library(SpatialEcoTyper)
library(googledrive)
drive_deauth() # no Google sign-in is required
drive_download(as_id("15n9zlXed74oeGaO1pythOOM_iWIfuMn2"), "Melanoma_WU2161_counts.rds",
                    overwrite = TRUE)
#> Error in map(as_id(id), get_one_file_id):  In index: 1.
#> Caused by error in `.f()`:
#> ! Client error: (404) Not Found
#> File not found: 15n9zlXed74oeGaO1pythOOM_iWIfuMn2.
#>  message: File not found: 15n9zlXed74oeGaO1pythOOM_iWIfuMn2.
#>  domain: global
#>  reason: notFound
#>  location: fileId
#>  locationType: parameter
counts <- readRDS("Melanoma_WU2161_counts.rds") ## raw counts of scRNA-seq data
#> Warning: cannot open compressed file 'Melanoma_WU2161_counts.rds', probable reason 'No such file or directory'
#> Error in gzfile(file, "rb"): cannot open the connection
groups <- sample(paste0("SE", 1:10), ncol(counts), replace = TRUE)
#> Error in sample.int(length(x), size, replace, prob): invalid 'size' argument
names(groups) <- colnames(counts)
result = CreatePseudobulks(counts = counts, groups = groups, n_mixtures = 20)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 't': argument is not a matrix
head(result$Mixtures[, 1:5]) ## Gene expression matrix of pseudobulks
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'result' not found
head(result$Fracs) ## SE fractions in pseudobulks
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'result' not found