This function generates pseudobulk samples by aggregating single-cell transcriptomics.
Arguments
- data
- A matrix of normalized gene expression data (genes x cells). If NULL, counts must be provided. 
- groups
- A named vector indicating the group (e.g., spatial ecotype) for each cell. The names should correspond to the column names of the data matrix. 
- counts
- A matrix of raw counts data (genes x cells). Used to generate normalized data if data is not provided. 
- n_mixtures
- An integer specifying the number of pseudobulk samples to create. Default is 100. 
Value
A list containing two elements:
- Fracs
- A matrix of the fractions of each group in the pseudobulk samples (rows represent pseudobulk samples, columns represent groups). 
- Mixtures
- A matrix of pseudobulk gene expression data (genes x pseudobulk samples). 
Details
- If `data` is not provided, the function will normalize the `counts` matrix by dividing each column by its sum and multiplying by 10,000. 
- If the maximum value in `data` is less than 80, it assumes the data is in log2 scale and converts it back to non-log scale. 
- The `groups` vector is used to ensure that cells are correctly assigned to their respective groups. If `groups` does not have names, it is assumed that the names correspond to the column names of the `data` matrix. 
- Pseudobulk samples are created by sampling cells from each group based on predefined fractions, and then calculating the average expression for each gene in the pseudobulk samples. 
- The pseudobulk data is then normalized using Seurat's `NormalizeData` function. 
Examples
library(SpatialEcoTyper)
library(googledrive)
drive_deauth() # no Google sign-in is required
drive_download(as_id("15n9zlXed74oeGaO1pythOOM_iWIfuMn2"), "Melanoma_WU2161_counts.rds",
                    overwrite = TRUE)
#> Error in map(as_id(id), get_one_file_id): ℹ In index: 1.
#> Caused by error in `.f()`:
#> ! Client error: (404) Not Found
#> File not found: 15n9zlXed74oeGaO1pythOOM_iWIfuMn2.
#> • message: File not found: 15n9zlXed74oeGaO1pythOOM_iWIfuMn2.
#> • domain: global
#> • reason: notFound
#> • location: fileId
#> • locationType: parameter
counts <- readRDS("Melanoma_WU2161_counts.rds") ## raw counts of scRNA-seq data
#> Warning: cannot open compressed file 'Melanoma_WU2161_counts.rds', probable reason 'No such file or directory'
#> Error in gzfile(file, "rb"): cannot open the connection
groups <- sample(paste0("SE", 1:10), ncol(counts), replace = TRUE)
#> Error in sample.int(length(x), size, replace, prob): invalid 'size' argument
names(groups) <- colnames(counts)
result = CreatePseudobulks(counts = counts, groups = groups, n_mixtures = 20)
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 't': argument is not a matrix
head(result$Mixtures[, 1:5]) ## Gene expression matrix of pseudobulks
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'result' not found
head(result$Fracs) ## SE fractions in pseudobulks
#> Error in h(simpleError(msg, call)): error in evaluating the argument 'x' in selecting a method for function 'head': object 'result' not found
