Skip to content

Latest commit

 

History

History
182 lines (136 loc) · 7.42 KB

Seurat.md

File metadata and controls

182 lines (136 loc) · 7.42 KB

Seurat

Standard workflow

pbmc.counts <- Read10X(data.dir = "~/Downloads/pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc.counts)
pbmc <- NormalizeData(object = pbmc)
pbmc <- FindVariableFeatures(object = pbmc)
pbmc <- ScaleData(object = pbmc)
pbmc <- RunPCA(object = pbmc)
pbmc <- FindNeighbors(object = pbmc)
pbmc <- FindClusters(object = pbmc)
pbmc <- RunTSNE(object = pbmc)
DimPlot(object = pbmc, reduction = "tsne")

Seurat 2 vs. 3

Seurat v2.X Seurat v3.X
object@data GetAssayData(object = object)
[email protected] GetAssayData(object = object, slot = "counts")
[email protected] GetAssayData(object = object, slot = "scale.data")
[email protected] colnames(x = object)
rownames(x = object@data) rownames(x = object)
[email protected] VariableFeatures(object = object)
[email protected] HVFInfo(object = object)
object@assays$assay.name object[["assay.name"]]
object@dr$pca object[["pca"]]
GetCellEmbeddings(object = object, reduction.type = "pca") Embeddings(object = object, reduction = "pca")
GetGeneLoadings(object = object, reduction.type = "pca") Loadings(object = object, reduction = "pca")
AddMetaData(object = object, metadata = vector, col.name = "name") object$name <- vector
[email protected]$name object$name
object@idents Idents(object = object)
SetIdent(object = object, ident.use = "new.idents") Idents(object = object) <- "new.idents")
SetIdent(object = object, cells.use = 1:10, ident.use = "new.idents") Idents(object = object, cells = 1:10) <- "new.idents")
StashIdent(object = object, save.name = "saved.idents") object$saved.idents <- Idents(object = object)
levels(x = object@idents) levels(x = objects)
RenameIdent(object = object, old.ident.name = "old.ident", new.ident.name = "new.ident") RenameIdents(object = object, "old.ident" = "new.ident")
WhichCells(object = object, ident = "ident.keep") WhichCells(object = object, idents = "ident.keep")
WhichCells(object = object, ident.remove = "ident.remove") WhichCells(object = object, idents = "ident.remove", invert = TRUE)
WhichCells(object = object, max.cells.per.ident = 500) WhichCells(object = object, downsample = 500)
WhichCells(object = object, subset.name = "name", low.threshold = low, high.threshold = high) WhichCells(object = object, expression = name > low & name < high)
FilterCells(object = object, subset.names = "name", low.threshold = low, high.threshold = high) subset(x = object, subset = name > low & name < high)
SubsetData(object = object, subset.name = "name", low.threshold = low, high.threshold = high) subset(x = object, subset = name > low & name < high)
MergeSeurat(object1 = object1, object2 = object2) merge(x = object1, y = object2)

Data

Seurat has 3 data slots (source):

  • counts (raw.data in v2)

    • The raw data slot ([email protected]) represents the original expression matrix, input when creating the Seurat object, and prior to any preprocessing by Seurat. For example, this could represent the UMI matrix generated by DropSeqTools or 10X CellRanger, a count matrix from featureCounts, an FPKM matrix produced by Cufflinks, or a TPM matrix produced by RSEM. Row names represent gene names, and column names represent cell names. Either raw counts or normalized values (i.e. FPKM or TPM) are fine, but the input expression matrix should not be log-transformed. Please note that Seurat can be used to analyze single cell data produced by any technology, as long as you can create an expression matrix. We provide the Read10X function to provide easy importing for datasets produced by the 10X Chromium system. Seurat uses count data when performing gene scaling and differential expression tests based on the negative binomial distribution.
  • data = log-normalized data

    • The data slot stores normalized and log-transformed single cell expression. This maintains the relative abundance levels of all genes, and contains only zeros or positive values. See ?NormalizeData for more information. This data is used for visualizations, such as violin and feature plots, most differential expression tests, finding high-variance genes, and as input to ScaleData (see below).
  • scale.data (= z-score normalized data)

    • The scale.data slot represents a cell’s relative expression of each gene, in comparison to all other cells. Therefore this matrix contains both positive and negative values. See ?ScaleData for more information If regressing genes against unwanted sources of variation (for example, to remove cell-cycle effects), the scaled residuals from the model are stored here. This data is used as input for dimensional reduction techniques, and is displayed in heatmaps.
> GetAssayData(as_fet_comb, "counts") %>% dim
[1] 0 0
> GetAssayData(as_fet_comb, "scale.data") %>% dim
[1] 1 1
> GetAssayData(as_fet_comb, "data") %>% dim
[1] 1000 1491

Raw data

raw.data <- GetAssayData(object = object,
                         assay.type = assay.type,
                         slot = "raw.data")

Normalized data

  • stored in object@data
  • can be added so:
object <- SetAssayData(object = object,
                       assay.type = assay.type,
                       slot = "data",
                       new.data = normalized.data)

If there are multiple assays stored within the same Seurat object, one will manually have to select the "active" one:

> srt
An object of class Seurat
50120 features across 26335 samples within 3 assays
Active assay: SCT (20844 features)
 2 other assays present: RNA, integrated
 2 dimensional reductions calculated: pca, umap

> [email protected] # find out which one's active
> DefaultAssay(srt) <- "SCT" # define another one

Genes

genes.use <- rownames(object@data)

Metadata

# View metadata data frame, stored in [email protected]
pbmc[[]]

# Retrieve specific values from the metadata
pbmc$nCount_RNA
pbmc[[c("percent.mito", "nFeature_RNA")]]

# Add metadata, see ?AddMetaData
random_group_labels <- sample(x = c("g1", "g2"), size = ncol(x = pbmc), replace = TRUE)
pbmc$groups <- random_group_labels

Normalization

results will be stored in object@data

More interesting accessors afterwards:

[email protected]$NormalizeData$scale.factor
[email protected]$NormalizeData$normalization.method

Scaling

will be stored in [email protected]

 Seurat:::RegressOutResid:
 
possible.models <- c("linear", "poisson", "negbinom")
 
latent.data <- FetchData(object = object, vars.all = vars.to.regress)

## extracts the log-scaled values
data.use <- object@data[genes.regress, , drop = FALSE]

regression.mat <- cbind(latent.data, data.use[1, ])
colnames(regression.mat) <- reg.mat.colnames

fmla_str = paste0("GENE ", " ~ ", paste(vars.to.regress, collapse = "+"))

qr = lm(as.formula(fmla_str), data = regression.mat, qr = TRUE)$qr
resid <- qr.resid(qr, gene.expr[x, ])     

Variable Genes

[email protected]
[email protected]$gene.mean
[email protected]$gene.dispersion
[email protected]$gene.dispersion.scaled

More object interactions

see Seurat website