pbmc.counts <- Read10X(data.dir = "~/Downloads/pbmc3k/filtered_gene_bc_matrices/hg19/")
pbmc <- CreateSeuratObject(counts = pbmc.counts)
pbmc <- NormalizeData(object = pbmc)
pbmc <- FindVariableFeatures(object = pbmc)
pbmc <- ScaleData(object = pbmc)
pbmc <- RunPCA(object = pbmc)
pbmc <- FindNeighbors(object = pbmc)
pbmc <- FindClusters(object = pbmc)
pbmc <- RunTSNE(object = pbmc)
DimPlot(object = pbmc, reduction = "tsne")
Seurat v2.X | Seurat v3.X |
---|---|
object@data |
GetAssayData(object = object) |
[email protected] |
GetAssayData(object = object, slot = "counts") |
[email protected] |
GetAssayData(object = object, slot = "scale.data") |
[email protected] |
colnames(x = object) |
rownames(x = object@data) |
rownames(x = object) |
[email protected] |
VariableFeatures(object = object) |
[email protected] |
HVFInfo(object = object) |
object@assays$assay.name |
object[["assay.name"]] |
object@dr$pca |
object[["pca"]] |
GetCellEmbeddings(object = object, reduction.type = "pca") |
Embeddings(object = object, reduction = "pca") |
GetGeneLoadings(object = object, reduction.type = "pca") |
Loadings(object = object, reduction = "pca") |
AddMetaData(object = object, metadata = vector, col.name = "name") |
object$name <- vector |
[email protected]$name |
object$name |
object@idents |
Idents(object = object) |
SetIdent(object = object, ident.use = "new.idents") |
Idents(object = object) <- "new.idents") |
SetIdent(object = object, cells.use = 1:10, ident.use = "new.idents") |
Idents(object = object, cells = 1:10) <- "new.idents") |
StashIdent(object = object, save.name = "saved.idents") |
object$saved.idents <- Idents(object = object) |
levels(x = object@idents) |
levels(x = objects) |
RenameIdent(object = object, old.ident.name = "old.ident", new.ident.name = "new.ident") |
RenameIdents(object = object, "old.ident" = "new.ident") |
WhichCells(object = object, ident = "ident.keep") |
WhichCells(object = object, idents = "ident.keep") |
WhichCells(object = object, ident.remove = "ident.remove") |
WhichCells(object = object, idents = "ident.remove", invert = TRUE) |
WhichCells(object = object, max.cells.per.ident = 500) |
WhichCells(object = object, downsample = 500) |
WhichCells(object = object, subset.name = "name", low.threshold = low, high.threshold = high) |
WhichCells(object = object, expression = name > low & name < high) |
FilterCells(object = object, subset.names = "name", low.threshold = low, high.threshold = high) |
subset(x = object, subset = name > low & name < high) |
SubsetData(object = object, subset.name = "name", low.threshold = low, high.threshold = high) |
subset(x = object, subset = name > low & name < high) |
MergeSeurat(object1 = object1, object2 = object2) |
merge(x = object1, y = object2) |
Seurat has 3 data slots (source):
-
counts
(raw.data
in v2)- The raw data slot ([email protected]) represents the original expression matrix, input when creating the Seurat object, and prior to any preprocessing by Seurat. For example, this could represent the UMI matrix generated by DropSeqTools or 10X CellRanger, a count matrix from featureCounts, an FPKM matrix produced by Cufflinks, or a TPM matrix produced by RSEM. Row names represent gene names, and column names represent cell names. Either raw counts or normalized values (i.e. FPKM or TPM) are fine, but the input expression matrix should not be log-transformed. Please note that Seurat can be used to analyze single cell data produced by any technology, as long as you can create an expression matrix. We provide the Read10X function to provide easy importing for datasets produced by the 10X Chromium system. Seurat uses count data when performing gene scaling and differential expression tests based on the negative binomial distribution.
-
data
= log-normalized data- The
data
slot stores normalized and log-transformed single cell expression. This maintains the relative abundance levels of all genes, and contains only zeros or positive values. See ?NormalizeData for more information. This data is used for visualizations, such as violin and feature plots, most differential expression tests, finding high-variance genes, and as input to ScaleData (see below).
- The
-
scale.data
(= z-score normalized data)- The
scale.data
slot represents a cell’s relative expression of each gene, in comparison to all other cells. Therefore this matrix contains both positive and negative values. See ?ScaleData for more information If regressing genes against unwanted sources of variation (for example, to remove cell-cycle effects), the scaled residuals from the model are stored here. This data is used as input for dimensional reduction techniques, and is displayed in heatmaps.
- The
> GetAssayData(as_fet_comb, "counts") %>% dim
[1] 0 0
> GetAssayData(as_fet_comb, "scale.data") %>% dim
[1] 1 1
> GetAssayData(as_fet_comb, "data") %>% dim
[1] 1000 1491
- stored in
[email protected]
(Seurat2) - can be accessed so:
raw.data <- GetAssayData(object = object,
assay.type = assay.type,
slot = "raw.data")
- stored in
object@data
- can be added so:
object <- SetAssayData(object = object,
assay.type = assay.type,
slot = "data",
new.data = normalized.data)
If there are multiple assays stored within the same Seurat object, one will manually have to select the "active" one:
> srt
An object of class Seurat
50120 features across 26335 samples within 3 assays
Active assay: SCT (20844 features)
2 other assays present: RNA, integrated
2 dimensional reductions calculated: pca, umap
> [email protected] # find out which one's active
> DefaultAssay(srt) <- "SCT" # define another one
genes.use <- rownames(object@data)
- Seurat2:
[email protected] <- data.frame(nGene, nUMI)
# View metadata data frame, stored in [email protected]
pbmc[[]]
# Retrieve specific values from the metadata
pbmc$nCount_RNA
pbmc[[c("percent.mito", "nFeature_RNA")]]
# Add metadata, see ?AddMetaData
random_group_labels <- sample(x = c("g1", "g2"), size = ncol(x = pbmc), replace = TRUE)
pbmc$groups <- random_group_labels
results will be stored in object@data
More interesting accessors afterwards:
[email protected]$NormalizeData$scale.factor
[email protected]$NormalizeData$normalization.method
will be stored in [email protected]
Seurat:::RegressOutResid:
possible.models <- c("linear", "poisson", "negbinom")
latent.data <- FetchData(object = object, vars.all = vars.to.regress)
## extracts the log-scaled values
data.use <- object@data[genes.regress, , drop = FALSE]
regression.mat <- cbind(latent.data, data.use[1, ])
colnames(regression.mat) <- reg.mat.colnames
fmla_str = paste0("GENE ", " ~ ", paste(vars.to.regress, collapse = "+"))
qr = lm(as.formula(fmla_str), data = regression.mat, qr = TRUE)$qr
resid <- qr.resid(qr, gene.expr[x, ])
[email protected]
[email protected]$gene.mean
[email protected]$gene.dispersion
[email protected]$gene.dispersion.scaled