man/SpaTopic_inference.Rd

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/Gibbs_sampler.R
\name{SpaTopic_inference}
\alias{SpaTopic_inference}
\title{'SpaTopic': fast topic inference to identify tissue architecture in multiplexed images}
\usage{
SpaTopic_inference(
  tissue,
  ntopics,
  sigma = 50,
  region_radius = 400,
  kneigh = 5,
  npoints_selected = 1,
  ini_LDA = TRUE,
  ninit = 10,
  niter_init = 100,
  beta = 0.05,
  alpha = 0.01,
  trace = FALSE,
  seed = 123,
  thin = 20,
  burnin = 1000,
  niter = 200,
  display_progress = TRUE,
  z_cellsize = region_radius * 2,
  do.parallel = FALSE,
  n.cores = 1,
  axis = "2D"
)
}
\arguments{
\item{tissue}{(Required). A data frame or a list of data frames. One for each image. 
Each row represent a cell with its image ID, X, Y coordinates on the image, celltype,
with column names (image, X, Y, type), respectively. For 3D tissue images, you may add 
either a 'Z' column (preferred) or 'Y2' column (legacy support) for the third dimension.}

\item{ntopics}{(Required). Number of topics. Topics will be obtained as distributions 
of cell types.}

\item{sigma}{Default is 50. The lengthscale of the Nearest-neighbor Exponential Kernel.
Sigma controls the strength of decay of correlation with distance in the kernel function.
Please check the paper for more information. 
Need to be adjusted based on the image resolution}

\item{region_radius}{Default is 400. The radius for each grid square when
sampling region centers for each image. 
Need to be adjusted based on the image resolution and pattern complexity.}

\item{kneigh}{Default is 5. Only consider the top 5 closest region centers for each cell.}

\item{npoints_selected}{Default is 1. Number of points sampled for each grid square 
when sampling region centers for each image. Used with \code{region_radius}.}

\item{ini_LDA}{Default is TRUE. Use warm start strategy for initialization and choose the best one
to continue. If 0, it simply uses the first initialization.}

\item{ninit}{Default is 10. Number of initialization. 
Only retain the initialization with the highest log likelihood (perplexity).}

\item{niter_init}{Default is 100. Warm start with 100 iterations in the Gibbs sampling 
during initialization.}

\item{beta}{Default is 0.05. A hyperparameter to control the sparsity of topic content
(topic-celltype) matrix \code{Beta}. A smaller value introduces more sparse in \code{Beta}.}

\item{alpha}{Default is 0.01. A hyperparameter to control the sparsity of document (region) content
(region-topic) matrix \code{Theta}. For our application, we keep it 
very small for the sparsity in \code{Theta}.}

\item{trace}{Default is FALSE. Compute and save log likelihood, \code{Ndk}, \code{Nwk} 
for every posterior samples. Useful when you want to use DIC to select number of 
topics, but it is time consuming to compute the likelihood for every posterior samples.}

\item{seed}{Default is 123. Random seed.}

\item{thin}{Default is 20. Key parameter in Gibbs sampling. 
Collect a posterior sample for every thin=20 iterations.}

\item{burnin}{Default is 1000. Key parameter in Gibbs sampling.
Start to collect posterior samples after 1000 iterations. You may increase
the number of iterations for burn-in for highly complex tissue images.}

\item{niter}{Default is 200. Key parameter in Gibbs sampling. 
Number of posterior samples collected for model inference.}

\item{display_progress}{Default is TRUE. Display the progress bar.}

\item{z_cellsize}{Default is region_radius*2. The thickness of each Z slice when
performing 3D stratified sampling. Only used when axis = "3D". Controls the 
Z-dimension binning resolution for region center selection in 3D tissue images.
Need to be adjusted based on the tissue thickness and Z-resolution.}

\item{do.parallel}{Default is FALSE. Use parallel computing through R package \code{foreach}.}

\item{n.cores}{Default is 1. Number of cores used in parallel computing.}

\item{axis}{Default is "2D". You may switch to "3D" for 3D tissue images. 
However, the model inference for 3D tissue is still under test.}
}
\value{
Return a \code{\link{SpaTopic-class}} object. A list of outputs from Gibbs sampling.
}
\description{
This is the main function of 'SpaTopic', implementing a Collapsed Gibbs
Sampling algorithm to learn topics, which referred to different tissue microenvironments, 
across multiple multiplexed tissue images. 
The function takes cell labels and coordinates on tissue images as input,
and returns the inferred topic labels for every cell, as well as topic contents, a distribution
over celltypes.
The function recovers spatial tissue architectures across images, 
as well as indicating cell-cell interactions in each domain.
}
\examples{

## tissue is a data frame containing cellular information from one image or
## multiple data frames from multiple images.

data("lung5")
## NOT RUN, it takes about 90s
library(sf)
#gibbs.res<-SpaTopic_inference(lung5, ntopics = 7,
#                               sigma = 50, region_radius = 400)
                             
                              
## generate a fake image 2 and make an example for multiple images
## NOT RUN
#lung6<-lung5
#lung6$image<-"image2"  ## The image ID of two images should be different
#gibbs.res<-SpaTopic_inference(list(A = lung5, B = lung6), 
#                 ntopics = 7, sigma = 50, region_radius = 400) 

}
\seealso{
\code{\link{SpaTopic-class}}
}