Defining cohorts and performing differential expression analysis

Differential expression (DE) is a common analysis that determines if the mean gene expression level differ between two groups. Most often we have two groups, normal and disease. But, in the case of TCGA, normal tissue samples are not always available. Nonetheless, samples can be grouped in many different ways, and with the number of samples and the types of data, one can be quite creative with analysis. In this work, I will be creating groups of samples based on somatic mutations, changes in the DNA of tumors, and associating it with differentially expressed genes.

The first question: "what types of mutation data are available?"

library(dplyr)
library(bigrquery)
library(scales)
library(ggplot2)
library(ISBCGCExamples)

q <- "SELECT
        Variant_Classification, COUNT(*) AS n
      FROM
        [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
      WHERE
        Study = 'BRCA'
      GROUP BY
        Variant_Classification"

result <- query_exec(q, project)
result

##      Variant_Classification     n
## 1  De_novo_Start_OutOfFrame    26
## 2         Missense_Mutation 75429
## 3           Frame_Shift_Del  4056
## 4           Start_Codon_Ins    13
## 5            Stop_Codon_Ins     4
## 6                    Silent 27612
## 7                     5'UTR  1392
## 8           Frame_Shift_Ins  3034
## 9              In_Frame_Ins   434
## 10           Stop_Codon_Del    16
## 11                      IGR  1641
## 12                   Intron  6617
## 13                  lincRNA   661
## 14        Nonsense_Mutation  6038
## 15             In_Frame_Del  1337
## 16    De_novo_Start_InFrame     7
## 17                      RNA  2644
## 18              Splice_Site  4158
## 19          Start_Codon_SNP    97
## 20                    3'UTR  2027
## 21         Nonstop_Mutation    94
## 22          Start_Codon_Del    25

This query is selecting all the variant classification labels for samples in the BRCA study using the Somatic Mutation table. The GROUP BY portion of the query ensures we only get each category once, similar to the DISTINCT keyword.

Oftentimes, people are interested in specific types of variants, such as exonic variants. Here, I'm interested in whether any type of variant might be associated with differential expression. In order to perform such an analysis, we need to define two groups. For now, let's focus on the IL10 gene, an anti-inflammatory cytokine (signalling molecule) used by the immune system. So let's find out how many individuals have a mutation in IL10, in the BRCA study.

q <- "SELECT
        ParticipantBarcode
      FROM
        [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
      WHERE
        Hugo_Symbol = 'IL10'
        AND Study = 'BRCA'
      GROUP BY
        ParticipantBarcode"

query_exec(q, project)

## [1] ParticipantBarcode
## <0 rows> (or 0-length row.names)

It turns out there's no data for that gene in the BRCA study. Maybe we should find out What genes do have somatic mutations in BRCA, and how many samples? That way we can make a more educated choice.

q <- "SELECT
        COUNT (DISTINCT(pb)) AS sampleN,
        hugosymbols AS gene
      FROM (
        SELECT
            ParticipantBarcode AS pb,
            Hugo_Symbol AS hugosymbols
        FROM
            [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
        WHERE
            Study = 'BRCA'
        GROUP BY
            hugosymbols,
            pb)
      GROUP BY
        gene
      ORDER BY
        sampleN DESC"
genes <- query_exec(q, project)
head(genes)

##   sampleN    gene
## 1     627 Unknown
## 2     335  PIK3CA
## 3     334    TP53
## 4     206     TTN
## 5     123    CDH1
## 6     117   GATA3

In this query, first on the inside, we are selecting ParticipantBarcodes and gene symbols, and grouping on those variables. This lets us get one count per gene per sample. Then from that table generated on the interior of the query, we count the number of gene symbols, giving us the number of samples with variants in a particular gene.

OK, so we see that a fair number of variants are not associated with any gene, and PIK3CA ranks first for the number of samples with mutations in that gene. Let's check our results on one gene.

q <- "SELECT count(hugosymbols)
        FROM (
          SELECT
            ParticipantBarcode, Hugo_Symbol as hugosymbols
          FROM
            [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
          WHERE
            Hugo_Symbol = 'GATA3'
            and Study = 'BRCA'
          GROUP BY
            ParticipantBarcode,
            hugosymbols )"
query_exec(q, project)

##   f0_
## 1 117

The inner select statement produces a table with a row for each participant. Then we can count over that table to get the number of participants. This number matches what we found above. You can see that if the aggregation variable is not named, it gets a name like "f0_".

Now we need to define our groups, those with a variant in a particular gene, and those without. Let's build a table of participant barcodes with a variant in gene GATA3.

First let's get all ParticipantBarcode associated with BRCA.

q <- "
  SELECT
    ParticipantBarcode,
  FROM
    [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
  WHERE
    Study = 'BRCA'
  GROUP BY
    ParticipantBarcode
 "
barcodesBRCA <- query_exec(q, project)
head(barcodesBRCA)

##   ParticipantBarcode
## 1       TCGA-A8-A07R
## 2       TCGA-BH-A1F8
## 3       TCGA-AO-A0J8
## 4       TCGA-AN-A0XN
## 5       TCGA-A8-A097
## 6       TCGA-C8-A27B

Then, let's get the barcodes for samples with a mutation in GATA3, since it had a good number of samples according to the gene table we made above (and I'm interested in immune related genes).

q <- "
  SELECT
    ParticipantBarcode,
  FROM
    [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
  WHERE
    Hugo_Symbol = 'GATA3'
    AND Study = 'BRCA'
  GROUP BY
    ParticipantBarcode
 "
barcodesWithMutations <- query_exec(q, project)
head(barcodesWithMutations)

##   ParticipantBarcode
## 1       TCGA-OL-A5D8
## 2       TCGA-D8-A1J8
## 3       TCGA-A2-A0CV
## 4       TCGA-C8-A26V
## 5       TCGA-E2-A15J
## 6       TCGA-B6-A40C

Next, we get the participant barcodes that do not have mutations in GATA3, using the following query.

q <- "
SELECT
  ParticipantBarcode
FROM
  [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
WHERE ParticipantBarcode NOT IN (
    SELECT
      ParticipantBarcode,
    FROM
      [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
    WHERE
      Hugo_Symbol = 'GATA3'
      and Study = 'BRCA'
    GROUP BY
      ParticipantBarcode)
  and Study = 'BRCA'
GROUP BY ParticipantBarcode"
barcodesWithOUTMutations <- query_exec(q, project)
sum(barcodesWithMutations$ParticipantBarcode %in% barcodesWithOUTMutations$ParticipantBarcode)

## [1] 0

This gives us 117 samples with a variant in GATA3, 873 samples without a variant in GATA3, which matches the number of samples that have mutation data in the MAF table (990). Let's take a look at what kind of variants are found in GATA3, in case we would like to eliminate some.

q <- "
  SELECT
    vc,
    count(vc) as num_variants_in_class
  FROM (
    SELECT
      ParticipantBarcode,
      Variant_Classification as vc
    FROM
      [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
    WHERE
      Hugo_Symbol = 'GATA3'
      AND Study = 'BRCA'
    GROUP BY
      ParticipantBarcode,
      vc)
  GROUP BY vc
 "
query_exec(q, project)

##                  vc num_variants_in_class
## 1   Frame_Shift_Ins                    62
## 2 Nonsense_Mutation                     2
## 3       Splice_Site                    24
## 4 Missense_Mutation                    10
## 5    Stop_Codon_Ins                     1
## 6   Frame_Shift_Del                    19
## 7            Silent                     2

So we find that frame shift insertions are the most common type of variant, followed by splice site variants.

Next, we compute the average gene expression over our defined cohorts. To do this, we first select the participants with mutations in GATA3, then in the "exterior" of the query, we select the normalized_count, which is the gene expression level, for gene GATA3, only for tumor samples (SampleTypeLetterCode = 'TP'), and only for participants with mutations in GATA3, as already defined. With the normalized_count, we can use aggregation functions to compute the mean and standard deviation on the LOG2 of expression.

q <- "
SELECT
  AVG(LOG2(normalized_count+1)) as mean_expr,
  STDDEV(LOG2(normalized_count+1)) as sd_expr
FROM
  [isb-cgc:tcga_201510_alpha.mRNA_UNC_HiSeq_RSEM]
WHERE
  HGNC_gene_symbol = 'GATA3'
  AND SampleTypeLetterCode = 'TP'
  AND ParticipantBarcode IN (
  SELECT
    ParticipantBarcode
  FROM
    [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
  WHERE
    Hugo_Symbol = 'GATA3'
    and Study = 'BRCA'
  GROUP BY
    ParticipantBarcode )
"
query_exec(q, project)

##   mean_expr   sd_expr
## 1  14.11414 0.6700897

With the ability to compute means and stddevs, we can implement a
T statistic using unpooled standard errors.

$$ \frac{(x - y)}{ \sqrt{ \frac{s_x^2}{n_x} + \frac{s_y^2}{n_y} } } $$

Where x and y are the means, $s_x^2$ is the squared standard error for mean x, and $n_x$ is the number of samples related to x.

q <- "
SELECT
  p.gene AS gene,
  p.study AS study,
  p.x AS x,
  p.sx2 AS sx2,
  p.nx AS nx,
  o.y AS y,
  o.sy2 AS sy2,
  o.ny AS ny,
  (p.x-o.y) / SQRT((p.sx2/p.nx) + (o.sy2/o.ny)) AS T
FROM (
  SELECT
    Study AS study,
    HGNC_gene_symbol AS gene,
    AVG(LOG2(normalized_count+1)) AS x,
    POW(STDDEV(LOG2(normalized_count+1)),2) AS sx2,
    COUNT(ParticipantBarcode) AS nx
  FROM
    [isb-cgc:tcga_201510_alpha.mRNA_UNC_HiSeq_RSEM]
  WHERE
    HGNC_gene_symbol = 'GATA3'
    AND SampleTypeLetterCode = 'TP'
    AND Study = 'BRCA'
    AND (ParticipantBarcode IN (
      SELECT
        m.ParticipantBarcode,
      FROM
        [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls] AS m
      WHERE
        m.Hugo_Symbol = 'GATA3'
        AND m.Study = 'BRCA'
      GROUP BY
        m.ParticipantBarcode )) /* end table of participant */
  GROUP BY
    gene,
    study ) AS p
JOIN (
  SELECT
    Study AS study,
    HGNC_gene_symbol AS gene,
    AVG(LOG2(normalized_count+1)) AS y,
    POW(STDDEV(LOG2(normalized_count+1)),2) AS sy2,
    COUNT(ParticipantBarcode) AS ny
  FROM
    [isb-cgc:tcga_201510_alpha.mRNA_UNC_HiSeq_RSEM]
  WHERE
    Study = 'BRCA'
    AND HGNC_gene_symbol = 'GATA3'
    AND SampleTypeLetterCode = 'TP'
    AND (ParticipantBarcode IN (
      SELECT
        ParticipantBarcode
      FROM
        [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
      WHERE
        Study = 'BRCA'
        AND (ParticipantBarcode NOT IN (
          SELECT
            m.ParticipantBarcode,
          FROM
            [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls] AS m
          WHERE
            m.Hugo_Symbol = 'GATA3'
            AND m.Study = 'BRCA'
          GROUP BY
            m.ParticipantBarcode )) /* end getting table of participants without mutations */
      GROUP BY
        ParticipantBarcode )) /* end getting table of participants */
  GROUP BY
    study,
    gene ) AS o
ON
  p.gene = o.gene
  AND p.study = o.study
GROUP BY
  gene,
  study,
  x,
  sx2,
  nx,
  y,
  sy2,
  ny,
  T
"
result1 <- query_exec(q, project)
result1

##    gene study        x       sx2  nx        y      sy2  ny        T
## 1 GATA3  BRCA 14.11414 0.4490203 116 12.44092 4.574784 872 17.52361

A little about this query: We are selecting the gene expression from our two previously defined groups, which results in two tables that need to be joined. The two tables are named o and p (not great names, I know), which lets us perform aggregation functions on each table separately to get the mean gene expression in each group. Then, it's just a matter of computing the T statistic using the above formula.

In building this query, I found that one cannot give the source tables an alias. It seems to throw off the 'ParticipantBarcode IN ' statement, and give weird errors. Second, make sure the two tables we're joining have different aliases!

Finally, let's just "bang-the-hammer" and query ALL genes!

q <- "
SELECT
  p.gene as gene,
  p.study as study,
  ABS(p.x - o.y) as mean_diff,
  p.x  as x,
  p.sx2 as sx2,
  p.nx as nx,
  o.y as y,
  o.sy2 as sy2,
  o.ny as ny,
  (p.x-o.y) / SQRT((p.sx2/p.nx) + (o.sy2/o.ny)) as T
FROM (
  SELECT
    Study as study,
    HGNC_gene_symbol as gene,
    AVG(LOG2(normalized_count+1)) as x,
    POW(STDDEV(LOG2(normalized_count+1)),2) as sx2,
    COUNT(ParticipantBarcode) as nx
  FROM
    [isb-cgc:tcga_201510_alpha.mRNA_UNC_HiSeq_RSEM]
  WHERE
    SampleTypeLetterCode = 'TP'
    AND Study = 'BRCA'
    and (ParticipantBarcode IN (
      SELECT
        m.ParticipantBarcode,
      FROM
        [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls] as m
      WHERE
        m.Hugo_Symbol = 'GATA3'
        AND m.Study = 'BRCA'
      GROUP BY
        m.ParticipantBarcode
      )) /* end table of participant */
    GROUP BY gene, study
    ) as p
JOIN (
   SELECT
   Study as study,
   HGNC_gene_symbol as gene,
   AVG(LOG2(normalized_count+1)) AS y,
   POW(STDDEV(LOG2(normalized_count+1)),2) AS sy2,
   COUNT(ParticipantBarcode) as ny
  FROM
   [isb-cgc:tcga_201510_alpha.mRNA_UNC_HiSeq_RSEM]
  WHERE
   SampleTypeLetterCode = 'TP'
   and Study = 'BRCA'
   AND (ParticipantBarcode IN (
     SELECT
       ParticipantBarcode
     FROM
       [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
     WHERE Study = 'BRCA'
     and (ParticipantBarcode NOT IN (
         SELECT
           m.ParticipantBarcode,
         FROM
           [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls] as m
         WHERE
           m.Hugo_Symbol = 'GATA3'
           and m.Study = 'BRCA'
         GROUP BY
           m.ParticipantBarcode
         )) /* end getting table of participants without mutations */
     GROUP BY ParticipantBarcode
   )) /* end getting table of participants */
  GROUP BY
   study, gene
 ) AS o
ON
  p.gene = o.gene and p.study = o.study
GROUP BY gene, study, x, sx2, nx, y, sy2, ny, T, mean_diff
ORDER BY mean_diff DESC
"
system.time(result1 <- query_exec(q, project))

## 
Retrieving data:  2.9s
Retrieving data:  5.2s

##    user  system elapsed 
##   1.280   0.114   9.194

compute_df <- function(d) {
  ((d$sx2/d$nx + d$sy2/d$ny)^2) /
  ((1/(d$nx-1))*(d$sx2/d$nx)^2 + (1/(d$ny-1))*(d$sy2/d$ny)^2)
}

result1$df <- compute_df(result1)
result1$p_value <- sapply(1:nrow(result1), function(i) 2*pt(abs(result1$T[i]), result1$df[i],lower=FALSE))
result1$fdr <- p.adjust(result1$p_value, "fdr")
result1$gene_label <- 1
result1$gene_label[result1$gene == "GATA3"] <- 2

# ordered by difference in means
head(result1)

##     gene study mean_diff         x       sx2  nx        y       sy2  ny
## 1  KCNJ3  BRCA  3.273704  7.485121 14.685614 116 4.211417 15.456478 872
## 2   CPB1  BRCA  3.226143  9.392237 36.573811 116 6.166094 24.273983 872
## 3   AGR3  BRCA  2.605398 11.179273  2.443981 116 8.573875 17.623450 872
## 4    TRH  BRCA  2.504631  5.231328 18.403546 116 2.726697  6.268322 872
## 5  FSIP1  BRCA  2.373198  8.866036  2.245905 116 6.492839  7.712446 872
## 6 CRABP1  BRCA  2.321596  3.410742  7.469231 116 5.732338 13.186676 872
##           T       df      p_value          fdr gene_label
## 1  8.617236 149.0709 9.190821e-15 1.507825e-12          1
## 2  5.507509 136.0631 1.756605e-07 3.090353e-06          1
## 3 12.823558 393.6262 1.069316e-31 3.549239e-28          1
## 4  6.150346 125.6233 9.514207e-09 2.470345e-07          1
## 5 14.130735 237.5220 2.587392e-33 1.288198e-29          1
## 6 -8.233218 174.0920 4.162527e-14 6.050855e-12          1

# ordered by T statistic
head(result1[order(result1$T, decreasing=T), ])

##       gene study mean_diff         x       sx2  nx         y       sy2  ny
## 39   GATA3  BRCA  1.673222 14.114141 0.4490203 116 12.440919  4.574784 872
## 5    FSIP1  BRCA  2.373198  8.866036 2.2459046 116  6.492839  7.712446 872
## 54  TBC1D9  BRCA  1.583615 13.761795 1.0174506 116 12.178180  3.555983 872
## 77   FOXA1  BRCA  1.471983 12.926657 0.2378543 116 11.454674  8.094412 872
## 15 DNAJC12  BRCA  2.087723 10.356336 2.2985499 116  8.268613  5.297170 872
## 3     AGR3  BRCA  2.605398 11.179273 2.4439810 116  8.573875 17.623450 872
##           T       df      p_value          fdr gene_label
## 39 17.52361 513.4457 3.095801e-54 6.165288e-50          2
## 5  14.13074 237.5220 2.587392e-33 1.288198e-29          1
## 54 13.97055 239.9444 7.424160e-33 2.957043e-29          1
## 77 13.82704 947.9646 9.549048e-40 6.338976e-36          1
## 15 12.97503 193.9135 4.033130e-28 8.924421e-25          1
## 3  12.82356 393.6262 1.069316e-31 3.549239e-28          1

# ordered by FDR
head(result1[order(result1$fdr, decreasing=F), ])

##       gene study mean_diff          x       sx2  nx         y       sy2
## 39   GATA3  BRCA  1.673222 14.1141410 0.4490203 116 12.440919  4.574784
## 77   FOXA1  BRCA  1.471983 12.9266565 0.2378543 116 11.454674  8.094412
## 138  OPRK1  BRCA  1.283247  0.3843466 0.3377241 116  1.667593  4.605231
## 5    FSIP1  BRCA  2.373198  8.8660364 2.2459046 116  6.492839  7.712446
## 54  TBC1D9  BRCA  1.583615 13.7617954 1.0174506 116 12.178180  3.555983
## 3     AGR3  BRCA  2.605398 11.1792730 2.4439810 116  8.573875 17.623450
##      ny         T       df      p_value          fdr gene_label
## 39  872  17.52361 513.4457 3.095801e-54 6.165288e-50          2
## 77  872  13.82704 947.9646 9.549048e-40 6.338976e-36          1
## 138 872 -14.17745 634.8217 7.749128e-40 6.338976e-36          1
## 5   872  14.13074 237.5220 2.587392e-33 1.288198e-29          1
## 54  872  13.97055 239.9444 7.424160e-33 2.957043e-29          1
## 3   872  12.82356 393.6262 1.069316e-31 3.549239e-28          1

qplot(data=result1, x=T, y=mean_diff, shape=as.factor(gene_label), col=as.factor(fdr < 0.05), ylab="Difference in Means", xlab="T statistic")

## Warning: Removed 267 rows containing missing values (geom_point).

Wow! So did we do that right? Let's check on a single gene. We'll pull down the actual expression values, and use the R T-test.

q <- "
SELECT HGNC_gene_symbol, ParticipantBarcode, LOG2(normalized_count+1)
FROM [isb-cgc:tcga_201510_alpha.mRNA_UNC_HiSeq_RSEM]
WHERE Study = 'BRCA'
and HGNC_gene_symbol = 'GATA3'
and SampleTypeLetterCode = 'TP'
and (ParticipantBarcode IN (
  SELECT
    m.ParticipantBarcode,
  FROM
    [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls] as m
  WHERE
    m.Hugo_Symbol = 'GATA3'
    AND m.Study = 'BRCA'
  GROUP BY
    m.ParticipantBarcode
  ))
"
mutExpr <- query_exec(q, project)   # SOME DUPLCIATES

q <- "
SELECT HGNC_gene_symbol, ParticipantBarcode, LOG2(normalized_count+1)
FROM [isb-cgc:tcga_201510_alpha.mRNA_UNC_HiSeq_RSEM]
WHERE Study = 'BRCA'
and HGNC_gene_symbol = 'GATA3'
and SampleTypeLetterCode = 'TP'
and (ParticipantBarcode IN (
  SELECT
    ParticipantBarcode
  FROM
    [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls]
  WHERE Study = 'BRCA'
  and (ParticipantBarcode NOT IN (
      SELECT
        m.ParticipantBarcode,
      FROM
        [isb-cgc:tcga_201510_alpha.Somatic_Mutation_calls] as m
      WHERE
        m.Hugo_Symbol = 'GATA3'
        and m.Study = 'BRCA'
      GROUP BY
        m.ParticipantBarcode
      )) /* end getting table of participants without mutations */
  GROUP BY ParticipantBarcode
)) /* end getting table of participants */
"
wtExpr <- query_exec(q, project)   # SOME DUPLCIATES

t.test(mutExpr$f0_, wtExpr$f0_)

## 
## 	Welch Two Sample t-test
## 
## data:  mutExpr$f0_ and wtExpr$f0_
## t = 17.524, df = 513.45, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.485635 1.860810
## sample estimates:
## mean of x mean of y 
##  14.11414  12.44092

boxplot(list(Mutation_In_GATA3=mutExpr$f0_, No_Mutation_In_GATA3=wtExpr$f0_), ylab="GATA3 LOG2 expression")

So, we have found the same gene expression means and standard deviation, degrees of freedom, and T statistic. Looks good! GATA3 has the most signficant result, interesting (or not?) since we split our samples on mutations in GATA3!

This result has been previously seen in: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4303202/

sessionInfo()

## R version 3.2.4 (2016-03-10)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## Running under: OS X 10.11.4 (El Capitan)
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] ISBCGCExamples_0.1.1 ggplot2_2.1.0        scales_0.4.0        
## [4] bigrquery_0.2.0      dplyr_0.4.3         
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_0.12.4      knitr_1.12.3     magrittr_1.5     munsell_0.4.3   
##  [5] colorspace_1.2-6 R6_2.1.2         stringr_1.0.0    httr_1.1.0      
##  [9] plyr_1.8.3       tools_3.2.4      parallel_3.2.4   grid_3.2.4      
## [13] gtable_0.2.0     DBI_0.3.1        digest_0.6.9     openssl_0.9.2   
## [17] assertthat_0.1   formatR_1.3      curl_0.9.6       evaluate_0.8.3  
## [21] mime_0.4         labeling_0.3     stringi_1.0-1    jsonlite_0.9.19 
## [25] markdown_0.7.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GenomicAndExpression_T_test.md

GenomicAndExpression_T_test.md

Defining cohorts and performing differential expression analysis

Files

GenomicAndExpression_T_test.md

Latest commit

History

GenomicAndExpression_T_test.md

File metadata and controls

Defining cohorts and performing differential expression analysis