-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathREADME.Rmd
160 lines (119 loc) · 5.21 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
---
title: "Heritability of the HLA expression"
output: github_document
---
```{r setup, include=FALSE}
library(knitr)
opts_chunk$set(echo = TRUE, warning = FALSE, message = FALSE)
```
## Genotypes
### Input data
- 1000 Genomes Project Phase III, lifted over to GRCh38 coordinates
### Subsets
- Dataset 1: All 2,504 individuals, selecting the 445 Geuvadis individuals after all data filtering;
- Dataset 2: 445 individuals in Geuvadis from EUR and YRI populations;
- Dataset 3: 358 individuals in Geuvadis from EUR populations (CEU, FIN, GBR, TSI).
### VCF processing
- Filters:
- MAF threshold: 1%
- Exclude variants with any missing genotype
- Exclude variants with HWE p-value < 5x10^-8
- Pruning:
- Performed with the `SNPRelate::snpgdsLDpruning` function;
- Method: correlation;
- LD threshold: square root of 0.1.
- Total of final variants:
- Dataset 1: 738,574
- Dataset 2: 624,797
- Dataset 3: 498,533
For each subset above, VCF files were processed by the "process_genotypes.sh" scripts in the `./scripts` directory.
### PCA
```{r pca, echo = FALSE}
include_graphics("./plots/pca.png")
```
### GRMs
#### Diagonal values
```{r grm_diag, echo = FALSE}
include_graphics("./plots/grm_diag.png")
```
#### Off-Diagonal values
```{r grm_off, echo = FALSE}
include_graphics("./plots/grm_off.png")
```
For the analyses below, we will use Dataset 3 (Europeans), and the GCTA GRM.
## Phenotypes (HLA-A gene expression)
We used the FASTQ files provided by the Geuvadis Consortium for the 445 individuals in Dataset 2, and we estimated expression with Salmon.
Transcript Per Million (TPM) values are quantile normal transformed by `QTLtools correct`. Here is a comparison between TPM and standardized values **(A)**, and a characterization of the expression values by covariates in the data **(B)**.
```{r expression, echo = FALSE}
include_graphics("./plots/expression_eda.png")
```
## Fitting the Null model
```{r}
library(GENESIS)
library(Biobase)
```
First, our phenotype and covariate data as an “Annotated data frame”, which is a structure provided by `Biobase`.
We will also need the GRM.
```{r annotdf}
annot_df <- readRDS("./data/annotdf.rds")
varMetadata(annot_df)
grm <- readRDS("./data/grm_gcta_dataset3.rds")
```
Then, we fit the null model. It adjusts gene expression for covariates such as laboratory of sequencing and principal components, and also accounts for the relatedness between individuals (GRM).
```{r nullmod}
mod_null <- fitNullModel(annot_df,
outcome = "value",
covars = c("lab", "V1", "V2"),
cov.mat = grm)
```
## Heritability
`GENESIS` includes the `varCompCI` function to compute point estimates and confidence intervals for variance component estimates.
```{r herit}
varCompCI(mod_null)
```
## Test: No chromosome 6
```{r, eval=FALSE}
grm_no6 <- readRDS("./data/grm_gcta_dataset3_noMHC.rds")
mod_null_no6 <- fitNullModel(annot_df,
outcome = "value",
covars = c("lab", "V1", "V2")
cov.mat = grm_no6)
Computing Variance Component Estimates...
Sigma^2_A log-lik RSS
[1] 0.4644353 0.4644353 -445.1076275 0.7539141
[1] 0.7427113 0.1146478 -444.9217459 0.8212868
[1] 0.80922812 0.02620166 -444.88534737 0.84428059
[1] 8.218219e-01 9.081087e-03 -4.448789e+02 8.491762e-01
[1] 8.277458e-01 9.894360e-04 -4.448759e+02 8.515397e-01
[1] 8.284646e-01 0.000000e+00 NaN -1.804345e+09
[1] 3.505836e-01 0.000000e+00 -3.981850e+03 1.277769e+09
[1] 1.086148e+00 0.000000e+00 -4.054588e+03 6.257688e+08
[1] 33.85324 0.00000 -3897.47640 8165656.87151
[1] 8.435884e+01 0.000000e+00 NaN -6.104065e+06
[1] 299.0901 0.0000 NaN -4751550.1222
[1] 115.093 0.000 -4153.028 10389462.097
[1] 3.428079e+01 0.000000e+00 NaN -3.140260e+07
[1] 16.59229 0.00000 -3255.10231 419791.48131
[1] 1.640679e+01 0.000000e+00 -4.347292e+03 2.218296e+08
[1] 1.775072e+00 0.000000e+00 -4.057799e+03 3.903209e+08
[1] 3.738135e-01 0.000000e+00 NaN -2.305464e+09
[1] 5.979946e+00 0.000000e+00 NaN -3.224541e+08
[1] 2.053623e+01 0.000000e+00 NaN -3.286456e+07
[1] 6.488281e+01 0.000000e+00 NaN -1.748925e+07
[1] 458.0558 0.0000 -4015.7292 1187556.0386
[1] 3750.735 0.000 -4069.347 197290.302
[1] 1215.592 0.000 -4035.651 501747.619
[1] 239.5556 0.0000 NaN -836269.0261
[1] 1.189617e+02 0.000000e+00 NaN -1.158615e+07
[1] 344.8503 0.0000 NaN -1285605.3699
[1] 5.450396e+00 0.000000e+00 NaN -4.139080e+08
[1] 8.316068e-01 0.000000e+00 NaN -1.999110e+09
[1] 1.628054e-02 0.000000e+00 -4.052669e+03 4.128692e+10
[1] 5.046517e-03 0.000000e+00 NaN -2.696522e+11
[1] 6.334596e-02 0.000000e+00 NaN -1.107326e+10
[1] 2.331095e-01 0.000000e+00 -4.256268e+03 9.263451e+09
[1] 3.468481e-02 0.000000e+00 NaN -1.978770e+10
Error in chol.default(Sigma) :
the leading minor of order 1 is not positive definite
In addition: There were 32 warnings (use warnings() to see them)
```