-
Notifications
You must be signed in to change notification settings - Fork 4
/
chapter_conclusions.tex
133 lines (118 loc) · 8.15 KB
/
chapter_conclusions.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
This thesis presents a collection of static and dynamic snapshots of diverse
microbial systems. Although our new methods are in no way specific to the human
gut, part of our focus has been to bridge the gap between computation and
biology, to get us closer towards a future where drug prescription, diet and
recreational activities can be further \textit{personalized} -- through a layer
invisible to the naked eye but perceivable in most respects of life.
It seems impossible to accurately predict the impact of scientific discoveries.
Historically, we have seen great evidence of seemingly anachronistic findings
(for example neural networks \cite{Tem10}) that later, usually when other
technologies catch up, become new fields of research or cornerstones in
consumer applications. Therefore, while acknowledging this is a tough problem,
I present my personal view on future directions that can build up from the
contributions in this thesis. I divide these into three sections: diagnostic
methods, treatment and analysis.
\section{Diagnostic Methods}
In Chapter~\ref{chapter_dogs} and Chapter~\ref{chapter_ibds}, we presented
examples of biomarkers for \gls{ibd} and \gls{cd} from two different analytical
perspectives. First, in a dog cohort, we showed that we can reformulate the
dysbiosis index we originally developed for humans \cite{RN154}, and make
it specific for a different host\hyp{}species. Both in humans and dogs, this
log-ratio of bacterial groups is associated with decreased phylogenetic
diversity, and in humans, it is also associated with increased inflammation.
Next, after noticing the increased volatility in the microbiome of subjects
with \gls{ibd}, we benchmarked microbial variability in fecal samples as an
effective classifier for disease. By collecting more samples per subject, we
can overcome the low classification accuracy of fecal samples.
Although both approaches are encouraging, translating research into
consumer\hyp{}level applications, commonly presents formidable challenges that
might only be solvable by future generations. Take for example the \gls{ecg},
it took almost 125 years, between the first observation of
biopotentials\footnote{Credited to Luigi Galvani in 1787.}, and the moment when
the first table \gls{ecg}\footnote{Willem Einthoven's ectrocardiograph was
manufactured by the Cambridge Scientific Instrument Company of London in 1911.}
became commercially available \cite{ECGZywietz}. Even after the tremendous
progress, proper validation of computer-generated diagnostics only appeared
more than 200 years after the initial discovery \cite{njem_ecg}.
I use the example of the \gls{ecg}, because much like raw heart biopotentials,
(\textit{i}) microbiome data is plagued with noise, and (\textit{ii})
determining the appropriate filters and thresholds directly depends on the
use-case. However, unlike with the early developments of the \gls{ecg}, we live
in digital and connected world. As such, the following focuses will
shorten the time between future discoveries and innovation:
\begin{description}
\item[Mechanistic Studies]The novelty characteristic of microbiome studies,
has produced a large amount of descriptive studies. Contrastingly,
mechanistic experiments have lagged in validating several of these
findings (likely due to the more expensive requirements). If the goal
is to relate the presence or abundance of a microbiome feature to a
disease state or biochemical process, the underlying methods must be
informed by biological inferences in order for these biomarkers to gain
credibility.
\item[Open Data]Medical research is especially affected by human factors,
being able to consistently collect samples from a subject is not always
easy or deterministic (think of bowel problems and fecal samples). A
powerful practice used to counter underpowered studies is to improve on
the current sample size through the reuse of previously published data.
This approach is only possible through open resources, like
Qiita\footnote{\url{https://qiita.ucsd.edu/}}, that make data reuse
seamless. Importantly, the work in this thesis was only possible
through the reuse of existing datasets (see
Chapter~\ref{exploratory_chapter} through Chapter~\ref{chapter_fmts}).
Although making data openly available is an important first step,
proper standardization of processing protocols will also be key to
maximizes data re-usability. A remarkable example of this practice is
the \gls{emp} \cite{RN4267}.
\end{description}
\section{Treatment}
Microbiome\hyp{}based treatments are generally based on the transplantation of
microbial communities from a \textit{healthy donor} into an \textit{affected
recipient}. Three spearheading examples are: \glspl{fmt} to treat
\gls{cdi} \cite{RN4129}, skin microbiomes transplants to treat atopic
dermatitis \cite{GalloSkin}, and (although still in early stages) a capsule
full of microbial spores to treat \gls{uc} by Seres
Therapeutics\footnote{\url{http://www.serestherapeutics.com}}.
Although most \glspl{fmt} succeed at treating \gls{cdi}, what makes a
successful transplant is not clear yet. The number of variables implicated in
answering this question is immense. From a computational complexity standpoint,
\textit{a priori} determination of whether or not a new community can colonize
an existing ecosystem is considered a hard computational problem, belonging to
the \textbf{\#-P} class \cite{RN4266}. As such our focus should be on
strengthening our systematic understanding of the transplant, and characterize
not only what makes a successful transplant but also what makes a failed one.
For example, the impact of the work presented in Chapter~\ref{section_fmt}
could have been magnified if we had included subjects for which the \gls{fmt}
failed. With appropriate sample sizes, we could have applied a number of
techniques to single out (if any) the common features that lead to a failed
transplant. With this knowledge, we could pre-screen \textit{donors} and
\textit{recipients} to ensure a successful treatment and avoid unexpected
side-effects.
\section{Analysis}
Much as our microbial symbionts depend on the host environment (and vice
versa), the development and funding of analytical tools depends on the
ever-growing necessity to unravel complex patterns in a number of experimental
setups (cross-sectional, longitudinal, multi-'omic, etc). Thus, these tools
must take into consideration the ability to be flexible, scalable, and when
possible interactive.
Novel analytical methods and software infrastructure should be built making
scalability a priority. We have seen a steady increase in our capability to
generate data, and it is likely that this trend will continue. Emperor
(Chapter~\ref{section_emperor}) was partly a response to the limitations in
existing software, and more recently we had to re-architect the underlying
implementation to cope with modern and larger datasets.
Flexibility and compliance with community standards makes software available
to a wider audience. Take the count table, often acting as the core
data-structure for metabolomic, transcriptomic, proteomic, (and other 'omics)
analyses. If the software producing this data complies with a standard, like
the \gls{biom}-format, the end user is now free to select from a variety of
methods as opposed to being limited to \textit{niche-software}. This idea is
being taken a step further with \gls{qiime}-2, where a semantic type system
defines the methods and visualizations that can be applied to any given dataset
(regardless of its biological origin).
Finally, the value of interactively exploring a dataset lies in our ability to
quickly test hypotheses and iteratively develop new ones. Future exploratory
analysis tools should be developed with interactivity and interoperability in
mind. For example, brushing to select a group of samples in a view of data
might act as a filter for a different representation. Modern web technologies,
and software development frameworks like \gls{qiime}-2, will likely be the
pioneers of these global overviews of microbial diversity.