-
Notifications
You must be signed in to change notification settings - Fork 33
/
lec13-theory.Rmd
903 lines (599 loc) · 30.8 KB
/
lec13-theory.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
---
title: "Mathematical models in ecology and evolutionary biology"
author: Amber Gigi Hoi & Zoë Humphries
output: pdf_document
---
# Lesson preamble:
> ### Lesson objectives:
>
> - Introduce students to the world of mathematical modelling -- the art, the mechanics, the possibilities, and the limitations
> - Be comfortable with -- and even appreciate -- math and theory
> - Equip students with tools to construct their first model
> - Work through examples of how to use models to solve real world problems
>
> ### Lesson outline:
> Total lesson time: 3 hours
>
> - Introduction to mathematical models (10 min)
> - Learn about the different moving parts of a model and common things you can do to them and get out of them (20 min)
> - Go through examples of some classic models in EEB (30 min)
> - Working through an ecological model (1 hour)
> - Working through an example of an evolution model (1 hour)
---
# Introduction
## What is a model? Why do we model?
Up until this point in this course, we've been working with statistical models
with the goal of characterizing relationships between variables and their
significance in existing data. Today, we are going to venture into another
branch of applied mathematics and learn about _mathematical models_. A
mathematical model (a.k.a. first-principle models) is a description of a system
using a series of equations. These models are used extensively throughout the
natural and social sciences to help explain and predict the behavior of various
systems of interest.
The world is governed by series of natural laws and "truths" biologists refer
to as first principles. For instance, ocean currents flow in a certain
direction, and some salmon species reproduce just once in their life time.
Mathematical models provide us with an opportunity to study the intricacies of
biological systems under a framework of first principles. More accurately,
models forces us to explicitly lay out our conception of "reality" and to
confront those intuitions. Intuition is perhaps the biggest pitfall in biology
-- everybody loves to tell a good story where everything works out and
everything makes perfect sense, but how rigorously have we actually weeded them
before accepting story as dogma? When you construct a model, you have no choice
but to verify every single last piece of information you plan to include in
your model.
Technically, one can describe any biological phenonemon with math. At the same
time, however, it is rarely the goal to rewrite "reality" in math. The most
elegant models are those able to describe succinctly a system's most essential
attributes, and the best theoreticians are those with the state-of-the-art
knowledge of the system and the clarity of mind to know exactly what those
important features are _given the questions being asked_. Sometimes we ask
grand questions about driviers of macro system change, other times we would
kill to find that devil in the details. Knowing how best to represent your
system and answer your question is a skill that comes with experience (read:
lots of trials and lots and _lots_ of errors).
What models are good for:
* Guiding intuition regarding how processes and mechanisms work.
* Pointing out logical flaws in an argument when the math don't make sense.
* Testing hypotheses that are otherwise untestable due to ethical concerns
and/or logistical constraints.
* Generating new testable hypotheses and predictions and can inform
experimental design.
* Provide a new framework on how to think about a problem.
Limitations of a model:
* Only as interesting as the motivating question.
* Only as robust as the theory that went into building it.
* All models are wrong, some are useful.
* Without data, models are only good for telling us what is possible, but not
what happened or what will happen.
Diversity promotes science and progress. Thus, any good researcher should
strive to have at least a basic understanding of both the empirical and
theoretical sides of science in order to facilitate collaboration and
synthesis, and to push the frontiers of knowledge.
## Setup
Here are the packages we will be using today:
```{r message=FALSE, warning=FALSE}
library(tidyverse)
library(deSolve)
```
# Getting started with theory
## Some simple terminologies
* Variable -- quantity changes over time, a.k.a. the things we are tracking.
* Parameter -- pre-determined values, quantity doesn't change over time, a.k.a.
the things we know or think that we know.
* Dynamics -- fancy way of saying patterns of change over time.
* Equilibria -- the state at which the system stops changing.
* Stability -- whether the system will remain at a particular state when
perturbed.
* Sensitivity -- how readily a system will change when perturbed.
## Some simple classifications
Two basic categories based on how to solve: analytic and numeric
* A model is solved analytically when exact solutions of desired outputs are
obtained.
* Numeric methods allows you to approximate solutions when the system is too
complex to solve.
Two basic categories based on type of dynamics: discrete and continuous
* Discrete time models describe change in discrete time step with recursion
equations. These are useful when modelling semelparous organisms.
* Continous time models allow variables to change at any point in time, and is
described with differential equations. These are useful when modelling
iteroparous organisms.
Two basic categories based on nature of change: deterministic and stochastic
* Deterministic models assumes the future is determined entirely by model
mechanics.
* Stochastic models assumes that there are random events that affect the
biological processes of interest. There is always some element of randomness
in real life. Whether to model that randomness explicitly depends on how big of
a role it plays in determining the dynamics of your system.
## Learning basic model mechanics through classic models in EEB
In this section, we will learn different ways to represent system change from
one time step to another by examining and deriving some classic models in EEB.
### Exponential growth
The first population model we examine is the exponential growth model, which
describes a population that grows without restriction, e.g., bacteria on a
petri dish.
* In discrete time: literally what is $N_{t+1}$ given $N_t$.
$$N_{t+1} = RN_t$$
* In continous time: population growth is described as change in $N$ per change
in $t$. $dN/dt$ is thus the rate of population change, or the _growth rate_
of a population.
$$\frac{dN}{dt}=rN$$
Note that $R$ and $r$ are **not** the same. $R$ is termed the _reproductive
factor_ (or ratio, or number), and it represents the number of replicates each
individual makes of themselves in a subsequent time step. On the other hand,
$r$ describes the per capita change in number of individual from one generation
to the next and is referred to as the _intrinsic growth rate_.
If we want to find out what the population size is at $t=t+1$, we simply plug
the necessary parameters into the recursion equation listed like this:
```{r}
# Set up some master controls such that it is easier to customize model later on
N0 = 10 # Initial population size
R = 1.2 # Reproductive factor
N1 = R*N0; N1
```
Most of the time though, biologists want to track the population for more than
just one time step. In that case, we could continue to do math like this
$$N_{t+5} = R(R(R(R(RN_0))))$$
which simplifies to
$$N_{t+5} = R^5 (N_0)$$
and then further simplifies to
$$N_{t+n} = R^n(N_0)$$
We coould also take a graphical approach and draw out the dynamics of the
system for 50 time steps.
```{r}
N0 <- 10 # Initial population size
R <- 1.2 # Reproductive factor
times <- 50 # Number of time steps
N <- c() # Make empty vector to store population size at each time step
N[1] <- N0 # Initiate vector
# Make for loop to iterate through the model 50 times
# Running (times-1) iterations gets us to generation N[i+1]
for (i in 1:(times-1)) {
N[i+1] = R*N[i]
}
# Plot the newly loaded N vector
plot(seq(1, times, 1), N, xlab="time")
```
As expected, this population is literally spinning out of control! In reality,
however, even bacteria on a petri dish can't grow forever. What are some
factors that may limit population growth?
### Logistic growth
Organisms often find themselves limited by resource availability as their
numbers grow. This causes their per capita growth rate to decreases as the
population increases towards the resource limit known as the _carrying
capacity_, $K$.
* In discrete time:
$$N_{t+1} = N_t+r(\frac{K-N_t}{K})N_t$$
* In continous time:
$$\frac{dN}{dt} = r(\frac{K-N}{K})N$$
When interpreted in plain English, the growth rate of the population is being
discounted depending on how close to $K$ it is. When the population is very
small, $\frac{K-N}{K} \approx 1$ and logistic growth approximates exponential
growth. When the population is close to $K$, $\frac{K-N}{K} \approx 0$ and thus
the population size stops increasing.
We can examine the logistics growth model graphically just like we did before.
```{r}
N0 <- 10 # Initial population size
r <- 0.5 # Intrinsic growth rate
K <- 1000 # Carrying capacity
times <- 50 # Number of time steps
N <- c() # Make empty vector to store population size at each time step
N[1] <- N0 # Initiate vector
for (i in 1:(times-1)) {
N[i+1] = N[i]+r*N[i]*(1-(N[i]/K))
}
plot(seq(1, times, 1), N, xlab="time")
```
Initially ($t=0$ to $t=10$), the population grows rapidly, and the form of the
growth model resembles the exponential growth model. As the population
increases ($t=10$ to $t=20$), the growth rate slows, and eventually plateaus at
the carrying capacity $K=1000$.
### When species are in competition -- the Lotka-Volterra model
This example will show case how to model two interacting populations,
specifically, populations in competition for a limiting resource. Focusing on a
continous time example, we will first write down the logistic growth models of
two populations (differentiated by subscripts), and then add in the effect of
their interaction.
$$\frac{dN_1}{dt} = r(\frac{K-N_1}{K})N_1$$
$$\frac{dN_2}{dt} = r(\frac{K-N_2}{K})N_2$$
What does competition do to organisms? Well, many things, but in terms of
population dynamics, competition hinders the growth of a population, and the
degree of reduction in growth is governed by the _competition coefficient_,
$\alpha_{ij}$. The subscripts $i$ and $j$ correspond to our two populations,
and the order of the subscript tells us who is affecting whom. For example,
$\alpha_{ij}$ represents the effect of $j$ on $i$, whereas $\alpha_{ji}$
represent the effect of $i$ on $j$.
Adding these terms into the population growth equations gives the classic
Lotka-Volterra model for competition: population growth is governed jointly by
how close it is to carrying capacity, but also the effect of the competing
species has on it (per capita strength of effect $\times$ number of individuals
around to exert this effect).
$$\frac{dN_1}{dt} = r(\frac{K- N_1 - \alpha_{12} N_2}{K})N_1$$
$$\frac{dN_2}{dt} = r(\frac{K - N_2 - \alpha_{21} N_1}{K})N_2$$
### Models of evolution
In line with the most basic definition of evolution, evolution models describes
change in allele frequency in a population over time. These models are
sometimes referred to as population genetics (pop gen) models. We will work
with a one-locus haploid model in this example.
Consider a population of individuals each carrying one of two variants of a
certain gene; let's call them alleles _A_ and _a_ because we are creative like
that. Similar to ecologists, evolutionary biologists are often interested in
the reproductive factor (or survivability, denoted $W$) of individuals with
different genotypes. The dynamics of these two sub-populations can be expressed
as
$$N_{A, t+1} = W_A N_{A, t}$$
$$N_{a, t+1} = W_a N_{a, t}$$
Typically, evolutionary biologists are concerned with the _proportion_ of a
certain allele in a population rather than changes to their absolute numbers.
Here, we will use $p$ and $q$ to denote the proportions of the _A_ and _a_
alleles in the population, respectively.
$$p_{t}=\frac{N_{A,t}}{N_{A,t}+ N_{a,t}}$$
$$q_{t}=1-p_{t}$$
Doing so also allow us to leverage the fact that the proportions $p$ and $q$
must add up to 1, and therefore the dynamics of the allele frequencies in a
population can be described with just one equation:
$$p_{t+1}=\frac{W_A N_{A,t}}{W_A N_{A,t}+W_a N_{a,t}}$$
In order to write an equation that tracks change in allele frequency over time,
as promised, we need to transform $N$ (the number of each type) to $p$
(proportion). Through the magic of algebra, we have
$$p_{t+1}=\frac{W_A p_t}{W_A p_t+W_a q_t}$$
Note that $N_A$ is **not** interchangeable with $p$; the algebra just happened
to work out this way.
Lastly, remember our good friend $p+q=1$?
$$p_{t+1}=\frac{W_A p_t}{W_A p_t+W_a (1-p_t)}$$
## How to construct a model
### Step 0: Breathe
Don't panic, and try not to be overwhelmed. Trust us when we say it is not as
daunting as it appears. This doesn't mean that it is easy to write a model
either, but there are lots of resources available to help you achieve your
ambition. The biggest hurdle is often to simply _start_. Know that whatever you
do, these are just numbers and symbols and words on paper and on the screen,
you can't break anything or hurt anything (except maybe R, but just turn it on
and off again and all will be forgiven, life will move on). So don't be afraid
to start typing and playing around with ideas. Know also that you will be
making mistakes and will have to troubleshoot your way through a theory project
(any project really...). We all go through that, even the most experienced
theoreticians make mistakes. Experience comes with time and practice, and you
will learn how to quickly spot mistakes, and how to effectively troubleshoot.
None of that will happen though, if you don't start.
So let's start!
### Step 1: Formulate the question.
* What do you want to know? Start with the simplest, biologically-reasonable
description of the problem.
* Having clear objectives is crucial!
### Step 2: Determine the basic ingredients of the model.
* Decide on which type of model to use (discrete vs. continous time, stochastic
vs. deterministic, etc.).
* Define the variables and parameters you want to keep track of.
* **Think about the assumptions you are making about your system**. Given that
we can't ever full replicate reality (and we don't want to anyway), we are
neccessarily making simplifying assumptions about our system as we write our
model. What are those assumptions? You need to make them clear to both yourself
and your audience. Your model can only be interpreted in the context of these
assumptions.
### Step 3: Qualitatively describe the system.
* Use life-cycle diagrams and flow charts to decribe changes to the
individuals/populations/communities that you're modelling.
* Make a table or list of variables and how they will interact with each other.
* Think about the possible outcomes and types of output you want from your
model.
### Step 4: Quantitatively describe the system.
* Using the diagrams and tables etc as guides, draft out equations that
describe your system.
* Think about the functional forms of interactions in your model and how to
parameterize your model (i.e., where are you going to get data?).
* Perform checks to make sure the structure of the model is logically sound
(e.g., will your model return negative population size?).
### Step 5: Analyze your model
* There are abundant options regarding how to analyze a model. These range from
finding equiliria and performing a stability analysis analytically, to
numerically simulating population dynamics over time.
* Choose the appropriate analysis for your system and question, and make sure
that the output does address the question adequately!
### Step 6: Checks and balances
* Check results against data or known cases.
* Consider alternatives to a simplified model -- how can this model be
extended?
### Step 7: Relate results back to the question
* Interpret the results and describe any new insights obtained in the context
of assumptions made.
* How general are your results? Are the results counterintuitive? Why?
We will be following this recipe (roughly) in our build-a-model workshops.
# Build-a-model workshop 1 -- Modelling malaria transmission dynamics
First, take a _deeeeeeeeeeeeeeep_ breath.
In this workshop, we are going all the way back to the dawn of mathematical
models in biology. We will borrow from the classic malaria model orginally
developed by Ross (1908), and later solidified by Macdonald (1956), which
investigates the required efficacy of mosquito management programs in order to
eradicate malaria.
_Note:_ It is actually very common practice to adapt old models to answer new
questions. In fact, practically _all_ of the malaria models that exist today is
a variation of the Ross-Macdonald model.
The Ross-Macdonald model is a continuous-time, deterministic, model which
tracks the number of **S**usceptible, **I**nfected, and **R**ecovered subjects
over the course of a malaria outbreak. These models are sometimes referred to
as _SIR_ models, and they have been adapted to model numerous diseases beyond
malaria.
Figure 1 shows a flow diagram of the malaria transmission cycle we will be modelling today.
![Flow diagram illustrating a basic epidemiological model of malaria transmission. The blocks represent different host and vector classes within a population. Arrows describe possible movements between host/vector categories. Subscripts _h_ and _m_ represent hosts and mosquitoes, respectively.](image/model.png)
The principle epidemiological compartments (i.e., variables) of interest are
the populations of infectious hosts ($I_h$) and infectious vectors ($I_m$).
These compartments are interdependent via the hosts’ and the vectors’ rate of
exposure to the parasite, given by $\beta I_m$ and $\beta I_h$, respectively,
where $\beta$ represent the per capita biting rate of vectors. Upon contact,
susceptible hosts ($S_h$) and mosquitoes ($S_m$) become infected with
probabilities $\varphi_h$ and $\varphi_m$, respectively. For simplicity, we
will assume that hosts cannot acquire immunity and so the "recovered" class was
not included in this model. Instead, hosts either die of infection at rate
$\alpha$ or recover and return to $S$ class at rate $\gamma$. Vectors were
assumed to be immediately infectious upon successful infection, are unaffected
by the parasites they carry, and remain infected for life (i.e., no $\alpha$
and $\gamma$ for vectors). All host and vector die of natural causes at rate
$\mu_h$ and $\mu_m$, respectively. The host and vector populations were assumed
to be closed and constant over time such that all deaths are immediately
replaced by immigration of individuals into susceptible classes, represented by
the symbol $\tau$.
After all that prefacing, we're finally ready to do math!
Recall that, in contrast to recursion equations where we describe directly the
change in the value of a variable, an ordinary differential equation (ODE)
model describes _how_ the population is changing. We will follow this simple
formula when we translate our flow diagram to our equations:
$$\frac{dN}{dt} = rate\;of\;increase - rate\;of\;decrease$$
Or,
$$\frac{dN}{dt} = flow\;rates\;along\;arrows\;entering\;a\;compartment - flow\;rates\;along\;arrows\;exiting\;a\;compartment$$
Note that the order of the entering and exiting of compartments is irrelevant
because each time step is so small, it doesn't matter which event happens first
within that time step.
The resulting model is thus made up of a system of four ODEs, two for hosts and two for vectors, written as follows:
$$\frac{dS_h}{dt} = \tau_t - (\mu_h +\beta \varphi_h I_m)S_h + \gamma I_h$$
$$\frac{dI_h}{dt} = (\beta \varphi_h I_m)S_h - (\gamma + \mu_h + \alpha_h) I_h$$
$$\frac{dS_m}{dt} = \tau_m - (\mu_m + \beta \varphi_m I_h)S_m$$
$$\frac{dI_m}{dt} = (\beta \varphi_m I_h)S_m - \mu_m I_m$$
The trickiest part of writing these equations is probably to remember that the
rate at which susceptible hosts ($S_h$) become infected ($I_h$) will depend on
how many infected mosquitoes ($I_m$) are around to deliver those infection
($\beta$ is the **per capita** biting rate, which means the number of times
**each** mosquito will bite a host). The same logic applies to the process in
which mosquitoes become infected (going from $S_m$ to $I_m$).
Now that we have our model written out, it is R's turn to do some math for us.
Specifically, R will _solve_ this system for us and save us from a lot of
headache and paper. To solve a system of ODEs means we will integrate the
equations (i.e., turn $\frac{dN}{dt}$ into $N$) to find expressions that
describes the state of each population as a function of the remaining variables
and parameters.
We will code our model like we did before, this time with roughly 100$\times$
the parameters, variables, and equations. First, we will set up a bunch of
master controls for easy model manipulation later on. Then, we will write out
our ODE model as a function that our solver will deal with. Lastly, we will
call the ODE solver _lsoda_ to put all this together.
### Define required parameters for the main program
1. Time vector:
```{r}
times <- 50; # Time is per day
timevec <- seq(0, times, 1)
```
2. Assign values for model parameters:
- These parameter values are mostly informed guesstimates. The reason we're
not using "real" parameter values here is because the model is too simple a
representation of "real life", and real numbers don't make sense in this
horrible rendition of reality. For instance, mosquitoes _infected_ with the
parasite don't actually become _infectious_ right away. There is a 10-14 day
incubation period where mosquitoes do nothing but sit around and accumulate
mortality before they can transmit malaria. Since we do not account for this
incubation period in our model, using the "real" daily mortality rate of
mosquito results in a drastic underestimation of their true mortality risk, and
an over-abundance of mosquitoes in our hypothetical population.
- Since the per capita biting rate ($\beta$) changes with population size, we
decomposed ($\beta$) into its components (i.e., number of bites per
mosquito per day and population size) for easy calculation. In this example,
$\beta$ is actually constant through out because we assumed closed and constant
population size ($N$), but setting up the variable in this manner will allow
for flexibility if one day we wish to relax that assumption.
- Make sure that the $N$ here is the same as the population size we will
assign later!
```{r}
# Units are assumed to be /day
mu.h <- 0.05; mu.m <- 0.1 # Natural mortality
psi.h <- 0.1; psi.m <- 0.1 # Probability of infection upon exposure
b <- 0.25 # number of times a mosquito bites per day
N <- 500 # Total population size (remains constant)
beta <- b/N # Number of times a mossie bite each person in a day
gamma <- 0.03 # Recovery rate (host only)
alpha <- 0.2 # Disease mortality rate (host only)
# String it all up as a vector
parms <- c(mu.h, mu.m, psi.h, psi.m, b, N, beta, gamma, alpha)
```
3. Give the model something to get it going a.k.a. assign initial conditions:
```{r}
s.h0 <- 450; i.h0 <- 50 # Host - make sure this is the same N as above
s.m0 <- 200; i.m0 <- 50 # Vector
# Make into another vector
Y0 <- c(s.h0, i.h0, s.m0, i.m0)
```
4. Define the function that specifies the ODE model called by _lsoda_ a.k.a. type out equations.
```{r}
odeequations <- function(t, y, parms)
{
# Retype parms and initial conditions
# This will link parameters set "outside" of the function to the equations "inside"
# Natural mortality parms
mu.h <- parms[1]; mu.m <- parms[2]
# Probability of infection parms
psi.h <- parms[3]; psi.m <- parms[4]
# Biting rate parms
b <- parms[5]; N <- parms[6]; beta <- parms[7]
#recovery parms
gamma <- parms[8];
#virulence
alpha <- parms[9];
# Initial conditions
s.h <- y[1]; i.h <- y[2]; s.m <- y[3]; i.m <- y[4]
# These are the differential equations
# These should be written in the same order as the initial conditions
dsh = (mu.h*s.h + (mu.h + alpha)*i.h) - (mu.h + beta*i.m*psi.h)*s.h + gamma*i.h
dih = beta*i.m*psi.h*s.h - (gamma + mu.h + alpha)*i.h
dsm = mu.m*(s.m + i.m) - (mu.m + beta*i.h*psi.m)*s.m
dim = beta*i.h*psi.m*s.m - mu.m*i.m
return(list(c(dsh, dih, dsm, dim)));
}
```
5. Call ODE solver _lsoda_ to integrate ODEs.
```{r}
odeoutput1 <- lsoda(Y0, timevec, odeequations, parms)
```
Anddddd that's it! We are done! We can now look at our shiny model outputs.
```{r}
head(odeoutput1)
```
The output dataframe consists of five columns, one for time, and the next four
represent the population size of each of our population classes at those time
points. The columns are organized in the same order we typed out our equations.
Therefore, column 1 represent susceptible hosts, column 2 infected hosts, etc.
To make our lives easier from now on, we'll go ahead and rename these columns.
We're also going to force the output into a data frame so that it is easier to
work with.
```{r}
colnames(odeoutput1)[2:5] <- c("S.h", "I.h", "S.m", "I.m")
odeoutput1 <- as.data.frame(odeoutput1)
```
We can inspect the dynamics of this system by ploting out these trajetories.
```{r}
# Host
ggplot() +
geom_line(data=odeoutput1, aes(x=time, y=S.h), colour="black") +
geom_line(data=odeoutput1, aes(x=time, y=I.h), colour="blue") +
labs(y="Hosts") +
theme_classic()
# Vector
ggplot() +
geom_line(data=odeoutput1, aes(x=time, y=S.m), colour="black") +
geom_line(data=odeoutput1, aes(x=time, y=I.m), colour="blue") +
labs(y="Vectors") +
theme_classic()
```
Feel free to play with the parameter set, and investigate what happens to
disease dynamics as you do so. Is the system more _sensitive_ to some
parameters than others? What are some parameters we can change or add to
imitate various means of disease control?
# Build-a-model workshop 2 -- Modelling forces of evolution
In this section, we will switch gears 720$^\circ$, and work on a discrete time,
stochastic, evolution model! We will use an example of drug resistance
emergence to ask whether random events -- genetic drift -- can counteract the
effect of drug-resistance selection and block the evolution of this trait.
You may recall from evolutionary biology class that the relative strengths of
drift vs. selection depend on population size and mutation rate. Here, we will
use simple population genetic simulations to disentangle these different
factors and get a better understanding of the role of these various forces of
evolution.
We will use the single-locus haploid model we were working on before as the
foundation of this model. Consider a population with $N$ individuals (of
parasites) each carrying a variant of this gene: _A_ is a drug-sensitive
allele, and _a_ confers resistance. We will impose mutation and selection on
this population at every time step, and at the end of each time step, we will
simulate genetic drift by random selecting a subset from the standing
population.
## Model mechanics
### Mutation
At each time step, some fraction of _a_ alleles will mutate to _A_ at rate $m$
and vice versa.
$$p_{t+1} = (1-m)p_t + m(1-p_t)$$
### Selection
Recall this equation:
$$p_{t+1}=\frac{W_A p_t}{W_A p_t+W_a (1-p_t)}$$
Recall also that selection acts on relative fitness, specifically, the
difference in relative fitness between _A_ and _a_. This is known as the
_selection coefficient_, and is commonly denoted $s$. The fitness of _A_ relative
to _a_ can be expressed simply as $\frac{W_A}{W_a}$. Taking $w_a$ as the
reference (i.e., setting $w_a$ to 1), we obtain the selection coefficient
against _A_ (i.e., assuming _A_ is at disadvantage) as follows:
$$s=1-\frac{W_A}{W_a}$$
Through the magic of algebra, we can derive an expression for $p_{t+1}$
consisting only of terms related to our focal allele _A_:
$$p_{t+1}=\frac{p_t (1-s)}{p_t(1-s) + (1-p_t)}$$
### Genetic drift
We will simulate genetic drift by choosing a random sample from the population.
More specifically, we obtain $N_A$ by drawing $N$ individuals from a population
where a fraction $p$ of the population is _A_ using the function _rbinom_.
After this draw, we can retrieve $p$ from $N_A$ with $\frac{N_A}{N}$.
### Putting the model together
First, we'll set up parameters and initial conditions. We will initiate our
population with equal proportions of the _A_ and _a_ alleles, and we will
investigate a scenario where drugs don't exist (i.e., _A_ and _a_ perform
equally well).
```{r}
# Time
times <- 5000
timevec <- seq(1, times, 1)
# Parameters
N <- 10000 # Total population size (assume constant)
m <- 0.001 # Mutation rate
s <- 0 # Selection coefficient
# Intitial condition
p <- 0.5 # proportion of A allele
# Empty vector to store values of p over time
freq <- c()
```
Then, we will write a _for_ loop to help us impose the various forces of
evolution repeatedly through time.
```{r}
for(i in 1:times) {
# Mutation
p <- (1-m)*p + m*(1-p)
# Selection
p <- ( p*(1-s) / (p*(1-s)+(1-p)) )
# Drift
p <- ((rbinom(1,N,p)) / N)
# Store values in vector
freq[i] <- p
}
```
That's it! We just made a stochastic population genetics model!
Let's have a look at the results.
```{r}
ggplot() +
geom_line(aes(x=timevec, y=freq)) +
labs(x="time", y="p") +
theme_classic()
```
Here, we see that in the absence of selection, mutation and drift causes allele
frequency to fluctute randomly, but for the most part $p$ _seems to be_
hovering around its initial values of 0.5.
Next, we can add in some drugs and see what happens to the fate of the
drug-sensitive _A_ alleles.
```{r}
# Update s
s <- 0.005
# Don't forget to reset p and the storage vector
p <- 0.5
freq <- c()
# Run loop again
for(i in 1:times) {
# Mutation
p <- (1-m)*p + m*(1-p)
# Selection
p <- ( p*(1-s) / (p*(1-s)+(1-p)) )
# Drift
p <- ((rbinom(1,N,p)) / N)
# Store values in vector
freq[i] <- p
}
# Make a plot of results
ggplot() +
geom_line(aes(x=timevec, y=freq)) +
labs(x="time", y="p") +
theme_classic()
```
Oh dear! It took no time for the resistant allele (_a_) to take over! The
population of the sensitive strain doesn't drop to zero, though, why is that?
And lastly, feel free to play around with this model to investigate the
relative effects of mutation, selection, drift, and how population size
mediates all of this!
# References
Hartl, D. (2000). A Primer of Population Genetics (3rd Ed). Sinauer Associates
Inc., Sunderland, MA.
Macdonald, G. (1956). Theory of the eradication of malaria. Bulletin of the
World Health Organization, 15(3–5): 369–387.
Otto, S. & Day, T. (2007). A Biologist's Guide to Mathematical Modeling in
Ecology and Evolution. Princeton University Press, Princeton, NJ.
Ross, R. (1908). Report on the prevention of malaria in Mauritius. Waterlow and
Sons Limited, London, UK.