Skip to content

Latest commit

 

History

History
81 lines (56 loc) · 17.2 KB

Introduction.md

File metadata and controls

81 lines (56 loc) · 17.2 KB

title: "Introduction" author: Neil McRoberts, Robin Choudhury output: html_document


This class will explore a set of topics connected with the analysis of microbial population dynamics. The aim is to touch on a wide range of topics so that by the end of the course you have had some exposure to both the foundational literature on the topics and also had a chance to try some modeling and quantitative analysis of your own. By taking an ecological perspective of microbial population dynamic we're looking at plant pathogen epidemiology as a special case of a wider phenomenon. Recognizing that there are diverse interests in the graduate group, the hope is that this approach will provide a starting point for at four different types of work that you might want to do in future:

  • Modeling the dynamics of a single species of pathogen (or disease symptoms) on a plant
  • Understanding how competition among pathogens trying to colonize the same plant tissue can be studied
  • Setting up larger models to think about microbial community dynamics
  • Studying how we can use information to make and evaluate decisions about disease management

We will work our way through the following set of topics

  • Using state diagrams as a general tool for capturing system dynamics
  • Translating state diagrams to mathematical models
  • Examples:
    • (i) SIR models for disease progress;
    • (ii) matrix models for communities;
    • (iii) demographic models for life histories
  • Network models
  • Time series: mixtures of rules and noise. Chaos and long term dynamics
  • Imposing thresholds for management and some surprises
  • Deciding if intervention is a good idea: probability, information, and expected regret

General reading and thoughts on how to approach the material

We'll suggest specific papers from the primary literature to go with each topic, but anyone wanting to develop their capabilities in this general area will find the following books/chapters provide a huge amount of useful material.

Caswell, H. 2001. Matrix Population Models: construction, analysis and interpretation. (2nd Edition). Sinauer Associates Inc. Sunderland MA. May, RM. 2001. Stability and Complexity in Model Ecosystems (Landmark Edition). Princeton University Press, Princeton, NJ. Royama, T. 1996. Analytical Population Dynamics. Chapman & Hall (Springer) London. Turchin, P. 2003. Complex Population Dynamics. Princeton University Press, Princeton University Press, Princeton NJ. Garrett, K??

Caswell's book covers a vast amount of material. It provided a lot of the inspiration for the way the first few classes in this class are organized. It progresses rapidly from very simple visual models of single species to advanced topics in demographics, population genetics and migration. May's book is a classic of quantitative ecology that was first published in 1973. In it May used methods from physics and engineering to look at questions about ecological communities that were topics of hot debate at the time. May's analysis changed the way we think about ecological stability, though his point was misunderstood by many readers at the time. It's worth reading the introduction to the Landmark Edition from 2001 in which May looks back on some of the debate that followed the original publication. May's book and Caswell's go together because they both make heavy use of matrices and the properties of matrices, although they have very different aims.

The books by Royama and Turchin also form a pair, but there's a thematic link between Royama's main question and May's as well because Royama is also interested in how we can understand what leads to the persistence of populations in the long run just as May is, but Royama's focus is on single species (or perhaps simple two-species interactions). Plant disease epidemiology has tended to focus on questions connected with the speed and extent of pathogens increasing over the course of single growing seasons, so long term analyses like Royama's haven't received much attention. However, both Royama and Turchin are focused on what leads to population sizes being more or less stable, and perhaps fluctuating between an upper limit and a lower limit. Understanding how that can happen would be useful in the context of, say, thinking about how to eradicate an endemic problem, or when to give up investing effort in eradication. Those sorts of analyses might be useful in regulatory plant pathology, or policy analysis. They would also be useful in thinking about microbial communities. Turchin's approach to studying long term population dynamics is useful because he moves back and forward between logical thought about biology, inspection of data, and construction of models based on translating his conjectures about the biology into equations. This means his work is useful at two levels. It's useful on a technical level because he shows how to get usable results from a fairly small set of building blocks. It's useful at a deeper level because it shows one example of how to think like a theoretician. How to get away from the necessity to start every piece of scientific research with a set of data and ask "what sort of analyses can make sense of these data?" and instead start with a theoretical question such as "what sort of dynamics would a population exhibit if its size was determined by intrinsic self-regulation and weak and variable competition for resources from half a dozen other species?"

Getting to grips with any of these topics is going to require a bit of engagement with mathematics. Many students in biological sciences find this idea off-putting and discouraging. There isn't really an easy answer to this issue. Like any technical subject in order to achieve a high level of competence you have to be good at the technical aspects. On the other hand if what you want is an ability to understand the basics so that you can have an informed conversation with an expert, or do some basic analyses of your own, the amount of mathematical detail needed isn't that big. One effective alternative to learning a lot of the algebra is to replace the mathematics with numerical examples calculated on a computer. This doesn't remove the problem completely, but it translates it from being about learning algebra to learning how to read and write computer code. We're going to do some of each in the class, but the material will always be rooted in explanations written in (we hope) simple terms in English.

Computational aspects and requirements

The class is going to make heavy use of the R programming language. One of the transferable skills that we hope the class will help you develop is some familiarity with R and with ideas of writing code to explore your data and to turn your theories into models. We'll also make use of some of the document generating capabilities of R to produce the lecture notes, which will also give anyone who's not already familiar with these topics, some idea of the potential that exists to use R as a way to keep electronic lab books and notebooks for your research. These are secondary objectives so we won't be doing much more than sharing the documents and pointing out how they do what they do.

To get the most out of the class you are going to need a laptop computer with R Studio and R installed. Both R and R Studio are free and Open Source. Versions are available for all of the widely used operating systems, including Windows, Mac and Linux. You can use R without R Studio, but R Studio makes R much easier to use, especially if you are not familiar with writing computer code. One of the easiest ways to get both programs is to visit the R Studio web site and follow their download instructions, here.

You'll find once you've completed that process and opened R Studio that you're looking at a program interface that is similar to many others that you already use on your computer, but perhaps divided into a few more panels than you're used to. We'll be spending quite a bit of time during the first week of the class getting familiar with the interface. For obvious reasons this is something that is much better done as a participation exercise in class as opposed to reading notes so we're not going to say too much more about it here.

One of the things you'll see when you open R Studio is that it's possible to open many different types of file. You can see a list of the main file types either by clicking on the "File" menu item and then pointing the cursor at "New" or by clicking the "plus" button directly under the "File" menu item. The most commonly used types of file are closest to the top of the list and the first three are "R Script", "R Notebook", and "R Markdown". In the vast majority of cases where you use R Studio for statistical analysis, quantitative modeling, or document generation, it will be one of these three types of file that you will need.

R Script

An R Script file is the simplest type of R program file. Script files can be written in the built-in editor in R Studio, but you can also write them in any word processor or text editing software (such as Notebook on Windows) or note-taking software as long as you save the file as a plain text file and you don't include any formatting such as bold or italics. Before the invention of Markdown and its inclusion in R Studio, Script files were the only way to make a program in R that could process multiple lines of code at the same time. They are still a really useful format for working on quantitative problems and offer a stripped back environment that means you focus on the problem and don't worry about presentation. One really important habit to get into if you find you like working with script files is to include lots of comments. This is really important so that other people can understand your work, but also so that future you can go back and understand what your code is doing. We can tell you from personal experience that future you will not remember how your code works or what it's supposed to do if you don't put in lots of comments. Comments can either take up whole lines or they can be added to the ends of lines. You can also have whole paragraphs (or pages) of comments as long as each line of comment starts with the comment marker, which is a single hashtag. The same marker is used to add a comment to the end of a line. If a line of code goes over more than one line, you can add a comment at the end of the first line and the R interpreter will still ignore it even if the code continues onto the next line. We've never seen R code that has too many comments, but many many examples of code that doesn't have enough (including nearly all of our own).

R Markdown document

The obvious question here, for those who don't already know, is "What's Markdown?". In simple terms, it's a set of notations that you can add to plain text files to indicate how the text should be formatted when it's prepared for publication. The name comes from the history of publishing, back to the days when printed pages were typeset mechanically based on typed text. If the final printed version needed to have formatting added to it, the copy editor would go over the typed version and "mark it up" using pencil to indicate formatting. Over time a kind of shorthand notation for indicating what formatting was needed evolved by common agreement into a kind of "markup language". When the WWW was invented the tags that were developed to make web browsers render formatted text on web pages adopted the name and the standard language for writing formatted web pages was called HyperText Markup Language (html). HTML is effective but it can be cumbersome, and in the early 2000's Aaron Schwartz and John Gruber collaborated to produce a lightweight markup language that could be used in any text editor to do standard text formatting tasks (with a somewhat techy bias about what counts as standard) using a minimal set of symbols; hence they decided to call it Markdown to convey the idea that it was supposed to be stripped back. It quickly gained popularity in programming and technology-focused disciplines where people spend a lot of time writing text, but often don't have much use for a traditional word processor. There are a number of different "flavors" of Markdown in use, but a couple of the more widely used ones have become more or less standard. The multi-user version control site Github developed its own particular flavor, and because of the popularity of Github that flavor has become almost standard. Another early flavor was used by the document translation software Pandoc. R Studio incorporates Pandoc, and uses Github Markdown. With the popularity of R and R Studio these three flavors have merged into a single, nearly universal version.

In an R Markdown document you can write blocks of text, that include Markdown formatting tags, interspersed with blocks of code (called "chunks") that run just like the code in regular R Script files. The depending on the setup on your computer the whole Markdown document can then be output in any one of several different formats including web pages, pdf files or Word files. Because the basic Markdown file is just a plain text file it is usually very small compared with the final pdf or Word file, and it's also readable on any computer and can be edited with any text editor, so you can even collaborate with someone who doesn't use R (if there are such strange people these days...).

Markdown is used by lots of blogging and online writing programs (e.g. if you use Slack you've used Markdown) and aside from the advantages of being able to use it within R Studio to write documents, for those who are easily distracted while writing it might be a good option to look at for a distraction-free writing environment. There are numerous Markdown editors available, either as downloadable applications, or as web-based editors. Many of these will sync with cloud file storage systems such as Google Docs, Box, Dropbox, or Github. This document, for example, was written using a cloud-based app called StackEdit (there's a link at the very end of the document). Of course, since R Studio has Markdown built in, you may decide to just use it for all of your writing in future.

One of the main reasons why Markdown has become popular in the science community is that it integrates really effectively with LaTeX, particularly the mathematics rendering aspects of the TeX language. Have you ever read a paper with equations in it, and thought that they just look nicer somehow than what comes out of the equation editor in Word? Do you think that using the equation editor in Word (or the one in Google Docs) is like some form of torture? You're not alone. The chances are that the equations you see in published papers were produced using TeX; the most widely implemented version is LaTeX - pronounced Lay-tech. Now, TeX is itself a whole document preparation and formatting system that does everything Markdown does (and more) and, like Markdown, it works by adding codes (markup) to plain text files. For our purposes the important piece of TeX is it's ability to turn strings of text into beautifully rendered equations and mathematics typefaces. To give you an idea of what we're talking about (and also to get some epidemiology in), here are the equations for the basic $SIR$ model of Kermack & McKendrick (1927) rendered by the TeX engine built into Markdown. We alert Markdown that we want to write one or more equations on a new line by typing double dollar signs "$$" (without the quotation marks) then we write the equations in the syntax used by TeX. In their text form, as you would type them in your document, they look like this:

\frac{dS}{dt}=-\frac{\beta IS}{N}, \frac{dI}{dt}=\frac{\betaIS}{N}-\gamma I, \frac{dR}{dt}=\gamma I, which TeX renders as $$ \frac{dS}{dt}=-\frac{\beta IS}{N},$$ $$\frac{dI}{dt}=\frac{\beta IS}{N}-\gamma I, $$ $$\frac{dR}{dt}=\gamma I, $$

and we end with another set of double dollar signs. If you want to write a mathematics expression within a line of text you use a single dollar sign to start the TeX maths and end with another single dollar sign, like this general expression for the statistical model for a simple linear function: $y=a_{0}+a_{1}x+\epsilon$. We'll spend some time in class on using TeX for mathematics typesetting, but in the meantime here are a couple of useful webpages with LaTeX syntax for a variety of common maths formatting issues: (a) basic, and (b) advanced.

R notebook

An R notebook is a special kind of R Markdown document. One of the main purposes of making notebooks is that they are live documents, so any changes to data or analyses immediately lead to results and output being updated. In a notebook, as opposed to a script or standard Markdown document, graphics are embedded in the same page(s) as the text and the code. This can actually make notebooks somewhat cumbersome to use while you are developing your analyses or exploring your data, but they provide a flexible and powerful format for producing reports, web pages and working papers for sharing with colleagues or your PI.

Written with StackEdit.