plos_template.tex

% Template for PLoS
% Version 1.0 January 2009
%
% To compile to pdf, run:
% latex plos.template
% bibtex plos.template
% latex plos.template
% latex plos.template
% dvipdf plos.template

\documentclass[10pt]{article}

% amsmath package, useful for mathematical formulas
\usepackage{amsmath}
% amssymb package, useful for mathematical symbols
\usepackage{amssymb}

% graphicx package, useful for including eps and pdf graphics
% include graphics with the command \includegraphics
\usepackage{graphicx}

% cite package, to clean up citations in the main text. Do not remove.
\usepackage{cite}

\usepackage{color} 

% Use doublespacing - comment out for single spacing
%\usepackage{setspace} 
%\doublespacing


% Text layout
\topmargin 0.0cm
\oddsidemargin 0.5cm
\evensidemargin 0.5cm
\textwidth 16cm 
\textheight 21cm

% Bold the 'Figure #' in the caption and separate it with a period
% Captions will be left justified
\usepackage[labelfont=bf,labelsep=period,justification=raggedright]{caption}

% Use the PLoS provided bibtex style
\bibliographystyle{plos2009}

% Remove brackets from numbering in List of References
\makeatletter
\renewcommand{\@biblabel}[1]{\quad#1.}
\makeatother


% Leave date blank
\date{}

\pagestyle{myheadings}
%% ** EDIT HERE **


%% ** EDIT HERE **
%% PLEASE INCLUDE ALL MACROS BELOW

%% END MACROS SECTION

\begin{document}

% Title must be 150 characters or less
\begin{flushleft}
{\Large
\textbf{openSNP - Crowdsourcing Genome Wide Association Studies}
}
% Insert Author names, affiliations and corresponding author email.
\\
Bastian Greshake$^{1,\ast}$, 
Philipp Bayer$^{2}$, 
Fabian Zimmer$^{3}$,
Julia Reda$^{4}$
\\
\bf{1} Bastian Greshake Frankfurt am Main, Germany
\\
\bf{2} Philipp Bayer somewhere, Australia
\\
\bf{3} Fabian Zimmer M\"unster, Germany
\\
\bf{4} Julia Reda, Mainz, Germany
\\
$\ast$ E-mail: info@opensnp.org
\end{flushleft}

% Please keep the abstract between 250 and 300 words
\section*{Abstract}

% Please keep the Author Summary between 150 and 200 words
% Use first person. PLoS ONE authors please skip this step. 
% Author Summary not valid for PLoS ONE submissions.   
\section*{Author Summary}

\section*{Introduction}
Genome Wide Association Studies (GWAS) are an easy and cheap way to find Single Nucleotide Polymorphisms (SNPs) which can be interesting because of their medical relevance. SNPs found through GWAS can be used to find candidate genes for a closer inspection or to predict disease risks. Genome Wide Association Studies make use of statistics to compare the alleles of patients to the alleles of healthy controls. By this the method it is not possbile to find causal differences but mere correlations. The first GWAS was published in 2005 and compared age-related macular degeneration in contrast to a healthy control group (doi:10.1126/science.1109557). Since the beginning the number of participants in those studies is rising and over 1200 GWAS have been performed (doi:10.1186/1471-2350-10-6.) and over 5000 SNPs have been linked to different diseases and traits in those studies %(http://www.genome.gov/page.cfm?pageid=26525384&clearquery=1#result_table).

Since 2006 there are different companies like 23andMe, deCODEme or FamilyTreeDNA on the market which offer Direct-To-Consumer (DTC) genetic testing. Those companies use DNA micro arrays to screen for around 1 million SNPs which are spread over the human genome. In return customers get an analysis of the results, as well as a raw file that includes the SNP-IDs and their respective allele for the customer. In 2011 23andMe alone had over 100.000 customers (http://spittoon.23andme.com/2011/06/15/23andme-2011-state-of-the-database-address/) and the company also recognizes the potential to do GWAS with that amount of data. They provide surveys to their customers that ask about traits and genetic diseases. With the consent of the customer those data will then be used for association studies. 23andMe published 3(?) articles in 2011 in which they replicate known findings but also find new associations for Parkinson's Disease. By activating their customer base they achieved to have over 30.000 individuals enrolled in those association studies.  

Although companies like 23andMe are willing to contribute to science it is not easy for individual scientists to get hold of the data. This arises mainly due to privacy concerns of the customers. Nevertheless there are individual customers who are willingly sharing their data. Most do though by uploading it to their personal website or to software repositories like GitHub. While this is makes it possible to use the data, it requires a lot of work to keep track of all new genotyping data that is available to the public. While projects like the SNPedia (10.1093/nar/gkr798) try to keep track of all the files, this still does not allow to perform GWAS, as the phenotypic information is not attached to the genetic information. Projects that attach the phenotype to the genetic information, like the Personal Genome Project, still don't allow for an easy re-use of the data.  

A possible solution to this can be a community-driven platform that aggregates genetical and phenotypical information of people who are willing to share their data with the general public and have given their informed consent. In our study we investigated if there would be interest in such a crowd sourcing platform, how many people would be willing to share their genetic and phenotypic information with the public and built such a platform. 

% Results and Discussion can be combined.
\section*{Results}

\subsection*{Survey on Sharing Genetic Information}
229 people, 180 with a self-reported chromosomal sex of XY, 56 with a self-reported chromosomal sex of XX, participated in the survey. The mean age of the participants is 33 (SD = 11,29) and over 81.7 \% reported their ethnicity as caucasian. 39.7 \% of the participants are already customer of at least one DTC genetic testing company and further 30.1 \% of them plan to become one in the future. 29.7 \% don't plan to become a DTC customer. There is no significant difference in the usage of DTC companies between chromosomal sexes (Somers-d). 

67.7 \% of all participants would share their data with their DTC-company without any constraints, 25.8 \% would do so, if the company does not share the data with third parties. 6.6 \% of the participants would not share their data. There is no significant difference between sharing-habits between both chromosomal sexes (Somers-d). Those who are a customer of a DTC company or are planing to become one in the future are more likely to share their results, compared to those who don't plan to get themselves genotyped (Somers-d). 

There are significant differences between those people who are already genotyped and those who don't plan to get genotyped: (All those numbers are tukey-hsd test). The first group is more likely to agree to share their information because they want to help scientists (mean difference = 0.465, SE = 0.128, p = 0.001), because they think of personal benefits (mean difference = 0.448, SE = 0.183, p = 0.04) and because they are curious (mean difference = 1.159, SE = 0.193, p < 0.001). 

On the other hand those hand those people who are not planning to get genotyped are more likely to not share their data, because they agree to fear discrimination (mean difference = 1.060, SE = 0.195, p < 0.001), because they agree that they feel it is a breach of their privacy (mean difference = 0.821, SE = 0.225, p = 0.001), because agree that they fear negative consequences for their family (mean difference = 0.733, SE = 0.21, p = 0.002) or because they agree that they fear personalized advertising (mean difference = 0.848, SE = 0.208, p < 0.001).

Similarly those people who would share data with their DTC provider are more likely to agree on sharing the data, because they want to help scientists (mean difference = 1.57, SE = 0.199, p < 0.001), because they think of personal benefits (mean difference = 0.951, SE = 0.308, p = 0.006), and because they are curious (mean difference = 1.99, SE = 0.321, p < 0.001). 

Those participants who are not planning to get genotyped are more likely to agree to not share their data, because they fear discrimination (mean difference = 1.52, SE = 0.322, p < 0.001), because they feel it is a breach of their privacy (mean difference = 1.871, SE = 0.324, p < 0.001), because they fear consequences for their family (mean difference = 1.146, SE = 0.32, p = 0.001) and because they fear personalized advertising (mean difference =  1.112, SE = 0.357, p = 0.006). 
\subsection*{openSNP platform}


\section*{Discussion}

% You may title this section "Methods" or "Models". 
% "Models" is not a valid title for PLoS ONE authors. However, PLoS ONE
% authors may use "Analysis" 
\section*{Materials and Methods}
\subsection*{Survey on Sharing Genetic Information}
The survey was done with Google Docs and included questions on the age, chromosomal sex and ethnicity of the participants. Additionally it included questions on if they are already customer of a DTC company, are planning to become one or don't plan to become one. If they are already a customer they also got asked if they already share their genetic and phenotypic data. All participants got asked if they would share their genetical or phenotypic information with their DTC company, possible answers were "Yes", "Yes, but only if they did not share my medical information with anybody else" and No". 

The survey also asked some scale-questions which measured how strong participants agree/disagree on different reasons to share or not to share their information with third parties. The scale went from 1 = strongly disagree over 3 = neutral to  5 = strongly agree. Motivations given to share data were "because you want to help scientists with their research", "because of possible personal benefits (e.g. getting treatments for a disease you have, possibility of new medication, etc.)", "because it may deliver advertising that is relevant to me" and "out of curiosity". Motivations given not to share data were "because advertisers could use the information for targeted campaigns", "because of possible negative consequences for closely related persons", "because of the breach of your privacy" and " because of the fear of discrimination (e.g. by the employer, the state, some insurance company)". Additionally participants had the possibility to give own reasons to share or not to share their data.

The results were analyzed with SPSS 19. 

\subsection*{Technical realization of the platform}
The main platform is realized using the web-framework Ruby on Rails. Postgres is used as the main database backend for Rails. The database stores genotyping results, phenotypic information of the users, literature results of Mendeley, the Public Library of Science and summaries on SNPs which can be found in SNPedia. The literature database of Mendeley is queried using the REST API, which delivers results in JSON. The literature database of the Public Library of Science is queried using the respective REST API, which delivers results in XML. Summaries on SNPs are provided by SNPedia, through querying the content through the MediaWiki-API. 

SNPs are cataloged by their unique identifier, which consists of a prefix (mostly \textit{rs}) and a unique number. This is a common format which is used by the NCBI dbSNP database and is also widely used and easily parsed from different literature-sources.   

Processes with a longer runtime, such as parsing the genotyping results, creating archives of results which are be mailed to users and queries to external resources are handled using the ruby-gem Resque and a Redis-server. Search-features on the platform itself are implemented using SOLR and the ruby-gem sunspot. Additionally data can be requested from openSNP using the Distributed Annotation System. The data for this is stored in a mySQL-database, the delivery of the data is done by using ProServer, which was modified by Gel et al. for use in easyDAS.  

The source code of openSNP is published under Creative Commons BY-SA 3.0 and can be downloaded at http://github.com/gedankenstuecke/snpr. The genetical and phenotypical data is licensed under Creative Commons Zero. 


% Do NOT remove this, even if you are not including acknowledgments
\section*{Acknowledgments}


%\section*{References}
% The bibtex filename
\bibliography{template}

\section*{Figure Legends}
%\begin{figure}[!ht]
%\begin{center}
%%\includegraphics[width=4in]{figure_name.2.eps}
%\end{center}
%\caption{
%{\bf Bold the first sentence.}  Rest of figure 2  caption.  Caption 
%should be left justified, as specified by the options to the caption 
%package.
%}
%\label{Figure_label}
%\end{figure}


\section*{Tables}
%\begin{table}[!ht]
%\caption{
%\bf{Table title}}
%\begin{tabular}{|c|c|c|}
%table information
%\end{tabular}
%\begin{flushleft}Table caption
%\end{flushleft}
%\label{tab:label}
% \end{table}

\end{document}