forked from jtextor/dagitty
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmanual-2.x.tex
1014 lines (838 loc) · 43.7 KB
/
manual-2.x.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
\documentclass[a4paper,10pt]{article}
\usepackage{ucs}
\usepackage[utf8x]{inputenc}
\newcommand{\pname}{{\sc DAG}itty\xspace}
\newcommand{\action}[1]{\emph{``#1''}}
%opening
\title{Drawing and Analyzing Causal DAGs with \pname}
\author{Johannes Textor}
\date{August 19th, 2015}
\usepackage{atbegshi}
\usepackage{tikz}
\usepackage{a4wide}
\usepackage{xspace}
\usepackage{hyperref}
\usepackage{pxfonts}
\usepackage[nottoc,numbib]{tocbibind}
% \usepackage{natbib}
\newcommand{\typesetbiblink}[1]{#1}
\bibliographystyle{plain}
\begin{document}
\maketitle
\begin{abstract}
\pname is a software for drawing and analyzing causal
diagrams, also known as directed acyclic graphs (DAGs).
Functions include identification of minimal sufficient
adjustment sets for estimating causal effects,
diagnosis of insufficient or invalid adjustment via the identification
of biasing paths, identification of instrumental variables,
and derivation of testable implications.
\pname is provided in the hope that it is useful for researchers
and students in Epidemiology, Sociology, Psychology, and
other empirical disciplines. The software should run in any modern
web browser that supports JavaScript, HTML, and SVG.
This is the user manual for \pname version 2.3. The manual is updated with every
release of a new stable version. \pname is available at
\href{http://dagitty.net}{\tt dagitty.net}.
\end{abstract}
\tableofcontents
\section{Introduction}
\pname is a web-based software for analyzing causal diagrams. It contains some of the
fastest algorithms available for this purpose.
This manual describes how causal diagrams can be created
(Section~\ref{sec:diagramcreation})
and manipulated (Section~\ref{sec:diagramediting}) using \pname. In
Section~\ref{sec:diagramanalysis}, \pname's capabilities to analyze causal diagrams are
described. A brief introduction to causal diagrams is given
in Section~\ref{sec:dagintro}.
\subsection{Citing \pname}
Developing and maintaining \pname requires a substantial
amount of work; thus, if you publish research results obtained
with the help of \pname, please
consider giving us credit by citing our work.
Depending on the context, you could cite the
letter in {\em Epidemiology} where \pname has first been announced \cite{Textor2011},
or the research papers describing the specific algorithms used
to identify biasing paths \cite{TextorL2011},
adjustment sets \cite{textor14_uai},
and instrumental variables \cite{zander15_instrument}.
\subsection{Running \pname online}
There are two ways to run \pname: either from the internet or from your
own computer. To run \pname online, visit the URL
\href{http://www.dagitty.net}{\tt dagitty.net}.
\pname should run in every modern browser. Specifically, I expect it
to work well on recent versions of Firefox, Chrome, Opera, and Safari
as well as on Internet Explorer (IE) version 9.0 or later, which all
support scalable vector graphics (SVG). IE
versions prior to 9.0 do not support SVG. These should be able to perform
all diagnosis functions but cannot display the graphics as well as modern
browsers can\footnote{While this would be redeemable, I'd much rather invest my
time in improving \pname for modern browsers than fixing it for old IE versions.
If you absolutely need to run \pname on older IEs and encounter severe problems,
please contact me.}. If you encounter any problems, please send
me an e-mail so I can fix them (my contact information is at the end of this manual).
Keep in mind that \pname is often used by hundreds of people per day
from all over the world -- these people all benefit if the problem you found is
fixed so please do consider investing
the time to notify me if you encounter any bugs.
\subsection{Installing \pname on your own computer}
\pname can be ``installed'' on your computer for use without an internet
connection. To do this, download the file
\href{http://www.dagitty.net/dagitty.zip}{\tt dagitty.net/dagitty.zip}
which is a ZIP archive containing \pname's source. Unpack this ZIP file anywhere
in your file system. To run \pname, just open the file \verb|dags.html| in the
unpacked folder.
Some features of \pname will not work in the offline version, because they
are actually implemented on the web server. Currently, these features are:
\begin{itemize}
\item Exporting model drawings as PDF, JPEG or PNG files.
\item Publishing models on-line.
\end{itemize}
\subsection{Migrating from earlier versions of \pname}
The following two issues are important for users of older \pname versions. New users
can skip this section.
\begin{itemize}
\item It is now possible to have more than one exposure and/or outcome variable.
This means that the old model code convention where the variable in the first line is the
exposure and the variable in the second line is the outcome no longer works. Hence,
if you open a model created with an earlier version in \pname 2.0,
exposure and outcome will appear like normal variables. To fix this,
simply set exposure and outcome again
using the \action{e} and \action{o} keys and save the new model code.
\item Spaces in variables are now finally reliably supported. The way this works
is that any variable name containing spaces or other special symbols is stored
using ``URL encoding'' -- e.g. ``patient sex'' will turn into ``patient\%20sex''
(of course \pname will do this automatically for you).
This may look strange but ensures that \pname models can be safely e-mailed, posted
on websites, stored in word documents and so forth without having to worry about line
breaks messing up variable names.
If you have an older \pname model containing spaces in variable names,
\pname 2.0 or higher should open this model correctly and perform the conversion itself. If
it does not, consider sending me your model so I can investigate.
\end{itemize}
\section{A brief introduction to causal diagrams}
\label{sec:dagintro}
In this section, we will briefly review what causal diagrams are and how they can be
applied in empirical sciences. For a more detailed account, we recommend
the book \emph{Causality} by Judea Pearl \cite{Pearl2009}, or the
chapter \emph{Causal Diagrams} in the Epidemiology
textbook of Rothman, Greenland, and Lash \cite{RothmanGL2008}.
Also take a look at the web page
\href{http://www.dagitty.net/learn/}{\tt dagitty.net/learn/},
where I am collecting several tutorials (some of them interactive) on specific DAG-related
topics.
In Epidemiology, causal diagrams are also frequently
called \emph{DAGs}.\footnote{The term ``DAG'' is
somewhat confusing to computer scientists and mathematicians,
for whom a DAG is simply an abstract mathematical structure without specific semantics
attached to it.} In a nutshell, a DAG is a graphic model that depicts a set of hypotheses
about the causal process that generates a set of variables of interest.
An arrow $X \to Y$ is drawn if there is a direct
causal effect of $X$ on $Y$. Intuitively, this means that
the natural process determining $Y$ is directly influenced by
the status of $X$, and that altering $X$ via external intervention
would also alter $Y$. However, an arrow $X \to Y$ only represents that part
of the causal effect which is \emph{not} mediated by any of the other variables
in the diagram. If one is certain that $X$ does not
have a direct causal influence on $Y$, then the arrow is omitted.
This has two important implications: (1) arrows should follow time order,
or else the diagram contradicts the basic principle that
causes must precede their effects; (2) the omission of an arrow is a stronger
claim than the inclusion of an arrow -- the presence of an arrow depicts
merely the ``causal null hypothesis'' that $X$ \emph{might} have an effect
on $Y$.
Mathematically, the semantics of an arrow $X \to Y$ can be defined as
follows. Given a DAG $G$ and a variable $Y$ in $G$, let $X_1,\ldots,X_n$
be all variables in $G$ that have direct arrows $X_i \to Y$ (also called the
\emph{parents} of $Y$). Then $G$
claims that the causal process determining the value of $Y$ can be
modelled as a mathematical function $Y := f(X_1,\ldots,X_n,\epsilon_Y)$,
where $\epsilon_Y$ (the ``causal residual'') is a random variable that
is jointly independent of all $X_i$.
For example, the sentence ``smoking causes lung cancer'' could be translated
into the following simple causal diagram:
\begin{center}
\begin{tikzpicture}
\node (sm) {smoking};
\node (c) [below of=sm] {lung cancer};
\draw [->] (sm) -- (c);
\end{tikzpicture}
\end{center}
We would interpret this diagram as
follows: (1) The variable ``smoking'' refers to a person's smoking habit
prior to a later assessment of cancer in that same person;
(2) the natural process by which a person develops
cancer might be influenced by the smoking habits of that person; (3) there
exist no other variables that have a direct influence on both smoking
habits and cancer. A slightly
more complex version of this diagram might look as follows:
\begin{center}
\begin{tikzpicture}
\node (sm) {smoking};
\node [below of=sm] (t) {tar deposit in lungs};
\node (c) [below of=t] {lung cancer};
\draw [->] (sm) -- (t);
\draw [->] (t) -- (c);
\end{tikzpicture}
\end{center}
This diagram is about a person's smoking habits at a time $t_1$,
the tar deposit in her lungs at a later time $t_2$, and finally the
development of lung cancer at an even later time $t_3$. We claim that
(1) the natural process which determines the amount of tar in the
lungs is affected by smoking;
(2) the natural process by which lung cancer develops is affected by
the amount of tar in the lung; (3) the natural
process by which lung cancer develops is not affected by the person's
smoking other than indirectly via the tar deposit;
and finally (4) no variables having
relevant direct influence on more than one variable
of the diagram were omitted.
In an epidemiological context, we are often interested in the putative
effect of a set of variables, called \emph{exposures}, on another set of variables
called \emph{outcomes}. A key question in Epidemiology (and many other empirical sciences)
is: how can we infer the causal effect of an exposure on an outcome of interest from
an observational study? Typically, a simple regression will not suffice due to the
presence of \emph{confounding factors}.
If the assumptions encoded in a given the diagram hold,
then we can infer from this diagram sets of variables
for which to adjust in an observational study to
minimize such confounding bias. For example,
consider the following diagram:
% \begin{figure}[h!]
\begin{center}
\begin{tikzpicture}
\node (sm) at (0,0) {smoking};
\node [anchor=east] (m) at (-1,-1) {carry matches};
\node [anchor=west] (c) at (1,-1) {cancer};
\draw [->,very thick] (sm) -- (c);
\draw [->,very thick] (sm) -- (m);
\draw [->] (m) -- node[midway,above] {?} (c);
\end{tikzpicture}
\end{center}
% \caption{A classical confounding triangle.}
% \label{fig:dag2}
% \end{figure}
If we were to perform an association
study on the relationship between carrying matches
in one's pocket and developing lung cancer, we would probably find a
correlation between these two variables. However, as the above diagram
indicates, this correlation would not imply that carrying matches in
your pocket causes lung cancer: Smokers are more likely to carry matches
in their pockets, and also more likely to develop lung cancer. This is an
example of a \emph{confounded} association between two variables,
which is mediated via the \emph{biasing path} (bold).
In this example, let us assume with a leap of faith that the simplistic diagram
above is accurate. Under this assumption, would we adjust for smoking,
e.g. by averaging separate effect estimates for
smokers and non-smokers, we would
no longer find a correlation between carrying matches and lung cancer.
In other words, adjustment for smoking would \emph{close the biasing path}.
Adjustment sets will be explained in more detail in
Section~\ref{sec:adjustment}.
The purpose of \pname is to aid study design through the identification of
suitable, small sufficient adjustment sets in complex causal diagrams and,
more generally, through the identification of causal and biasing paths as
well as testable implications in a given diagram.
\section{Loading, saving and sharing diagrams}
\label{sec:diagramcreation}
This section covers the three basic steps of working with \pname:
(1) loading a diagram; (2) manipulating the graphical layout of the diagram; and
(3) saving the diagram.
% The first step for working with \pname is to create a causal network.
First of all, any causal diagram consists of vertices (variables) and arrows
(direct causal effects).
You can either create the diagram directly using \pname's graphical user
interface (explained in the next section),
or prepare a textual diagram description in a word processor
and then import this
description into \pname. In addition, \pname
contains some pre-defined examples that you can use to become familiar
with the program and with DAGs in general.
To do so, just select one of the pre-defined examples from
the \action{Examples} menu.
\subsection{\pname's textual syntax for causal diagrams}
\pname's textual syntax for causal diagrams is based on the one used
by the DAG program by Sven Kn\"{u}ppel \cite{KnueppelS2010}. A diagram description
(\emph{model code}, somewhat clumsily
called \emph{model text data} in older \pname versions)
consists of two parts:
\begin{enumerate}
\item A list of the variables in the diagram
\item A list of arrows between the variables
\end{enumerate}
The list of variables consists of one variable per line.
After each variable name follows
a character that indicates the status of the variable, which can
be one of ``1'' (normal variable), ``A'' (adjusted for), ``U'' (latent/unobserved),
``E'' (exposure), or ``O'' (outcome). If you prepare your diagram description
in a word processor rather than constructing the diagram in \pname itself,
you may encounter problems when you use spaces or other special symbols
in variable names (e.g. instead of ``patient sex'' you should write
``patient\_sex''). This restriction does not apply when you construct the
diagram using \pname's graphical user interface.
The list of arrows consists of several lines each starting
with a start variable name, followed by one or more other target variables
that the start variable is connected to. Figure~\ref{fig:syntaxexample}
contains a worked example of a textual model description. When you modify
a diagram within \pname, the vertex labels will be augmented by additional information
to help \pname remember the layout of the vertices and for other purposes (see
rightmost column in Figure~\ref{fig:syntaxexample}).
\begin{figure}
\centering
\begin{minipage}[t]{3cm}
{\bf (a)} \emph{vertex labels}\\
E E \\
D O \\
A 1 \\
B 1 \\
Z 1
\end{minipage}
\begin{minipage}[t]{3cm}
{\bf (b)} \emph{adjacency list}\\
E D \\
A E Z \\
B D Z \\
Z E D
\end{minipage}
\begin{minipage}[t]{3cm}
{\bf (c)} \emph{resulting graph}\\
\begin{tikzpicture}
\node (e) at (-1,0) {E};
\node (d) at (1,0) {D};
\node (z) at (0,1) {Z};
\node (a) at (-1,2) {A};
\node (b) at (1,2) {B};
\draw [->] (a) -- (z); \draw [->] (b) -- (z);
\draw [->] (a) -- (e); \draw [->] (b) -- (d);
\draw [->] (z) -- (e); \draw [->] (z) -- (d);
\draw [->] (e) -- (d);
\end{tikzpicture}
\end{minipage}
\hskip .5cm
\begin{minipage}[t]{4cm}
{\bf (d)} \emph{augmented vertex labels}\\
E E @-2.2,1.6 \\
D O @1.4,1.6 \\
A 1 @-2.2,-1.5 \\
B 1 @1.4,-1.5 \\
Z 1 @-0.3,-0.1
\end{minipage}
\caption{Example for a textual model definition with \pname (a,b: model code;
c: resulting diagram). When the diagram is edited within
\pname, the vertex labels and adjustment status are augmented with
additional information that \pname uses to layout
the vertices on the canvas (d): the
layout coordinates of each variable are indicated behind the @ sign.
}
\label{fig:syntaxexample}
\end{figure}
\subsection{Loading a textually defined diagram into \pname}
To load a textually defined diagram into \pname, simply copy\&paste the
variable list, followed by a blank line,
followed by the list of arrows into the \action{Model code} text
box. Then click on \action{Update DAG}.
\pname will now generate a preliminary graphical layout for your diagram on
the canvas, which may not yet look the way you intended, but can be freely
modified.
\subsection{Modifying the graphical layout of a diagram}
To layout the vertices and arrows of your diagram more clearly than \pname
did, simply drag the vertices with your mouse on the canvas. You may notice
that \pname modifies the information in the \action{Model code} field
on the fly, and augments it with additional position information for each
vertex. In general, all changes you make to your diagram within \pname
are immediately reflected in the model code.
\subsection{Saving the diagram}
To save your diagram locally, just copy\&paste the contents of the \action{Model code}
field to a text file,
and save that file locally to your computer\footnote{This is most easily done
by clicking in the text field, pressing \action{CTRL + A} to select the entire content
of the text field, then pressing \action{CTRL + C} to copy the content. You
can then paste the content in another program using \action{CTRL + V}.}.
When you wish to continue working on the
diagram, copy the model code back into \pname as explained above.
\subsection{Exporting the diagram}
\pname can export the diagram as a PDF or SVG vector graphic (publication quality)
or a JPEG or PNG bitmap graphic (e.g. for inclusion in Powerpoint). Select the
corresponding function from the \action{Model} menu. If you want to edit the graphical
layout of the diagram or annotate it, it is recommended to export the diagram as an
SVG file and open that in a vector graphics program such as Inkscape.
\subsection{Publishing diagrams online}
Part of the appeal of using DAGs is that the assumptions underlying
one's research are made explicit, and the conclusions drawn from
the data can be later re-checked if some of the assumptions are found
to not hold. Of course, this requires to make the DAG available together
with the data and interpretation. I have however seen many articles
where people report having used DAGs but do not actually show them.
If researchers, reviewers or editors deem it inappropriate to include the
DAG (or its model code) in the manuscript itself, here's another option:
Store the DAG on the \pname website and get a short URL under which
this DAG will be accessible. Then include this URL in the manuscript,
or its supporting information. For example, one of the \pname examples
is stored at the URL \href{http://dagitty.net/mvcFQ}{\tt dagitty.net/mvcFQ}.
Here's how it works: Draw your DAG to full satisfaction, then choose
\action{Publish on dagitty.net} from the \action{Model} menu. You have
two options how to publish your DAG: anonymously, or linking it to an e-mail
address. If you store the DAG anonymously, you will later on not be able
to edit it or delete it from the server.
After choosing \action{Publish on dagitty.net} from the \action{Model} menu,
a small form will appear where you can enter some metadata on the DAG,
and provide your e-mail address if you so wish. Upon clicking \action{Publish},
the DAG will be sent to the dagitty.net server, and you will receive a URL under
which the DAG is now available. If you provided your e-mail address, you
will also receive a message requesting you to confirm your ownership of the DAG.
This is simply done by clicking on a confirmation link. Only then will the DAG
be linked to your e-mail address, and you will receive a password to use when
deleting or modifying the published DAG.
If you did link your DAG to your e-mail address, you can delete it by choosing
\action{Delete on dagitty.net} from the \action{Model} menu, which will prompt you
to enter the DAG's URL and the password. If the URL and password match,
the DAG will be deleted.
Similarly, you can update a stored DAG using the \action{Load from dagitty.net} function
from the \action{Model} menu, modifying it, and saving it again.
You can view published DAGs (if you know their URL)
by just putting the URL into your address bar of course, but you can also do
so using the \action{Load from dagitty.net} function.
Please note that all DAGs stored on dagitty.net are meant to be public information.
Do not store any data that you consider private or in any way secret.
Once stored on dagitty.net, every person in the world who knows your DAG's URL
can view it (but not your e-mail address if you provided one). Also note that there is
no guarantee that dagitty.net will keep running forever. Storing your DAGs is done at
your own risk. Still, you may find this feature useful, for instance to e-mail your DAGs
to colleagues or to include links to DAGs in papers under review. For archival
purposes, it may be more appropriate to include the DAG or the model code in
the paper itself or its supporting information.
\section{Editing diagrams using the graphical user interface}
\label{sec:diagramediting}
You are free to make changes directly to the textual description of your
diagram, which will be reflected on the canvas next time you click on \action{Update DAG}.
However, you can also create, modify, and delete vertices and arrows graphically
using the mouse.
\subsection{Creating a new diagram}
To create a new diagram, select \action{New Model} from the \action{Model} menu.
You will be asked
for the names of the exposure and the outcome variable, and an initial model containing
just those variables and an arrow between them will be drawn. Then you can
add variables and arrows to the model as explained below.
\subsection{Adding new variables}
To add a new variable to the model, double-click on a free space in the canvas
(i.e., not on an existing variable) or press the \action{n} key. A dialog will
pop up asking you for the name of the new variable. Enter the name into the dialog
and press the enter key or click \action{OK}. If you click \action{Cancel}, no new
variable will be created.
\subsection{Renaming variables}
To rename an existing variable, move the
mouse pointer over that variable and hit the \action{r} key. A dialog
will pop up allowing you to change the variable name.
\subsection{Setting the status of a variable}
Variables can have one of the following
statuses:
\begin{itemize}
\item Exposure
\item Outcome
\item Unobserved (latent)
\item Adjusted
\item Other
\end{itemize}
To turn a variable into an exposure, move the
mouse pointer over that variable and hit the \action{e} key;
for an outcome, hit the \action{o} key instead.
To toggle whether a variable is observed or unobserved,
hit the \action{u} key; to toggle whether it is
adjusted, hit the \action{a} key.
Changing the status of variables may change the
colors of the diagram vertices to reflect the new structure
and information flow in the diagram (see below).
At present, these statuses are mutually exclusive -- e.g.,
a variable cannot be both unobserved and adjusted
or both exposure and unobserved. This could change
in future versions of \pname.
\subsection{Adding new arrows}
To add a new arrow, double-click first on the source vertex
(which will become highlighted) and then on the target vertex.
The arrow will be inserted. If an arrow existed before
in the opposite direction, that arrow will be deleted, because otherwise
there would now be a cycle in the model.
Instead of double-clicking on a vertex, you can also move the mouse pointer
over the vertex and press the key \action{c}. Arrows are by default
drawn using a straight line,
but you can change that moving the mouse pointer to the line,
pressing and holding down the left mouse button, and ``bending''
the line by dragging as appropriate.
\subsection{Deleting variables}
To delete a variable, move the mouse pointer over that variable
and hit the \action{del} key on your keyboard, or alternatively
the \action{d} key (the latter comes in handy if you're on a Mac, which
has no real delete key).
All arrows to that variable will be deleted along
with the variable. In contrast to \pname versions prior to 2.0,
all variables can now be deleted including exposure and outcome.
\subsection{Deleting arrows}
An arrow is deleted just like it has been inserted,
i.e., by double-clicking first on the start
variable and then on the target variable. An arrow
is also deleted automatically if a new one is
inserted in the opposite direction (see above).
\subsection{Choosing the style of display}
At present, you can choose between two DAG diagram styles: ``classic'', where nodes and
their labels are separate from each other, and SEM-like, where labels are inside nodes.
Both have their advantages and disadvantages. By the way, ``SEM'' refers to structural
equation modeling.
\section{Analyzing diagrams}
\label{sec:diagramanalysis}
\subsection{Paths}
Causal diagrams contain two different kinds of paths between exposure and outcome
variables.
\begin{itemize}
\item \emph{Causal paths} start at the exposure, contain
only arrows pointing away from the exposure, and end at
the outcome. That is, they have the form
$e \rightarrow x_1 \rightarrow \ldots \rightarrow x_k \rightarrow o$.
\item \emph{Biasing paths} are all other paths from exposure to outcome. For
example, such paths can have the form
$e \leftarrow x_1 \rightarrow \ldots \rightarrow x_k \rightarrow o$.
\end{itemize}
With respect to a set $\mathbf{Z}$ of conditioning variables (that
can also be empty if we are not conditioning on anything),
paths can be either \emph{open} or \emph{closed}
(also called d-separated \cite{Pearl2009}). A path
is \emph{closed} by $\mathbf{Z}$ if one or both of the following holds:
\begin{itemize}
\item The path $p$ contains a chain $x \rightarrow m \rightarrow y$
or a fork $x \leftarrow m \rightarrow y$
such that $m$ is in $\mathbf{Z}$.
\item The path $p$ contains a collider $x \rightarrow c \leftarrow y$
such that $c$ is not in $\mathbf{Z}$ and
furthermore, $\mathbf{Z}$ does not contain
any successor of $c$ in the graph.
\end{itemize}
Otherwise, the path is \emph{open}.
The above criteria imply that paths consisting of only one
arrow are always open, no matter the content of $\mathbf{Z}$.
Also it is possible that a path is closed with respect
to the empty set $\mathbf{Z}=\{\}$.
\subsection{Coloring}
It is not easy to verify by hand which paths are open and
which paths are closed, especially in larger diagrams.
\pname highlights all arrows lying on open
biasing paths in red and all arrows lying on open
causal paths in green. This highlighting is optional
and is controlled via
the \action{highlight causal paths} and \action{highlight biasing paths}
checkboxes.
\subsection{Effect analysis}
As mentioned above, arrows in DAGs represent \emph{direct effects}. That is, in a DAG
with three variables $X$, $M$, and $Y$, an arrow $X \to Y$ means that there is a causal
effect of $X$ on $Y$ that is \emph{not} mediated through the variable $M$.
Often when building DAGs, people tend to forget this aspect and think only about whether
any kind of causal effect exists, without paying attention to how it is mediated. This
may result in DAGs with too many arrows.
To aid users with this, George Ellison (Leeds University) suggested to implement a
function that identifies arrows for which also a corresponding indirect pathway exists.
After drawing an initial DAG, one might reconsider these arrows and judge whether they
are really necessary given the indirect pathways already present in the diagram.
For example, suppose after thinking about the pairwise causal relationships between
our variables $X$, $M$, $Y$ we came up with this DAG:
\begin{tikzpicture}
\node (x) at (0,0) {$X$};
\node (m) at (1,0) {$M$};
\node (y) at (2,0) {$Y$};
\draw [very thick,->] (x) -- (m);
\draw [very thick,->] (m) -- (y);
\draw [bend left=40,->] (x) edge (y);
\end{tikzpicture}
For the arrows drawn in bold, there is no corresponding indirect path -- removing
one of these arrows from the diagram means that there will no longer be any causal
effect between the corresponding variables. These arrows are called \emph{atomic direct
effects} in \pname, and they can be highlighted -- like in the above DAG -- by ticking
the checkbox with that name.
On the other hand, for the thin arrow
$X \to Y$, there is also the indirect pathway $X \to M \to Y$. One may therefore reconsider
whether the arrow $X \to Y$ is truly necessary --
perhaps the causal effect from $X$ to $Y$ is entirely mediated through $M$.
\subsection{View mode}
There are several ways to transform a given DAG such that it becomes better suited for
a particular purpose. We call such a transformed DAG a \emph{derived graph}. Currently
\pname can display two kinds of derived graphs: correlation graphs, and moral graphs.
These derived graphs can be shown by clicking on the respective radio button in the
\action{View mode} field on the left-hand side of the screen.
\subsubsection{The correlation graph}
The correlation graph is not a DAG, but a simple graph with lines instead of arrows.
It connects each pair of variables that, according to the diagram, could be statistically
dependent. In other words, variables not connected by a line in the correlation graph
must be statistically independent. These pairwise independencies are also listed in the
\action{Testable implications} field on the right-hand side of the screen, and so the
correlation graph could be seen as encoding a subset of those implications.
Although this is not implemented in \pname yet, it is also possible to take a given
correlation graph (which can be obtained e.g. by thresholding a covariance matrix) and
list all the DAGs that are ``compatible'' with it in the sense that they entail exactly
the given correlation graph \cite{textor15_uai}.
\subsubsection{The moral graph}
To identify minimal sufficient adjustment sets, \pname uses the so-called ``moral graph'',
which results from a transformation of the model to an undirected graph.
This procedure is also highly recommended if you wish to verify the calculation by hand.
See the nice explanation by Shrier and Platt \cite{ShrierP2008} for details on this
procedure.
In \pname, you can switch between display of the model and its moral graph
choosing ``moral graph'' in the``view mode'' section on the left-hand
side of the page.
\subsection{Causal effect identification}
Some of the most important features of \pname are concerned with the question: how
can causal effects be estimated from observational data? Currently, two types of causal
effect identification are supported: adjustment sets, and instrumental variables.
\subsubsection{Adjustment sets}
\label{sec:adjustment}
Finding sufficient adjustment sets is one main purpose
of \pname. In a nutshell, a sufficient adjustment
set $\mathbf{Z}$ is a set of covariates such that adjustment,
stratification, or selection (e.g. by restriction or
matching) will minimize bias when estimating the causal
effect of the exposure on the outcome (assuming that
the causal assumptions encoded in the diagram hold).
You can read more about controlling bias and confounding in Pearl's textbook,
chapter 3.3 and epilogue \cite{Pearl2009}.
Moreover, Shrier and Platt \cite{ShrierP2008}
give a nice step-by-step tutorial on how to test if a set of covariates
is a sufficient adjustment set.
To identify adjustment sets, the diagram must contain at least
one exposure and at least one outcome.
\paragraph{Total and direct effects.}
One can understand adjustment sets graphically by viewing an adjustment set
as a set $\mathbf{Z}$ that closes all all biasing
paths while keeping desired causal paths open
(see previous section).
\pname considers two kinds of adjustment sets:
\begin{itemize}
\item Adjustment sets for the \emph{total effect} are sets that
close all biasing paths and leave all causal paths
open.
In the literature, if the effect is not mentioned
(e.g. \cite{ShrierP2008,KnueppelS2010}), then usually
this kind of adjustment set is meant.
\item Adjustment sets for the \emph{direct effect} are sets that
close all biasing paths and all causal paths, and leave only
the direct arrow from exposure $X$ to outcome $Y$ (i.e., the path
$X \rightarrow Y$, if it exists) open.
\end{itemize}
In a diagram where the only causal path between exposure
and outcome is the path $X \rightarrow Y$, the total effect
and the direct effect are equal. This is true e.g. for the
diagram in Figure~\ref{fig:syntaxexample}. An example diagram where the
direct and total effects are not equal is shown in
Figure~\ref{fig:effects}.
\begin{figure}
\begin{center}
\begin{tikzpicture} [scale=2]
\tikzstyle{vertex}=[rectangle]
\node at(-1,.5)(C1) [vertex]{$C_{1}$};
\node at(1,-.5)(C2)[vertex]{$C_{2}$};
\node at(2,0) (Y) [vertex]{$Y$}
edge [<-] (C2);
\node at(1,.5) (M) [vertex]{$M$}
edge [->,thick,densely dashed] (Y)
edge [<-] (C1);
\node at(0,0) (X) [vertex]{$X$}
edge [->,thick,densely dashed] (M)
edge [<-] (C1)
edge [<-] (C2);
\draw (X) edge [->,thick] (Y);
\end{tikzpicture}
\end{center}
\caption{A causal diagram where the total and direct
effects of exposure $X$ on outcome $Y$ are not equal. The total effect is the effect
mediated only via the thick (both dashed and solid) arrows,
while the direct effect is the effect mediated only
via the thick arrow.}
\label{fig:effects}
\end{figure}
As proved by Lauritzen et al. \cite{Lauritzen1990}
(see also Tian et al. \cite{TianPP1998}),
it suffices to restrict our attention to the part of the model that consists of exposure,
outcome, and their ancestors for identifying sufficient adjustment sets. This is indicated
by \pname by coloring irrelevant nodes in gray. The relevant variables are colored according
to which node they are ancestors of (exposure, outcome, or both) -- see the legend on the
left-hand side of the screen. The highlighting may be turned on and off by toggling the
\action{highlight ancestors} checkbox.
\paragraph{Minimal sufficient adjustment sets.}
A \emph{minimal} sufficient adjustment set is a sufficient
adjustment set of which no proper subset is itself sufficient. For example,
consider again the causal diagram in Figure~\ref{fig:syntaxexample}.
The following three sets are sufficient adjustment sets
for the total and direct effects, which are equal in this case:
$$ \{A,B,Z\} $$
$$ \{A,Z\} $$
$$ \{B,Z\} $$
Each of these sets is sufficient because it closes all biasing paths and leaves
the causal path open. The sets $\{A,Z\}$ and $\{B,Z\}$ are minimal sufficient adjustment
sets while the set $\{A,B,Z\}$ is sufficient, but not minimal.
In contrast, the set $\{Z\}$ is \emph{not} sufficient, since this would
open
the path $E \leftarrow A \rightarrow Z \leftarrow B \leftarrow D$:
Because both $E$ and
$D$ depend on $Z$, adjusting for $Z$ will induce
additional correlation between $E$ and $D$.
\paragraph{Finding minimal sufficient adjustment sets.}
To find minimal sufficient adjustment sets, select the
option \action{Adjustment (total effect)}
or \action{Adjustment (direct effect)} in the
\action{Causal effect identification} field.
\pname
will then calculate all minimal sufficient adjustment sets and display
them in that field. Any changes made to the diagram will be instantly
reflected in the list of adjustment sets.
\paragraph{Forcing adjustment for specific covariates.}
You can also tell \pname that you wish a specific covariate
to be included into every adjustment set. To do this, move the
mouse over the vertex of that covariate and press the \emph{a}
key. \pname will then update the list of minimal sufficient
adjustment sets accordingly -- every set displayed
is now minimal in the sense that removing any variable
\emph{except those you specified} will render
that set insufficient. However, when you adjust for an intermediate
or another descendant of the exposure, \pname will tell you that it is no longer
possible to find a valid adjustment set.
\paragraph{Avoiding adjustment for unobserved covariates.}
You can tell \pname that a certain variable is unobserved
(e.g. not measured at present, or not measurable because it is a
latent variable) by moving the mouse
over that covariate and pressing the \emph{u} key.
\pname will only calculate adjustment sets that do
not contain unobserved variables. However, if too many or
some important variables are unobserved, then it may
be impossible to close all biasing paths.
\subsubsection{Instrumental variables}
Sometimes it is not possible to estimate a causal effect by simple covariate adjustment.
For example, this is the case whenever there is an unobserved confounder that directly
effects the exposure and outcome variables. However, this does not necessarily
mean that it is impossible to estimate the causal effect at all.
\emph{Instrumental variable regression} is a technique that is
often used in situations with unobserved confounders.
Note that this technique depends on linearity assumptions. For further information
on instrumental variables, please refer to the literature \cite{AngristIR96,imbens14}.
\pname can find instrumental variables in DAGs, as explained below.
The validity of an instrumental variable $I$ depends on two causal conditions --
exogeneity and exclusion restriction. These two conditions can be expressed
in the language of DAGs and paths as follows: (1) there must be an open path between
$I$ and the exposure $X$; and (2) all paths between $I$ and the outcome $Y$ must be
closed in a modified graph where all edges out of $X$ are removed.
A variable that fulfills these two conditions is called an \emph{instrumental variable}
or simply an \emph{instrument}.
Instrumental variables can also be generalized such that the two conditions are required
to hold \emph{conditional on a set of covariates} $\mathbf{Z}$ \cite{BritoPearlUAI02}.
The two conditions then read as follows:
(1) there must be a path between
$I$ and $X$ that is opened by $\mathbf{Z}$; and (2)
all paths between $I$ and $Y$ must be closed by $\mathbf{Z}$
in a modified graph where all edges out of $X$ are removed.
A variable that fulfills these two conditions is called a
\emph{conditional instrument}.
\pname will find both ``classic'' and conditional instruments when the option
\action{Instrumental Variable} is selected under the \action{Causal effect identification}
field. Note that \pname will not always list \emph{all} possible instruments;
instead, it will restrict itself to a certain well-defined subset that we call
``ancestral instruments''. However, whenever any instrument or conditional instrument
exists at all, then \pname is guaranteed to find one. Note also that if there are several
instruments available, then it is best to choose the one that is most strongly
correlated with $X$ (conditional on $\mathbf{Z}$ in the case of a conditional instrument).
For details regarding ancestral instruments and how \pname computes them, please
refer to the research paper where we describe these methods \cite{zander15_instrument}.
\subsection{Testable implications}
Any implications that are obtained from a causal diagram,
such as possible adjustment sets or instrumental variables,
are of course dependent on the
assumptions encoded in the diagram.
To some extent, these assumptions can be tested via the
(conditional) independences implied by the diagram: If two
variables $X$ and $Y$ are $d$-separated by a set $\mathbf{Z}$, then
$X$ and $Y$ should be conditionally independent
given $\mathbf{Z}$. The converse is not true: Two variables
$X$ and $Y$ can be independent given a set $\mathbf{Z}$ even though they
are not $d$-separated in the diagram. Furthermore, two
variables can also be $d$-separated by the empty set $\mathbf{Z}=\emptyset$.
In that case, the diagram implies that $X$ and $Y$ are
\emph{unconditionally} independent.
\pname displays all minimal testable implications
in the \action{Testable implications} text field. Only such
implications will be displayed that are in fact testable,
i.e., that do not involve any unobserved variables. Note
that the set of testable implications displayed by \pname
does not constitute a ``basis set'' \cite{Pearl2009}.
Future versions will allow choosing between different basis sets.
In general, the less arrows a diagram contains, the more testable
predictions it implies. For this reason, ``simpler'' models with
fewer arrows are in general easier to falsify (Occam's razor).
\section{Acknowledgements}
I would like to thank my collaborators Maciej Li\'skiewicz
and Benito van der Zander (both at the Institute for Theoretical Computer Science,
University of L\"ubeck, Germany) for our collaborations on developing efficient algorithms
to analyze causal diagrams.
I also thank Michael Elberfeld, Juliane Hardt,
Sven Kn\"{u}ppel, Keith Marcus, Judea Pearl, Sabine Schipf,
and Felix Thoemmes (in alphabetical order) for enlightening discussions
(either in person, per e-mail, or on the SEMnet discussion list)
about DAGs that made this program possible. Furthermore, I thank
Robert Balshaw, George Ellison,
Marlene Egger, Angelo Franchini, Ulrike F\"{o}rster, Mark Gilthorpe,
Dirk van Kampen,
Jeff Martin, Jillian Martin, Karl Micha\"{e}lsson,
David Tritchler, Eric Vittinghof,
and other users for sending feedback and bug reports that greatly
helped to improve \pname.
The development of DAGitty was sponsored by funding from the
Institute of Genetics, Health and Therapeutics at Leeds
University, UK. I thank George Ellison for arranging this
generous support.
\section{Legal notice}
Use of \pname is (and will always be) freely permitted and free of charge.
You may download a copy of \pname's source code from its website at \url{www.dagitty.net}.
The source code is available under the GNU General Public License (GPL),
either version 2.0, or any later version, at the licensee's choice;
see the file \verb|LICENSE.txt| in the download archive for details.
In particular, the GPL permits you to modify and redistribute the source
as you please as long as the result remains itself under the GPL.
\section{Bundled libraries}
\pname ships along with the following JavaScript libraries:
\begin{itemize}
\item \emph{Prototype.js}, a framework that makes life with JavaScript much easier. Only some
parts of Prototype (mainly those focusing on data structures) are included to keep the code small. Developed
by the Prototype Core Team and licensed under the MIT license \cite{Prototype2010}.
\end{itemize}
Furthermore, \pname uses some modified code from the \emph{Dracula Graph Library} by
Philipp Strathausen, which is also licensed under the MIT license \cite{Dracula2010}.
Versions of \pname prior to 2.0 used the \emph{Rapha\"el} library for smooth
cross-browser vector graphics in SVG and VML, developed by Dmitry Baranovskiy
\cite{Raphael2010}. However, the dependency on \emph{Rapha\"el} was removed starting
from version 2.0, and only SVG-capable browsers will be supported in the future.
I am grateful to the authors of these libraries for their valuable work.
\section{Bundled examples}
\pname contains some builtin examples for didactic and illustrative purposes.
Some of these examples are taken from published papers or talks given at
scientific meetings. These are, in inverse chronological order:
\begin{itemize}
\item van Kampen 2014 \cite{Kampen2014}
\item Polzer et al., 2012 \cite{Polzer2012}
\item Schipf et al., 2010 \cite{Schipf2010}
\item Shrier \& Pratt, 2008 \cite{ShrierP2008}
\item Sebastiani et al.\footnote{The example actually shows only a small part of their DAG.},
2005 \cite{Sebastiani2005}
\item Acid \& de Campos, 1996 \cite{Acid1996}
\end{itemize}
Another example was provided by Felix Thoemmes
via personal communication (2013).
\section{Author contact}
I would be glad to receive feedback from those
who use \pname for research or educational purposes.
Also, you are welcome to send me your
suggestions or requests for features that
you miss in \pname.
\bigskip
\noindent