-
Notifications
You must be signed in to change notification settings - Fork 1
/
BioLinux_Class_head.txt
173 lines (114 loc) · 7.52 KB
/
BioLinux_Class_head.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
= BioLinux: a gentle introduction to Bioinformatics on UCI's HPC Linux cluster
Instructors: Jenny Wu <[email protected]> and Harry Mangalam <[email protected]>
//== BioLinux: Bioinformatics without Prozac in 2 days.
Presented by:
- http://ghtf.biochem.uci.edu/[The Genomics High Throughput Facility]
- http://www.igb.uci.edu/[Institute of Genomics & Bioinformatics]
- http://www.oit.uci.edu[Office of Information Technology]
// export fileroot="/home/hjm/nacs/BioLinux_Class_head"; asciidoc -a icons -a toc2 -b html5 -a numbered ${fileroot}.txt; scp ${fileroot}.html moo:~/public_html/biolinux/index.html; scp ${fileroot}.html [email protected]:/data/hpc/www/biolinux/index.html;
== Current Status
We will be presenting the class on Tue, Jan 20 (Linux) and Thurs, Jan 22nd (Bioinformatics).
If you have specific questions about the Bioinformatics-related parts of the course,
please contact Jenny Wu <[email protected]> for more information.
If it's Linux/Programming related, please contact Harry Mangalam <[email protected]>.
== General Information
These are full-day classes that teach the basics of Bioinformatics on UCI's
http://hpc.oit.uci.edu[HPC compute cluster] which runs the Linux Operating System.
If you are only interested in Linux, you can skip the Bioinformatics session.
Conversely, if you already know your way around Linux and the HPC cluster, you
may wish to skip the Linux parts. You can tell if you are up to speed by browsing
the Linux Lecture slides and Tutorial scripts below.
== Class Lecture Slides and Tutorial Notes
You can determine if you're interested in taking the class by reviewing the
class slides and tutorial scripts below.
// http://moo.nac.uci.edu/%7ehjm/biolinux/Intro_to_BioLinux_R_bash_Perl.pdf
=== Linux
- http://moo.nac.uci.edu/%7ehjm/biolinux/Intro_to_BioLinux_Linux.pdf[Class slides (PDF)
for the Linux part]
- http://moo.nac.uci.edu/%7ehjm/biolinux/Linux_Tutorial_12.html[Consolidated
Tutorial Script for the Linux part]
As preparation for the Linux part, we strongly suggest viewing the
http://software-carpentry.org/4_0/shell/index.html[Software Carpentry introduction
to the shell] videos
=== Bioinformatics
- http://hpc.oit.uci.edu/biolinux/workshop13_day1.pptx[Class slides for the BioInformatics part]
- http://hpc.oit.uci.edu/biolinux/handson.docx[Tutorial Script for the BioInformatics part]
=== Tutorial Data
Input and example data files for the tutorials http://hpc.oit.uci.edu/biolinux/data[are
stored here], which is a browsable directory. If there is a file called
http://hpc.oit.uci.edu/biolinux/data/MANIFEST[MANIFEST], please read it for
descriptions of the files you find there.
== Target Audience
This class was designed for faculty, postdocs and graduate students who are
working on genomics and other large analytical projects and want a quick introduction
to cluster computing with Linux and Bioinformatics. It assumes that the participants
will be naive Linux users with some idea of the analysis they want to achieve.
This time the course has been split to into a Linux part (which may be of interest
to non-BioSci students as well) and a Bioinformatics part.
Mixing metaphors, this is not a Computer Science course. We will 'not' be teaching
you how the engine works; we will be teaching you how to drive.
== Class Calendar & Location
=== When
The 2 day course will be held 9a-5p:
- Tuesday, Jan 20th (Linux) - Rm 3011, Bren Hall
- Thursday, Jan 22nd (Bioinformatics) - Rm 4011, Bren Hall
=== Where
Bren Hall 3011 Bldg 314 ('H8' on the
http://communications.uci.edu//documents/pdf/UCI_13_map_campus_core.pdf[Campus Map]),
or https://maps.google.com/?q=33.64341,-117.84190[here on Google Maps]
Look for signs at the main doors.
//or https://www.google.com/maps/preview/@33.64341,-117.84190,19z
== Availability, Cost, & Deadlines
The class will consist of a morning lecture followed by an extended tutorial
that will last thru the afternoon. Both sessions are available to the entire
UCI community, but YOU MUST BE REGISTERED TO ATTEND. The Linux session is free to
the attendees (thanks to the Data Sciences program). The Bioinformatics session
costs $50 so if your interest is in learning Linux on the HPC system, but not Bioinformatics,
you can attend only the Linux part.
Both sessions include coffee, mid-session snacks and lunch.
Sign up for both sessions and pay if needed
https://conference.ics.uci.edu/register/?conf_id=qwwl2N[at this link]
by Jan 16th to assure your place in the class.
For the tutorial, you must also have an http://hpc.oit.uci.edu[HPC cluster] account,
which is free to all UCI researchers. Send email to <[email protected]> to obtain one.
== Day 1 - Linux and the HPC cluster
=== Lecture
Introduction to Linux and why you should use it. Overview of commands and getting around.
What is/isn't a cluster, logging in with ssh, setting up your environment, text editors,
quotas, data management, graphics, useful bash shell commands, environment
variables, pattern matching and regular expressions, programs: how to find
them, find out about them, run them, simple debugging.
=== Tutorial
Logging in with ssh, commandline editing, setting your prompt, transferring data in, editing, de/compressing, unpacking, basic bash and utility commands, cluster status commands, software modules, Grid Engine commands. Introduction to simple data manipulation and scripting/ programming with bash, Perl, and R.
Bring your specific problems to discuss with the Instructor & TAs.
== Day 2 - Bioinformatics
=== Lecture
NGS workflow, general data analysis pipeline, data format, short read
mapping, alignment software, general workflow for DNA-seq, RNA-seq and
ChIP-seq and corresponding software resources. Data visualization
tools.
RNA-seq data analysis: data normalization, spliced mapping,
transcriptome assembly and abundance estimation. Novel transcripts.
Tophat/cufflinks/cuffdiff parameter details. Open Source RNA-seq
software with graphical user interface. ChIP-seq work flow and
software.
=== Tutorial
Getting reference genome sequences to HPC, getting reference
annotation. FASTA and FASTQ format of data. Prepareing reference
sequence for alignment. Using tophat/bowtie to align short reads.
Use the alignment obtained and run cuffmerge, cufflinks and cuffdiff.
Examine cuffdiff output. Depending on the
time and level of students, cummeRbund can also be covered.
ChIP-seq with Galaxy if needed.
== Tutorial Data Sets
The data set directory http://hpc.oit.uci.edu/biolinux/data/[is browsable here]
== Prerequisites for the tutorial:
- a Mac, PC, or Linux laptop with wifi pre-registered with the http://www.oit.uci.edu/mobile/[UCI Mobile network].
- If Mac: +
http://cyberduck.ch[CyberDuck] or other graphical file transfer program (GFTP) + http://code.x2go.org/releases/binary-macosx/x2goclient/releases/3.99.2.1/x2goclient-3.99.2.1.dmg[the Mac x2go client]. We have had problems with the 4.0 release; please use the *3.99.2.1* release linked above. Also, to get it working correctly with your X11 software, please start the X11 software first, THEN start x2go.
- If Windows: +
http://goo.gl/XbTF[the putty terminal program] +
http://cyberduck.ch[CyberDuck], http://winscp.net[WinSCP] or other GFTP client. +
Optionally, the http://code.x2go.org/releases/binary-win32/x2goclient/releases/4.0.0.3/x2goclient-4.0.0.3-setup.exe[the Windows x2go client], altho we have discovered that there are some applications that refuse to work with it.
- If Linux: +
The http://wiki.x2go.org/doku.php/doc:installation:x2goclient[x2go client] allows you to view graphical output from the HPC cluster.