From f6ce90023bf58e165a3026096e56766407a9dca0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dan=20Ros=C3=A9n?=
Your task is to write a Python program that will extract a disease-causing transcript from the CFTR gene, translate the gene sequence to its corresponding amino-acid sequence and based on the reference amino-acid sequence determine whether any of the five given individuals is affected.
-Download the lecture slides from here.
+Download the lecture slides from here.
@@ -52,7 +52,7 @@ Human reference annotation file (`GTF` format): If you are not familiar with the file formats, read up online on how the files are structured. For example, here you can find a short description of the different (tab-delimited) fields of a GTF file. -Some of the tasks involve outputting long sequences. To make sure they are correct, use theutils.check_answers
package (from the downloads folder from the course topics website). You can import it that way:
+Some of the tasks involve outputting long sequences. To make sure they are correct, use the utils.check_answers
package (from the downloads folder from the course topics website). You can import it that way:
from utils import check_answers
More detailed instructions are given with each task that uses the package.
@@ -196,7 +196,7 @@ In the annotation file (the GTF file), the CFTR gene has the id `ENSG00000001626
-6. Translate the above sequence of all exons into amino acids, using an implementation of the translation table from the utils.rna
package (from the downloads folder from the course topics website).
+6. Translate the above sequence of all exons into amino acids, using an implementation of the translation table from the utils.rna
package (from the downloads folder from the course topics website).
Open the reference fasta file and read it line by line.
+ + The commands below are for Mac and Linux and should also work on Windows Subsystem for Linux. + You can get help in the terminal by writing the command name followed by--help
, such as cd --help
.
+ Naturally you can also search the web!
+
+ cd
to change directory and pwd
to print the working (current) directory.mkdir
or a file explorer. Check that the files are there using by listing the directory's contents with ls
.mv
. Again, use ls
to see that the files end up where you want them to be..gz
indicates
+ gzip compression which can be decompressed using gunzip
.
+ The command for .zip
files is unzip
.cat
, head
and tail
.Open the reference fasta file and read it line by line. Study the example in the lecture!
In a loop, ignore the first line and get the length of each following line.
Don't forget to remove the trailing newline character from each line.
Sum up all the lengths you found.
@@ -78,7 +110,7 @@ More detailed instructions are given with each task that uses the package.