Skip to content

Commit

Permalink
fix various typos
Browse files Browse the repository at this point in the history
  • Loading branch information
hmehlan committed Nov 7, 2018
1 parent aee62ae commit 3d17cc2
Show file tree
Hide file tree
Showing 30 changed files with 95 additions and 95 deletions.
2 changes: 1 addition & 1 deletion augustus-training/dynamic/about.html
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ <h3>Gene prediction with AUGUSTUS</h3>
<p align="center"><a href="images/AUG.cDNA.gif"><img src="images/AUG.cDNA.gif" alt="with cDNA" width="70%"></a><br><i>Click on image to enlarge!</i></p>
</li>

<li> AUGUSTUS ususally belongs to the most accurate programs for the species it is trained
<li> AUGUSTUS usually belongs to the most accurate programs for the species it is trained
for. Often it is the most accurate <i>ab initio</i> program. For example, at the independent gene finder
assessment (EGASP) on the human ENCODE regions AUGUSTUS was the most accurate gene finder among the
tested <i>ab initio</i> programs. At the more recent nGASP (worm), it was among the best in the <i>ab initio</i>
Expand Down
2 changes: 1 addition & 1 deletion auxprogs/bam2hints/bam2hints.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1069,7 +1069,7 @@ int main(int argc, char* argv[])
// TASK: else store alignment pointer into map
// TASK: set new or reuse "pal"

} // end while (parsing through bam aligment lines)
} // end while (parsing through bam alignment lines)


// newline after display of the line count
Expand Down
4 changes: 2 additions & 2 deletions config/model/constraints_shadow_partial_utr.txt
Original file line number Diff line number Diff line change
Expand Up @@ -63,10 +63,10 @@
32
36
# --------------------------------------------------------------------
# This section is for seting constraints between transition probabilities,
# This section is for setting constraints between transition probabilities,
# such as suggested by symmetry such as strand symmetry (or by treating transitions in
# all reading frames the same). Theoretically, i.e. with infinite and
# representative traininig data, this should not be necessary. However, in the real finite
# representative training data, this should not be necessary. However, in the real finite
# world this is a little safeguard against overfitting.
#
[BINDINGS]
Expand Down
4 changes: 2 additions & 2 deletions docs/CDS.sp.README
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,6 @@ for T in 0 1 2 3 4 5 6 7; do
done

The graph shows the posterior probability reported in the 6th column of the augustus output gff file versus the actual fraction of exons (CDS) that
match the reference annotation (RefSeq). The esimated curves for t=5,6,7 are not reliable for high posterior probabilities as they occurr too rarely.
match the reference annotation (RefSeq). The estimated curves for t=5,6,7 are not reliable for high posterior probabilities as they occur too rarely.
For large temperatures exons with high posterior probability become increasingly rare. We recommend t=3 as it comes relatively close to the ideal curve,
the diagonal, for the most relavant range of posterior probabilities.
the diagonal, for the most relevant range of posterior probabilities.
4 changes: 2 additions & 2 deletions docs/espoca/README
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ ESPOCA - Estimate Selective Pressure on Codon Alignments

This directory contains example input files:
example.fa codon alignment file in multi fasta format
tree.nwk phylogenetik tree file in newick format with branch length
tree.nwk phylogenetic tree file in newick format with branch length

Example Command:

Expand All @@ -21,7 +21,7 @@ Output:
# 4. Pr(w>1) probability of omega > 1 at alipos (*: Pr(w>1) > 0.90, **: Pr(w>1) > 0.95)
# 5. post_mean posterior mean estimate of omega at ali_pos
# 6. SE_for_w standard deviation of omega at ali_pos
# 7. num_subst number of subsitution calculated by the Fitch algorithm
# 7. num_subst number of substitution calculated by the Fitch algorithm

ali_pos ref_pos AS_ref Pr(w>1) post_mean +- SE_for_w num_subst
0 0 M 0.249188 0.814206 +- 0.362779 0
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial/scipio.html
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ <h1>Using Scipio to create a (training) set of gene structures</h1>
the proteins encoded by the genome (e.g. >90% similarity). One use case is the migration of gene structures from one assembly to another assembly. Another use case
is the identification of exon coordinates for protein of the same species as the genome when the gene structures (GFF) is not available (anymore). A third use case
is the mapping of protein sequences from a related known species to a new species, e.g. from human to Orangutan. Scipio is well-suited for draft assemblies with short
contigs, as it is assemblying alignments of proteins where different fragments match different contigs.
contigs, as it is assembling alignments of proteins where different fragments match different contigs.
</div><br>

<h2>1. Run Scipio</h2>
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial/training.html
Original file line number Diff line number Diff line change
Expand Up @@ -375,7 +375,7 @@ <h2 id="optimize">4. RUN THE SCRIPT <tt>optimize_augustus.pl</tt></h2>
in the case of <i>Tetrahymena</i>, where <tt>taa</tt> and <tt>tag</tt> are coding for glutamine (Q).

<p>
Choose the translation table number accoding to this table. translation_table=1 is
Choose the translation table number according to this table. translation_table=1 is
the default value and the standard with stop codons taa, tga, tag. If you have a species with the standard genetic code you don't have to do anything.
In case your species' code is not covered by this table send us a note with the string of 64 one-letter amino acid codes in the codon order below.
</p>
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial2015/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -66,7 +66,7 @@ <h3>Exercise 1: <span class="assignment">Compile a Training Set</span></h3>
to estimate the parameters of gene finders. We will here go through option 6.
We assume that we have RNA-Seq data only and no substantial homology data. We will reuse an existing parameter set for AUGUSTUS.<br>
<ol>
<li><span class="assignment">Follow the tutorial on <a href="ittrain.html">"Iteratative Training Set Construction"</a></span>
<li><span class="assignment">Follow the tutorial on <a href="ittrain.html">"Iterative Training Set Construction"</a></span>
and create a training set <tt>genes.gb</tt>.
<li><span class="assignment">Partition <tt>genes.gb</tt></span> into a training set and a holdout test setas described in <a href="training.html#split">1.2 Split gene structure set...</a>.
</ol>
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial2015/ittrain.html
Original file line number Diff line number Diff line change
Expand Up @@ -268,7 +268,7 @@ <h2>5. Create a training set</h2>
and to include genes in the "intergenic" region.
The choice of the parameter <tt>max-size-of-gene-flanking-DNA</tt> is important for several reasons.
<ul>
<li> The flanking region should be large enough to allow a prepresentative estimate of non coding regions.
<li> The flanking region should be large enough to allow a representative estimate of non coding regions.
When the GFF file only contains CDS, then part of the flanking regions are UTR and may not be representative of
intergenic region (e.g. CpG islands in vertebrates).</li>
<li>Usually the gff file is not complete and genes are missing from it. In that case the flanking regions may contain genic regions
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial2015/prediction.html
Original file line number Diff line number Diff line change
Expand Up @@ -186,7 +186,7 @@ <h3 id="hest">3.1 From ESTs</h2>
<pre class="code">cat est.psl | filterPSL.pl --best --minCover=80 > est.f.psl</pre>

<span class="result"><tt>est.f.psl</tt></span> now only contains for each query
the best alginment(s) and that only if it covers at least 80% of the query length.
the best alignment(s) and that only if it covers at least 80% of the query length.
This reduces the number of alignments:
<pre class="code">
wc -l est.psl est.f.psl
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial2015/training.html
Original file line number Diff line number Diff line change
Expand Up @@ -376,7 +376,7 @@ <h2 id="optimize">4. RUN THE SCRIPT <tt>optimize_augustus.pl</tt></h2>
in the case of <i>Tetrahymena</i>, where <tt>taa</tt> and <tt>tag</tt> are coding for glutamine (Q).

<p>
Choose the translation table number accoding to this table. translation_table=1 is
Choose the translation table number according to this table. translation_table=1 is
the default value and the standard with stop codons taa, tga, tag. If you have a species with the standard genetic code you don't have to do anything.
In case your species' code is not covered by this table send us a note with the string of 64 one-letter amino acid codes in the codon order below.
</p>
Expand Down
2 changes: 1 addition & 1 deletion docs/tutorial2018/README_augustus.TXT
Original file line number Diff line number Diff line change
Expand Up @@ -721,7 +721,7 @@ HS04636 blat2hints exonpart 500 599 . + . priority=2; source=E
HS04636 blat2hints intron 550 650 . + . priority=5; source=mRNA

When two hints or hint groups contradict each other then the hints with the lower priority number
are ignored. This is especially usefull if for a genome several sources of hints are available,
are ignored. This is especially useful if for a genome several sources of hints are available,
where one source should be trusted when in doubt. For example, the rhesus macaque currently has few native ESTs
but human ESTs often also align to rhesus. Giving the hints from native ESTs a higher priority means
that AUGUSTUS uses only them for genes with support by native ESTs and uses the alien EST alignments
Expand Down
8 changes: 4 additions & 4 deletions docs/tutorial2018/index.html
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ <h1>Using BRAKER2 and AUGUSTUS</h1>
<h2>General remarks</h2>

<ul>
<li> This tutorial is designed in a way that persons with no exprience in Linux should be able to follow. In case someone gets bored, he or she can try to get the code of this tutorial on his own laptop running.
<li> This tutorial is designed in a way that persons with no experience in Linux should be able to follow. In case someone gets bored, he or she can try to get the code of this tutorial on his own laptop running.
<li> Here some manuals:
<ul>
<li> <a href="README_augustus.TXT">AUGUSTUS readme</a>
Expand Down Expand Up @@ -164,7 +164,7 @@ <h2>1. Repeat-mask the genome </h2>
RepeatScout -sequence data/genome.fa -output masking/genome.repseq.fa -freq masking/genome.freq # takes ~30s
</pre>

The file <tt>masking/genome.freq</tt> contains a list of ostensibe repeat sequences.
The file <tt>masking/genome.freq</tt> contains a list of ostensible repeat sequences.

<pre class="code">
head -n 100 masking/genome.freq
Expand Down Expand Up @@ -360,7 +360,7 @@ <h2>3. Gene prediction </h2>
The options in the BRAKE call have the follownig meanings:
<ul>
<li> <tt>--species</tt>: The name of the species on which we carry out our gene prediction. <br>
<i>The parameters AUGUSTUS infers from the training data and uses to parametrize its internal model for the gene prediciton is located in AUGUSTUS_CONFIG_PATH/species/[value of --species]. The term "species" is used in this option although AUGUSTUS can also be used for gene predictions on different strains of the same species.</i>
<i>The parameters AUGUSTUS infers from the training data and uses to parametrize its internal model for the gene prediction is located in AUGUSTUS_CONFIG_PATH/species/[value of --species]. The term "species" is used in this option although AUGUSTUS can also be used for gene predictions on different strains of the same species.</i>
<li> <tt>--genome</tt>: The genomic data on which the gene prediction is carried out.
<li> <tt>--bam</tt>: The BAM file containign the aligned RNA-seq reads used for inference of the training genes by GeneMark and as hints for the gene prediction by AUGUSTUS
<li> <tt>--softmasking</tt>: Flag indicating that the genome is softmasked. <br>
Expand All @@ -369,7 +369,7 @@ <h2>3. Gene prediction </h2>
<i>One should not use the parameters AUGUSTUS infers without optimization to generate a gene set used in a scientific project. We skip optimization here because it is very time-consuming</i>
<li> <tt>--cores</tt>: The number of cores used by BRAKER and AUGUSTUS. <br>
<i> To reduce running time, once can increase the number of cores. Nevertheless, for this session this is probably not advisable as the genotoul server only has 32 cores. </i>
<li> <tt>--AUGUSTUS_SCRIPTS_PATH</tt>, <tt>--AUGUSTUS_BIN_PATH</tt>, <tt>--AUGUSTUS_CONFIG_PATH</tt>: These pathes specify where to find various files and executables. <br>
<li> <tt>--AUGUSTUS_SCRIPTS_PATH</tt>, <tt>--AUGUSTUS_BIN_PATH</tt>, <tt>--AUGUSTUS_CONFIG_PATH</tt>: These paths specify where to find various files and executables. <br>
<i>When you install BRAKER and AUGUSTUS on a computer for which you have administrator rights, you very probably will not need to set these paths.</i>
</ul>
<br>
Expand Down
16 changes: 8 additions & 8 deletions include/consensus.hh
Original file line number Diff line number Diff line change
Expand Up @@ -52,10 +52,10 @@ class consensus{
* no_iterations specifies the number of consensus patterns we need to store for the output
* delta is used to calculate whether a certain string is relevant or not
* p_value is the threshold used to identify the significant strings
* powers is the array used to store the powers 4 for caculating the character to int conversion and vice versa
* tpm is the transition probabiliry matrix
* powers is the array used to store the powers 4 for calculating the character to int conversion and vice versa
* tpm is the transition probability matrix
* var is the vector of structure sequence containing all the input sequences
* t_values is a flag array used to store the actuall frequencies of the patterns
* t_values is a flag array used to store the actual frequencies of the patterns
* f_values stores the actual frequencies of the patterns
* back_probs store the background probabilities
* significant_strings and relevant_strings store the significant and relevant strings respectively
Expand All @@ -72,7 +72,7 @@ class consensus{
string analyse_pattern;

/* consensus_data stores the consensus patterns as their corresponding integers
* final_list is a vector of struccture histogram_data
* final_list is a vector of structure histogram_data
* max_string_length is the maximum length of the input string sequences
* max_freq is the maximum frequency of occurrence at a particular position and is used to scale the histogram
*/
Expand All @@ -93,7 +93,7 @@ public:
/* takes the file name to store the sequences
*/
void set_file_name(string filename);
/* adds a particular string to consider for caculation
/* adds a particular string to consider for calculation
*/
void add_string(string fileline);
/* takes as input the pattern that we want to analyse and also the required parameters and does all the necessary
Expand All @@ -116,7 +116,7 @@ public:
/* plots the histogram
*/
void plot_histogram();
/* returns the consensus pattern and other reltaed information
/* returns the consensus pattern and other related information
*/
vector<histogram_data> get_consensus();
/* prints the consensus pattern and other related information
Expand All @@ -128,7 +128,7 @@ public:
/* returns n choose r
*/
int nCr(int n, int r);
/* converts a character from A C G T to coresponding number value 0 1 2 3 respectively
/* converts a character from A C G T to corresponding number value 0 1 2 3 respectively
*/
int char2num(char m);
/* converts the numbers to corresponding character
Expand All @@ -139,7 +139,7 @@ public:
* performing the calculations and the vector where the new neighbours is to be stored
*/
void k_mismatch(int new_m, int old_m, int k, int length,Seq2Int s2i,vector<int> &neighbours_index);
/* returns a vector containing k mismatch neighbours of pattern with interger value m and length as specified
/* returns a vector containing k mismatch neighbours of pattern with integer value m and length as specified
*/
vector<int> find_neighbours(int m,int k, int length);
};
Expand Down
12 changes: 6 additions & 6 deletions include/exon_seg.hh
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ const long int minus_infinity=LONG_MIN;
const int maxcov =10000;
#define NUMSTATES 4
/* EXONP denotes positive exon EXONM denotes the negative exon INTRONJ and INTRONI denote the
* J and I lables of introns Together these four are teh states that we are considering
* J and I labels of introns Together these four are the states that we are considering
* in our HMM
*/
const int EXONP =0;
Expand All @@ -41,18 +41,18 @@ const int INTRONJ =3;
const int STRANDP =0;
const int STRANDM =1;
const int STRANDB =2;
/* pott_gamma denotes the gamma value that we will be using in calulating the
/* pott_gamma denotes the gamma value that we will be using in calculating the
* Pott's functionals
*/
const int pott_gamma =1500;
/* moving_window stores the window size that we will be using while finding the
* splice sites using the template matching technique
*/
const int moving_window=70;
/* The convergence limit that we use for calculating the labda values using the iterative technique
/* The convergence limit that we use for calculating the lambda values using the iterative technique
*/
const double convergence_limit=0.01;
/* L stores the threshold betweeen introns and exons which we use when we try to estimate the
/* L stores the threshold between introns and exons which we use when we try to estimate the
* the distribution using the train_function which calculates the distribution using the exons
* predicted from a gff file
*/
Expand Down Expand Up @@ -134,7 +134,7 @@ public:
/* reads the file and stores it in the the object of the class dataset
*/
void read_file(dataset &coverage_info,string filename);
/* it takes as input pointer to a 2-D vecctor,
/* it takes as input pointer to a 2-D vector,
* the input dataset for a chromosome and
* stores the emission probability values in it
*/
Expand All @@ -150,7 +150,7 @@ public:
/* function to convert the state sequence into segments and also to store the avg coverage depth
*/
vector< fragment > segment( vector<int> state_seq,vector< vector< vector<int> > > &input_set,int include_intron,int no_of_tracks );
/* This function takes input the averrage coverage depth and transforms them
/* This function takes input the average coverage depth and transforms them
* into some usable form in the pott's functional
*/
double pott_convert(double d);
Expand Down
6 changes: 3 additions & 3 deletions include/extrinsicinfo.hh
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ public:
};

/**
* @brief Plan of individiual prediction steps with each a range and a set of hint groups turned on/off
* @brief Plan of individual prediction steps with each a range and a set of hint groups turned on/off
*
* @author Mario Stanke
*/
Expand Down Expand Up @@ -235,10 +235,10 @@ public:
int seqlen;
int K; // block size for firstEnd and lastStart
// firstEnd[t][k] holds an iterator to featureList[t] to the first element f in the list such that
// f->end >= k*K, where k>=0 and k ist such that k*K <= seqlen
// f->end >= k*K, where k>=0 and k is such that k*K <= seqlen
list<Feature>::iterator **firstEnd;
// lastStart[t][k] holds an iterator to featureList[t] to the list such that all following list elements f have
// f->start > k*K, where k>=0 and k ist such that k*K <= seqlen
// f->start > k*K, where k>=0 and k is such that k*K <= seqlen
list<Feature>::iterator **lastStart;
vector<int> cumCovUTRpartPlus; // cumulative number of positions not covered by UTRpart hints on the plus strand
vector<int> cumCovUTRpartMinus; // cumulative number of positions not covered by UTRpart hints on the minus strand
Expand Down
2 changes: 1 addition & 1 deletion retraining.html
Original file line number Diff line number Diff line change
Expand Up @@ -178,7 +178,7 @@
in the case of <i>Tetrahymena</i>, where taa and tag are coding for glutamine (Q).


Choose the translation table number accoding to this table. translation_table=1 is
Choose the translation table number according to this table. translation_table=1 is
the default value and the standard with stop codons taa, tga, tag. If you have a species with the standard genetic code you don't have to do anything.
In case your species' code is not covered by this table send us a note with the string of 64 one-letter amino acid codes in the codon order below.
<br><br>
Expand Down
Loading

0 comments on commit 3d17cc2

Please sign in to comment.