fix various typos

Gaius-Augustus · Nov 7, 2018 · 3d17cc2 · 3d17cc2
1 parent aee62ae
commit 3d17cc2
Show file tree

Hide file tree

Showing 30 changed files with 95 additions and 95 deletions.
diff --git a/augustus-training/dynamic/about.html b/augustus-training/dynamic/about.html
@@ -35,7 +35,7 @@ <h3>Gene prediction with AUGUSTUS</h3>
 <p align="center"><a href="images/AUG.cDNA.gif"><img src="images/AUG.cDNA.gif"  alt="with cDNA" width="70%"></a><br><i>Click on image to enlarge!</i></p>
 </li>
 
-<li> AUGUSTUS ususally belongs to the most accurate programs for the species it is trained
+<li> AUGUSTUS usually belongs to the most accurate programs for the species it is trained
 for. Often it is the most accurate <i>ab initio</i> program. For example, at the independent gene finder
 assessment (EGASP) on the human ENCODE regions AUGUSTUS was the most accurate gene finder among the
 tested <i>ab initio</i> programs. At the more recent nGASP (worm), it was among the best in the <i>ab initio</i>

diff --git a/auxprogs/bam2hints/bam2hints.cc b/auxprogs/bam2hints/bam2hints.cc
@@ -1069,7 +1069,7 @@ int main(int argc, char* argv[])
     // TASK: else store alignment pointer into map
     // TASK: set new or reuse "pal"
 
-  } // end while (parsing through bam aligment lines)
+  } // end while (parsing through bam alignment lines)
 
 
   // newline after display of the line count

diff --git a/config/model/constraints_shadow_partial_utr.txt b/config/model/constraints_shadow_partial_utr.txt
@@ -63,10 +63,10 @@
 32
 36
 # --------------------------------------------------------------------
-# This section is for seting constraints between transition probabilities,
+# This section is for setting constraints between transition probabilities,
 # such as suggested by symmetry such as strand symmetry (or by treating transitions in
 # all reading frames the same). Theoretically, i.e. with infinite and
-# representative traininig data, this should not be necessary. However, in the real finite
+# representative training data, this should not be necessary. However, in the real finite
 # world this is a little safeguard against overfitting.
 #
 [BINDINGS]

diff --git a/docs/CDS.sp.README b/docs/CDS.sp.README
@@ -5,6 +5,6 @@ for T in 0 1 2 3 4 5 6 7; do
 done
 
 The graph shows the posterior probability reported in the 6th column of the augustus output gff file versus the actual fraction of exons (CDS) that
-match the reference annotation (RefSeq). The esimated curves for t=5,6,7 are not reliable for high posterior probabilities as they occurr too rarely.
+match the reference annotation (RefSeq). The estimated curves for t=5,6,7 are not reliable for high posterior probabilities as they occur too rarely.
 For large temperatures exons with high posterior probability become increasingly rare. We recommend t=3 as it comes relatively close to the ideal curve,
-the diagonal, for the most relavant range of posterior probabilities.
+the diagonal, for the most relevant range of posterior probabilities.
diff --git a/docs/espoca/README b/docs/espoca/README
@@ -2,7 +2,7 @@ ESPOCA - Estimate Selective Pressure on Codon Alignments
 
 This directory contains example input files:
 example.fa    codon alignment file in multi fasta format
-tree.nwk      phylogenetik tree file in newick format with branch length
+tree.nwk      phylogenetic tree file in newick format with branch length
 
 Example Command:
 
@@ -21,7 +21,7 @@ Output:
 # 4. Pr(w>1)     probability of omega > 1 at alipos (*: Pr(w>1) > 0.90, **: Pr(w>1) > 0.95)
 # 5. post_mean   posterior mean estimate of omega at ali_pos
 # 6. SE_for_w    standard deviation of omega at ali_pos
-# 7. num_subst   number of subsitution calculated by the Fitch algorithm
+# 7. num_subst   number of substitution calculated by the Fitch algorithm
 
    ali_pos   ref_pos    AS_ref   Pr(w>1) post_mean  +-  SE_for_w num_subst
          0         0         M  0.249188  0.814206  +-  0.362779         0

diff --git a/docs/tutorial/scipio.html b/docs/tutorial/scipio.html
@@ -39,7 +39,7 @@ <h1>Using Scipio to create a (training) set of gene structures</h1>
 the proteins encoded by the genome (e.g. >90% similarity). One use case is the migration of gene structures from one assembly to another assembly. Another use case
 is the identification of exon coordinates for protein of the same species as the genome when the gene structures (GFF) is not available (anymore). A third use case
 is the mapping of protein sequences from a related known species to a new species, e.g. from human to Orangutan. Scipio is well-suited for draft assemblies with short
-contigs, as it is assemblying alignments of proteins where different fragments match different contigs.
+contigs, as it is assembling alignments of proteins where different fragments match different contigs.
 </div><br>
 
 <h2>1. Run Scipio</h2>

diff --git a/docs/tutorial/training.html b/docs/tutorial/training.html
@@ -375,7 +375,7 @@ <h2 id="optimize">4. RUN THE SCRIPT <tt>optimize_augustus.pl</tt></h2>
 in the case of <i>Tetrahymena</i>, where <tt>taa</tt> and <tt>tag</tt> are coding for glutamine (Q).
 
 <p>
-Choose the translation table number accoding to this table. translation_table=1 is 
+Choose the translation table number according to this table. translation_table=1 is 
 the default value and the standard with stop codons taa, tga, tag. If you have a species with the standard genetic code you don't have to do anything.
 In case your species' code is not covered by this table send us a note with the string of 64 one-letter amino acid codes in the codon order below.
 </p>

diff --git a/docs/tutorial2015/index.html b/docs/tutorial2015/index.html
@@ -66,7 +66,7 @@ <h3>Exercise 1: <span class="assignment">Compile a Training Set</span></h3>
 to estimate the parameters of gene finders. We will here go through option 6.
 We assume that we have RNA-Seq data only and no substantial homology data. We will reuse an existing parameter set for AUGUSTUS.<br>
 <ol>
-<li><span class="assignment">Follow the tutorial on <a href="ittrain.html">"Iteratative Training Set Construction"</a></span>
+<li><span class="assignment">Follow the tutorial on <a href="ittrain.html">"Iterative Training Set Construction"</a></span>
 and create a training set <tt>genes.gb</tt>.
 <li><span class="assignment">Partition <tt>genes.gb</tt></span> into a training set and a holdout test setas described in <a href="training.html#split">1.2 Split gene structure set...</a>.
 </ol>

diff --git a/docs/tutorial2015/ittrain.html b/docs/tutorial2015/ittrain.html
@@ -268,7 +268,7 @@ <h2>5. Create a training set</h2>
 and to include genes in the "intergenic" region.
 The choice of the parameter <tt>max-size-of-gene-flanking-DNA</tt> is important for several reasons.
 <ul>
-<li> The flanking region should be large enough to allow a prepresentative estimate of non coding regions.
+<li> The flanking region should be large enough to allow a representative estimate of non coding regions.
 When the GFF file only contains CDS, then part of the flanking regions are UTR and may not be representative of
 intergenic region (e.g. CpG islands in vertebrates).</li>
 <li>Usually the gff file is not complete and genes are missing from it. In that case the flanking regions may contain genic regions

diff --git a/docs/tutorial2015/prediction.html b/docs/tutorial2015/prediction.html
@@ -186,7 +186,7 @@ <h3 id="hest">3.1 From ESTs</h2>
 <pre class="code">cat est.psl | filterPSL.pl --best --minCover=80 > est.f.psl</pre>
 
 <span class="result"><tt>est.f.psl</tt></span> now only contains for each query
-the best alginment(s) and that only if it covers at least 80% of the query length.
+the best alignment(s) and that only if it covers at least 80% of the query length.
 This reduces the number of alignments:
 <pre class="code">
 wc -l est.psl est.f.psl

diff --git a/docs/tutorial2015/training.html b/docs/tutorial2015/training.html
@@ -376,7 +376,7 @@ <h2 id="optimize">4. RUN THE SCRIPT <tt>optimize_augustus.pl</tt></h2>
 in the case of <i>Tetrahymena</i>, where <tt>taa</tt> and <tt>tag</tt> are coding for glutamine (Q).
 
 <p>
-Choose the translation table number accoding to this table. translation_table=1 is 
+Choose the translation table number according to this table. translation_table=1 is 
 the default value and the standard with stop codons taa, tga, tag. If you have a species with the standard genetic code you don't have to do anything.
 In case your species' code is not covered by this table send us a note with the string of 64 one-letter amino acid codes in the codon order below.
 </p>

diff --git a/docs/tutorial2018/README_augustus.TXT b/docs/tutorial2018/README_augustus.TXT
@@ -721,7 +721,7 @@ HS04636	blat2hints	exonpart	500	599	.	+	.	priority=2; source=E
 HS04636	blat2hints	intron		550	650	.	+	.	priority=5; source=mRNA
 
 When two hints or hint groups contradict each other then the hints with the lower priority number
-are ignored. This is especially usefull if for a genome several sources of hints are available,
+are ignored. This is especially useful if for a genome several sources of hints are available,
 where one source should be trusted when in doubt. For example, the rhesus macaque currently has few native ESTs
 but human ESTs often also align to rhesus. Giving the hints from native ESTs a higher priority means
 that AUGUSTUS uses only them for genes with support by native ESTs and uses the alien EST alignments

diff --git a/docs/tutorial2018/index.html b/docs/tutorial2018/index.html
@@ -12,7 +12,7 @@ <h1>Using BRAKER2 and AUGUSTUS</h1>
 <h2>General remarks</h2>
 
 <ul>
-  <li> This tutorial is designed in a way that persons with no exprience in Linux should be able to follow. In case someone gets bored, he or she can try to get the code of this tutorial on his own laptop running.
+  <li> This tutorial is designed in a way that persons with no experience in Linux should be able to follow. In case someone gets bored, he or she can try to get the code of this tutorial on his own laptop running.
   <li> Here some manuals:
   <ul>
     <li> <a href="README_augustus.TXT">AUGUSTUS readme</a>
@@ -164,7 +164,7 @@ <h2>1. Repeat-mask the genome </h2>
 RepeatScout -sequence data/genome.fa -output masking/genome.repseq.fa -freq masking/genome.freq   # takes ~30s
 </pre>
 
-The file <tt>masking/genome.freq</tt> contains a list of ostensibe repeat sequences.
+The file <tt>masking/genome.freq</tt> contains a list of ostensible repeat sequences.
 
 <pre class="code">
 head -n 100 masking/genome.freq
@@ -360,7 +360,7 @@ <h2>3. Gene prediction </h2>
 The options in the BRAKE call have the follownig meanings:
 <ul>
   <li> <tt>--species</tt>: The name of the species on which we carry out our gene prediction. <br> 
-  <i>The parameters AUGUSTUS infers from the training data and uses to parametrize its internal model for the gene prediciton is located in AUGUSTUS_CONFIG_PATH/species/[value of --species]. The term "species" is used in this option although AUGUSTUS can also be used for gene predictions on different strains of the same species.</i>
+  <i>The parameters AUGUSTUS infers from the training data and uses to parametrize its internal model for the gene prediction is located in AUGUSTUS_CONFIG_PATH/species/[value of --species]. The term "species" is used in this option although AUGUSTUS can also be used for gene predictions on different strains of the same species.</i>
   <li> <tt>--genome</tt>: The genomic data on which the gene prediction is carried out.
   <li> <tt>--bam</tt>: The BAM file containign the aligned RNA-seq reads used for inference of the training genes by GeneMark and as hints for the gene prediction by AUGUSTUS
   <li> <tt>--softmasking</tt>: Flag indicating that the genome is softmasked. <br>
@@ -369,7 +369,7 @@ <h2>3. Gene prediction </h2>
   <i>One should not use the parameters AUGUSTUS infers without optimization to generate a gene set used in a scientific project. We skip optimization here because it is very time-consuming</i>
   <li> <tt>--cores</tt>: The number of cores used by BRAKER and AUGUSTUS. <br>
   <i> To reduce running time, once can increase the number of cores. Nevertheless, for this session this is probably not advisable as the genotoul server only has 32 cores. </i>
-  <li> <tt>--AUGUSTUS_SCRIPTS_PATH</tt>, <tt>--AUGUSTUS_BIN_PATH</tt>, <tt>--AUGUSTUS_CONFIG_PATH</tt>: These pathes specify where to find various files and executables. <br>
+  <li> <tt>--AUGUSTUS_SCRIPTS_PATH</tt>, <tt>--AUGUSTUS_BIN_PATH</tt>, <tt>--AUGUSTUS_CONFIG_PATH</tt>: These paths specify where to find various files and executables. <br>
   <i>When you install BRAKER and AUGUSTUS on a computer for which you have administrator rights, you very probably will not need to set these paths.</i>
 </ul>
 <br>

diff --git a/include/consensus.hh b/include/consensus.hh
@@ -52,10 +52,10 @@ class consensus{
    * no_iterations specifies the number of consensus patterns we need to store for the output
    * delta is used to calculate whether a certain string is relevant or not
    * p_value is the threshold used to identify the significant strings
-   * powers is the array used to store the powers 4 for caculating the character to int conversion and vice versa
-   * tpm is the transition probabiliry matrix
+   * powers is the array used to store the powers 4 for calculating the character to int conversion and vice versa
+   * tpm is the transition probability matrix
    * var is the vector of structure sequence containing all the input sequences
-   * t_values is a flag array used to store the actuall frequencies of the patterns
+   * t_values is a flag array used to store the actual frequencies of the patterns
    * f_values stores the actual frequencies of the patterns
    * back_probs store the background probabilities
    * significant_strings and relevant_strings store the significant and relevant strings respectively
@@ -72,7 +72,7 @@ class consensus{
   string analyse_pattern;
 
   /* consensus_data stores the consensus patterns as their corresponding integers
-   * final_list is a vector of struccture histogram_data
+   * final_list is a vector of structure histogram_data
    * max_string_length is the maximum length of the input string sequences
    * max_freq is the maximum frequency of occurrence at a particular position and is used to scale the histogram
    */
@@ -93,7 +93,7 @@ public:
   /* takes the file name to store the sequences
    */
   void set_file_name(string filename);
-  /* adds a particular string to consider for caculation
+  /* adds a particular string to consider for calculation
    */
   void add_string(string fileline);
   /* takes as input the pattern that we want to analyse and also the required parameters and does all the necessary 
@@ -116,7 +116,7 @@ public:
   /* plots the histogram
    */
   void plot_histogram();
-  /* returns the consensus pattern and other reltaed information
+  /* returns the consensus pattern and other related information
    */
   vector<histogram_data> get_consensus();
   /* prints the consensus pattern and other related information
@@ -128,7 +128,7 @@ public:
   /* returns n choose r
    */
   int nCr(int n, int r);
-  /* converts a character from A C G T to coresponding number value 0 1 2 3 respectively
+  /* converts a character from A C G T to corresponding number value 0 1 2 3 respectively
    */
   int char2num(char m);
   /* converts the numbers to corresponding character
@@ -139,7 +139,7 @@ public:
    * performing the calculations and the vector where the new neighbours is to be stored
    */
   void k_mismatch(int new_m, int old_m, int k, int length,Seq2Int s2i,vector<int> &neighbours_index);
-  /* returns a vector containing k mismatch neighbours of pattern with interger value m and length as specified
+  /* returns a vector containing k mismatch neighbours of pattern with integer value m and length as specified
    */
   vector<int> find_neighbours(int m,int k, int length);
 };

diff --git a/include/exon_seg.hh b/include/exon_seg.hh
@@ -29,7 +29,7 @@ const long int minus_infinity=LONG_MIN;
 const int maxcov =10000;
 #define NUMSTATES 4
 /* EXONP denotes positive exon EXONM denotes the negative exon INTRONJ and INTRONI denote the 
- * J and I lables of introns Together these four are teh states that we are considering
+ * J and I labels of introns Together these four are the states that we are considering
  * in our HMM
  */
 const int EXONP =0;
@@ -41,18 +41,18 @@ const int INTRONJ =3;
 const int STRANDP =0;
 const int STRANDM =1;
 const int STRANDB =2;
-/* pott_gamma denotes the gamma value that we will be using in calulating the
+/* pott_gamma denotes the gamma value that we will be using in calculating the
  * Pott's functionals
  */
 const int pott_gamma =1500;
 /* moving_window stores the window size that we will be using while finding the 
  * splice sites using the template matching technique
  */
 const int moving_window=70;
-/* The convergence limit that we use for calculating the labda values using the iterative technique
+/* The convergence limit that we use for calculating the lambda values using the iterative technique
  */
 const double convergence_limit=0.01;
-/* L stores the threshold betweeen introns and exons which we use when we try to estimate the
+/* L stores the threshold between introns and exons which we use when we try to estimate the
  * the distribution using the train_function which calculates the distribution using the exons
  * predicted from a gff file
  */
@@ -134,7 +134,7 @@ public:
   /* reads the file and stores it in the the object of the class dataset
    */
   void read_file(dataset &coverage_info,string filename);
-  /* it takes as input pointer to a 2-D vecctor, 
+  /* it takes as input pointer to a 2-D vector, 
    * the input dataset for a chromosome and
    * stores the emission probability values in it
    */
@@ -150,7 +150,7 @@ public:
   /* function to convert the state sequence into segments and also to store the avg coverage depth
    */
   vector< fragment > segment( vector<int> state_seq,vector< vector< vector<int> > > &input_set,int include_intron,int no_of_tracks );
-  /* This function takes input the averrage coverage depth and transforms them
+  /* This function takes input the average coverage depth and transforms them
    * into some usable form in the pott's functional
    */
   double pott_convert(double d);

diff --git a/include/extrinsicinfo.hh b/include/extrinsicinfo.hh
@@ -55,7 +55,7 @@ public:
 };
 
 /**
- * @brief Plan of individiual prediction steps with each a range and a set of hint groups turned on/off
+ * @brief Plan of individual prediction steps with each a range and a set of hint groups turned on/off
  * 
  * @author Mario Stanke
  */
@@ -235,10 +235,10 @@ public:
     int              seqlen;
     int              K; // block size for firstEnd and lastStart
     // firstEnd[t][k] holds an iterator to featureList[t] to the first element f in the list such that
-    // f->end >= k*K, where k>=0 and k ist such that k*K <= seqlen
+    // f->end >= k*K, where k>=0 and k is such that k*K <= seqlen
     list<Feature>::iterator **firstEnd;
     // lastStart[t][k] holds an iterator to featureList[t] to the list such that all following list elements f have
-    // f->start > k*K, where k>=0 and k ist such that k*K <= seqlen 
+    // f->start > k*K, where k>=0 and k is such that k*K <= seqlen 
     list<Feature>::iterator **lastStart;
     vector<int> cumCovUTRpartPlus; // cumulative number of positions not covered by UTRpart hints on the plus strand
     vector<int> cumCovUTRpartMinus; // cumulative number of positions not covered by UTRpart hints on the minus strand

diff --git a/retraining.html b/retraining.html
@@ -178,7 +178,7 @@
 in the case of <i>Tetrahymena</i>, where taa and tag are coding for glutamine (Q).
 
 
-Choose the translation table number accoding to this table. translation_table=1 is 
+Choose the translation table number according to this table. translation_table=1 is 
 the default value and the standard with stop codons taa, tga, tag. If you have a species with the standard genetic code you don't have to do anything.
 In case your species' code is not covered by this table send us a note with the string of 64 one-letter amino acid codes in the codon order below.
 <br><br>