Skip to content

Commit

Permalink
Automated Docs Update
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions committed Feb 3, 2022
1 parent 2ae8fe9 commit f0c0f71
Show file tree
Hide file tree
Showing 11 changed files with 50 additions and 42 deletions.
Binary file modified docs/README.doctree
Binary file not shown.
17 changes: 9 additions & 8 deletions docs/README.html
Original file line number Diff line number Diff line change
Expand Up @@ -152,13 +152,13 @@ <h3>Currently Implemented Functionalities<a class="headerlink" href="#currently-
Step 7: Define a score which maximizes the value in Step 5 and minimizes the value in Step 6.
Step 8: Iterate Steps 1 – 7 to select the featurization scheme and similarity measure to maximize the result of Step 7.</p></li>
<li><p>See Property Variation with Similarity: Visualize the correlation in the QoI between nearest neighbor molecules (most similar pairs in the molecule set) and between the furthest neighbor molecules (most dissimilar pairs in the molecule set). This is used to verify that the chosen measure is appropriate for the task.</p></li>
<li><p>Visualize Dataset: Visualize the diversity of the molecule set in the form of a pairwise similarity density and a similarity heatmap of the molecule set.</p></li>
<li><p>Compare Target Molecule to Molecule Set&lt;: Run a similarity search of a molecule against a database of molecules (molecule set). This task can be used to identify the most similar (useful in virtual screening operations) or most dissimilar (useful in application that require high diversity such as training set design for machine learning models) molecules.</p></li>
<li><p>Visualize Dataset: Visualize the diversity of the molecule set in the form of a pairwise similarity density and a similarity heatmap of the molecule set. Embed the molecule set in 2D space using using principal component analysis (PCA)[3], multi-dimensional scaling[4], or TSNE[5].</p></li>
<li><p>Compare Target Molecule to Molecule Set: Run a similarity search of a molecule against a database of molecules (molecule set). This task can be used to identify the most similar (useful in virtual screening operations) or most dissimilar (useful in application that require high diversity such as training set design for machine learning models) molecules.</p></li>
<li><p>Cluster Data: Cluster the molecule set. The following algorithms are implemented:</p></li>
</ol>
<p>For arbitrary molecular features or similarity metrics with defined Euclidean distances: K-Medoids[3] and Ward[4] (hierarchical clustering).</p>
<p>For binary fingerprints: Complete, single and average linkage hierarchical clustering[4].</p>
<p>The clustered data is plotted in two dimensions using multi-dimensional scaling[5].</p>
<p>For arbitrary molecular features or similarity metrics with defined Euclidean distances: K-Medoids[3] and Ward[6] (hierarchical clustering).</p>
<p>For binary fingerprints: Complete, single and average linkage hierarchical clustering[5].</p>
<p>The clustered data is plotted in two dimensions using principal component analysis (PCA)[3], multi-dimensional scaling[4], or TSNE[5].</p>
<ol class="arabic simple">
<li><p>Outlier Detection: Using an isolation forest, check for which molecules are potentially novel or are outliers according to the selected descriptor. Output can be directly to the command line by specifiying <code class="docutils literal notranslate"><span class="pre">output</span></code> to be <code class="docutils literal notranslate"><span class="pre">terminal</span></code> or to a text file by instead providing a filename.</p></li>
</ol>
Expand Down Expand Up @@ -206,9 +206,10 @@ <h2>License<a class="headerlink" href="#license" title="Permalink to this headli
<h2>Works Cited<a class="headerlink" href="#works-cited" title="Permalink to this headline"></a></h2>
<p>[1] Collins, K. and Glorius, F., A robustness screen for the rapid assessment of chemical reactions. Nature Chem 5, 597–601 (2013). <a class="reference external" href="https://doi.org/10.1038/nchem.1669">https://doi.org/10.1038/nchem.1669</a></p>
<p>[2] Chen, Y., Murray, P.R.D., Davies, A.T., and Willis M.C., J. Am. Chem. Soc. 140 (28), 8781-8787 (2018). <a class="reference external" href="https://doi.org/10.1021/jacs.8b04532">https://doi.org/10.1021/jacs.8b04532</a></p>
<p>[3] Hastie, T., Tibshirani R. and Friedman J., The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed. (Springer Series in Statistics). 2009.</p>
<p>[4] Murtagh, F. and Contreras, P., Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowl Discov (2011). <a class="reference external" href="https://doi.org/10.1002/widm.53">https://doi.org/10.1002/widm.53</a></p>
<p>[5] Borg, I. and Groenen, P.J.F., Modern Multidimensional Scaling: Theory and Applications (Springer Series in Statistics). 2005.</p>
<p>[3] Hastie, T., Tibshirani R. and Friedman J., The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer Series in Statistics (2009).</p>
<p>[4] Borg, I. and Groenen, P.J.F., Modern Multidimensional Scaling: Theory and Applications, Springer Series in Statistics (2005).</p>
<p>[5] van der Maaten, L.J.P. and Hinton, G.E., Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9:2579-2605 (2008).</p>
<p>[6] Murtagh, F. and Contreras, P., Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowl Discov (2011). <a class="reference external" href="https://doi.org/10.1002/widm.53">https://doi.org/10.1002/widm.53</a></p>
</section>
</section>

Expand Down
18 changes: 10 additions & 8 deletions docs/README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -102,19 +102,19 @@ Currently Implemented Functionalities
See Property Variation with Similarity: Visualize the correlation in the QoI between nearest neighbor molecules (most similar pairs in the molecule set) and between the furthest neighbor molecules (most dissimilar pairs in the molecule set). This is used to verify that the chosen measure is appropriate for the task.

#.
Visualize Dataset: Visualize the diversity of the molecule set in the form of a pairwise similarity density and a similarity heatmap of the molecule set.
Visualize Dataset: Visualize the diversity of the molecule set in the form of a pairwise similarity density and a similarity heatmap of the molecule set. Embed the molecule set in 2D space using using principal component analysis (PCA)[3], multi-dimensional scaling[4], or TSNE[5].

#.
Compare Target Molecule to Molecule Set<: Run a similarity search of a molecule against a database of molecules (molecule set). This task can be used to identify the most similar (useful in virtual screening operations) or most dissimilar (useful in application that require high diversity such as training set design for machine learning models) molecules.
Compare Target Molecule to Molecule Set: Run a similarity search of a molecule against a database of molecules (molecule set). This task can be used to identify the most similar (useful in virtual screening operations) or most dissimilar (useful in application that require high diversity such as training set design for machine learning models) molecules.

#.
Cluster Data: Cluster the molecule set. The following algorithms are implemented:

For arbitrary molecular features or similarity metrics with defined Euclidean distances: K-Medoids[3] and Ward[4] (hierarchical clustering).
For arbitrary molecular features or similarity metrics with defined Euclidean distances: K-Medoids[3] and Ward[6] (hierarchical clustering).

For binary fingerprints: Complete, single and average linkage hierarchical clustering[4].
For binary fingerprints: Complete, single and average linkage hierarchical clustering[5].

The clustered data is plotted in two dimensions using multi-dimensional scaling[5].
The clustered data is plotted in two dimensions using principal component analysis (PCA)[3], multi-dimensional scaling[4], or TSNE[5].


#. Outlier Detection: Using an isolation forest, check for which molecules are potentially novel or are outliers according to the selected descriptor. Output can be directly to the command line by specifiying ``output`` to be ``terminal`` or to a text file by instead providing a filename.
Expand Down Expand Up @@ -181,8 +181,10 @@ Works Cited

[2] Chen, Y., Murray, P.R.D., Davies, A.T., and Willis M.C., J. Am. Chem. Soc. 140 (28), 8781-8787 (2018). https://doi.org/10.1021/jacs.8b04532

[3] Hastie, T., Tibshirani R. and Friedman J., The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed. (Springer Series in Statistics). 2009.
[3] Hastie, T., Tibshirani R. and Friedman J., The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer Series in Statistics (2009).

[4] Murtagh, F. and Contreras, P., Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowl Discov (2011). https://doi.org/10.1002/widm.53
[4] Borg, I. and Groenen, P.J.F., Modern Multidimensional Scaling: Theory and Applications, Springer Series in Statistics (2005).

[5] Borg, I. and Groenen, P.J.F., Modern Multidimensional Scaling: Theory and Applications (Springer Series in Statistics). 2005.
[5] van der Maaten, L.J.P. and Hinton, G.E., Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9:2579-2605 (2008).

[6] Murtagh, F. and Contreras, P., Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowl Discov (2011). https://doi.org/10.1002/widm.53
Binary file modified docs/_build/doctrees/README.doctree
Binary file not shown.
Binary file modified docs/_build/doctrees/environment.pickle
Binary file not shown.
17 changes: 9 additions & 8 deletions docs/_build/html/README.html
Original file line number Diff line number Diff line change
Expand Up @@ -152,13 +152,13 @@ <h3>Currently Implemented Functionalities<a class="headerlink" href="#currently-
Step 7: Define a score which maximizes the value in Step 5 and minimizes the value in Step 6.
Step 8: Iterate Steps 1 – 7 to select the featurization scheme and similarity measure to maximize the result of Step 7.</p></li>
<li><p>See Property Variation with Similarity: Visualize the correlation in the QoI between nearest neighbor molecules (most similar pairs in the molecule set) and between the furthest neighbor molecules (most dissimilar pairs in the molecule set). This is used to verify that the chosen measure is appropriate for the task.</p></li>
<li><p>Visualize Dataset: Visualize the diversity of the molecule set in the form of a pairwise similarity density and a similarity heatmap of the molecule set.</p></li>
<li><p>Compare Target Molecule to Molecule Set&lt;: Run a similarity search of a molecule against a database of molecules (molecule set). This task can be used to identify the most similar (useful in virtual screening operations) or most dissimilar (useful in application that require high diversity such as training set design for machine learning models) molecules.</p></li>
<li><p>Visualize Dataset: Visualize the diversity of the molecule set in the form of a pairwise similarity density and a similarity heatmap of the molecule set. Embed the molecule set in 2D space using using principal component analysis (PCA)[3], multi-dimensional scaling[4], or TSNE[5].</p></li>
<li><p>Compare Target Molecule to Molecule Set: Run a similarity search of a molecule against a database of molecules (molecule set). This task can be used to identify the most similar (useful in virtual screening operations) or most dissimilar (useful in application that require high diversity such as training set design for machine learning models) molecules.</p></li>
<li><p>Cluster Data: Cluster the molecule set. The following algorithms are implemented:</p></li>
</ol>
<p>For arbitrary molecular features or similarity metrics with defined Euclidean distances: K-Medoids[3] and Ward[4] (hierarchical clustering).</p>
<p>For binary fingerprints: Complete, single and average linkage hierarchical clustering[4].</p>
<p>The clustered data is plotted in two dimensions using multi-dimensional scaling[5].</p>
<p>For arbitrary molecular features or similarity metrics with defined Euclidean distances: K-Medoids[3] and Ward[6] (hierarchical clustering).</p>
<p>For binary fingerprints: Complete, single and average linkage hierarchical clustering[5].</p>
<p>The clustered data is plotted in two dimensions using principal component analysis (PCA)[3], multi-dimensional scaling[4], or TSNE[5].</p>
<ol class="arabic simple">
<li><p>Outlier Detection: Using an isolation forest, check for which molecules are potentially novel or are outliers according to the selected descriptor. Output can be directly to the command line by specifiying <code class="docutils literal notranslate"><span class="pre">output</span></code> to be <code class="docutils literal notranslate"><span class="pre">terminal</span></code> or to a text file by instead providing a filename.</p></li>
</ol>
Expand Down Expand Up @@ -206,9 +206,10 @@ <h2>License<a class="headerlink" href="#license" title="Permalink to this headli
<h2>Works Cited<a class="headerlink" href="#works-cited" title="Permalink to this headline"></a></h2>
<p>[1] Collins, K. and Glorius, F., A robustness screen for the rapid assessment of chemical reactions. Nature Chem 5, 597–601 (2013). <a class="reference external" href="https://doi.org/10.1038/nchem.1669">https://doi.org/10.1038/nchem.1669</a></p>
<p>[2] Chen, Y., Murray, P.R.D., Davies, A.T., and Willis M.C., J. Am. Chem. Soc. 140 (28), 8781-8787 (2018). <a class="reference external" href="https://doi.org/10.1021/jacs.8b04532">https://doi.org/10.1021/jacs.8b04532</a></p>
<p>[3] Hastie, T., Tibshirani R. and Friedman J., The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed. (Springer Series in Statistics). 2009.</p>
<p>[4] Murtagh, F. and Contreras, P., Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowl Discov (2011). <a class="reference external" href="https://doi.org/10.1002/widm.53">https://doi.org/10.1002/widm.53</a></p>
<p>[5] Borg, I. and Groenen, P.J.F., Modern Multidimensional Scaling: Theory and Applications (Springer Series in Statistics). 2005.</p>
<p>[3] Hastie, T., Tibshirani R. and Friedman J., The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer Series in Statistics (2009).</p>
<p>[4] Borg, I. and Groenen, P.J.F., Modern Multidimensional Scaling: Theory and Applications, Springer Series in Statistics (2005).</p>
<p>[5] van der Maaten, L.J.P. and Hinton, G.E., Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9:2579-2605 (2008).</p>
<p>[6] Murtagh, F. and Contreras, P., Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowl Discov (2011). <a class="reference external" href="https://doi.org/10.1002/widm.53">https://doi.org/10.1002/widm.53</a></p>
</section>
</section>

Expand Down
18 changes: 10 additions & 8 deletions docs/_build/html/_sources/README.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -102,19 +102,19 @@ Currently Implemented Functionalities
See Property Variation with Similarity: Visualize the correlation in the QoI between nearest neighbor molecules (most similar pairs in the molecule set) and between the furthest neighbor molecules (most dissimilar pairs in the molecule set). This is used to verify that the chosen measure is appropriate for the task.

#.
Visualize Dataset: Visualize the diversity of the molecule set in the form of a pairwise similarity density and a similarity heatmap of the molecule set.
Visualize Dataset: Visualize the diversity of the molecule set in the form of a pairwise similarity density and a similarity heatmap of the molecule set. Embed the molecule set in 2D space using using principal component analysis (PCA)[3], multi-dimensional scaling[4], or TSNE[5].

#.
Compare Target Molecule to Molecule Set<: Run a similarity search of a molecule against a database of molecules (molecule set). This task can be used to identify the most similar (useful in virtual screening operations) or most dissimilar (useful in application that require high diversity such as training set design for machine learning models) molecules.
Compare Target Molecule to Molecule Set: Run a similarity search of a molecule against a database of molecules (molecule set). This task can be used to identify the most similar (useful in virtual screening operations) or most dissimilar (useful in application that require high diversity such as training set design for machine learning models) molecules.

#.
Cluster Data: Cluster the molecule set. The following algorithms are implemented:

For arbitrary molecular features or similarity metrics with defined Euclidean distances: K-Medoids[3] and Ward[4] (hierarchical clustering).
For arbitrary molecular features or similarity metrics with defined Euclidean distances: K-Medoids[3] and Ward[6] (hierarchical clustering).

For binary fingerprints: Complete, single and average linkage hierarchical clustering[4].
For binary fingerprints: Complete, single and average linkage hierarchical clustering[5].

The clustered data is plotted in two dimensions using multi-dimensional scaling[5].
The clustered data is plotted in two dimensions using principal component analysis (PCA)[3], multi-dimensional scaling[4], or TSNE[5].


#. Outlier Detection: Using an isolation forest, check for which molecules are potentially novel or are outliers according to the selected descriptor. Output can be directly to the command line by specifiying ``output`` to be ``terminal`` or to a text file by instead providing a filename.
Expand Down Expand Up @@ -181,8 +181,10 @@ Works Cited

[2] Chen, Y., Murray, P.R.D., Davies, A.T., and Willis M.C., J. Am. Chem. Soc. 140 (28), 8781-8787 (2018). https://doi.org/10.1021/jacs.8b04532

[3] Hastie, T., Tibshirani R. and Friedman J., The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed. (Springer Series in Statistics). 2009.
[3] Hastie, T., Tibshirani R. and Friedman J., The Elements of statistical Learning: Data Mining, Inference, and Prediction, 2nd Ed., Springer Series in Statistics (2009).

[4] Murtagh, F. and Contreras, P., Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowl Discov (2011). https://doi.org/10.1002/widm.53
[4] Borg, I. and Groenen, P.J.F., Modern Multidimensional Scaling: Theory and Applications, Springer Series in Statistics (2005).

[5] Borg, I. and Groenen, P.J.F., Modern Multidimensional Scaling: Theory and Applications (Springer Series in Statistics). 2005.
[5] van der Maaten, L.J.P. and Hinton, G.E., Visualizing High-Dimensional Data Using t-SNE. Journal of Machine Learning Research 9:2579-2605 (2008).

[6] Murtagh, F. and Contreras, P., Algorithms for hierarchical clustering: an overview. WIREs Data Mining Knowl Discov (2011). https://doi.org/10.1002/widm.53
2 changes: 1 addition & 1 deletion docs/_build/html/searchindex.js

Large diffs are not rendered by default.

Loading

0 comments on commit f0c0f71

Please sign in to comment.