Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can you provided a link to what these descriptors actually are? #91

Open
ErikCVik opened this issue Nov 20, 2020 · 2 comments
Open

Can you provided a link to what these descriptors actually are? #91

ErikCVik opened this issue Nov 20, 2020 · 2 comments

Comments

@ErikCVik
Copy link

For example, what are matrix aggregating methods and can you give a description for each one more elaborately?
Is this similar to chemical fingerprints or not at all?

@plkx
Copy link

plkx commented Jan 22, 2021

Start here:

https://www.rdkit.org/docs/GettingStartedInPython.html#list-of-available-descriptors

In essence, they are not fingerprints, since many molecular structures may have identical descriptor values (in some cases, infinite many molecules), which is known as descriptor degeneracy, whereas fingerprints are designed with intent of zero degeneracy.

As for "matrix aggregating:"
Many molecular descriptors are invariant properties determined from structure-derived graphs, or other representations in the form of matrices. Matrices facilitate derivation of invariant properties. As somewhat of an illustration:

Consider any hydrocarbon, CnH#. Ignore the H atoms. Create a table (matrix) numbered 1 through n horizontally and vertically. The matrix must be square, and is of dimensions n x n. Different systems (rules) are used to populate the table, e.g. values = 1 at Ci, Cj when Ci is bonded to Cj, otherwise they are zero. 'Set the matrix = 0' (treat it as a determinant, which is a scalar value). The determinant can be expanded into a polynomial equation of nth-degree (characteristic equation). The roots of that equation are solved (the matrix values are coefficients for the system of linear equations). The result is n roots, which are eigenvalues, which are invariant. One well-known rule system (Huckel theory) for populating the matrix produces the atom/electron coefficients for molecular orbitals as a linear combination of atomic orbitals (LCAO).

Fundamentally, this is the procedure used for generating descriptors from matrices. One matrix may beget many varieties of additional matrices, transformations may be applied, relationships between the matrices evaluated, ….

Such descriptors may seem abstract or ethereal because of the apparent lack of physical meaning in relation to chemical properties. However, such a perspective is no more valid than the belief that eigenvalues we call quantum numbers reflect physical phenomena such as spin, or angular momentum.

Their value is not in some physical relation to a physical property or activity, but in their incorporation into models with sufficient prediction quality to make them useful.

All models are wrong, but some are useful.

Regards,

plkx

@DocMinus
Copy link

DocMinus commented Sep 1, 2021

A bit late perhaps, but here the actual Mordred overview:
http://mordred-descriptor.github.io/documentation/master/descriptors.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants