Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix segmentation fault on invalid SMARTS #19

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

aleebberg
Copy link

Fixes #18

@@ -48,11 +48,17 @@ impl RWMol {
ROMol { ptr }
}

pub fn from_smarts(smarts: &str) -> Result<Self, Box<dyn std::error::Error>> {
pub fn from_smarts(smarts: &str) -> Result<Self, &str> {
if smarts.is_empty(){ return Err("empty SMARTS"); }
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I do not study chemistry myself, but informatics, I am not sure, if this is the wanted behavior or if you'd prefer a wildcard here or if maybe an empty string is fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably don't want to allow an empty smarts, so this works for me!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for contributing this PR! It's close but I think a little guidance would help. First off, it's good to reiterate the structure of this git repository. It contains two rust crates: rdkit and rdkit-sys. The sys crate is where pointers are built and where rust meets the C++ code of rdkit. You want all exception handling to happen in the sys crate. The rdkit crate is for high-level convenience. Every pointer type from rdkit-sys is wrapped in a struct -- with hidden pointers, as you've seen, we wan't to create a "nanny state" where our users can't do unsafe things. I believe you want to take a look at the sys crate as part of your bug fix.

Now for my feedback on this code snippet:

Using the Result type is absolutely the right way to go -- we need to discriminate between good values and error values. My only objection is the type on the error path, &str is not very Rust-y. We want to avoid "stringly-typed" data and instead use more enums.

See

pub fn from_smiles(smiles: &str) -> Result<Self, ROMolError> {
let_cxx_string!(smiles_cxx_string = smiles);
let ptr = ro_mol_ffi::smiles_to_mol(&smiles_cxx_string);
match ptr {
Ok(ptr) => {
if ptr.is_null() {
Err(ROMolError::UnknownConversionError)
} else {
Ok(ROMol { ptr })
}
}
Err(e) => Err(ROMolError::ConversionException(e.to_string())),
}
}
for an example of how we build enum errors when working with SMILES. I believe you want to make a RWMolError enum and instantiate it in your error path.

My next concern is performing condition checking in rw_mol.rs itself -- in general I want rdkit (c++) to drive what happens. It's very possible in the future our friends at the RDKit project could change how empty strings behave (although unlikely) and I don't want the Rust logic to be at odds with whatever they decide. Instead, check if the returned pointer is null and if so, build the appropriate enum.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for the detailed feedback! You are totally right about the enums and I am going to adapt it to the handling of errors in ro_mol.rs.
Concerning the empty SMARTS, I checked what happens in Python and they return a valid Mol object that doesn't have any contents (so not a nullpointer, but empty). So maybe we should go for that, too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to model what the C++ library does -- it does mean that python folks are going to have to rethink how they use rdkit. The rdkit-rs implementation puts errors upfront and in your face. Clear error propagation is important for our users of the the cheminee search engine, for example. We don't want too much "magic".

So think of this project as C++ flavored RDKit 😄. Hopefully you are familiar with the autogenerated rdkit docs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had a deeper look at the rdkit-Documentation, though I must say I have never dealt with C++. I still found an interesting comment in the actual C++ code stating: // empty strings produce empty molecules. So maybe having an error in Rust at this part would not really fit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, please address my last comment and I'm happy to merge this and push it up to crates.io. Thanks for your perseverance and patience!

@JJ-Pineda JJ-Pineda self-requested a review December 6, 2023 15:24
@JJ-Pineda
Copy link
Contributor

Thank you @aleebberg for finding this! We have been fixing pointer issues lately and looks like we missed this one. @xrl I'm going to merge this as it seems to be a good stop-gap measure, but we probably want to look into what's happening to the pointer in rdkit.

@@ -48,11 +48,17 @@ impl RWMol {
ROMol { ptr }
}

pub fn from_smarts(smarts: &str) -> Result<Self, Box<dyn std::error::Error>> {
pub fn from_smarts(smarts: &str) -> Result<Self, &str> {
if smarts.is_empty(){ return Err("empty SMARTS"); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we probably don't want to allow an empty smarts, so this works for me!

@JJ-Pineda
Copy link
Contributor

@aleebberg Clippy is mainly complaining about a space before an "=" sign. Would you mind adding that?

Also, the failed test looks to be related to an issue we introduced on our side (i.e. making a mutable romol). Feel free to add that "mut" if you like.

@@ -48,11 +48,17 @@ impl RWMol {
ROMol { ptr }
}

pub fn from_smarts(smarts: &str) -> Result<Self, Box<dyn std::error::Error>> {
pub fn from_smarts(smarts: &str) -> Result<Self, &str> {
if smarts.is_empty(){ return Err("empty SMARTS"); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you so much for contributing this PR! It's close but I think a little guidance would help. First off, it's good to reiterate the structure of this git repository. It contains two rust crates: rdkit and rdkit-sys. The sys crate is where pointers are built and where rust meets the C++ code of rdkit. You want all exception handling to happen in the sys crate. The rdkit crate is for high-level convenience. Every pointer type from rdkit-sys is wrapped in a struct -- with hidden pointers, as you've seen, we wan't to create a "nanny state" where our users can't do unsafe things. I believe you want to take a look at the sys crate as part of your bug fix.

Now for my feedback on this code snippet:

Using the Result type is absolutely the right way to go -- we need to discriminate between good values and error values. My only objection is the type on the error path, &str is not very Rust-y. We want to avoid "stringly-typed" data and instead use more enums.

See

pub fn from_smiles(smiles: &str) -> Result<Self, ROMolError> {
let_cxx_string!(smiles_cxx_string = smiles);
let ptr = ro_mol_ffi::smiles_to_mol(&smiles_cxx_string);
match ptr {
Ok(ptr) => {
if ptr.is_null() {
Err(ROMolError::UnknownConversionError)
} else {
Ok(ROMol { ptr })
}
}
Err(e) => Err(ROMolError::ConversionException(e.to_string())),
}
}
for an example of how we build enum errors when working with SMILES. I believe you want to make a RWMolError enum and instantiate it in your error path.

My next concern is performing condition checking in rw_mol.rs itself -- in general I want rdkit (c++) to drive what happens. It's very possible in the future our friends at the RDKit project could change how empty strings behave (although unlikely) and I don't want the Rust logic to be at odds with whatever they decide. Instead, check if the returned pointer is null and if so, build the appropriate enum.

@xrl
Copy link
Contributor

xrl commented Dec 7, 2023

I enabled github actions for this PR, now you can see the formatting errors here: https://github.com/rdkit-rs/rdkit/actions/runs/7128018658/job/19420335878?pr=19#step:7:1211 .

src/graphmol/rw_mol.rs Outdated Show resolved Hide resolved
@aleebberg aleebberg requested a review from xrl December 27, 2023 10:22
@aleebberg
Copy link
Author

Hi,
@xrl I was just wondering if you already had a second look at my PR. I think I might need your permission for github actions as well. It would be great to merge it since I need the PR for benchmarking the code I implemented for my master thesis. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Segmentation fault on invalid SMARTS in RWMol::from_smarts()
3 participants