Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing human annotations #2

Open
mehradans92 opened this issue Oct 15, 2023 · 4 comments
Open

Missing human annotations #2

mehradans92 opened this issue Oct 15, 2023 · 4 comments

Comments

@mehradans92
Copy link

Hi @ardunn. I am trying to use your datasets that you have created and looks like many of the human annotations are missing from the data/general_results. As an example, many completion keys in the jsonl have empty list values. How did you handle these samples when you were evaluating the model performance? Thanks!

https://github.com/lbnlp/NERRE/tree/main/general_and_mofs/data/general_results/run_0.jsonl

@ardunn
Copy link
Contributor

ardunn commented Oct 16, 2023

Hi @mehradans92

For annotation of the general materials task, we do not have extracted entries for texts without a material (root entity around which the other entities are linked) mentioned in the text. Take for example this line in run_2:

{"prompt": "Electrodynamics in superconductors explained by Proca equations\nA fully consistent model to study electrodynamics for superconductors in the stationary and non-stationary regimes has been developed based on Proca equations and a massive photon. In particular, this approach has been applied to study the electric field penetration depth in superconductors. The model shows a deviation from the charge contribution to an internal electric field compared to previous approaches.\n\n###\n\n", "completion": " []\n\nEND\n\n", "gpt3_completion": " []"}

There is no clear root material entity to link to other recognizable materials terms (e.g., superconductors). We do not consider "superconductors" a material because a clear material entity is not present. This is a design choice we made quite early on in developing the General-JSON models.

However, there may be errors. For example, a clearly mentioned material in the text and no corresponding material root entry in the annotation. This would be an error in annotation. If you find them, please point them out and we can correct them!

@mehradans92
Copy link
Author

Thanks for the explanation @ardunn. I have another question on the mof dataset. So I have noticed that sometimes MOF names and their chemical formula are interchangeably used in the human annotations and sometimes not. Was there a reasoning for this?

Also, Andrew mentioned that there have been a few updates to your paper, so I was wondering if there's a way to get access to the latest version. Thanks a lot :)

@ardunn
Copy link
Contributor

ardunn commented Oct 23, 2023

Hi @mehradans92 everything should be updated, but do you have some specific examples where this is the case? I am tagging @Andrew-S-Rosen here as well

@Andrew-S-Rosen
Copy link

Andrew-S-Rosen commented Oct 24, 2023

Additional details would be helpful.

That said, it's worth emphasizing that the notion of what constitutes a "name" vs. a "formula" can be rather unclear for MOFs. For instance, take HKUST-1. That's clearly a name. And the equivalent Cu3(btc)2 is clearly a formula. But what about Cu-BTC? I would argue that's a name, but one could potentially argue it's a formula. So, there is certainly room for ambiguity here, although if there are clear errors we should have that addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants