-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LLM scraper #971
LLM scraper #971
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some initial comments. Didn't have a lot of time last week to look at this due to the ELIXIR All-Hands meeting
add_reference :events, :llm_object, foreign_key: true | ||
add_column :events, :open_science, :string, array: true, default: [] | ||
add_column :materials, :llm_processed, :bool, default: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are these fields needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed materials for now.
lib/modules/llm_service.rb
Outdated
def post_process_func(event) | ||
response = process(event) | ||
event = unload_json(event, response) | ||
event.llm_object_attributes = llm_object_attributes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This overwrites the data from the "scrape" LLM interaction, right? Is there ever a need to retain the data for both?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently not I think. I'd like to still have data for the scrape in case we ever decide not to do post processing for whatever reason
Will try and review this next week. Had a busy week due to various projects wrapping up at the end of June |
test/test_helper.rb
Outdated
@@ -233,22 +235,22 @@ | |||
|
|||
def mock_biotools | |||
biotools_file = File.read("#{Rails.root}/test/fixtures/files/annotation.json") | |||
WebMock.stub_request(:get, /data.bioontology.org/). | |||
to_return(:status => 200, :headers => {}, :body => biotools_file) | |||
WebMock.stub_request(:get, /data.bioontology.org/) |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames High test
test/test_helper.rb
Outdated
|
||
WebMock.stub_request(:get, /nominatim.openstreetmap.org/). | ||
to_return(:status => 200, :headers => {}, :body => nominatim_file) | ||
WebMock.stub_request(:get, /nominatim.openstreetmap.org/) |
Check failure
Code scanning / CodeQL
Incomplete regular expression for hostnames High test
Summary of changes
Added functionality for using automated LLM scraping and metadata processing
Motivation and context
Robustness, metadata parsing, (potential) deduplication, collection management
Checklist
to license it to the TeSS codebase under the
BSD license.