-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #971 from DaanVanVugt/feature/gpt_scraper
LLM scraper
- Loading branch information
Showing
31 changed files
with
1,772 additions
and
89 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
class LlmInteraction < ApplicationRecord | ||
belongs_to :event | ||
validates :scrape_or_process, presence: true | ||
validates :model, presence: true | ||
validates :prompt, presence: true | ||
validates :input, presence: true | ||
validates :output, presence: true | ||
# validates :needs_processing, presence: true | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,17 @@ | ||
class AddLlmCheck < ActiveRecord::Migration[7.0] | ||
def change | ||
create_table :llm_interactions do |t| | ||
t.belongs_to :event, foreign_key: true | ||
t.datetime :created_at | ||
t.datetime :updated_at | ||
t.string :scrape_or_process | ||
t.string :model | ||
t.string :prompt | ||
t.string :input | ||
t.string :output | ||
t.boolean :needs_processing, default: false | ||
end | ||
add_reference :events, :llm_interaction, foreign_key: true | ||
add_column :events, :open_science, :string, array: true, default: [] | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,28 @@ | ||
require 'open-uri' | ||
require 'csv' | ||
require 'nokogiri' | ||
|
||
module Ingestors | ||
class FourtuLlmIngestor < LlmIngestor | ||
def self.config | ||
{ | ||
key: '4tu_llm_event', | ||
title: '4TU LLM Events API', | ||
category: :events | ||
} | ||
end | ||
|
||
private | ||
|
||
def process_llm(_url) | ||
url = 'https://www.4tu.nl/en/agenda/' | ||
event_page = Nokogiri::HTML5.parse(open_url(url, raise: true)).css('.searchresults')[0].css('a.searchresult') | ||
event_page.each do |event_data| | ||
new_url = event_data['href'] | ||
sleep(1) unless Rails.env.test? and File.exist?('test/vcr_cassettes/ingestors/4tu_llm.yml') | ||
new_event_page = Nokogiri::HTML5.parse(open_url(new_url, raise: true)).css('body').css('main, .page-header__content') | ||
get_event_from_css(new_url, new_event_page) | ||
end | ||
end | ||
end | ||
end |
Oops, something went wrong.