Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API v2 support with new fields #231

Merged
merged 168 commits into from
Sep 26, 2024
Merged
Changes from 1 commit
Commits
Show all changes
168 commits
Select commit Hold shift + click to select a range
419a8fd
moves code out of bin elasticsearch files and into module
jduss4 Jan 9, 2020
e452e3b
removes unnecessary dtd for french 17
jduss4 Jan 9, 2020
b01397a
combines some parameter gathering files
jduss4 Jan 9, 2020
4d02bee
updates gems and fixes test suite
jduss4 Jan 9, 2020
3308c75
in progress working on validator for es fields)
jduss4 Jan 10, 2020
8e995b4
creates validator for elasticsearch postings
jduss4 Jan 15, 2020
9f7eebf
whoops missed one
jduss4 Jan 17, 2020
44c3b80
refactored validator to handle nested field specific mapping
jduss4 Jan 17, 2020
2deb3ca
get rid of unnecessary variable definitions
jduss4 Jan 17, 2020
c863b32
put data methods in index class
wkdewey May 20, 2022
282dabf
require_relative so that tests can be run from base directory
wkdewey May 20, 2022
d081aa0
move puts from bin methods into es classes
wkdewey May 20, 2022
ebbb8d2
change get_schema to return rather than puts
wkdewey May 20, 2022
a91a352
simplify regex
wkdewey May 20, 2022
1c4b4eb
drop unnecessary conditional
wkdewey May 20, 2022
3728424
return early if invalid nested field found
wkdewey May 20, 2022
4d2c277
change coverage-spatial to spatial
wkdewey May 23, 2022
1b3feeb
Update CHANGELOG.md
wkdewey May 23, 2022
70f98a1
moves code out of bin elasticsearch files and into module
jduss4 Jan 9, 2020
ab2a750
in progress working on validator for es fields)
jduss4 Jan 10, 2020
8dc6ce8
creates validator for elasticsearch postings
jduss4 Jan 15, 2020
2d88276
add byebug and update gems
wkdewey Nov 3, 2021
6d20c53
specify proper api_version, add xsl file for ead
wkdewey Nov 3, 2021
6b63319
add ead to format_to_class
wkdewey Nov 3, 2021
6e281f6
add date helpers from newer Datura version
wkdewey Nov 3, 2021
37a7f07
add byebug to gemspec
wkdewey Nov 3, 2021
5bc84bb
add gem
wkdewey Nov 3, 2021
53bfcc3
add file_ead class
wkdewey Nov 3, 2021
1f4361a
add EadToES class
wkdewey Nov 3, 2021
52b38a0
add files for EadToEs
wkdewey Nov 3, 2021
12b0897
add EadToEsItems class and associated files
wkdewey Nov 3, 2021
21a55dc
add xsl file for ead (not functional yet)
wkdewey Nov 3, 2021
c39668c
remove gem doc that is messing things up
wkdewey Nov 4, 2021
78fa867
print full error message, not just something went wrong
wkdewey Nov 4, 2021
0df2257
fix xpath
wkdewey Nov 4, 2021
3ce120d
add all require fields, including unfilled ones
wkdewey Nov 4, 2021
812322d
fix xpaths hash
wkdewey Nov 4, 2021
1f20314
make EadToEsItems a separate class
wkdewey Nov 4, 2021
9f01cd0
add abstract field and fix bad xpaths
wkdewey Nov 4, 2021
f05e535
add a backtrace to error handling
wkdewey Nov 8, 2021
1805a2d
grab 'items' at any nesting of the EAD
wkdewey Nov 8, 2021
8f74d38
add xpaths and fields, and make sure eadtoesitems inherits from eadtoes
wkdewey Nov 8, 2021
fbcf656
change order of get id to fix bug
wkdewey Nov 8, 2021
79e89e7
add documentation for adding new format
wkdewey Nov 8, 2021
44a3790
adjust and add fields for items
wkdewey Nov 9, 2021
e2ab095
add items to repository xpaths
wkdewey Nov 9, 2021
aa9a9cb
fix image_url xpath
wkdewey Nov 9, 2021
7a9cbeb
add puts statements for debugging
wkdewey Nov 9, 2021
21b1adc
try another way to debug
wkdewey Nov 9, 2021
0511b9f
test for nil specifically
wkdewey Nov 9, 2021
52aba16
add debugging statements to get_schema
wkdewey Nov 9, 2021
2bbdab1
try debugging with byebug
wkdewey Nov 9, 2021
4358081
remove debugging info
wkdewey Nov 9, 2021
5ae1fc0
add alternative field
wkdewey Nov 10, 2021
73089c0
add relation field
wkdewey Nov 10, 2021
428ea4c
add spatial field
wkdewey Nov 10, 2021
c5cb629
fix a get_text method
wkdewey Nov 10, 2021
4ab0542
change post_es to match jessica's changes
wkdewey Nov 10, 2021
193d69e
change CommonXML to Datura helpers
wkdewey Nov 11, 2021
344381c
change xpaths to be less specific to Walt Whitman
wkdewey Nov 11, 2021
df7e896
refactor title fields and xpaths
wkdewey Nov 11, 2021
bf65724
add creator override for items so it is an array
wkdewey Dec 10, 2021
c367987
change creators to creator
wkdewey Dec 21, 2021
480f95a
update gems in preparation for release
wkdewey May 25, 2022
51d85b2
add rdf schema
wkdewey Jun 16, 2022
45cb1a1
update schemas to include rdf fields
wkdewey Jun 20, 2022
43c5c41
add rdf to default fields
wkdewey Jun 20, 2022
6198a7a
add spatial.title field
wkdewey Jun 21, 2022
c145ad8
require byebug so it is in scope for posting etc.
wkdewey Jul 18, 2022
f089f42
remove inserted byebug
wkdewey Aug 24, 2022
d0bfc36
require byebug so it is in scope for posting etc.
wkdewey Jul 18, 2022
0bd3c58
include full error message with backtrace
wkdewey Aug 8, 2022
cd639b4
updates gems and fixes test suite
jduss4 Jan 9, 2020
b40c22c
update gems in preparation for release
wkdewey May 25, 2022
7efc8fc
start adding new api fields
wkdewey Aug 9, 2022
685a563
update schema to match spreadsheet with new field names
wkdewey Aug 11, 2022
5b52ab1
assemble json based on api version
wkdewey Aug 15, 2022
3af4d47
add overrides for 2.0 fields
wkdewey Aug 15, 2022
8a3634a
change next and previous fields
wkdewey Aug 15, 2022
8e5f888
add fig_location
wkdewey Aug 15, 2022
6397179
add abstract
wkdewey Aug 15, 2022
2f80693
remove split-out assemble_text methods
wkdewey Aug 15, 2022
df573e1
update gems and get rid of merge conflicts
wkdewey Aug 24, 2022
ee079ae
add new fields
wkdewey Aug 24, 2022
c268f8c
correct field name
wkdewey Aug 24, 2022
0eb14bc
add fields to ead overrides
wkdewey Aug 24, 2022
ccdede7
populate new fields in json
wkdewey Aug 24, 2022
dfa420c
resolve merge conflict
wkdewey Aug 24, 2022
e856812
add new fields
wkdewey Aug 25, 2022
b252e66
update fields for related items, dates, order integers
wkdewey Aug 25, 2022
215da79
correct syntax errors
wkdewey Aug 25, 2022
db217d9
correct another syntax error
wkdewey Aug 25, 2022
8065e3f
change keywords1 to plain keywords
wkdewey Aug 25, 2022
05b5561
add more specific message to es validation
wkdewey Aug 30, 2022
fe9fb2b
remove extra byebug require
wkdewey Sep 7, 2022
59013d2
remove byebug, change error message
wkdewey Sep 7, 2022
3889306
update schema under citations
wkdewey Sep 7, 2022
f0c19d8
require fileutils to avoid errors in setup
wkdewey Sep 16, 2022
c0b734d
skip title_sort if title is nil
wkdewey Sep 20, 2022
1ec9559
return nil instead of empty string, addresses https://github.com/whit…
wkdewey Sep 23, 2022
3eddf9a
add more nil checks for results of xpath methods
wkdewey Sep 26, 2022
a1413ba
check the correct xpath fields
wkdewey Sep 26, 2022
b71d028
make sure input is in UTF-8
wkdewey Sep 26, 2022
520bbaa
make changes for new api schema and revised xpath methods
wkdewey Sep 26, 2022
dd30d2f
add a nil check for creators
wkdewey Sep 26, 2022
0ad2053
Revert "make sure input is in UTF-8"
wkdewey Oct 3, 2022
dc727d6
change error handling to avoid method that isn't present
wkdewey Oct 17, 2022
2df4dde
make sure person is an array
wkdewey Oct 20, 2022
436deed
make sure settings hash is what elasticsearch expects
wkdewey Oct 21, 2022
8a35aa2
change where mappings are posted for es upgrade
wkdewey Oct 21, 2022
2c7fa5c
add headers to ES requests for authorization
wkdewey Oct 25, 2022
25f0a2b
add method to construct basic auth header from options
wkdewey Oct 25, 2022
0ec2665
remove debugging code
wkdewey Oct 25, 2022
7b90a09
update conditional logic for status code, dynamic_templates key
wkdewey Oct 25, 2022
f982173
change endpoint for delete_by_query for ES8 compatibility
wkdewey Jan 26, 2023
8a99138
Merge pull request #210 from CDRH/elasticsearch_upgrade
techgique Mar 23, 2023
602f6be
upgrade to Ruby 3.0.4
wkdewey Oct 27, 2022
c6e687d
make keyword arguments compatible with Ruby 3
wkdewey Oct 27, 2022
dbe6aaa
go up to ruby 3.1.2
wkdewey Oct 27, 2022
6b1e468
add output if nested field is invalid
wkdewey Nov 8, 2022
de02891
don't use array method on person to avoid errors
wkdewey Nov 8, 2022
ba60f4c
update changelog for new version
wkdewey Nov 9, 2022
0ddcc33
update reference to ruby version
wkdewey Nov 9, 2022
a614c6d
make changes related to ES and API upgrade
wkdewey Nov 10, 2022
fbc0897
add links to more detailed documentation
wkdewey Nov 10, 2022
3cf2237
add link to elasticsearch documentation
wkdewey Nov 10, 2022
3cadd26
add conditional to creator for nil checking
wkdewey Nov 18, 2022
3498ebe
Create schema_v2.md
karindalziel Nov 10, 2022
5b0f54f
make sure webs_to_es fields can handle nil values
wkdewey Jan 25, 2023
f286ab4
Merge pull request #211 from CDRH/ruby3upgrade
techgique Mar 23, 2023
594e57f
add new fields to 2.0 schema
wkdewey May 4, 2023
e928e7e
fix errors
wkdewey May 4, 2023
aa0770b
create methods and overrides to transform pdf files to elasticsearch
wkdewey Mar 2, 2023
16a421b
add pdf option for command line options
wkdewey Mar 2, 2023
d3f87be
fix variable names
wkdewey Mar 2, 2023
ab921b7
adjust and clarify documentation for adding new formats
wkdewey Mar 6, 2023
d0ba3f0
fix comments
wkdewey Mar 13, 2023
13ba8f7
fix html transformation method
wkdewey Mar 20, 2023
68842a9
clarify that this is not yet implement
wkdewey Mar 20, 2023
1f612a7
add keywords5 to pdftoes fields
wkdewey Jun 13, 2023
43a5821
add has_relation to pdftoes fields
wkdewey Jun 13, 2023
9fe353c
add has_source to pdftoes fields
wkdewey Jun 13, 2023
caed7ba
add xpath for notes
wkdewey Aug 23, 2023
91dcfc3
add note xpath to text fields
wkdewey Aug 23, 2023
867cad7
truncate pdf text so it doesn't exceed ES limit
wkdewey Aug 25, 2023
4dfd67b
Update solr_clear_index to use renamed method in parser code
techgique Aug 30, 2023
5cc6c87
Merge pull request #217 from CDRH/pdftoes
techgique Oct 25, 2023
b680e70
Output schema in human readable format
techgique Dec 21, 2023
4a1c478
make sure uri_html is blank so html is not loaded in rails
wkdewey Apr 15, 2024
e0bb62e
don't hardcode collection name for ead_to_es_items
wkdewey Apr 15, 2024
0bb4c07
fix nested fields in vra that were making the schema not validate
wkdewey Jun 5, 2024
0d39379
add keyword5 for default htmltoes overrides
wkdewey Jul 22, 2024
7ae1583
check for nil values when variables depende on nokogiri methods
wkdewey Aug 7, 2024
5088c14
add character limit and truncate text fields
wkdewey Aug 8, 2024
10e45d0
update changelog
wkdewey Aug 21, 2024
4fcb329
add ead to list of possible formats
wkdewey Aug 21, 2024
22b2d32
fix incorrect field label and remove redundancy
wkdewey Aug 28, 2024
40393d8
revert accidental commenting of subcategory override
wkdewey Aug 28, 2024
67eedaa
Merge pull request #226 from CDRH/whitman-fixes
techgique Aug 28, 2024
dd21a6d
change version back to unreleased
wkdewey Sep 16, 2024
8ba660c
replace dead link and link to private repo
wkdewey Sep 16, 2024
20799c5
add pdf-reader to gemspec
wkdewey Sep 16, 2024
544b59d
change link from private docs to ES site
wkdewey Sep 18, 2024
18e1aad
remove file with redundant methods
wkdewey Sep 18, 2024
2c9b4b4
remove uri_html definition which is blanked later
wkdewey Sep 18, 2024
19e1bfe
move byebug into gemspec, don't restrict to dev only
wkdewey Sep 19, 2024
d0bac71
remove reference to info that was moved
wkdewey Sep 23, 2024
93ba036
Merge branch 'dev' into new_fields
wkdewey Sep 23, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 7 additions & 7 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,13 +38,9 @@ Versioning](https://semver.org/spec/v2.0.0.html).
```
api_version: '2.0'
```
See new schema (2.0) documentation [here](https://github.com/CDRH/datura/docs/schema_v2.md)
See new schema (2.0) documentation [here](https://github.com/CDRH/datura/blob/main/docs/schema_v2.md)
- schema validation with API version 2.0: invalidly constructed documents will not post
- authentication with Elasticesarch 8.5; add the following to `public.yml` or `private.yml` in the data repo:
```
es_user: username
es_password: ********
```
- authentication with Elasticesarch 8.5
- field overrides for new fields in the new API schema
- functionality to transform EAD files and post them to elasticsearch
- functionality to transform PDF files (including text and metadata) and post them to elasticsearch
Expand All @@ -65,7 +61,11 @@ See new schema (2.0) documentation [here](https://github.com/CDRH/datura/docs/sc

### Migration
- check to make sure "text" xpath is doing desired behavior
- use Elasticsearch 8.5 or higher and add authentication as described above if security is enabled. See [dev docs instructions](https://github.com/CDRH/cdrh_dev_docs/blob/update_elasticsearch_documentation/publishing/2_basic_requirements.md#downloading-elasticsearch).
- use Elasticsearch 8.5 or higher and add authentication as described above if security is enabled. Add the following to `public.yml` or `private.yml` in the data repo:
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please revise the text here to reflect that one doesn't need to look above for the auth info now.

```
es_user: username
es_password: ********
```
- upgrade data repos to Ruby 3.1.2
-
- add api version to config as described above
Expand Down