-
Notifications
You must be signed in to change notification settings - Fork 7
Release History
Effective v32.0.0, version numbers follow a semantic versioning scheme. Major numbers will be incremented whenever effort is required for upgrading, such as changing a configuration file, upgrading dependencies, and so on. Minor numbers will be incremented in most cases, for new features or combinations of features and bugfixes, and patchlevel numbers will be increased when the only changes are bugfixes.
v33.5.1 - July 24, 2015
- Fix source of NYPL rights
v33.5.0 - July 20, 2015
- Add Artstor sets
- Log content for OAI parse error
- Update Harvard mapping and sets
- Remove problematic USC sets that have single-page records
- Add traceback printing to couch module
v33.4.0 - June 9, 2015
- Remove set from MWDL's exclusions in profile
v33.3.0 - June 2, 2015
- Add MDL and CDL providers
v33.2.3 - April 13, 2015
- Fix move_date_values to properly iterify incoming values
- Add Indiana profile
- Harvard: Properly iterate over incoming descriptions
- NYPLMapper: fixup publisher and description setters
- MARCMapper: improve handling in set_begin_end_dates()
- Added test for pre-1900 dates in enrich_date (#7562)
- Fix enrich_date to handle dates < 1900 (#7562)
- Hotfix: Changed location of dict evaluation in ia_fetcher.request_records
- Added "moving image" to type conversion keywords.
v33.2.2 - April 3, 2015
- Update ARTstor, GPO, Hathitrust, Internet Archive, and Smithsonian profiles
- Fix exception in AbsoluteURLFetcher
- Map MWDL intermediateProvider
- Change field for Getty collection
v33.2.1 - November 17, 2014
- IAMapper: remove extraneous print function
- Fix sitemap generation and use pyrax to sync to Rackspace
v33.2.0 - October 31, 2014
- IA: generate sourceResource.title correctly (fixes #7708)
- IA: Add YIVO Institute Library (fixes #7735)
- Remove old SCDL-specific code (refs #7647)
v33.1.6 - October 1, 2014
- Add Missouri Hub profile #7617
v33.1.4 - September 25, 2014
- SCDL single-source profile #7647
- Add strip_html pipeline module to all profiles #7636
- Geocoding bug fixes: (refs #7705)
- Add retries and logging to Geonames requests
- Fix Geonames webservice URI
- Fix critical bug in
Place.merge_related()
- MAPv3.1/temporal-related indexing fixes for MWDL/NYPL #7704
v33.1.1 - September 24, 2014
- DigitalNC: Override DigitalNCMapper super's map_format (#29, refs #7656)
v33.1.0 - September 23, 2014
- Geocoding refactor (#7664, #7643, #7640):
- Skip 'USA' place names when Geocoding
- Refactor geocode to use a Place class
- Update enrichments to use Geonames first, before falling back on
v33.0.2 - September 22, 2014
- Fix UTC timestamp regression #7699
- Fix dedupe value exception to handle non-strings #7700
- Allow "physical object" to type mapping and update validation #7663
- Parallelize tests to isolate those connecting externally
Run pip install -r requirements.txt
in addition to the usual steps, before restarting Akara.
v33.0.1 - September 15, 2014
- IA: Add Montana State Library collection
v33.0.0 - September 11, 2014
- Set log level for couch.py (fixes #7688)
- Maintain Primo URL/params better (ref #7654)
- Add module to strip HTML tags (#7636)
- Refactor Couch object for sync_qa_views property (ref #7545)
Run pip install -r requirements.txt
in addition to the usual steps, before restarting Akara.
Optional: set the new LogLevel
property in akara.ini
, per README.md.
v32.2.0 - September 8, 2014
- Have save_records.py use sync_qa_views (fixes #7680)
- GPO: Set URI for object property (refs #7519)
- Cleanup unused/out of date code (fixes #7645, #7677)
v32.1.0 - September 4, 2014
- Complete GPO profile (#7519)
- Fix
authority_condition
inOAIMODSMapper.map_format()
(#7673) - Hathi: fix removal of
</collection>
when fetching (#7678) - Enable/disable building of QA views (refs #7545)
- Add docstrings to explain shred module (#7613)
Add the following line to your akara.ini file in the [CouchDb]
section:
SyncQAViews=<Either True or False; if missing QA views will be synced by default>
v32.0.0 - August 29, 2014
- Remove old Digital Commonwealth profile (#7633)
- Have BPL enrichment remove finding aids (#7639)
- Update geocode module to set 'name' property and add unit tests (#7607)
- Add SCDLCharlestonMapper for object mapping (#7651)
- UTC timestamps everywhere; fix sitemaps (#7657)
Add the following line to your akara.ini file in the [Sitemap]
section:
SitemapPath=<Path to a writable local directory for sitemap files>
v31.2 - August 15, 2014
- Fixed stripping of brackets in displayDate (#7598)
- Updated date and temporal mapping (#7612)
- Refactored views to eliminate separate *_count views (#7545)
Upgrading: This is being released alongside an update to the platform (API) application, which changes the QA interface's generation of reports. The sync_couch_views.py script should be run against the 'dpla' database, ideally before the views are actually needed.
v31.1 - August 7, 2014
- Removed "nonSort" from BPL title mapping (#7570)
- Removed cleanup and lookup language enrichments from Getty profile
v31.0 - August 7, 2014
- Added Boston Library Consortium to IA intermediateProviders (#7595)
- Changed JSON Schema for validation (#7590)
- Fixed uiuc_book URL in test, and re-enabled test
- Re-enabled fetcher tests
- Refactored copy_prop module (#7130)
- Removed uses_network attributes from tests that use local files
- Refactored enrich_language module (#7618)
- Fixed tests based on enrich_language refactoring
- Refactored Primo and added Getty (#7603, #7604)
- Fixed MWDL fetcher test
v30.0 - July 28, 2014
- Added parsing of intermediateProvider, added MHL (#7588)
- Changed NCDHC ("digitalnc") endpoint_url (#7552)
v29.2 - July 17, 2014
- Updated mapv3 schema for validation
- Updated fetcher, mapper, profile for MWDL upgrade (#6610)
- Include all "content" type note values in description for NCDHC (#7594)
v29.1 - July 14, 2014
v29.0 - July 8, 2014
v28.1 - June 30, 2014
- Improved exception handling in couch module (#7486)
- Refactor to-dpla modules into mapper classes (#6563)
- Added test for pre-1900 dates in enrich_date (#7562)
- Fixed problems with pipeline lacking "/" character (#7565)
- Fix JSON-LD
@context
issues (#6919)
- Fix enrich_date to handle dates < 1900 (#7562)
- In couch.Couch._get_bulk_download_doc(), handle empty result or missing view
- Fixed in hathi_fetcher: parsed_docs should always be a list (#7559)
- Remove global views (refs #7493)
- #7289, #7319: Add validation against MAPv3 JSON Schema and supporting views
- #3062: Add views to report missing properties
- #7493: Add global count views
- #4586: remove operations on
aggregatedCHO
in Smithsonian profile - #7520: Added parsing capabilities to enrich_earliest_date
- Reinstate human-readable byte count to bulk-file database record
Upgrading:
Run pip
to install the new packages for JSON Schema support:
$ pip install -r requirements.txt
Update the "dpla" CouchDB database's views:
$ python scripts/sync_couch_views.py dpla
- Changed database export's use of views (7155)
- Have pip use DPLA mirror of Zepheira zenpub
Upgrading: Update the "dpla" CouchDB database's views:
$ python scripts/sync_couch_views.py dpla
- Skip geocoding for generic "United States" (#6794)
- Fix ARTStor format field (#4251)
- Initial mapping for GPO and geocoding fixes
- Add automation of bulk data upload (#4776)
Upgrading:
- Create the "bulk_download" CouchDB database. For example,
$ curl -X PUT "http://user:pw@couchserver:5984/bulk_download"
- Load that database's views:
$ python scripts/sync_couch_views.py bulk_download
- Exception handling, OAI fetch; fixed MODS key error (#7524)
- Print error when posting to the database fails (#7509)
- Fixed geocode tests, because Bing response has changed
- Added uses_network attribute to tests
- Fixed OAI-PMH fetcher memory leak; added fetcher tests (#5483)
- Added "photographer" to NYPL creator terms (#7512)
- Fixed call to couch.sync_couch_views() in sync_couch_views.py
- Added movingimage (w/out space) to type config
- Updated Artstor profile to fix dataProvider, spatial
- Changed artstor_identify_object recognition of thumbnail images
- Fixed potentially undeclared variable in poll_storage
- Added exclusion of IA "collection"-type records (#6731)
- Added Cambridge Public Library to IA profile (#7502)
- Fixed test for list in dpla-list-records.listrecords.
- Lowered MWDL bulkSize parameter to 500, from 1000 (#7486)
- Modified test_all_oai_verb_fetchers to fetch only a few profiles
- Changed automated tests to retain Akara directory
- Changes for new NYPL API, incl. multithreaded fetcher (#7773)
- Changed primo-to-dpla (MWDL) to use facets/rsrctype for type (#7441)
- Fixed tests
- Added missing check_counts to poll_storage.py
- Changed view syncing to speed up save, delete, and backup (#7487)
- Changed Akara MaxServers and MaxRequestsPerServer
- Added "moving image" to type conversion keywords.
- Partial fix for 7491 ("sound recording" vs "sound")
- Fix 7491: Delete sourceResource/type if type not found and default is None
- Add enrich-type unit tests
Changes for new BPL, Digital Commonwealth sites; Geocoding fixes
- Resolve #7330: BPL spatial data, "United States" generic value
- Other geocoding improvements
- Fix #7488: BPL dataProvider: add option to pull from recordInfo/recordContentSource
- Fix #7330 and #6097: Created subject_and_spatial_transform_bpl to join spatial values on double-hyphen and add the coordinates
- Fix #7329: Iterify OAI-PMH/ListRecords/record, fix location_transform_bpl
- Added threading to enrich_records (#7282)
- Added catch for incomplete read in database export (#7155)
- Added catch for undefined sourceResource in enrich-type.
- Fixed Google thumbnail image URL in test for hathi_identify_object
Upgrading:
Add the following lines to your akara.ini file:
[Enrichment]
QueueSize=4
ThreadCount=4
These are suggested numbers for a start. You can experiment with increasing them.
- Fixed #7329 - Iterify OAI-PMH/ListRecords/record & fix location_transform_bpl
- Fixed #7256 - Smithsonian missing titles
- Resolved #7315 - Print counts and error if alert email fails
- Set blacklist for BPL metasets; lower threshold for old DC endpoint
- Removed obsolete scripts
- Added speed.py to report on elapsed time from access log
- Fixed #7197 - Improved type mapping for enrich-type module
- Refs #7169 - add
regionaldigitzationmass
to Internet Archive profile - Fix an error with the ia_fetcher
- Fixed 7183 - Create an alert if an ingestion adds, deletes, or changes a certain number of records
Notes:
- Add an Alert section to the akara.ini file with To and From parameters:
[Alert]
To=<Comma-separated email addresses to receive alert emails>
From=<Email address of alert email sender, ie no-reply@dpla>
- Fixed 7226 - Change Akara logging to produce smaller logs by default
- Fixed 6623 - SCDL - Update ingest process according to crosswalk rationalization
- Fixed 6949 - Create instructions for setting up an ingestion server from scratch
- Fixed part of #7155 - Added ret. val. checking, db export per-source file generation
Upgrading:
- Add this line to the Akara section of your akara.ini file:
LogLevel=<level>
where <level> is DEBUG or INFO (or another logging level, but those are the recommended ones)
-
Re-install Akara so that it uses the DPLA version. Do
pip uninstall akara
and thenpip install -r requirements.txt
. -
Run the usual
python setup.py install
- Fixed 7254 - Fixed class name of instantiated HathiFetcher in create_fetcher
- Fixed 6624 - UIUC (OAI_DC) - Update ingest process according to crosswalk rationalization
- Fixed 6622 - MDL - Update ingest process according to crosswalk rationalization
- Fixed 6966 - Run sitemap creation process after Week 4 ingest
- Fixed 7218 - HathiTrust: tweak to ingest script to remove double commas
- Fixed 7231 - Add database-export view to views synced by "couch" ingestion module
- Fixed 7219 - Ingestion: Handle XML parsing exception and output bad lines
- Fixed 6944 - IA - Fetch process hangs
Notes:
- Create a Sitemap section in akara.ini
[Sitemap]
SitemapURI=http://sitemaps.dp.la
- Issue 4265 - Added determination of Smithsonian type for physical format
- Fixed edan_to_dpla.transform_date exception for non-iterable dates
- Fixed 7122 - Digital Commonwealth contributing institution values gone
- Fixed 7122 - Digital Commonwealth contributing institution values gone
- Fixed 5976 - Install sitemap to allow search engines to crawl individual item pages
Notes: In the Rackspace section of akara.ini change ContainerName parameter (not the value) to DPLAContainer and add parameter SitemapContainer with value Sitemap.
- Uncomment QA views to allow building during sync method
- Fixed 6686 - ARTstor: "from" and "until" dates in set_params need to be changed
- Fixed 6208 - Rollback process should remove dashboard documents
- Fixed 5789 - Restrict ingested data to our schema to stop current bloating
- Fixed 6786 - Enrichment errors from last Smithsonian ingest
- Fixed 4675 - Digital Commonwealth - "Finding Aid" in title
- Fixed 6267 - HATHI: Permanently suppress ALL MDL records
- Fixed 6880 - MODS to DPLA and OAI/MODS to DPLA set hasView.@id incorrectly
- Fixed 6752 - Geocode module fails if wrong version of geopy is used
- Fixed 6758 - SCDL: Fix geocoding of forcibly set coordinates for South Carolina regions
- Fixed README formatting
- Fixed 6625 - IA - Update ingest process according to crosswalk rationalization
- Crosswalk fixes for NARA, Hathi, ARTstor, UIUC MARC, and PTH
- Added handling of error tag in OAI-PMH response
- Fixed Hathi thumbnail URL for HVD
- Crosswalk fixes for Smithsonian, USC, BHL, KDL and MWDL
- Hardcoded provider for Hathi
- Hardcoded general CouchDB settings
- Updated README
- Changed URL scheme to https in requirements.txt
- Fixed poll_storage
- More compare_with_schema module fixes
- Fixed compare_with_schema module
- Added stateLocatedIn enrichment for Smithsonian
- Fixed scdl_enrich_location
- Updated poll_storage to handle pipe method return tuple
- Fixed 6491 - Digital Commonwealth - Crosswalk update
- Fixed 6209 - Dashboard database should only contain item-level documents for the last 3 ingestion sequences
- Fixed 5651 - Hathi data fix post production release
- Added handling of non-string values in move_date_values module
- Fix test_parse_profiles
- Fixed 6515 - Don't send deleted records through enrichment pipeline
- Blacklisted all non-partner collections for PTH
- Fixed 6510 - NARA - Records without collections should still be ingested
- Fixed 5085 - NARA - new data set
- Fixed 6223 - IA - URLs which timeout or return a 404 response should be retried at the end of the fetch process
- Fixed 6129 - DPLA: Fix Portal to Texas History ID mapping
- Fixed 6490 - Log number of records that failed ingest
- Added retry logic for MWDL
Notes: Add LogLevel=ERROR in the CouchDb section of the akara.ini file.
- Fixed poll_storage script
- Updated set_prop to handle dictionary values
- Fixed setting of provider field in ia_to_dpla
- Added --no-backup option to save_records
- Fixed Smithsonian stateLocatedIn mapping
- Fixed 6209: Dashboard database should only contain item-level documents for the last 3 ingestion sequences
- Fixed 5789: Restrict ingested data to our schema to stop current bloating
- Fixed startkey and endkey doc._id in iterview queries
- Fixed 6225: Regenerate views on update
- Fixed 5769: Change Hathi/ UIUC mapping
- Fixed 5856: Refactor Hathi fetcher for new poll_profiles
- Fixed 5651: Hathi data fix post production release
- Fixed 5459: Reingest NYPL with new crosswalk
- Fixed 6224: Smithsonian/Global - Update fetchers to return only records and not collection information
- Updated all post/queries to CouchDB to use batch_size
Notes: Rename IterviewBatch to BatchSize in the CouchDb section of the akara.ini file and set its value to 500.
- Fixed 5855: Ingest North Carolina Digital Heritage
- Fixed USC dataProvider
- Fixed 5933: ArtStor - New Collection (SSDPLAWashington)
- Updated IAFetcher to reset page on new collection
- Fixed date transform in oai_untl_to_dpla module
- Updated couch module to not save unchanged collection documents
- Added collection statelibrarynorthcarolina to IA profile
- Fixed stateLocatedIn for various providers
- Fixed 5970: Slight change to DLG ingest script
- Fixed UVA stateLocatedIn field
- Added profile for Portal to Texas History
- Added handling of OAI-PMH/error in ListRecords response
- Added oai_untl_to_dpla module
- Updated couch.py to not iterate over all DPLA database ids
- Restored UVA fetcher unit tests
- Fixed request_more in MWDLFetcher
- Compressed fetch and enrich directories in the enrich and save scripts
- Fixed geocode unit tests
- Commented out UIUC book profile from test_all_oai_verb_fetchers test
- Fixed expected_records in IAFetcher
- Updated IAFetcher to use fetch_url method from internet_archive.py
- Fixed UVA spatial transform
- Minimized CouchDB post requests in the save_records script
- Fixed 5084: UVA - ingest uva-lib:628506
- Fixed 5909: Investigate missing UVA records
- Excluded uva1 fetcher test from Travis
- Fixed BPL description transform to handle lists with both string and dictionary values
- Updated enrich_location module to handle lists with both string and dictionary values
- Added a BPL specific description transform in oai_mods_to_dpla.py
- Updated EDANFetcher to parse Smithsonian XML files in parts
- Fixed error_msg in fetch_records script
- Updated oai.py to handle dictionary resumption_token
- Fixed ingestion document ID references in ingestion scripts
- Updated export_database script and view to use provider.name
- Added BPL profile and updated oai_mods_to_dpla to map BPL
- Fixed UIUC books dataProvider
- Updated export_database script
- Updated couch module to post records with iterview_batch as limit
- Removed log_json references in contentdm_identify_object module
- Appended to UIUC book dataProvider
- Fixed ingestion doc id references in scripts
- Fixed marc_to_dpla UIUC book provider
- Fixed overriding of OAI fetcher metadataPrefix
- Fixed UIUC book profile name field
- Fixed 4789: UIUC - Book collections ingest
- Fixed ISBN extraction in hathi_identify_object module
- Fixed 4644: Refactor poll_profiles
- Fixed 5314: Split out "Fetch" process from poll_profiles
- Fixed 5315: Split out "Enrich" process from poll_profiles
- Fixed 5316: Split out "Save" process from poll_profiles
- Fixed 5683: Hathi: Add thumbnails
- Fixed Hathi sourceResource.spatial field
- Added unit test for getting last ingestion document
- Fixed 5567: Couch module function _get_last_ingestion_doc_for does not return last ingestion document
- Fixed 5545: Update test/server_support to support Geonames token
- Fixed 4890: Ingest HathiTrust data into a dev environment
- Fixed simple service URL for geocode module
- Fixed 5507: USC: Fix Geonames URL
- Fixed 5497: USC: Add latitude check
- Fixed 5488: Hathi: Fix parsing of XML files so as not to use so much memory
- Fixed 5475: USC: Fix location coordinates
- Fixed 5465: USC: Fix dates
- Fixed 5391: USC: Better geocoding
- Fixed 5180: Change UIUC endpoint
- Fixed 5390: USC - Update endpoint URL and remove geocode from pipeline
- Fixed 5366: Update poll_storage to use couch module
- Fixed 5365: USC: Extract thumbnails
- Fixed 5382: Create script to delete all of a provider's documents
- Fixed 4890 - Ingest HathiTrust data into a dev environment
- Fixed 5009 - Ingest USC data
- Changed Virginia books profile "name" field from "virginia" to "virginia_books"
- Fixed 4807: UVA - Ingest book collection
- Fixed 4737: UVA - Ingest additional collection
- Fixed 4656: UVA - Use different domain name for ingestion
- Fixed 4723: UIUC - Add additional sets for ingestion
- Fixed 5055: Refactor the mods_to_dpla module
- Fixed 5278: David Rumsey profile should use production URL for ingest
- Fixed 5277: Artstor thumbnail parsing fails
- Fixed 5250: DLG - Thumbnail extraction fails if underscores are embedded in the item identifier part of the identifier
- Fixed 4250: Ingestions are bloating ElasticSearch schemas
- Fixed 3779: SCDL - Inconsistency in case/pluralization for object formats