All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog and this project adheres to Semantic Versioning.
- Restructure package management, moving dependencies to built-in packages (#442)
- Relay prep requirements (#529)
- Add GitHub SearchSource (#468), Unpaywall SearchSource (#469), SpringerLink SearchSource (#466), OSF SearchSource (#471)
- Refactor other SearchSources
- Replace dacite by pydantic
- Stop Docker containers
- CLI: option to add packages interactively
- Testing and bugfixes in built-in packages (paper-md, files_dir, aisel)
- Update docs (add asciinema demonstration)
- Extend documentation (package development, package summaries, asciinema demo)
- Bugfixes and codebase improvements (e.g., package management and discovery, closing sqlite connections)
- Reduce dependencies (e.g., levenshtein, PyPDF2, pdfminer, daff, psutil)
- Refactor colrev.bibliography_export (add writers)
- Extend tests: cover MacOS and Python 3.12
- Remove unnecessary options (e.g., init --local_pdf_collection)
- Add and test support for GitHub codespaces
- Update CoLRev packages (including interfaces, development docs etc.)
- Refactoring (local-index)
- Implement json-loader
- Make ui_web (dash, blinker) optional to prevent errors in WSL
- Bugfixes
- Refactor and test (dataset, records, provenance, local_index)
- Extract package_manager into a separate internal package
- Use bib-dedupe for matching (instead of simple similarities)
- Update docs
- Add linter
colrev_records_variable_naming_convention
- Test coverage increased from 71% to 80%
- Split
records
,dataset
, createdrecords
package. - Extracted
process
as a separate package. - Implemented loaders as a separate package, created a standard interface. SearchSources now create the specific mapping of IDs, entrytypes and fields.
- Moved field standardization from
load
to SearchSources. - Extended use of constants
- SearchSourceInterface: renamed
run_search
tosearch
, preferprep_link_md
overget_masterdata
- Renamed and refactored
GeneralOriginFeed
toSearchAPIFeed
- Pass record objects instead of dicts (in
local_index
in particular) - Replaced unnecessary keyword arguments by positional arguments
- Moved
zotero_translation_service
tobibliography_export
package - Consolidated code for reference parsing in
tei_parser
- Upgraded Grobid to 0.8.0
- Removed dead code
- Dropped
INCONSISTENT_WITH_DOI_METADATA
transitions
dependency
- Do not require review_manager for
colrev env -i
- Fixed
status_stats
, including special cases. - Repository registration: resolve() and absolute() path
- Separate PDF quality model (#268)
download_from_website
pdf-get package- Separate loader utilities for nbib, ris, bib
- SearchSources: SemanticScholar (#288), Arxiv (#203)
- Constants module for Fields, ENTRYTYPES, etc.
- CEP003 for SearchSources
- New default dedupe package based on bib-dedupe
- Colrev pandas for Jupyter notebooks
- GitHub actions: pip-install test, make documentation
- Integrated
colrev.resolve_crossrefs
intoload_utils_bib.py
- Defect codes can be ignored based on the
IGNORE:
prefix (#269) - Documentation for setup (VM, MacOS, WSL)
- Revised interfaces for SearchSources
- Integrated: pdf_dir + video_dir > files_dir
- poetry extras
- Backward search: export of parameters and expected sample sizes
- Replace thefuzz witz rapidfuzz
- Package based on dedupe-io, including incompatible dependencies
- Crossref resolution package (integrated in bib-loader)
- Removed unstable test case
- GitHub actions for CoLRev updates now install with Poetry because the fixed dependencies are more stable compared to pip installation
- paper_md: export BibTeX file and replace keys containing
.
to prevent pandoc error
- SearchTypes: API, TOC, MD are added, PDFS is replaced by FILES.
- SearchTypes are explained in the docs.
- Package documentation is imported to docs.
- colrev.pdfs_dir and colrev.video_dir are integrated into colrev.files_dir.
- SearchSources: SYNERGY datasets, OpenAlex, ERIC, IEEEXplore, ArXiv
- JournalRankings: index, prep, and prescreen
- CoLRev shell via cli-repl (
colrev shell
) - prep operation: pause and resume
- Dashboard overview of the sample and project status
- Extended tests, updated documentation (especially for extension development)
- GitHub workflows to update dependencies (poetry update)
- Ruff linter
- Load: ris/csv/... files are loaded directly (without creating intermediate BibTeX file)
- Introduced namespaced fields (e.g.,
colrev.pubmed.pubmedid
instead ofpubmedid
) - Extracted quality checks to separate Quality Model
- Docs: instructions for development setup
- Code quality improvements (codacy)
- colrev-asreview: extracted to separate package
- watchdog-based service
- Introduced namespaced fields (e.g.,
colrev.pubmed.pubmedid
instead ofpubmedid
).
- Updated colrev-asreview dependency (PyPI instead of GitHub)
- Integrated
load
intoSearchSource
. Removedload_conversion
endpoint:settings.json
,packages
,interface
etc.
- The
quality_model
was created to check for quality defects - The
auto_upgrade
flag allows users to enable/disable automated upgrades - All-contributors bot to acknowledge contributions to CoLRev
- Implemented OpenLibrary as a SearchSource
- Pylint check for direct assignment of colrev_status
- Test battery for built-in SearchSources (heuristics, load, prep)
- Backward-search comparison with OpenCitations data
- Refactored
language_service
- Refactored the tests (
conftest.py
now provides thebase_repo_review_manager
fixture) - Changed pdf-hash (pdf to image) from poppler to mupdf for cross-platform compatibility (
cpid1
->cpid2
) - Local settings changed from yaml to json
- Quality defects (colrev_masterdata_provenance notes) change
- The
colrev.global_ids_consistency_check
prep-endpoint is removed (integrated into the quality model) - Individual quality checks can be disabled through the
prep/defects_to_ignore
settings - Update the Github action workflows in CoLRev repositories
timeout-decorator
dependency (for better compatibility with MacOS)- Docker image
pdf-hash-service
(replaced by mupdf) - Redundant fields for the backward search are removed (
cited_by_file
andcited_by_id
)
- Documentation: typos and inconsistencies
- Codacy issues and refactored complex files 1
- Windows paths in
iter_commit
(git history)
- Implemented new quality model
- Quality defects (colrev_masterdata_provenance notes) change
- The
colrev.global_ids_consistency_check
prep-endpoint is removed (integrated into the quality model) - Individual quality checks can be disabled through the
prep/defects_to_ignore
settings
- Redundant fields for the backward search are removed (
cited_by_file
andcited_by_id
)
- CoLRev pdf IDs are now based on the mupdf library
- Fix InvalidGitRepositoryError (raised upon status in empty directories)
- Update the Github action workflows in CoLRev repositories
- Add auto-upgrade flag to settings
- Unit tests: increased test coverage to 70%, added Github actions matrix tests across OS and Python versions
- Completed OpenSSF Best Practices checks (1)
- Added forward and backward searches based on OpenCitations
- Moved documentation to readthedocs and revised documentation
- Added dependabot and pre-commit.ci: automated code and secrity checks
- Added support for Github actions, distinguishing packages that are supported in ci-environments (
ci_supported
flag) - Added Pubmed API searches and metadata preparation support
- Option to initialize and run CoLRev repositories without requiring Docker
- Overview video presented at ESMARConf2023 1
- CITATION.cff and Zenodo
- API-searches for the AIS eLibrary
- Numerous modifications based on the user tests
- Replaced OpenSearch with sqlite
- SearchSource interface:
run_search
andadd_package
are now mandatory - Documentation review, including detailed information on development status
- Consistent setup of Github actions (test, publish to PyPI)
- Built-in packages renamed from
colrev_built_in
tocolrev
- Data package
manuscript
renamed topaper_md
- Simplified upgrade operation and activated upgrades per default
- Extracted and refactored language-service
- Several bugfixes
- Changed package prefix from
colrev_built_in
tocolrev
- Add retrieve and pdfs as high-level operations
- Metadata preparation can add records to separate origin feeds
- Initial package manager functionality (registering packages and displaying them in the docs)
- Search: update of records and propagation of changes
- Several SearchSources (including SearchSource query validation)
- Revisions of CLI (verbose mode, user feedback)
- Colrev merge (reconciliation coding when merging git branches)
- dedupe --merge/--unmerge
- Integrated colrev pre-commit hooks
- PRISMA diagram (data endpoint)
- Obsidian (data endpoint)
- Preparation: not-in-toc exception/warning
- Setup of pytests
- Curated records are now explicitly identified through curation_IDs
- Revise colrev validate (commits, users, properties)
- Detailed advisor (using get_advice() for data endpoints)
- Performance improvements and simplification of status (cli)
- Moved correction functionality to SearchSources (refactored correction path)
- Preparation: simplified preparation rounds (default settings)
- Retrieve TEIs through local_index (if available) instead of recreating it
- Replace pathos by Threadpool
- Revise the documentation
- Revise and extend exceptions
- Remove persistent colrev-ids
- Remove realtime review
- Dependencies ansiwrap and p-tqdm
- **kwargs calls in ReviewManager
- Indexing of non-curated records
- Address special cases in dedupe (active learning)
- Web-based editor for project settings
- Comprehensive architecture refactoring
- Conformance with pylint, mypy, flake8
- Introduced packages
- Updated file and directory structure
- Documentation of modules, classes, and methods
- Github-pages as a data package_endpoint
- Renamed from colrev_core to colrev (integrated cli)
- Switch to poetry for dependency management
- Renamed scripts to package_endpoints
- PDF-hash generation based on Docker to avoid platform dependency issues
- Switch to Jinja templates (instead of concatenating multiple strings)
- Concurrent request session handling
- StatusStats calculations
- Push/pull (including corrections), sync, validate, service operations
- Data provenance model (colrev_data_provenance, colrev_masterdata_provenance)
- Extensible endpoints (search, prep, prescreen, pdf-get, pdf-prep, screen, data)
- Prescreen scope
- Improvements: prep, dedupe operations
- Performance improvements (e.g., status, bibtexparser > pybtex)
- Extended Record class (e.g., merge and fuse_best_fields)
- LocalIndex: Elasticsearch to Opensearch
- Dedupe: testing and parameter optimization (option to prevent same-source merges)
- Settings.json and validation
- Updated documentation
- Testing and refactoring (e.g., for Windows, prefer keyword arguments in functions, python package type information)
- Extract functionality: ReviewDataset, Process
- Developed LocalIndex, EnvironmentManager, OpenSearch
- Curation model, including Resource installation and a "correction path"
- Search operation (reintegrating paper_feed and local_paper_index)
- Prep exclusion based on languages
- Object-oriented refactoring of the whole codebase
- Use Zotero translators (instead of bibutils) for imports
- Duplicate identification (add FP safeguards based on LocalIndex, add a procedure for small samples)
- Consistent PDF path handling
- Structured data extraction based on csv
- Loggers
- Performance issues in prep and status
- Introduced ReviewManager and integrated hooks/checks
- Fetch metadata from Open Library
- Required fields for misc
- Information on needs_manual_preparation (man_prep_hints)
- Activated mypy hooks
- Introduced custom load scripts
- Documentation
- LocalIndex: hash-table implementation for indexing and retrieval
- Dedupe: based on active learning (dedupe-io)
- Improved batches
- Pass records instead of BibDatabase
- PDF prep and longer pdf hashes
- CLI: now in separate colrev repository
- Initializing repositories
- Backward search adds two entries to search_details
- Logging (reinitialize after batches/commits)
- Status model (rev_status, md_status, pdf_status)
- Implemented cli interface
- Import formats (bib, ris, endn, pdf, text list of references)
- Docker services for import, ocr, building the paper etc.
- Metadata repositories for record preparation (crossref, dblp, semantic scholar)
- PDF preparation (OCR, metadata validation)
- Commit message reporting
- Check and validation of iteration completeness
- Support for building papers based on pandoc
- Integrated review process status (including prescreen, screen inclusion vs exclusion) in the references.bib
- Renamed scripts and cli entrypoints
- Refactored code
- Tracing from hash_id to origin links
- Extended and refactored pre-commit hooks
- R scripts for sample statistics (the goal is to implement them in Python)
- hash_id function, trace_entry, trace_hash_id
- Bugs in
analysis/combine_individual_search_results.py
and inanalysis/acquire_pdfs.py
- Catch exceptions and check bad responses in
analysis/acquire_pdfs.py
- Bug in git modification check for
references.bib
inanalysis/utils.py
- Exception in
anaylsis/screen_2.py
(IndexError) - Global constant conflict with
analysis/entry_hash_function.py
(nameparser.config/CONSTANTS)
- First version of the pipeline, including
status
,reformat_bibliography
,trace_entry
,trace_hash_id
,combine_individual_search_results
,cleanse_records
,screen_sheet
,screen_1
,acquire_pdfs
,screen_2
,data_sheet
anddata_pages
- Environment setup including
Dockerfile
andMakefiles