An explainable agentic workflow as analysis copilot for LHCb OpenData and dissemination of best-practice methods for the apt evaluation of systematic uncertainties.
- Prerequisites
python3 -m pip install --user pipx
python3 -m pipx ensurepath
source ~/.bashrc # or any other default shell
- Install Poetry (package manager):
pipx install poetry
- Clone the repository and install dependencies
git clone https://github.com/reallyblaised/beauty-in-stats.git
cd beauty-in-stats
poetry install
- Verify the correct installation of the Poetry environment
poetry env list
>>> beautyinstats-U3Bi8mYg-py3.10 (Activated)
- Load the Poetry shell to load the library environment
poetry self add poetry-plugin-shell
poetry shell
# Get all papers
build-corpus
# Get specific number of papers
build-corpus --max-papers 10
# Get papers from date range
build-corpus --start-date 2020-01-01 --end-date 2023-12-31
# Show additional logging
build-corpus --verbose
Downloaded files are organized in the data/
directory:
data/pdfs/: PDF versions of papers
data/source/: LaTeX source files
data/expanded_tex/: Expanded LaTeX files
data/abstracts/: Paper abstracts
- Python ≥ 3.9
latexpand
(for processing LaTeX sources)