Holistic Evaluation of Language Models

This is a fork of https://github.com/stanford-crfm/helm which we used for the 2023 NeurIPS LLM efficiency competition https://llm-efficiency-challenge.github.io/

It was private because the tasks we were testing on had to be undisclosed to the final participants and included

If you're interested in using these tasks in your own work please feel free to copy paste

Name		Name	Last commit message	Last commit date
Latest commit History 4,559 Commits
.github/workflows		.github/workflows
configs		configs
docs		docs
neurIPS_eval_scripts		neurIPS_eval_scripts
scripts		scripts
src		src
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
build_open_spec.py		build_open_spec.py
build_secret_run_spec.py		build_secret_run_spec.py
demo.py		demo.py
do-run.sh		do-run.sh
install-dev.sh		install-dev.sh
install-helm-local.sh		install-helm-local.sh
json-urls-root.js		json-urls-root.js
json-urls.js		json-urls.js
mkdocs.yml		mkdocs.yml
pre-commit.sh		pre-commit.sh
private_run_specs.conf		private_run_specs.conf
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.cfg		setup.cfg

Provide feedback