GuideLLM v0.2.0 - CI/CD Finalization, Documentation Expansion, and Backend Support
Past due by about 1 month
50% complete
This release of GuideLLM focuses on finalizing the CI/CD pipelines for full automation, expanding documentation for easier access, enhancing user experience through a web-based report UI, and adding backend support for benchmarking across different hardware configurations.
Key Features
- CI/CD Finalization: Finalize and expand GitHub Actions CI/CD pipeline…
This release of GuideLLM focuses on finalizing the CI/CD pipelines for full automation, expanding documentation for easier access, enhancing user experience through a web-based report UI, and adding backend support for benchmarking across different hardware configurations.
Key Features
- CI/CD Finalization: Finalize and expand GitHub Actions CI/CD pipelines to enable automated builds, releases, testing, and quality assurance.
- Documentation Expansion: Expand the documentation and host it on a dedicated webpage, covering CLI, examples, and architecture for easy discovery.
- GuideLLM HTML Report UI: Include an HTML report UI for easier visualization and consumption of benchmark results.
- vLLM Python Backend Integration: Integrate the vLLM backend for direct benchmarking, including system hardware reporting.
- Standard Dataset Profiles: Standardized Dataset profiles make it easy to run inference perforamnce benchmarks for your expected token input/output profiles across key LLM use cases.
- Transformers/Compressed Tensors Backend Support: Add benchmarking support for transformers and compressed tensors with detailed hardware reporting.
- vLLM OpenAI Server Expansion: Expand OpenAI server hardware querying to surface system hardware and model specifications.
- CLI Output Format Enhancements: Expand and simplify CLI output format options, with support for CSV and more consistent reporting.
- Dataset Analysis Pathways: Enable detailed analysis of one or more datasets within the GuideLLM framework.
- Model Analysis Pathways: Add support for analyzing one or more models, including accuracy evaluations and detailed reports.
- Accuracy Evaluation Enablement: Add infrastructure for supporting common accuracy eval pathways initially targeting LM Eval harness.
- Single loop dataset benchmarks: Add support for looping through a dataset once for a given benchmark.
- Benchmark warmup: Add support for warmup runs to be done before counting performance benchmarks.
Expected Improvements
- End-to-End Testing Expansion: Enable and expand end-to-end testing for benchmarking workflows.
- Integration Testing: Expand integration tests to ensure seamless performance across various backend and CLI pathways.
Expected Bug Fixes
Milestones & Timeline
- Development: Sept 01, 2024 - Sept 30, 2024
- QA: Sept 30, 2024 - TBD
- Feature Freeze: ~Sept 30, 2024
- Documentation Finalization: ~Sept 30, 2024
- Release: ~Sept 30, 2024
Testing Requirements
- Unit Tests: All newly implemented features must have accompanying unit tests that ensure full coverage. Code coverage should remain at 85% or higher.
- Integration Tests: Ensure all integrations with vLLM, DeepSparse, and other backends are fully tested, covering edge cases and normal workflows.
- End-to-End (E2E) Tests: Run complete e2e tests for all CLI workflows, including benchmarks, report generation, dataset/model analysis, and output formats.
- Manual Testing: QA must conduct manual testing on all core features, including the new HTML report UI and dataset analysis workflows, ensuring usability and functionality.
Documentation Requirements
- Docs site released.
- Supporting docs/guides for new features including model analysis, dataset analysis, accuracy evals, output formats, HTML report, CI/CD flows.
- Docs expansion with CLI guide, examples guide, architecture documentation, API docs.