Skip to content

GuideLLM v0.2.0 - CI/CD Finalization, Documentation Expansion, and Backend Support

Past due by about 1 month 50% complete

This release of GuideLLM focuses on finalizing the CI/CD pipelines for full automation, expanding documentation for easier access, enhancing user experience through a web-based report UI, and adding backend support for benchmarking across different hardware configurations.

Key Features

  • CI/CD Finalization: Finalize and expand GitHub Actions CI/CD pipeline…

This release of GuideLLM focuses on finalizing the CI/CD pipelines for full automation, expanding documentation for easier access, enhancing user experience through a web-based report UI, and adding backend support for benchmarking across different hardware configurations.

Key Features

  • CI/CD Finalization: Finalize and expand GitHub Actions CI/CD pipelines to enable automated builds, releases, testing, and quality assurance.
  • Documentation Expansion: Expand the documentation and host it on a dedicated webpage, covering CLI, examples, and architecture for easy discovery.
  • GuideLLM HTML Report UI: Include an HTML report UI for easier visualization and consumption of benchmark results.
  • vLLM Python Backend Integration: Integrate the vLLM backend for direct benchmarking, including system hardware reporting.
  • Standard Dataset Profiles: Standardized Dataset profiles make it easy to run inference perforamnce benchmarks for your expected token input/output profiles across key LLM use cases.
  • Transformers/Compressed Tensors Backend Support: Add benchmarking support for transformers and compressed tensors with detailed hardware reporting.
  • vLLM OpenAI Server Expansion: Expand OpenAI server hardware querying to surface system hardware and model specifications.
  • CLI Output Format Enhancements: Expand and simplify CLI output format options, with support for CSV and more consistent reporting.
  • Dataset Analysis Pathways: Enable detailed analysis of one or more datasets within the GuideLLM framework.
  • Model Analysis Pathways: Add support for analyzing one or more models, including accuracy evaluations and detailed reports.
  • Accuracy Evaluation Enablement: Add infrastructure for supporting common accuracy eval pathways initially targeting LM Eval harness.
  • Single loop dataset benchmarks: Add support for looping through a dataset once for a given benchmark.
  • Benchmark warmup: Add support for warmup runs to be done before counting performance benchmarks.

Expected Improvements

  • End-to-End Testing Expansion: Enable and expand end-to-end testing for benchmarking workflows.
  • Integration Testing: Expand integration tests to ensure seamless performance across various backend and CLI pathways.

Expected Bug Fixes

Milestones & Timeline

  • Development: Sept 01, 2024 - Sept 30, 2024
  • QA: Sept 30, 2024 - TBD
  • Feature Freeze: ~Sept 30, 2024
  • Documentation Finalization: ~Sept 30, 2024
  • Release: ~Sept 30, 2024

Testing Requirements

  • Unit Tests: All newly implemented features must have accompanying unit tests that ensure full coverage. Code coverage should remain at 85% or higher.
  • Integration Tests: Ensure all integrations with vLLM, DeepSparse, and other backends are fully tested, covering edge cases and normal workflows.
  • End-to-End (E2E) Tests: Run complete e2e tests for all CLI workflows, including benchmarks, report generation, dataset/model analysis, and output formats.
  • Manual Testing: QA must conduct manual testing on all core features, including the new HTML report UI and dataset analysis workflows, ensuring usability and functionality.

Documentation Requirements

  • Docs site released.
  • Supporting docs/guides for new features including model analysis, dataset analysis, accuracy evals, output formats, HTML report, CI/CD flows.
  • Docs expansion with CLI guide, examples guide, architecture documentation, API docs.
Loading