Skip to content

Releases: EvolvingLMMs-Lab/lmms-eval

v0.2.4 add `generate_until_multi_round` to support interative and multi-round evaluations; add models and fix glitches

03 Oct 15:33
af395ae
Compare
Choose a tag to compare

What's Changed

  • [Fix] Fix bugs in returning result dict and bring back anls metric by @kcz358 in #221
  • fix: fix wrong args in wandb logger by @Luodian in #226
  • [feat] Add check for existence of accelerator before waiting by @Luodian in #227
  • add more language tasks and fix fewshot evaluation bugs by @Luodian in #228
  • Remove unnecessary LM object removal in evaluator by @Luodian in #229
  • [fix] Shallow copy issue by @pufanyi in #231
  • [Minor] Fix max_new_tokens in video llava by @kcz358 in #237
  • Update LMMS evaluation tasks for various subjects by @Luodian in #240
  • [Fix] Fix async append result in different order issue by @kcz358 in #244
  • Update the version requirement for transformers by @zhijian-liu in #235
  • Add new LMMS evaluation task for wild vision benchmark by @Luodian in #247
  • Add raw score to wildvision bench by @Luodian in #250
  • [Fix] Strict video to be single processing by @kcz358 in #246
  • Refactor wild_vision_aggregation_raw_scores to calculate average score by @Luodian in #252
  • [Fix] Bring back process result pbar by @kcz358 in #251
  • [Minor] Update utils.py by @YangYangGirl in #249
  • Refactor distributed gathering of logged samples and metrics by @Luodian in #253
  • Refactor caching module and fix serialization issue by @Luodian in #255
  • [Minor] Bring back fix for metadata by @kcz358 in #258
  • [Model] support minimonkey model by @white2018 in #257
  • [Feat] add regression test and change saving logic related to output_path by @Luodian in #259
  • [Feat] Add support for llava_hf video, better loading logic for llava_hf ckpt by @kcz358 in #260
  • [Model] support cogvlm2 model by @white2018 in #261
  • [Docs] Update and sort current_tasks.md by @pbcong in #262
  • fix error name with infovqa task by @ZhaoyangLi-nju in #265
  • [Task] Add MMT and MMT_MI (Multiple Image) Task by @ngquangtrung57 in #270
  • mme-realworld by @yfzhang114 in #266
  • [Model] support Qwen2 VL by @abzb1 in #268
  • Support new task mmworld by @jkooy in #269
  • Update current tasks.md by @pbcong in #272
  • [feat] support video evaluation for qwen2-vl and add mix-evals-video2text by @Luodian in #275
  • [Feat][Task] Add multi-round evaluation in llava-onevision; Add MMSearch Benchmark by @CaraJ7 in #277
  • [Fix] Model name None in Task manager, mix eval model specific kwargs, claude retrying fix by @kcz358 in #278
  • [Feat] Add support for evaluation of Oryx models by @dongyh20 in #276
  • [Fix] Fix the error when running models caused by generate_until_multi_round by @pufanyi in #281
  • [fix] Refactor GeminiAPI class to add video pooling and freeing by @pufanyi in #287
  • add jmmmu by @AtsuMiyai in #286
  • [Feat] Add support for evaluation of InternVideo2-Chat && Fix evaluation for mvbench by @yinanhe in #280

New Contributors

Full Changelog: v0.2.3...v0.2.4

v0.2.3.post1

04 Sep 15:16
9d00bfa
Compare
Choose a tag to compare

What's Changed

  • [Fix] Fix bugs in returning result dict and bring back anls metric by @kcz358 in #221
  • fix: fix wrong args in wandb logger by @Luodian in #226

Full Changelog: v0.2.3...v0.2.3.post1

v0.2.3 add language evaluations and remove registration to speedup loading tasks and models

01 Sep 11:21
30a0745
Compare
Choose a tag to compare

What's Changed

  • Update the blog link by @pufanyi in #196
  • Bring back PR#52 by @kcz358 in #198
  • fix: update from previous model_specific_prompt to current lmms_eval_kwargs to avoid warnings by @Luodian in #206
  • [Feat] SGLang SRT commands in one go, async input for openai server by @kcz358 in #212
  • [Minor] Add kill sglang process by @kcz358 in #213
  • Support textonly inference for LLaVA-OneVision. by @CaraJ7 in #215
  • Fix videomme evaluation by @zhijian-liu in #209
  • [feat] remove registeration logic and adding language evaluation tasks. by @Luodian in #218

New Contributors

Full Changelog: v0.2.2...v0.2.3

v0.2.2: add llava-onevision/mantis/llava-interleave/VILA and new tasks.

09 Aug 14:34
3f89773
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.2.2

v0.2.0.post1

23 Jun 06:02
8f9d620
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.0...v0.2.0.post1

v0.2.0

12 Jun 19:15
ed88068
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.1.0...v0.2.0

LMMs-Eval 0.1.0.dev

12 Mar 07:17
2dd12d1
Compare
Choose a tag to compare

[Enhancement & Fix] Add Tensor Parallelism and Fix LLaVA-W/MMBench's issue.

LMMs-Eval 0.1.0 Release

08 Mar 07:35
39d1dad
Compare
Choose a tag to compare

Currently support 40+ evaluation datasets with 60+ subsets/variants, and 5 commonly used LMMs.