Releases: aws/fmeval
Releases · aws/fmeval
v1.2.2
What's Changed
- build: add fmeval support for python 3.11 and 3.12 by @Satish615 in #339
- build: bump up fmeval version to 1.2.2 by @Satish615 in #339
Full Changelog: v1.2.1...v1.2.2
v1.2.1
What's Changed
- build(deps) bump nltk from 3.8.1 to 3.9.1 by @danielezhu in #327
- build: bump fmeval version to 1.2.1 by @shrestha-bikash in #329
Full Changelog: v1.2.0...v1.2.1
v1.2.0
What's Changed
- refactor:
SummarizationAccuracyMetrics
transform to handle multiple target outputs more efficiently by @kirupang-code in #317 - feat: Add
BERT_SCORE
toQAAccuracy
and update unit/integration tests by @kirupang-code in #314 - build(deps): pinned nltk version to address build failure by @kirupang-code in #323
- feat: add
BERT_SCORE
toQAAccuracySemanticRobustness
by @kirupang-code in #315 - [Fix] Updated notebook rendering as suggested in issue 316 by @polaschwoebel in #320
- chore: update urllib3 version by @kirupang-code in #318
- build(deps): bump zipp from 3.19.0 to 3.19.1 by @dependabot in #304
- build(deps): bump aiohttp from 3.9.5 to 3.10.2 by @dependabot in #321
- doc: delete ROUGE/METEOR score from QAAccuracy documentation by @kirupang-code in #325
- build: bump fmeval version to 1.2.0 by @kirupang-code in #326
Full Changelog: v1.1.0...v1.2.0
v1.1.0
What's Changed
- chore: add support for proprietary models by @shrestha-bikash in #276
- feat: allow placeholder dict for prompt composer by @xiaoyi-cheng in #273
- feat: add target_context to dataset columns by @oyangz in #266
- feat: add SaveStrategy to allow flexibility in saving localized evaluation outputs by @keerthanvasist in #281
- feat: modify GeneratePrompt transform to take placeholder_dict by @xiaoyi-cheng in #288
- Support multiple data configs in evaluate by @athewsey in #283
- Fix comparisons for string enumerations by @athewsey in #282
- feat: support embedding model runner by @xiaoyi-cheng in #293
- build(deps): bump tornado from 6.4 to 6.4.1 by @dependabot in #287
- build(deps): bump urllib3 from 1.26.18 to 1.26.19 by @dependabot in #292
- feat: add error field in EvalScore by @xiaoyi-cheng in #297
- build(deps-dev): bump pdoc from 14.5.0 to 14.5.1 by @dependabot in #296
- feat: add validate_prompt_template util by @xiaoyi-cheng in #300
- fix: register placeholder_to_record_key in GeneratePrompt transform by @xiaoyi-cheng in #301
- build(deps): bump certifi from 2024.2.2 to 2024.7.4 by @dependabot in #303
- feat: add
quasi_exact_inclusion
metric to factual knowledge; changefactual_knowledge
score name toexact_inclusion
by @kirupang-code in #302 - fix: update how default payloads get extracted from model spec by @danielezhu in #309
- feat: update context to take lists and rename context field by @oyangz in #305
- feat: add configurable param logical_operator (OR/AND) to factual knowledge by @kirupang-code in #307
- feat: update s3 data source for us-isof partition by @oyangz in #311
- chore: rename factual knowledge scores by @danielezhu in #312
- build: bump fmeval version to 1.1.0 by @danielezhu in #313
New Contributors
- @shrestha-bikash made their first contribution in #276
- @athewsey made their first contribution in #283
- @kirupang-code made their first contribution in #302
Full Changelog: v1.0.3...v1.1.0
v1.0.3
What's Changed
- docs: update telemetry-related info in README and docstrings by @danielezhu in #265
- feat: fetch log probability jmespath from JS metadata by @keerthanvasist in #267
- build(deps): bump jinja2 from 3.1.3 to 3.1.4 by @dependabot in #272
- build(deps): bump tqdm from 4.66.2 to 4.66.3 by @dependabot in #271
- fix: update pinned sagemaker python sdk version and get_user_agent_extra util function by @danielezhu in #274
- build: bump fmeval version to 1.0.3 by @danielezhu in #275
Full Changelog: v1.0.2...v1.0.3
v1.0.2
What's Changed
- chore: simplify botocore/boto3-related util code by @danielezhu in #256
- docs: create Github Actions workflow for generating docs via pdoc by @danielezhu in #260
- test: update matplotlib version and figure cell init test by @oyangz in #259
- chore: update lib versions based on dependabot recommendation by @keerthanvasist in #258
- docs: add syntax highlighting by @connorads in #261
- feat: add fmeval-specific user agent header to botocore config for telemetry purposes by @danielezhu in #262
- fix: fix patching in unit tests by @danielezhu in #264
- build: bump fmeval version to 1.0.2 by @danielezhu in #263
New Contributors
- @connorads made their first contribution in #261
Full Changelog: v1.0.1...v1.0.2
v1.0.1
v1.0.0
What's Changed
- chore: Update readme with installation tips by @danielezhu in #181
- fix: readme by @polaschwoebel in #196
- docs: add troubleshooting item for OOM errors by @keerthanvasist in #198
- fix: add data for example notebook by @polaschwoebel in #203
- fix: update terminology in README and source code by @danielezhu in #208
- feat: implement Transform and TransformPipeline classes for modular redesign by @danielezhu in #209
- feat: implement helper models used by evaluation algorithms by @danielezhu in #210
- feat: implement transforms for summarization accuracy metrics by @danielezhu in #211
- docs: update README to include information about Windows support by @danielezhu in #213
- fix: update the default prompt templates for the built-in datasets by @jmikko in #212
- feat: update implementation of SummarizationAccuracy to use Transform-based approach by @danielezhu in #214
- feat: implement transforms for semantic perturbations by @danielezhu in #215
- refactor: update Transform API by @danielezhu in #216
- feat: add prompt template to report by @oyangz in #217
- feat: update various transforms to accept multiple input keys by @danielezhu in #218
- chore: change PromptComposer.PLACEHOLDER from "feature" to "model_input" by @danielezhu in #219
- feat: update GetModelResponse transform to support multiple model invocations on the same input by @danielezhu in #220
- feat: update implementation of GeneralSemanticRobustness to use Transform-based approach by @danielezhu in #222
- fix: update GetModelResponse transform to work with any ModelRunner by @danielezhu in #228
- fix: restore semantic perturbation constants to their original values by @danielezhu in #229
- feat: example notebook for comparative plotting by @polaschwoebel in #223
- refactor: move repeated code in evaluate method into util functions and simplify the EvalAlgorithmInterface method signatures by @danielezhu in #224
- feat: updated docstrings by @polaschwoebel in #225
- chore: restore evaluate_sample and evaluate signatures in EvalAlgorithmInterface by @danielezhu in #231
- refactor: update evaluate_dataset to take in a dataset instead of dataset config by @danielezhu in #232
- feat: update implementation of SummarizationAccuracySemanticRobustness to use Transform-based approach by @danielezhu in #233
- feat: update implementation of QAAccuracy to use Transform-based approach by @danielezhu in #234
- feat: update implementation of QAAccuracySemanticRobustness to use Transform-based approach by @danielezhu in #235
- feat: update implementation of ClassificationAccuracy to use Transform-based approach by @danielezhu in #236
- feat: update implementation of ClassificationAccuracySemanticRobustness to use Transform-based approach by @danielezhu in #237
- Updating third party attributions by @malhotra18 in #239
- feat: update implementation of FactualKnowledge to use Transform-based approach by @danielezhu in #238
- feat: update implementation of PromptStereotyping to use Transform-based approach by @danielezhu in #240
- fix: set default region for boto3 client to access built-in datasets by @oyangz in #242
- feat: update implementation of Toxicity to use Transform-based approach by @danielezhu in #241
- build: bump fmeval version to 1.0.0 by @danielezhu in #243
New Contributors
Full Changelog: v0.4.0...v1.0.0
v0.4.0
What's Changed
- feat: make sm/br runners easier to subclass by @franluca in #159
- chore: update example notebooks to pip install the fmeval package by @danielezhu in #158
- fix(pre-launch science review): correcting categories for toxicity da… by @franluca in #136
- fix: replace add_column with map in _generate_prompt_column by @danielezhu in #161
- Update f1 score in QA accuracy eval by @bilalaws in #166
- feat: added the precision and recall metrics for QA accuracy by @bilalaws in #157
- Strip text when computing precision and recall. by @bilalaws in #172
- fix: create single source of truth for dataset column names by @danielezhu in #171
- fix: update Ray to version 2.9.0 by @danielezhu in #173
- chore: update devtool all to install first, lint after by @keerthanvasist in #174
- feat: stringify dataset column contents during data loading by @danielezhu in #168
- fix: unblock release pipeline by @xiaoyi-cheng in #176
- fix: update scores description by @xiaoyi-cheng in #177
- fix: split text by any newline and spaces by @franluca in #178
- fix: load detoxify model from state dict and upgrade transformers version by @oyangz in #180
- fix: Fix example notebook unit tests by @danielezhu in #188
- chore: Update Ray to 2.9.1 by @danielezhu in #189
- chore: remove xsum dataset and update gigaword description by @xiaoyi-cheng in #191
- chore: remove XSUM dataset from example notebook and integration tests by @danielezhu in #192
- feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity by @bilalaws in #184
- fix: add bert_score_dissimilarity description by @oyangz in #193
- fix: Toxicity evaluate_sample error message by @xiaoyi-cheng in #185
- build(deps): bump aiohttp to fix vulnerability by @xiaoyi-cheng in #194
- build: bump fmeval version to 0.4.0 by @xiaoyi-cheng in #195
New Contributors
Full Changelog: v0.3.0...v0.4.0
v0.3.0
What's Changed
- fix: add proper capitalization for 'SageMaker' by @oyangz in #153
- fix: fix f1_score by @xiaoyi-cheng in #152
- fix: remove s3fs from data-loading code and use boto3 instead by @danielezhu in #155
- feat: support inference component by @xiaoyi-cheng in #156
Full Changelog: v0.2.1...v0.3.0