Skip to content

Releases: aws/fmeval

v1.2.2

14 Jan 18:57
44c51d4
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.2.1...v1.2.2

v1.2.1

17 Oct 18:29
8d1897e
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.2.0...v1.2.1

v1.2.0

23 Aug 18:09
417bdbb
Compare
Choose a tag to compare

What's Changed

Full Changelog: v1.1.0...v1.2.0

v1.1.0

18 Jul 23:23
2c6a402
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v1.0.3...v1.1.0

v1.0.3

10 May 22:57
693ca6e
Compare
Choose a tag to compare

What's Changed

  • docs: update telemetry-related info in README and docstrings by @danielezhu in #265
  • feat: fetch log probability jmespath from JS metadata by @keerthanvasist in #267
  • build(deps): bump jinja2 from 3.1.3 to 3.1.4 by @dependabot in #272
  • build(deps): bump tqdm from 4.66.2 to 4.66.3 by @dependabot in #271
  • fix: update pinned sagemaker python sdk version and get_user_agent_extra util function by @danielezhu in #274
  • build: bump fmeval version to 1.0.3 by @danielezhu in #275

Full Changelog: v1.0.2...v1.0.3

v1.0.2

25 Apr 18:53
c76b923
Compare
Choose a tag to compare

What's Changed

  • chore: simplify botocore/boto3-related util code by @danielezhu in #256
  • docs: create Github Actions workflow for generating docs via pdoc by @danielezhu in #260
  • test: update matplotlib version and figure cell init test by @oyangz in #259
  • chore: update lib versions based on dependabot recommendation by @keerthanvasist in #258
  • docs: add syntax highlighting by @connorads in #261
  • feat: add fmeval-specific user agent header to botocore config for telemetry purposes by @danielezhu in #262
  • fix: fix patching in unit tests by @danielezhu in #264
  • build: bump fmeval version to 1.0.2 by @danielezhu in #263

New Contributors

Full Changelog: v1.0.1...v1.0.2

v1.0.1

17 Apr 17:44
91e675b
Compare
Choose a tag to compare
  • fix: fix s3 uri validation for built-in datasets
  • docs: update README with details on contributing new eval algos
  • fix: update output record key validation logic in validate_call

v1.0.0

29 Mar 00:01
e893f70
Compare
Choose a tag to compare

What's Changed

  • chore: Update readme with installation tips by @danielezhu in #181
  • fix: readme by @polaschwoebel in #196
  • docs: add troubleshooting item for OOM errors by @keerthanvasist in #198
  • fix: add data for example notebook by @polaschwoebel in #203
  • fix: update terminology in README and source code by @danielezhu in #208
  • feat: implement Transform and TransformPipeline classes for modular redesign by @danielezhu in #209
  • feat: implement helper models used by evaluation algorithms by @danielezhu in #210
  • feat: implement transforms for summarization accuracy metrics by @danielezhu in #211
  • docs: update README to include information about Windows support by @danielezhu in #213
  • fix: update the default prompt templates for the built-in datasets by @jmikko in #212
  • feat: update implementation of SummarizationAccuracy to use Transform-based approach by @danielezhu in #214
  • feat: implement transforms for semantic perturbations by @danielezhu in #215
  • refactor: update Transform API by @danielezhu in #216
  • feat: add prompt template to report by @oyangz in #217
  • feat: update various transforms to accept multiple input keys by @danielezhu in #218
  • chore: change PromptComposer.PLACEHOLDER from "feature" to "model_input" by @danielezhu in #219
  • feat: update GetModelResponse transform to support multiple model invocations on the same input by @danielezhu in #220
  • feat: update implementation of GeneralSemanticRobustness to use Transform-based approach by @danielezhu in #222
  • fix: update GetModelResponse transform to work with any ModelRunner by @danielezhu in #228
  • fix: restore semantic perturbation constants to their original values by @danielezhu in #229
  • feat: example notebook for comparative plotting by @polaschwoebel in #223
  • refactor: move repeated code in evaluate method into util functions and simplify the EvalAlgorithmInterface method signatures by @danielezhu in #224
  • feat: updated docstrings by @polaschwoebel in #225
  • chore: restore evaluate_sample and evaluate signatures in EvalAlgorithmInterface by @danielezhu in #231
  • refactor: update evaluate_dataset to take in a dataset instead of dataset config by @danielezhu in #232
  • feat: update implementation of SummarizationAccuracySemanticRobustness to use Transform-based approach by @danielezhu in #233
  • feat: update implementation of QAAccuracy to use Transform-based approach by @danielezhu in #234
  • feat: update implementation of QAAccuracySemanticRobustness to use Transform-based approach by @danielezhu in #235
  • feat: update implementation of ClassificationAccuracy to use Transform-based approach by @danielezhu in #236
  • feat: update implementation of ClassificationAccuracySemanticRobustness to use Transform-based approach by @danielezhu in #237
  • Updating third party attributions by @malhotra18 in #239
  • feat: update implementation of FactualKnowledge to use Transform-based approach by @danielezhu in #238
  • feat: update implementation of PromptStereotyping to use Transform-based approach by @danielezhu in #240
  • fix: set default region for boto3 client to access built-in datasets by @oyangz in #242
  • feat: update implementation of Toxicity to use Transform-based approach by @danielezhu in #241
  • build: bump fmeval version to 1.0.0 by @danielezhu in #243

New Contributors

Full Changelog: v0.4.0...v1.0.0

v0.4.0

21 Feb 00:02
52ad34d
Compare
Choose a tag to compare

What's Changed

  • feat: make sm/br runners easier to subclass by @franluca in #159
  • chore: update example notebooks to pip install the fmeval package by @danielezhu in #158
  • fix(pre-launch science review): correcting categories for toxicity da… by @franluca in #136
  • fix: replace add_column with map in _generate_prompt_column by @danielezhu in #161
  • Update f1 score in QA accuracy eval by @bilalaws in #166
  • feat: added the precision and recall metrics for QA accuracy by @bilalaws in #157
  • Strip text when computing precision and recall. by @bilalaws in #172
  • fix: create single source of truth for dataset column names by @danielezhu in #171
  • fix: update Ray to version 2.9.0 by @danielezhu in #173
  • chore: update devtool all to install first, lint after by @keerthanvasist in #174
  • feat: stringify dataset column contents during data loading by @danielezhu in #168
  • fix: unblock release pipeline by @xiaoyi-cheng in #176
  • fix: update scores description by @xiaoyi-cheng in #177
  • fix: split text by any newline and spaces by @franluca in #178
  • fix: load detoxify model from state dict and upgrade transformers version by @oyangz in #180
  • fix: Fix example notebook unit tests by @danielezhu in #188
  • chore: Update Ray to 2.9.1 by @danielezhu in #189
  • chore: remove xsum dataset and update gigaword description by @xiaoyi-cheng in #191
  • chore: remove XSUM dataset from example notebook and integration tests by @danielezhu in #192
  • feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity by @bilalaws in #184
  • fix: add bert_score_dissimilarity description by @oyangz in #193
  • fix: Toxicity evaluate_sample error message by @xiaoyi-cheng in #185
  • build(deps): bump aiohttp to fix vulnerability by @xiaoyi-cheng in #194
  • build: bump fmeval version to 0.4.0 by @xiaoyi-cheng in #195

New Contributors

Full Changelog: v0.3.0...v0.4.0

v0.3.0

13 Dec 17:41
29fb223
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.1...v0.3.0