Releases · aws/fmeval

14 Jan 18:57

Satish615

v1.2.2

44c51d4

v1.2.2 Latest

Latest

What's Changed

build: add fmeval support for python 3.11 and 3.12 by @Satish615 in #339
build: bump up fmeval version to 1.2.2 by @Satish615 in #339

Full Changelog: v1.2.1...v1.2.2

Contributors

Satish615

Assets 2

17 Oct 18:29

shrestha-bikash

v1.2.1

8d1897e

v1.2.1

What's Changed

build(deps) bump nltk from 3.8.1 to 3.9.1 by @danielezhu in #327
build: bump fmeval version to 1.2.1 by @shrestha-bikash in #329

Full Changelog: v1.2.0...v1.2.1

Contributors

shrestha-bikash and danielezhu

Assets 2

23 Aug 18:09

xiaoyi-cheng

v1.2.0

417bdbb

v1.2.0

What's Changed

refactor: SummarizationAccuracyMetrics transform to handle multiple target outputs more efficiently by @kirupang-code in #317
feat: Add BERT_SCORE to QAAccuracy and update unit/integration tests by @kirupang-code in #314
build(deps): pinned nltk version to address build failure by @kirupang-code in #323
feat: add BERT_SCORE to QAAccuracySemanticRobustness by @kirupang-code in #315
[Fix] Updated notebook rendering as suggested in issue 316 by @polaschwoebel in #320
chore: update urllib3 version by @kirupang-code in #318
build(deps): bump zipp from 3.19.0 to 3.19.1 by @dependabot in #304
build(deps): bump aiohttp from 3.9.5 to 3.10.2 by @dependabot in #321
doc: delete ROUGE/METEOR score from QAAccuracy documentation by @kirupang-code in #325
build: bump fmeval version to 1.2.0 by @kirupang-code in #326

Full Changelog: v1.1.0...v1.2.0

Contributors

polaschwoebel, dependabot, and kirupang-code

Assets 2

18 Jul 23:23

danielezhu

v1.1.0

2c6a402

v1.1.0

What's Changed

chore: add support for proprietary models by @shrestha-bikash in #276
feat: allow placeholder dict for prompt composer by @xiaoyi-cheng in #273
feat: add target_context to dataset columns by @oyangz in #266
feat: add SaveStrategy to allow flexibility in saving localized evaluation outputs by @keerthanvasist in #281
feat: modify GeneratePrompt transform to take placeholder_dict by @xiaoyi-cheng in #288
Support multiple data configs in evaluate by @athewsey in #283
Fix comparisons for string enumerations by @athewsey in #282
feat: support embedding model runner by @xiaoyi-cheng in #293
build(deps): bump tornado from 6.4 to 6.4.1 by @dependabot in #287
build(deps): bump urllib3 from 1.26.18 to 1.26.19 by @dependabot in #292
feat: add error field in EvalScore by @xiaoyi-cheng in #297
build(deps-dev): bump pdoc from 14.5.0 to 14.5.1 by @dependabot in #296
feat: add validate_prompt_template util by @xiaoyi-cheng in #300
fix: register placeholder_to_record_key in GeneratePrompt transform by @xiaoyi-cheng in #301
build(deps): bump certifi from 2024.2.2 to 2024.7.4 by @dependabot in #303
feat: add quasi_exact_inclusion metric to factual knowledge; change factual_knowledge score name to exact_inclusion by @kirupang-code in #302
fix: update how default payloads get extracted from model spec by @danielezhu in #309
feat: update context to take lists and rename context field by @oyangz in #305
feat: add configurable param logical_operator (OR/AND) to factual knowledge by @kirupang-code in #307
feat: update s3 data source for us-isof partition by @oyangz in #311
chore: rename factual knowledge scores by @danielezhu in #312
build: bump fmeval version to 1.1.0 by @danielezhu in #313

New Contributors

@shrestha-bikash made their first contribution in #276
@athewsey made their first contribution in #283
@kirupang-code made their first contribution in #302

Full Changelog: v1.0.3...v1.1.0

Contributors

keerthanvasist, shrestha-bikash, and 6 other contributors

Assets 2

10 May 22:57

danielezhu

v1.0.3

693ca6e

v1.0.3

What's Changed

docs: update telemetry-related info in README and docstrings by @danielezhu in #265
feat: fetch log probability jmespath from JS metadata by @keerthanvasist in #267
build(deps): bump jinja2 from 3.1.3 to 3.1.4 by @dependabot in #272
build(deps): bump tqdm from 4.66.2 to 4.66.3 by @dependabot in #271
fix: update pinned sagemaker python sdk version and get_user_agent_extra util function by @danielezhu in #274
build: bump fmeval version to 1.0.3 by @danielezhu in #275

Full Changelog: v1.0.2...v1.0.3

Contributors

keerthanvasist, dependabot, and danielezhu

Assets 2

25 Apr 18:53

danielezhu

v1.0.2

c76b923

v1.0.2

What's Changed

chore: simplify botocore/boto3-related util code by @danielezhu in #256
docs: create Github Actions workflow for generating docs via pdoc by @danielezhu in #260
test: update matplotlib version and figure cell init test by @oyangz in #259
chore: update lib versions based on dependabot recommendation by @keerthanvasist in #258
docs: add syntax highlighting by @connorads in #261
feat: add fmeval-specific user agent header to botocore config for telemetry purposes by @danielezhu in #262
fix: fix patching in unit tests by @danielezhu in #264
build: bump fmeval version to 1.0.2 by @danielezhu in #263

New Contributors

@connorads made their first contribution in #261

Full Changelog: v1.0.1...v1.0.2

Contributors

keerthanvasist, connorads, and 2 other contributors

Assets 2

17 Apr 17:44

oyangz

v1.0.1

91e675b

v1.0.1

fix: fix s3 uri validation for built-in datasets
docs: update README with details on contributing new eval algos
fix: update output record key validation logic in validate_call

Assets 2

29 Mar 00:01

danielezhu

v1.0.0

e893f70

v1.0.0

What's Changed

chore: Update readme with installation tips by @danielezhu in #181
fix: readme by @polaschwoebel in #196
docs: add troubleshooting item for OOM errors by @keerthanvasist in #198
fix: add data for example notebook by @polaschwoebel in #203
fix: update terminology in README and source code by @danielezhu in #208
feat: implement Transform and TransformPipeline classes for modular redesign by @danielezhu in #209
feat: implement helper models used by evaluation algorithms by @danielezhu in #210
feat: implement transforms for summarization accuracy metrics by @danielezhu in #211
docs: update README to include information about Windows support by @danielezhu in #213
fix: update the default prompt templates for the built-in datasets by @jmikko in #212
feat: update implementation of SummarizationAccuracy to use Transform-based approach by @danielezhu in #214
feat: implement transforms for semantic perturbations by @danielezhu in #215
refactor: update Transform API by @danielezhu in #216
feat: add prompt template to report by @oyangz in #217
feat: update various transforms to accept multiple input keys by @danielezhu in #218
chore: change PromptComposer.PLACEHOLDER from "feature" to "model_input" by @danielezhu in #219
feat: update GetModelResponse transform to support multiple model invocations on the same input by @danielezhu in #220
feat: update implementation of GeneralSemanticRobustness to use Transform-based approach by @danielezhu in #222
fix: update GetModelResponse transform to work with any ModelRunner by @danielezhu in #228
fix: restore semantic perturbation constants to their original values by @danielezhu in #229
feat: example notebook for comparative plotting by @polaschwoebel in #223
refactor: move repeated code in evaluate method into util functions and simplify the EvalAlgorithmInterface method signatures by @danielezhu in #224
feat: updated docstrings by @polaschwoebel in #225
chore: restore evaluate_sample and evaluate signatures in EvalAlgorithmInterface by @danielezhu in #231
refactor: update evaluate_dataset to take in a dataset instead of dataset config by @danielezhu in #232
feat: update implementation of SummarizationAccuracySemanticRobustness to use Transform-based approach by @danielezhu in #233
feat: update implementation of QAAccuracy to use Transform-based approach by @danielezhu in #234
feat: update implementation of QAAccuracySemanticRobustness to use Transform-based approach by @danielezhu in #235
feat: update implementation of ClassificationAccuracy to use Transform-based approach by @danielezhu in #236
feat: update implementation of ClassificationAccuracySemanticRobustness to use Transform-based approach by @danielezhu in #237
Updating third party attributions by @malhotra18 in #239
feat: update implementation of FactualKnowledge to use Transform-based approach by @danielezhu in #238
feat: update implementation of PromptStereotyping to use Transform-based approach by @danielezhu in #240
fix: set default region for boto3 client to access built-in datasets by @oyangz in #242
feat: update implementation of Toxicity to use Transform-based approach by @danielezhu in #241
build: bump fmeval version to 1.0.0 by @danielezhu in #243

New Contributors

@jmikko made their first contribution in #212

Full Changelog: v0.4.0...v1.0.0

Contributors

keerthanvasist, jmikko, and 4 other contributors

Assets 2

21 Feb 00:02

danielezhu

v0.4.0

52ad34d

v0.4.0

What's Changed

feat: make sm/br runners easier to subclass by @franluca in #159
chore: update example notebooks to pip install the fmeval package by @danielezhu in #158
fix(pre-launch science review): correcting categories for toxicity da… by @franluca in #136
fix: replace add_column with map in _generate_prompt_column by @danielezhu in #161
Update f1 score in QA accuracy eval by @bilalaws in #166
feat: added the precision and recall metrics for QA accuracy by @bilalaws in #157
Strip text when computing precision and recall. by @bilalaws in #172
fix: create single source of truth for dataset column names by @danielezhu in #171
fix: update Ray to version 2.9.0 by @danielezhu in #173
chore: update devtool all to install first, lint after by @keerthanvasist in #174
feat: stringify dataset column contents during data loading by @danielezhu in #168
fix: unblock release pipeline by @xiaoyi-cheng in #176
fix: update scores description by @xiaoyi-cheng in #177
fix: split text by any newline and spaces by @franluca in #178
fix: load detoxify model from state dict and upgrade transformers version by @oyangz in #180
fix: Fix example notebook unit tests by @danielezhu in #188
chore: Update Ray to 2.9.1 by @danielezhu in #189
chore: remove xsum dataset and update gigaword description by @xiaoyi-cheng in #191
chore: remove XSUM dataset from example notebook and integration tests by @danielezhu in #192
feat: add support for non-deterministic models in GeneralSemanticRobustness and add BERTScore Dissimilarity by @bilalaws in #184
fix: add bert_score_dissimilarity description by @oyangz in #193
fix: Toxicity evaluate_sample error message by @xiaoyi-cheng in #185
build(deps): bump aiohttp to fix vulnerability by @xiaoyi-cheng in #194
build: bump fmeval version to 0.4.0 by @xiaoyi-cheng in #195

New Contributors

@bilalaws made their first contribution in #166

Full Changelog: v0.3.0...v0.4.0

Contributors

keerthanvasist, danielezhu, and 4 other contributors

Assets 2

13 Dec 17:41

xiaoyi-cheng

v0.3.0

29fb223

v0.3.0

What's Changed

fix: add proper capitalization for 'SageMaker' by @oyangz in #153
fix: fix f1_score by @xiaoyi-cheng in #152
fix: remove s3fs from data-loading code and use boto3 instead by @danielezhu in #155
feat: support inference component by @xiaoyi-cheng in #156

Full Changelog: v0.2.1...v0.3.0

Contributors

danielezhu, xiaoyi-cheng, and oyangz

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's Changed

Contributors

Releases: aws/fmeval

v1.2.2

What's Changed

Contributors

v1.2.1

What's Changed

Contributors

v1.2.0

What's Changed

Contributors

v1.1.0

What's Changed

New Contributors

Contributors

v1.0.3

What's Changed

Contributors

v1.0.2

What's Changed

New Contributors

Contributors

v1.0.1

v1.0.0

What's Changed

New Contributors

Contributors

v0.4.0

What's Changed

New Contributors

Contributors

v0.3.0

What's Changed

Contributors