Release v0.7 · ludwig-ai/ludwig

Key Highlights

Pretrained Vision Models: we’ve added 20 additional TorchVision pretrained models as image encoders, including: AlexNet, EfficientNet, MobileNet v3, and GoogleLeNet.
Image Augmentation: Ludwig v0.7 also introduces image augmentation, artificially increasing the size of the training dataset by applying a randomized set of transformations to each batch of images during training.
50x Faster Fine-Tuning via Automatic Mixed Precision (AMP) Training, Cached Encoder Embeddings, Approximate Training Set evaluation, and automatic batch sizing by default to maximize throughput.
New Distributed Training Strategies: Distributed Data Parallel (DDP) and Fully Sharded Data Parallel (FSDP)
Ray 2.0, 2.1, 2.2 and 2.3 support
A new Ludwig profiler for benchmarking various CPU/GPU performance metrics, as well as comparing different Ludwig model runs.
Revamped Ludwig datasets API with an even larger number of datasets out of the box.
API annotations within Ludwig for contributors and Python users
Schemification of the entire Ludwig Config object for better validation and checks upfront.

What's Changed

Fix ray nightly import by @jppgks in #2196
Restructured split config and added datetime splitting by @tgaddair in #2132
enh: Implements InferenceModule as a pipelined module with separate preprocessor, predictor, and postprocessor modules by @brightsparc in #2105
Explicitly pass data credentials when reading binary files from a RayBackend by @jeffreyftang in #2198
MlflowCallback: do not end run on_trainer_train_teardown by @jppgks in #2201
Fail hyperopt with full import error when Ray not installed by @tgaddair in #2203
Make convert_predictions() backend-aware by @hungcs in #2200
feat: MVP for explanations using Integrated Gradients from captum by @jppgks in #2205
[Torchscript] Adds GPU-enabled input types for Vector and Timeseries by @geoffreyangus in #2197
feat: Added model type GBM (LightGBM tree learner), as an alternative to ECD by @jppgks in #2027
[Torchscript] Parallelized Text/Sequence Preprocessing by @geoffreyangus in #2206
feat: Adding feature type shared parameter capability for hyperopt by @arnavgarg1 in #2133
Bump up version to 0.6.dev. by @justinxzhao in #2209
Define FloatOrAuto and IntegerOrAuto schema fields, and use them. by @justinxzhao in #2219
Define a dataclass for parameter metadata. by @justinxzhao in #2218
Add explicit handling for zero-length image byte buffers to avoid cryptic errors by @jeffreyftang in #2210
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2231
Create dataset util to form repeatable train/vali/test split by @amholler in #2159
Bug fix: Use safe rename which works across filesystems when writing checkpoints by @dantreiman in #2225
Add parameter metadata to the trainer schema. by @justinxzhao in #2224
Add an explicit call to merge_wtih_defaults() when loading a config from a model directory. by @justinxzhao in #2226
Fixes flaky test test_datetime_split[dask] by @dantreiman in #2232
Fixes prediction saving for models with Set output by @geoffreyangus in #2211
Make ExpectedImpact JSON serializable by @hungcs in #2233
standardised quotation marks, added missing word by @Marvjowa in #2236
Add boolean postprocessing to dataset type inference for automl by @magdyksaleh in #2193
Update get_repeatable_train_val_test_split to handle non-stratified split w/ no existing split by @amholler in #2237
Update R2 score to handle single sample computation by @arnavgarg1 in #2235
Input/Output Feature Schema Refactor by @connor-mccorm in #2147
Fix nan in entmax loss and flaky sparsemax/entmax loss tests by @dantreiman in #2238
Fix preprocessing dataset split API backwards compatibility upgrade bug. by @justinxzhao in #2239
Removing duplicates in constants from recent PRs by @arnavgarg1 in #2240
Add attention scores of the vit encoder as an additional return value by @Dennis-Rall in #2192
Unnest Audio Feature Preprocessing Config by @connor-mccorm in #2242
Fixed handling of invalud number values to treat as missing values by @tgaddair in #2247
Support saving numpy predictions to remote FS by @hungcs in #2245
Use global constant for description.json by @hungcs in #2246
Removed import warnings when LightGBM and Ray not requested by @tgaddair in #2249
Adds ability to read images from numpy files and numpy arrays by @geoffreyangus in #2212
Hyperopt steps per epoch not being computed correctly by @arnavgarg1 in #2175
Fixed splitting when providing pre-split inputs by @tgaddair in #2248
Added Backwards Compatibility for Audio Feature Preprocessing by @connor-mccorm in #2254
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2256
Fix: Don't skip saving the model if the save path already exists. by @justinxzhao in #2264
Load best weights outside of finally block, since load may throw an exception by @dantreiman in #2268
Reduce number of distributed tests. by @justinxzhao in #2270
[WIP] Adds inference_utils.py by @geoffreyangus in #2213
Run github checks for pushes and merges to *-stable. by @justinxzhao in #2266
Add ludwig logo and version to CLI help text. by @justinxzhao in #2258
Add hyperopt_statistics.json constant by @hungcs in #2276
fix: Make BaseTrainerConfig an abstract class by @ksbrar in #2273
[Torchscript] Adds --device argument to export_torchscript CLI command by @geoffreyangus in #2275
Use pytest tmpdir fixture wherever temporary directories are used in tests. by @justinxzhao in #2274
adding configs used in benchmarking by @abidwael in #2263
Fixes #2279 by @noahlh in #2284
adding hardware usage and software packages tracker by @abidwael in #2195
benchmarking utils by @abidwael in #2260
dataclasses for summarizing benchmarking results by @abidwael in #2261
Benchmarking core by @abidwael in #2262
Fixed default eval_batch_size when setting batch_size=auto by @tgaddair in #2286
Remove obsolete postprocess_inference_graph function. by @justinxzhao in #2267
[Torchscript] Adds BERT tokenizer + partial HF tokenizer support by @geoffreyangus in #2272
Support passing ground_truth as df for visualizations by @hungcs in #2281
catching urllib3 exception by @abidwael in #2294
Run pytest workflow on release branches. by @justinxzhao in #2291
Save checkpoint if train_steps is smaller than batcher's steps_per_epoch by @dantreiman in #2298
Fix typo in amazon review datasets: s/review_tile/review_title by @dantreiman in #2300
Refactor non-distributed automl utils into a separate directory. by @justinxzhao in #2296
Don't skip normalization in TabNet during inference on a single row. by @dantreiman in #2299
Fix error in postproc_predictions calculation in model.evaluate() by @arnavgarg1 in #2304
Test for parameter updates in Ludwig components by @jimthompson5802 in #2194
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2311
Use warnings to suppress repeated logs for failed image reads by @arnavgarg1 in #2312
Use ray dataset and drop type casting in binary_feature prediction post processing for speedup by @magdyksaleh in #2293
Add size_bytes to DatasetInfo and DataSource by @jeffreyftang in #2306
Fixes TensorDtype TypeError in Ray nightly by @geoffreyangus in #2320
Add configuration section for global feature parameters by @arnavgarg1 in #2208
Ensures unit tests are deleting artifacts during teardown by @geoffreyangus in #2310
Fixes unit test that had empty Dask partitions after splitting by @geoffreyangus in #2313
Serve json numpy encoding by @jeffkinnison in #2316
fix: Mlflow config being injected in hyperopt config by @hungcs in #2321
Update tests that use preprocessing to match new defaults config structure by @arnavgarg1 in #2323
Bump test timeout to 60 minutes by @tgaddair in #2325
Set a default value for size_bytes in DatasetInfo by @jeffreyftang in #2331
Pin nightly versions to fix CI by @geoffreyangus in #2327
Log number of failed image reads by @arnavgarg1 in #2317
Add test with encoder dependencies for global defaults by @arnavgarg1 in #2342
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2334
Add wine quality notebook to demonstrate using config defaults by @arnavgarg1 in #2333
fix: GBM tests failing after new release from upstream dependency by @jppgks in #2347
fix: restore overwrite of eval_batch_size on GBM schema by @jppgks in #2345
Removes empty partitions after dropping rows and splitting datasets by @geoffreyangus in #2328
fix: Properly serialize ParameterMetadata to JSON by @ksbrar in #2348
Test for parameter updates in Ludwig Components - Part 2 by @jimthompson5802 in #2252
refactor: Replace bespoke marshmallow fields that accept multiple types with a new 'combinatorial' OneOfField that accepts other fields as arguments. by @ksbrar in #2285
Use Ray Datasets to read binary files in parallel by @tgaddair in #2241
typos: Update README.md by @andife in #2358
Respect the resource requests in RayPredictor by @magdyksaleh in #2359
Resource tracker threading by @abidwael in #2352
Allow writing init_config results to remote filesystems by @tgaddair in #2364
Fixed export_mlflow command to not assume an existing registered_model_name by @tgaddair in #2369
fix: Fixes to serialization, and update to allow set repo location. by @brightsparc in #2367
Add amazon employee access challenge kaggle dataset by @justinxzhao in #2349
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2362
Wrap read of cached training set metadata in try/except for robustness by @jeffreyftang in #2373
Reduce dropout prob in test_conv1d_stack by @dantreiman in #2380
fever: change broken download links by @jppgks in #2381
Add default split config by @hungcs in #2379
Fix CI: Skip failing ray GBM tests by @justinxzhao in #2391
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2389
Triton ensemble export by @abidwael in #2251
Fix: Random dataset splitting with 0.0 probability for optional validation or test sets. by @justinxzhao in #2382
Print final training report as tabulated text. by @justinxzhao in #2383
Add Ray 2.0 to CI by @tgaddair in #2337
add GBM configs to benchmarking by @jppgks in #2395
Optional artifact logging for MLFlow by @ShreyaR in #2255
Simplify ludwig.benchmarking.benchmark API and add ludwig benchmark CLI by @abidwael in #2394
rename kaggle_api_key to kaggle_key by @jppgks in #2384
use new URL for yosemite dataset by @jppgks in #2385
Encoder refactor V2 by @dantreiman in #2370
re-enable GBM tests after new lightgbm-ray release by @jppgks in #2393
Added option to log artifact location while creating mlflow experiment by @ShreyaR in #2397
Treat dataset columns as object dtype during first pass of handle_missing_values by @jeffreyftang in #2398
fix: ParameterMetadata JSON serialization bug by @ksbrar in #2399
Adds registry to organize backward compatibility updates around versions and config sections by @dantreiman in #2335
Include split column in explanation df by @connor-mccorm in #2405
Fix AimCallback to model_name as Run.name by @alberttorosyan in #2413
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2410
Hotfix: features eligible for shared params hyperopt by @arnavgarg1 in #2417
Nest FC Params in Decoder by @connor-mccorm in #2400
Hyperopt Backwards Compatibility by @connor-mccorm in #2419
Investigating test_resnet_block_layer intermittent test failure by @dantreiman in #2414
fix: Remove duplicate option from cell_type field schema by @ksbrar in #2428
Test for parameter updates in Ludwig Combiners - Part 3 by @jimthompson5802 in #2332
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2430
Hotfix: Proc column missing in output feature schema by @arnavgarg1 in #2435
Nest hyperopt parameters into decoder object by @arnavgarg1 in #2436
Fix: Make the twitter bots modeling example runnable by @justinxzhao in #2433
Add MLG-ULB creditcard fraud dataset by @jppgks in #2425
Bugfix: non-number inputs to GBM by @jppgks in #2418
GBM: log intermediate progress by @jppgks in #2421
Fix: Upgrade ludwig config before schema validation by @connor-mccorm in #2441
Log warning for calibration if validation set is trivially small by @dantreiman in #2440
Fixes calibration and adds example scripts by @dantreiman in #2431
Add medical no-show appointments dataset by @jppgks in #2387
Added conditional check for UNK token insertion into category feature vocab by @arnavgarg1 in #2429
Ensure synthetic dataset unit tests to clean up extra files. by @justinxzhao in #2442
Added feature specific parameter test for hyperopt by @arnavgarg1 in #2329
Fixed version transformation to accept user configs without ludwig_version by @tgaddair in #2424
Fix mulitple partition predict by @magdyksaleh in #2422
Cache jsonschema validator to reduce memory pressure by @tgaddair in #2444
[tests] Added more explicit lifecycle management to Ray clusters during tests by @tgaddair in #2447
Fix: explicit keyword args for seaborn plot fn by @jppgks in #2454
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2453
Extended hyperopt to support nested configuration block parameters by @tgaddair in #2445
Consolidate missing value strategy to only include bfill and ffill by @arnavgarg1 in #2457
fix: Switched Learning Rate to NonNegativeFloat Field by @connor-mccorm in #2446
Support GitHub Codespaces by @jppgks in #2463
Enh: quality-of-life improvements for export_torchscript by @geoffreyangus in #2459
Disables batch_size: auto for CPU-only training by @geoffreyangus in #2455
buxfix: triton model version as a string by @abidwael in #2461
Updating images to Ray 2.0.0 and CUDA 11.3 by @abidwael in #2390
Loss, Split, and Defaults Schema Additions by @connor-mccorm in #2439
More precise resource usage tracking by @abidwael in #2363
Summarizing performance metrics and resource usage results by @abidwael in #2372
Better gbm defaults based on benchmarking results by @jppgks in #2466
Infer single distinct value columns as category instead of binary by @arnavgarg1 in #2467
fix: Add explicit schema in to_parquet() during saving predictions by @hungcs in #2420
Publish docker images from release branches by @tgaddair in #2470
Add backwards-compatibility logic for model progress tracker by @jeffreyftang in #2468
Backwards compatibility for class_weights by @connor-mccorm in #2469
Test for parameter updates in Ludwig Decoders - Part 4 by @jimthompson5802 in #2354
Fixed backwards compatibility for training_set_metadata and bfill by @tgaddair in #2472
Fixed backwards compatibility for models with level metadata in saved configs by @tgaddair in #2475
Fix profiler: account for missing values when running in docker by @jppgks in #2477
Add L-BFGS optimizer by @jppgks in #2478
fix: Automatically assign title to OneOfOptionsField by @ksbrar in #2480
fix: handle 'numerical' entries in preprocessing config during backwards compatibility upgrade by @jeffreyftang in #2484
fix: mark update_class_weights_in_features transformation for version 0.6 by @jeffreyftang in #2481
Fixed usage of checkpoints for AutoML in Ray 2.0 by @tgaddair in #2485
[fix flaky test] Relax loss constraint for unit tests for lbfgs optimizer. by @justinxzhao in #2486
Fixed stratified splitting with Dask by @tgaddair in #1883
Replace custom Union marshmallow fields with Oneof fields, and default allow_none=True everywhere. by @justinxzhao in #2482
Resource isolation for dataset preprocessing on ray backends by @magdyksaleh in #2404
Pin transformers < 4.22 until issues resolved by @tgaddair in #2495
Fix flaky ray nightly image test by @arnavgarg1 in #2493
Added workflow to auto cherry-pick into release branches by @tgaddair in #2500
Enable hyperopt to be launched from a ray client by @ShreyaR in #2501
GBM: support hyperopt by @jppgks in #2490
Fixes saved_weights_in_checkpoint docstring, mark as internal only by @dantreiman in #2506
Fix test length of predictions by @tgaddair in #2507
Fixed support for distributed datasets in create_auto_config by @tgaddair in #2508
Config-first Datasets API (ludwig.datasets refactor) by @dantreiman in #2479
Add in-memory dataset size calculation to dataset statistics by @arnavgarg1 in #2509
Surfacing dataset statistics in hyperopt by @arnavgarg1 in #2515
Adds multimodal benchmark datasets from AutoGluon paper by @dantreiman in #2512
Adds goodbooks dataset by @dantreiman in #2514
GBM: correctly compute early stopping by @jppgks in #2517
Fixes mnist dataset image files not exporting by @dantreiman in #2520
Fix get_best_model in hyperopt for Ray 1.12 by @arnavgarg1 in #2527
Populate Parameter Metadata by @connor-mccorm in #2503
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2532
Update README to be consistent with ludwig.ai home page. by @justinxzhao in #2530
Add missing declarative ML image in README by @arnavgarg1 in #2533
fix: Add missing titles/descriptions to various schemas by @ksbrar in #2516
Cleanup: move to per-module loggers instead of the global logging object. by @justinxzhao in #2531
Updated schedule logic for placement groups for ray backend by @magdyksaleh in #2523
Nit: Parameter update tests grammar. by @justinxzhao in #2537
Hyperopt: Log warning with num_extra_trials if all grid search parameters and num_samples > 1 by @arnavgarg1 in #2535
Adds model configs to ludwig.datasets by @dantreiman in #2540
ZScore Normalization Failure When Using Constant Value Number Feature by @arnavgarg1 in #2543
Adds class names to calibration plot title, reformats Brier scores as grouped bar chart by @dantreiman in #2545
Pin ray nightly version to avoid new test failures by @arnavgarg1 in #2548
Added tests for init_config and render_config CLI commands by @tgaddair in #2551
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2554
Ensure bfill/ffill leave no residual NaNs in the dataset during preprocessing by @arnavgarg1 in #2553
Comprehensive configs: Explicitly list and save all parameter values for input and output features in configs. by @justinxzhao in #2460
Fixing SettingWithCopyWarning when using get_repeatable_train_val_test_split by @abidwael in #2562
Replace numerical with number in dataset zoo configs. by @justinxzhao in #2558
Benchmarking toolkit wrap up by @abidwael in #2462
Migrate to Raincloud plots for hyperopt report by @arnavgarg1 in #2561
Remove global torchtext version-specific tokenizer availability warnings. by @justinxzhao in #2547
Only create hyperopt pair plots when there is more than 1 parameter by @arnavgarg1 in #2560
fix: Limit frequency array to top_n_classes in F1 viz by @hungcs in #2565
int: unpin Dask version by @geoffreyangus in #2550
Fixed typehint and removed unused utility function by @magdyksaleh in #2570
AutoML: stratify imbalanced datasets by @jppgks in #2525
Use Ray Air Checkpoint to sync files between trial workers by @tgaddair in #2577
GBM bugfix: matching predictions LightGBM, hummingbird by @jppgks in #2574
specify seed in RayDataset shuffling by @abidwael in #2566
update logging message when early_stop: -1 by @abidwael in #2585
update docker with torch wheel by @abidwael in #2584
Refactors test_ray.py to minimize duplicate training jobs by @geoffreyangus in #2573
Explanation API and feature importance for GBM by @jppgks in #2564
Remove duplicate option by @connor-mccorm in #2593
Quick fix: Don't show calibration validation set warnings unless calibration is actually enabled by @dantreiman in #2595
Fixed issue when uploading output directory artifacts to remote filesystems by @tgaddair in #2598
Add API Annotations to Ludwig by @arnavgarg1 in #2596
Tweaks to the README (forward-ported from release-0.6) by @justinxzhao in #2603
Extend test coverage for non-conventional booleans by @jppgks in #2601
Fix assertions in training_determinism tests by @arnavgarg1 in #2606
Ensure no ghost ray instances are running in tests by @arnavgarg1 in #2607
Allow explicitly plumbing through nics by @tgaddair in #2605
bug: fix relative import in optimizers.py by @ksbrar in #2600
GBM: increase boosting_rounds_per_checkpoint to reduce evaluation overhead by @jppgks in #2612
regression tests: add GBM model trained on v0.6.1 by @jppgks in #2611
Relax test constraint to reduce flakiness in test_ray by @arnavgarg1 in #2610
Add splitter that deterministically splits on an ID column by @tgaddair in #2615
fix(explain): missing columns for fixed split by @jppgks in #2616
Fixed hyperopt trial syncing to remote filesystems for Ray 2.0 by @tgaddair in #2617
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2622
feat: adds max_batch_size to auto batch size functionality by @geoffreyangus in #2579
Set commonly used parameters by @connor-mccorm in #2619
Factor out defaults mixin change by @connor-mccorm in #2628
Add type to custom combiner by @connor-mccorm in #2627
Remove hyperopt from config when running train through cli by @arnavgarg1 in #2631
Ensure resource availability for ray datasets workloads when running on cpu clusters by @arnavgarg1 in #2524
Speed up horovod hyperopt tests and solve OOMs by @arnavgarg1 in #2599
[explain] add API annotations by @jppgks in #2635
Added storage backend API to allow injecting dynamic credentials by @tgaddair in #2630
Update version to 0.7.dev by @justinxzhao in #2625
Unpin Ray nightly in CI by @tgaddair in #2614
Skip Horovod 0.26 installation, add packaging to requirements.txt by @arnavgarg1 in #2642
[Annotations] Callbacks by @arnavgarg1 in #2641
Fix automl by @connor-mccorm in #2639
accepting dictionary as input to benchmarking.benchmark by @abidwael in #2626
Fixed automl APIs to work with remote filesystems by @tgaddair in #2650
Adds minimum split size, ensures random split is never smaller than minimum for local backend by @dantreiman in #2623
Categorical passthrough encoder training failure fix by @abidwael in #2649
Changes learning_curves to use "step" or "epoch" as x-axis label. by @dantreiman in #2578
Remove Trainer type Param by @connor-mccorm in #2647
Model performace in GitHub actions by @abidwael in #2568
Fixed race condition in schema validation by @tgaddair in #2653
Fixed --gpu_memory_limit in CLI to interpret as fraction of GPU memory by @tgaddair in #2658
Stopgap solution for test_training_determinism by @connor-mccorm in #2665
Added min and max to sample ratio by @connor-mccorm in #2655
Set internal only flags by @connor-mccorm in #2659
Add support for running pytest github action locally with act by @dantreiman in #2661
Enforcing a 1 to 1 matching in names between Ludwig datasets and AutoGluon paper by @abidwael in #2666
Added default arg to get_schema by @connor-mccorm in #2667
remove duplicate news_popularity dataset by @abidwael in #2668
Switch defaults to use mixins and improve test by @connor-mccorm in #2669
Documents running local tests with act by @dantreiman in #2672
Config Object by @connor-mccorm in #2426
Unpin protobuf by @justinxzhao in #2673
Check vocab size of category features, error out if only one category. Also adds error.py for custom error types. by @dantreiman in #2670
Ordered Schema by @connor-mccorm in #2671
Fix Regression Test Configs by @connor-mccorm in #2678
Testing always() inside expansion in condition by @dantreiman in #2681
Add protos to the Ludwig project: DatasetProfile messages and Whylogs messages. by @justinxzhao in #2674
Allow Ray Tune callbacks to be passed into hyperopt and log model config by @jeffkinnison in #2640
Check for nans before testing equality in test_training_determinism by @dantreiman in #2687
Set saved_weights_in_checkpoint on encoder, not input feature by @dantreiman in #2690
Use fully rendered config dictionary when accessing model.config by @tgaddair in #2685
bug: Set additionalProperties to True for preprocessing schemas. by @ksbrar in #2620
Bump support for torch 1.11.0 by @justinxzhao in #2691
Fix validator for reduce_learning_rate_on_plateau by @carlogrisetti in #2692
Use TensorArray to speed up writing predictions with Ray by @tgaddair in #2684
Dataset size checks in preprocess_for_training by @dantreiman in #2688
Remove Duplicate Schema Fields by @connor-mccorm in #2679
Speed up tune_batch_size by using synthetic batches by @tgaddair in #2680
Add bucketing_field Param to Trainer by @connor-mccorm in #2694
Fix InputDataError to be serializeable by @tgaddair in #2695
Adds PublicAPI annotation to api.py by @dantreiman in #2698
Cleanup: move to per-module loggers instead of the global logging object. (2) by @justinxzhao in #2699
Adds Ray implementation of IntegratedGradientsExplainer that distributes across cluster resources by @tgaddair in #2697
Fixed bug with non-category outputs in RayIntegratedGradientsExplainer by @tgaddair in #2702
Fix example values for max_batch_size in trainer parameter metadata by @connor-mccorm in #2705
Fix incorrect internal_only flags on audio feature metadata by @connor-mccorm in #2704
add customer churn datasets by @abidwael in #2703
Add Kaggle test splits by @abidwael in #2675
Fix ComparatorCombiner by @jppgks in #2689
Actually print the torchinfo summary in print_model_summary() by @justinxzhao in #2696
Add H&M fashion recommendation dataset by @jppgks in #2708
Fix GBM ray nightly test by @jppgks in #2676
Adds DeveloperAPI and PublicAPI annotations to AutoML by @dantreiman in #2701
Remove obsolete v0 whylogs callback. by @justinxzhao in #2713
fill_value / computed_fill_value fix by @connor-mccorm in #2714
Add path to RayDataset by @tgaddair in #2716
Fixed Horovod to be an optional import when doing Hyperopt by @tgaddair in #2717
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2722
Adds annotation to download_one method in benchmarks by @dantreiman in #2712
fix: Prevent shared parameter_metadata instances between defaults and _features. by @ksbrar in #2715
Added ngram tokenizer by @tgaddair in #2723
Revert "Add H&M fashion recommendation dataset (#2708)" by @jppgks in #2724
Optimize search space for hyperopt tests to decrease test durations by @arnavgarg1 in #2730
Add custom to_dask() to infer Dask metadata from Datasets schema. by @arnavgarg1 in #2728
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2735
Bump Ludwig to Ray 2.0 by @arnavgarg1 in #2729
Parameter Metadata Updates by @connor-mccorm in #2736
Removes some vestigial code and replaces Tensorflow with PyTorch in comments by @dantreiman in #2731
@DeveloperAPI annotations for backend module by @dantreiman in #2707
int: Refactor test_ray.py to limit number of full train jobs by @geoffreyangus in #2637
BaseTrainer: add empty barrier() by @jppgks in #2734
Use whylogs to generate dataset profiles for pandas and dask dataframes. by @justinxzhao in #2710
Add IntegerOptions marshmallow field by @ksbrar in #2739
Downgrade to Ray 2.0 in CI to get green Ludwig CIs again. by @justinxzhao in #2742
Adds @DeveloperAPI annotations to combiner classes by @dantreiman in #2744
Use clearer error messages in ludwig serving, and enable serving to work with configs that have stratified splitting on target columns. by @justinxzhao in #2740
Update Ray GPU Docker image to CUDA 11.6 by @tgaddair in #2747
Fix #1735 by @herrmann in #2746
Enable dataset window autosizing by @jeffkinnison in #2721
Downgrade to PyTorch 1.12.1 in Docker to due to NCCL + CUDA compatibility by @tgaddair in #2750
Replicate ludwig type inference, using the whylogs dataset profile. by @justinxzhao in #2743
fix: Encountered unknown symbol 'foo' warning in Category feature preprocessing by @geoffreyangus in #2662
Expand ~ in dataset download paths by @dantreiman in #2754
Updates twitter bots example to new datasets API by @dantreiman in #2753
fix: refactor IntegerOptions field by @ksbrar in #2755
Added ray datasets repartitioning in cases of multiple train workers by @ShreyaR in #2756
fix: Fix metadata object-to-JSON serialization for oneOf fields and add full schema serialization test. by @ksbrar in #2758
refactor: Add ProtectedString field (alias of StringOptions that only allows one string) by @ksbrar in #2757
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2761
Updates ludwig docker readme by @dantreiman in #2760
Annotates ludwig.datasets API by @dantreiman in #2751
Annotate MLFlow callback, and utility functions by @dantreiman in #2749
Drishi sarcasmdataset 1 by @drishi in #2725
Add local_rank to BaseTrainer by @tgaddair in #2766
Public datasets by @connor-mccorm in #2752
Fix typo by @connor-mccorm in #2767
Correctly infer bool and object types in autoML by @arnavgarg1 in #2765
feat: Hyperopt schema v0, part 1: Move output feature metrics from feature classes to feature configs. by @ksbrar in #2759
Fix by @connor-mccorm in #2769
Add ray version to runners by @arnavgarg1 in #2771
Annotate Ludwig encoders and decoders by @arnavgarg1 in #2773
Move preprocess callbacks inside model.preprocess by @jeffreyftang in #2772
Fix benchmark tests, update latest metrics, and use the local backend for GBM benchmark tests by @abidwael in #2748
Ensure correct output reduction for text encoders like MT5 and add warning messages when not supported by @arnavgarg1 in #2774
CVE-2007-4559 Patch by @TrellixVulnTeam in #2770
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2776
Fix double counting of training loss by @arnavgarg1 in #2775
feat: Hyperopt schema v0, part 2: Make BaseMarshmallowConfig abstract by @ksbrar in #2779
feat: Hyperopt schema v0, part 3: Enable optional min/max support for FloatTupleMarshmallowField fields by @ksbrar in #2780
feat: Hyperopt schema v0, part 4: Add and use new hyperopt registry, search algorithm instantiation by @ksbrar in #2781
Added exponential retry for mlflow, remote dataset loading by @ShreyaR in #2738
Add synthetic test data integration test utils, and use them for loss value decrease tests. by @justinxzhao in #2789
feat: Hyperopt schema v0, part 5: Add basic search algorithm, scheduler, executor, and hyperopt schemas. by @ksbrar in #2784
Add benchmark as a pytest marker to avoid warnings. by @justinxzhao in #2786
feat: Hyperopt schema v0, part 6: Enable new hyperopt schema by @ksbrar in #2785
Add sentencepiece as a requirement, which is necessary for some hf models like mt5. by @justinxzhao in #2782
[Annotations] Ludwig data modules by @arnavgarg1 in #2793
[Annotations] Add DeveloperAPI annotations to Ludwig utils - Part 1 by @arnavgarg1 in #2794
[Annotations] Annotations for Ludwig's utils - Part 2 by @arnavgarg1 in #2797
[Annotations] Add annotations for schema module (part 1) - Model Config, Split, Trainer, Optimizers, Utils by @arnavgarg1 in #2798
[Annotations] Annotate Schema Part 2: decoders, encoders, defaults, combiners, loss and preprocessing by @arnavgarg1 in #2799
Add new data utility functions for buffers and files, and rename registry by @arnavgarg1 in #2796
[Annotations] Ludwig Schema - Part 3: Features, Hyperopt and Metadata by @arnavgarg1 in #2800
[Annotations] Add annotations for Ludwig's data utils (file readers) by @arnavgarg1 in #2795
Proceed with model training even if saving preprocessed data fails. by @justinxzhao in #2783
Improve warnings about backwards compatibility and dataset splitting. by @justinxzhao in #2788
Generate structural change warnings and log_once functionality by @arnavgarg1 in #2801
Broadcast progress tracker dict to all workers by @arnavgarg1 in #2804
Start fresh training run if files for resuming training are missing by @arnavgarg1 in #2787
LIghtGBMRayTrainer repartition datasets with fewer blocks than Ray actors by @jeffkinnison in #2806
Add InterQuartileTransformer normalization strategy for Number Features by @arnavgarg1 in #2805
Add negative sampling to ludwig.data by @jppgks in #2711
Rectify output features in dataset config by @abidwael in #2768
int: Add JSON markup to support unique input feature names. by @ksbrar in #2792
int: Replace StringOptions usage with ProtectedString in split schemas by @ksbrar in #2808
int: Replace StringOptions with ProtectedString for combiner schema type fields by @ksbrar in #2809
refactor: Replace StringOptions with ProtectedString for encoder/decoder schema type fields by @ksbrar in #2810
Upload Datasets to Remote Location by @connor-mccorm in #2764
[Annotations] Annotate AutoML utils by @arnavgarg1 in #2812
[Annotations] Ludwig Visualizations by @arnavgarg1 in #2813
[Annotations] Logging Level Registry by @arnavgarg1 in #2814
refactor: Replace StringOptions with ProtectedString for loss/hyperopt schema type fields by @ksbrar in #2816
Define custom Ludwig types and replace Dict[str, Any] type hints with them. by @justinxzhao in #2556
Config Object Bug Fix by @connor-mccorm in #2817
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2803
AutoML libraries that use DatasetProfile instead of DatasetInfo by @justinxzhao in #2802
Remove Sentencepiece by @connor-mccorm in #2821
fix: account for max_batch_size config param in batch size tuning on cpu by @geoffreyangus in #2693
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2823
refactor: Add filtering based on model_type for feature, combiner, and model type schemas by @ksbrar in #2815
[TorchScript] Add user-defined HF Bert tokenizers by @geoffreyangus in #2733
[Annotations] Move feature registries into accessor functions by @arnavgarg1 in #2818
[Annotations] Encoder and Decoder Registries by @arnavgarg1 in #2819
Speed Up Ray Image Tests by @geoffreyangus in #2828
fix: Restrict allowed top-level config keys by @ksbrar in #2826
Moves image decoding out of Ray Datasets to Dask Dataframe by @geoffreyangus in #2737
Improve type hints and remove dead code for DatasetLoader module by @arnavgarg1 in #2833
Update stratified split with a more specific exception for underpopulated classes by @jeffkinnison in #2831
Add Ludwig contributors to README by @arnavgarg1 in #2835
Fix key error in AutoML model select by @ShreyaR in #2824
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2836
Drop incomplete batches for Ray and Pandas to prevent Batchnorm computation errors by @arnavgarg1 in #2778
Catch and surface Runtime exceptions during preprocessing by @arnavgarg1 in #2839
fix: Mark width and height as internal_only for image encoders by @ksbrar in #2842
Select best batch size to maximize training throughput by @tgaddair in #2843
Make batch_size=auto more consistent by using median of 5 steps by @tgaddair in #2846
Make trainable=False default for all pretrained models by @tgaddair in #2844
fix: Add back missing split fields by @ksbrar in #2848
Pin scikit-learn<1.2.0 by @tgaddair in #2850
text_encoder: RoBERTa max_sequence_length by @rudolfolah in #2852
Fix TorchText version in tokenizers ahead of torch 1.13.0 upgrade by @geoffreyangus in #2838
Fix trainable=False to freeze all params for HF encoders by @tgaddair in #2855
Add support for automatic mixed precision (AMP) training by @tgaddair in #2857
Evaluate training set in the training loop by @tgaddair in #2856
Extend parameter guidance documentation for regularization, and add explicit maxes to Non-Negative floats by @justinxzhao in #2849
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2860
Fixes for the roberta encoder: explicitly set max sequence length, and fix output shape computation by @justinxzhao in #2861
Enables Set output feature on Ray by @geoffreyangus in #2791
Add go module for dataset profile protos. by @justinxzhao in #2834
fix: Upgrade expected_impact for trainable to MEDIUM on all encoders. by @ksbrar in #2865
support stratified split with low cardinality features by @abidwael in #2863
fix: load spacy model for lemmatization in EnglishLemmatizeFilterTokenizer to work by @abidwael in #2868
Token-level explanations by @jppgks in #2864
Replace learning rate: auto with feature type and encoder-based heuristics by @abidwael in #2854
Set RayBackend Config to use single worker for tests by @arnavgarg1 in #2853
Remove _to_tensors_fn from Ray Datasets by @geoffreyangus in #2866
Remove ludwig-dev Dockerfile by @arnavgarg1 in #2873
Support Ray GPU image with Torch 1.13 and CUDA 11.6 by @arnavgarg1 in #2869
Use native LightGBM for intermittent eval during training by @jeffkinnison in #2829
Set default validation metrics based on the output feature type. by @justinxzhao in #2820
Auto resize images for ViTEncoder when use_pretrained is True or False by @arnavgarg1 in #2862
TLE Backwards Compatibility Fixes by @jppgks in #2875
Do not drop batch size dimension for single inputs by @jppgks in #2878
Save GBM after training if not previously saved by @jeffkinnison in #2880
Fix TLE - Pt. 2 by @connor-mccorm in #2881
Tle fix by @connor-mccorm in #2883
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2885
Convert schema metadata to YAML by @tgaddair in #2884
Automatically infer vector_size for vector features when not provided by @tgaddair in #2888
Support MLFlowCallback logging to an existing run by @tgaddair in #2892
Fix dataset synthesizer by @connor-mccorm in #2894
Add a clear error message about invalid column names in GBM datasets by @jeffkinnison in #2879
Explicitly track all metrics related to the best evaluation in the progress tracker. by @justinxzhao in #2827
Added DistributedStrategy interface with support for DDP by @tgaddair in #2890
Adopt PyTorch official LRScheduler API by @tgaddair in #2877
Annotate Confusion Matrix with updated cmap by @arnavgarg1 in #2899
Dynamically resize confusion matrix and f1 plots by @arnavgarg1 in #2900
Update backward compatibility tests for LR progress tracker changes made in #2877. by @justinxzhao in #2904
fix: Fix vague initializer JSON schema titles. by @ksbrar in #2909
Support Distributed Training And Ray Tune with Ray 2.1 by @arnavgarg1 in #2709
Expand vision models to support pre-trained models by @jimthompson5802 in #2408
Add ECD Descriptions by @connor-mccorm in #2897
Simplify titanic example to read config in-line, and skip saving processed input. by @justinxzhao in #2912
Adds quick fix for pretrained models not loading by modifying state_dict keys on load. by @dantreiman in #2911
fix: Schema split conditions should pass in [TYPE] and not string by @hungcs in #2917
Refactor metrics and metric tables and support adding more in-training metrics. by @justinxzhao in #2901
Updated AutoML configs for latest schema and added validation tests by @tgaddair in #2921
Adds backwards compatibility for legacy image encoders by @tgaddair in #2916
Pin Torch to >=1.13.0 by @connor-mccorm in #2914
Hyperopt invalid GBM config by @jppgks in #2926
Store mlflow tracking URI to ensure consistency across processes by @tgaddair in #2927
Update automl heuristics for fine-tuning and multi-modal tasks by @tgaddair in #2922
Bump torch version for benchmark tests by @connor-mccorm in #2929
Fix signing key by @tgaddair in #2928
Adds safe_move_directory to fs_utils by @arnavgarg1 in #2931
Added separate AutoML APIs for feature inference and config generation by @tgaddair in #2932
Dynamic resizing for Confusion Matrix, Brier, F1 Plot, etc. by @arnavgarg1 in #2936
Raise RuntimeError only for category output features with vocab size 1 by @arnavgarg1 in #2923
Bump min python to 3.8 by @tgaddair in #2930
Evaluate training set in the training loop (GBM) by @jppgks in #2907
[automl] Exclude text fields with low avg words by @tgaddair in #2941
[pre-commit.ci] pre-commit suggestions by @pre-commit-ci in #2944
Fix pre-commit by removing manually specified blacken-docs dep. by @justinxzhao in #2949
Rotate Brier Plot X-axis labels to 45 degree angle by @arnavgarg1 in #2948
Retry HuggingFace pretrained model download on failure by @jeffkinnison in #2951
Disable AUROC for CATEGORY features. by @justinxzhao in #2950
Deactivate GBM random forest boosting type by @jeffkinnison in #2954
Make batch_size=auto the default by @tgaddair in #2845
Twitter bots test small improvements by @dantreiman in #2955
Disable bagging when using GOSS GBM boosting type by @jeffkinnison in #2956
Add missing standardize_image key to metadata by @jppgks in #2946
Integrated Gradients: reset sample_ratio to 1.0 if set by @jppgks in #2945
Increase CI pytest time out to 75 minutes by @jimthompson5802 in #2958
Add sacremoses as a dependency for transformer_xl encoder by @arnavgarg1 in #2961
Move all config validation to its own standalone module, config_validation. by @justinxzhao in #2959
Fixes longformer encoder by passing in pretrained_kwargs correctly by @arnavgarg1 in #2963
Expected Impact Calibration by @connor-mccorm in #2960
Update Camembert by @geoffreyangus in #2966
fix: Fix epochs suggested range by @ksbrar in #2965
fix: enable binary dense encoder by @abidwael in #2957
GBM DART boosting type incopatible with early stopping by @jeffkinnison in #2964
Improving metadata config descriptions by @w4nderlust in #2933
Fix ludwig-gpu image by @tgaddair in #2974
Skip test_ray_outputs by @arnavgarg1 in #2935
Enable custom HF BERT models with default tokenizer config by @geoffreyangus in #2973
Update CamemBERT in schema by @geoffreyangus in #2975
Set reduce_output to sum for XLM encoder by @arnavgarg1 in #2972
Skipped mercedes_benz_greener.ecd.yaml benchmark test by @tgaddair in #2980
Add sentencepiece as a requirement for MT5 text encoder by @arnavgarg1 in #2967
Disable CTRL Encoder by @connor-mccorm in #2976
MT5 reduce_output can't be cls_pooled - set to sum by default by @arnavgarg1 in #2981
Populate hyperopt defaults using schema by @arnavgarg1 in #2968
Revert "Add sentencepiece as a requirement for MT5 text encoder (#2967)" and disable MT5 Encoder by @arnavgarg1 in #2982
Change default reduce_output strategy to sum for CamemBERT by @arnavgarg1 in #2984
Set max_failures for Tuner to 0 by @geoffreyangus in #2987
Fix TLE OOM for BERT-like models by @jppgks in #2990
Reorder Advanced Parameters by @connor-mccorm in #2979
[Hyperopt] Modify _get_best_model_path to grab it from the Checkpoint object with ExperimentAnalysis by @arnavgarg1 in #2985
GBM: disable goss boosting type by @jppgks in #2986
Adds HuggingFace pretrained encoder unit tests by @geoffreyangus in #2962
[Hyperopt] Set default num_samples based on parameter space by @arnavgarg1 in #2997
LR Scheduler Adjustments by @connor-mccorm in #2996
fix: Force populate combiner registry inside of get_schema function. by @ksbrar in #2970
fix: Fix validation and serialization for Boolean and OneOfOptionsField fields by @ksbrar in #2992
Ray 2.2 compatibility by @arnavgarg1 in #2910
Compute fixed text embeddings (e.g., BERT) during preprocessing by @tgaddair in #2867
Use iloc to fetch first audio value. by @justinxzhao in #3006
Fix Internal Only Param by @connor-mccorm in #3008
Ludwig Dataclass by @connor-mccorm in #3005
Cap batch_size=auto at 128 for CPU training by @tgaddair in #3007
Added ghost batch norm option for concat combiner by @tgaddair in #3001
Refactored norm layer and added additional norm at the start of the FCStack by @tgaddair in #3011
Fix assignment that undoes tensor move to CPU by @jeffkinnison in #3012
[Explain] Detach inputs before numpy processing by @jppgks in #3014
Handle CUDA OOMs in explanations with retry and batch size halving by @tgaddair in #3015
fix: Remove ecd_ray_legacy model type alias. by @ksbrar in #3013
Explain fixes by @jppgks in #3016
Remove null GBM trainer config options by @jeffkinnison in #2989
Disable reuse_actors in hyperopt by @arnavgarg1 in #3017
Skip Sarcos dataset during benchmark tests by @arnavgarg1 in #3020
Explain: improve docstring about IntegratedGradient baseline for number features by @jppgks in #3018
Upgrade isort to fix pre-commit. by @justinxzhao in #3027
Limit batch size tuning to ≤20% of dataset size by @geoffreyangus in #3003
[schema] Mark skip internal only by @jppgks in #3022
Add specificity metric for binary features by @jppgks in #3025
Added FSDP distributed strategy by @tgaddair in #3026
Move on_batch_end callback to omit eval from batch duration during benchmarking by @geoffreyangus in #2898
Set 0.7.beta by @justinxzhao in #3028
Added missing file for fsdp by @tgaddair in #3033
Cleaning up seed / random_seed usage discrepancy by @w4nderlust in #3021
Filter Competitions by @connor-mccorm in #3032
Hyperopt Quick Fix by @connor-mccorm in #3034
Expected Impact and Ordering for GBM Params by @connor-mccorm in #3038
Transformer Encoder - Representation Parameter Fix by @connor-mccorm in #2999
Enables a new GitHub Action for slow tests by @geoffreyangus in #3029
Skip BOHB test when using hyperopt with ray + horovod by @arnavgarg1 in #3036
Fix gradient clipping typo by @geoffreyangus in #3039
Fix checkpoint loading for HuggingFace encoders by @geoffreyangus in #3010
Schema Polishing by @connor-mccorm in #3041
Address some warnings when running hyperopt tests by @arnavgarg1 in #3040
Removed log spam from distributed loader by @tgaddair in #3042
[Hyperopt] Fix get_best_model_path by @arnavgarg1 in #3043
Bump Ludwig images to Ray 2.2.0 by @geoffreyangus in #3023
Revert "Bump Ludwig images to Ray 2.2.0 (#3023)" by @geoffreyangus in #3044
add httpx as required by starlette>=0.21.0 by @jppgks in #3047
Raise exceptions from async batch producer thread on the main thread by @tgaddair in #3050
Set zscore normalization as the default normalization strategy for number features by @arnavgarg1 in #3051
Fix TLE: safe divide by zero + normalize at sequence level by @jppgks in #3046
Fix fill_with_mode when using Dask by @arnavgarg1 in #3054
Refactored ModelConfig object into a Marshmallow schema by @tgaddair in #2906
Fix LR reduce on plateau interaction with base LR decay by @tgaddair in #3056
Update schema to correctly reflect supported missing value strategies for different feature types by @arnavgarg1 in #3053
Improve observability when using cached datasets during preprocessing by @arnavgarg1 in #3058
Quick fix for cached logging by @arnavgarg1 in #3060
Remove passthrough encoder from sequence and text features encoder registry by @jeffkinnison in #3061
Remove RNN invalid cell types from the schema by @jeffkinnison in #3062
Round confusion matrix numbers to 3 decimal places by @arnavgarg1 in #3065
Fixed handling of {} hyperopt config section by @tgaddair in #3064
Deflake test_tune_batch_size_lr_cpu by @justinxzhao in #3067
Feature: Data Augmentation for Image Input Features by @jimthompson5802 in #2925
Update transformer hidden_size / num_heads error message by @jeffkinnison in #3066
Add MPS device support by @tgaddair in #3072
Require env var LUDWIG_USE_MPS to enable MPS by @tgaddair in #3074
Generate proc_column only after all preprocessing parameters are merged in to prevent incorrect cached dataset reads by @arnavgarg1 in #3069
Remove duplicate validation field validation. by @justinxzhao in #3070
Remove previous ModelConfig implementation and refactor to use __post_init__ by @tgaddair in #3083
Set RunConfig verbosity to 0 by @tgaddair in #3085
Fix default image on image read failure by @geoffreyangus in #3073
Add Precision Recall curves to Ludwig by @arnavgarg1 in #3084
Log number of rows dropped by DROP_ROWS strategy by @geoffreyangus in #3087
Allow providing a Ludwig dataset as a URI of the form ludwig://<dataset> by @tgaddair in #3082
Fixed augmentation schema check by @tgaddair in #3090
XLNet: disable "uni" attention type by @jppgks in #3097
Only show drop row logging if rows are dropped by @arnavgarg1 in #3094
Upgrade torchmetrics to 0.11.1. Add ROC metrics for category features. Add sequence accuracy, char error rate, and perplexity metrics for text features. by @justinxzhao in #3035
Disallow certain config parameters from accepting null as a value by @abidwael in #3079
Deflake the lbgfs optimizer test by @justinxzhao in #3100
Use window_size_bytes: auto to specify automatic windowing by @jeffkinnison in #3076
Use proc col hash for checksum computation by @arnavgarg1 in #3095
Fixed ethos_binary dataset to threshold the label at 0.5 by @tgaddair in #3102
Add -1 as a valid negative class for binary type inference by @arnavgarg1 in #3101
GBM: remove distributed=False from RayDMatrix by @jppgks in #3099
Switch combiner num_fc_layers to expected impact 3 by @connor-mccorm in #3103
Add a registry of additional config checks to check inter-parameter incompatibilities. by @justinxzhao in #3024
Adds config parameters to replace outliers via a missing_value_strategy by @tgaddair in #3080
Fixed serialization and deserialization of augmentation configuration by @jimthompson5802 in #3096
Unregister CTRL and MT5 encoders since they have tensor placement and sentencepiece segfault issues by @arnavgarg1 in #3106
Pin torch nightly to Feb 13, 2023 by @arnavgarg1 in #3110
Resize confusion matrix properly by @arnavgarg1 in #3109
Fold all validation into ModelConfig. by @justinxzhao in #3104
fix: add pytest to hashfiles to be more selective about caching by @abidwael in #3113
Disable XLM Text Encoder because of host memory pressure issues by @arnavgarg1 in #3108
Bump ludwig docker image ray220 by @geoffreyangus in #3111
Remove XLM encoder from slow encoders test by @geoffreyangus in #3114
Added additional dropout to concat by @tgaddair in #3116
Update field descriptions for ludwig-docs by @tgaddair in #3123
Transformer divisibility error validation by @jeffkinnison in #3105
fix: Updated Learning Rate decay_rate to use corresponding Metadata. by @martindavis in #3128
refactor: Use TypeSelection to power optimizer field by @ksbrar in #3071
refactor: Add separate ECD and GBM defaults schemas by @ksbrar in #3124
Skip batch norm when chunk size is 1 in GhostBatchNorm by @arnavgarg1 in #3119
Fix tests to be compliant with latest version of whylogs by @justinxzhao in #3131
Make positive class weight a float. by @justinxzhao in #3133
feat: Raise deprecation warnings for unknown parameters by @ksbrar in #3118
Align sequence encoder descriptions with ludwig-docs by @tgaddair in #3135
[Explain] always return global and row-level explanations by @jppgks in #3132
Update benchmark tests by @abidwael in #3115
[Hyperopt] Load checkpoints directly from the object store in Ray 2.2 by @arnavgarg1 in #3037
Decouple loss schema from implementation by @tgaddair in #3141
Adds non-slow HF unit test to validate constant value by @geoffreyangus in #3130
Updated decoder and loss schemas, removed dep from schema -> loss_modules by @tgaddair in #3140
Init minimal config by @tgaddair in #3143
Updated HF long descriptions by @tgaddair in #3144
Disallow GPU tensors in GBM training and eval by @jeffkinnison in #3139
Not default to adding None as accepted tuple for FloatRangeTupleDataclassField by @abidwael in #3146
Ray 2.3 Compatibility by @arnavgarg1 in #3009
Unpin pyarrow by @arnavgarg1 in #3149
Update Ludwig version to 0.7 by @arnavgarg1 in #3148

New Contributors

@Marvjowa made their first contribution in #2236
@Dennis-Rall made their first contribution in #2192
@abidwael made their first contribution in #2263
@noahlh made their first contribution in #2284
@jeffkinnison made their first contribution in #2316
@andife made their first contribution in #2358
@alberttorosyan made their first contribution in #2413
@herrmann made their first contribution in #2746
@drishi made their first contribution in #2725
@TrellixVulnTeam made their first contribution in #2770
@rudolfolah made their first contribution in #2852
@martindavis made their first contribution in #3128

Full Changelog: v0.6.4...v0.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.7

Key Highlights

What's Changed

New Contributors

Contributors