From 26d707b119d31110359ab4ed62604d23979e8e39 Mon Sep 17 00:00:00 2001 From: Justin Zhao Date: Tue, 20 Sep 2022 18:21:50 -0400 Subject: [PATCH] Update README to be consistent with ludwig.ai home page. --- README.md | 225 ++++++++++++++++++++++++++++++++---------------------- 1 file changed, 132 insertions(+), 93 deletions(-) diff --git a/README.md b/README.md index 8f4f1ffe937..6f37d02bcd0 100644 --- a/README.md +++ b/README.md @@ -14,59 +14,75 @@ -Translated in [🇰🇷Korean](README_KR.md) +[Full official documentation](ludwig.ai) # What is Ludwig? -Ludwig is an open-source, [declarative machine learning framework](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/what_is_ludwig/#why-declarative-machine-learning-systems) -that makes it easy to define deep learning pipelines with a simple and flexible data-driven configuration system. -Ludwig is suitable for a wide variety of AI tasks, and is hosted by the [Linux Foundation AI & Data](https://lfaidata.foundation/). +Ludwig is a [declarative machine learning framework](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/what_is_ludwig/#why-declarative-machine-learning-systems) +that makes it easy to define machine learning pipelines using a simple and +flexible data-driven configuration system. Ludwig is suitable for a wide variety +of AI tasks, and is hosted by the +[Linux Foundation AI & Data](https://lfaidata.foundation/). -Ludwig allows users to define their deep learning pipeline by simply providing a configuration file, which lists the -inputs and outputs, and their respective data types. Ludwig will then assemble and train a deep learning model and based -on the configuration file, determine how inputs and outputs are preprocessed, encoded, decoded and which metrics and -loss criterion to use. +The configuration declares the input and output features, with their respective +data types. Users can also specify additional parameters to preprocess, encode, +and decode features, load from pre-trained models, compose the internal model +architecture, set training parameters, or run hyperparameter optimization. ![img](https://raw.githubusercontent.com/ludwig-ai/ludwig-docs/master/docs/images/ludwig_legos_unanimated.gif) -Writing a configuration file for Ludwig is easy. The configuration file flexibility allows for full control of every -aspect of the end-to-end pipeline. This includes exploring state-of-the-art model architectures, running a -hyperparameter search, scaling up to larger than available memory datasets and multi-node clusters, and finally serving -the best model in production. All of this is achieved through simple configuration file changes. +Ludwig will build an end-to-end machine learning pipeline automatically, using +whatever is explicitly specified in the configuration, while falling back to +smart defaults for any parameters that are not. -Finally, the use of abstract interfaces throughout the codebase makes it easy for users to extend Ludwig by adding new -models, metrics, losses, preprocessing functions and register them to make them available immediately in the -configuration system. +# Declarative Machine Learning + +Ludwig’s declarative approach to machine learning empowers you to have full +control of the components of the machine learning pipeline that you care about, +while leaving it up to Ludwig to make reasonable decisions for the rest. + +![img](images/why_declarative.png) + +Analysts, scientists, engineers, and researchers use Ludwig to explore +state-of-the-art model architectures, run hyperparameter search, scale up to +larger than available memory datasets and multi-node clusters, and finally +serve the best model in production. + +Finally, the use of abstract interfaces throughout the codebase makes it easy +for users to extend Ludwig by adding new models, metrics, losses, and +preprocessing functions that can be registered to make them immediately useable +in the same unified configuration system. # Main Features -- [Data-Driven configuration system](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/how_ludwig_works) +- **[Data-Driven configuration system](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/how_ludwig_works)** - A config YAML file that describes the schema of your data (input features, output features, and their types) is all - you need to start training deep learning models. Ludwig uses declared features to compose a deep learning model - accordingly. + A config YAML file that describes the schema of your data (input features, + output features, and their types) is all you need to start training deep + learning models. Ludwig uses declared features to compose a deep learning + model accordingly. ```yaml input_features: - - name: data_column_1 - type: number - - name: data_column_2 - type: category - - name: data_column_3 - type: text - - name: data_column_4 - type: image - ... + - name: data_column_1 + type: number + - name: data_column_2 + type: category + - name: data_column_3 + type: text + - name: data_column_4 + type: image + ... output_features: - - name: data_column_5 - type: number - - name: data_column_6 - type: category - ... + - name: data_column_5 + type: number + - name: data_column_6 + type: category + ... ``` -- [Training, prediction, and evaluation from the command line](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/command_line_interface) +- **[Training, prediction, and evaluation from the command line](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/command_line_interface)** Simple commands can be used to train models and predict new data. @@ -76,9 +92,10 @@ configuration system. ludwig eval --model_path results/experiment_run/model --dataset test.csv ``` -- [Programmatic API](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/api/LudwigModel) +- **[Programmatic API](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/api/LudwigModel)** - Ludwig also provides a simple programmatic API for all of the functionality described above and more. + Ludwig also provides a simple programmatic API for all of the functionality + described above and more. ```python from ludwig.api import LudwigModel @@ -99,12 +116,13 @@ configuration system. predictions = model.predict(data) ``` -- [Distributed training](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/distributed_training) +- **[Distributed training](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/distributed_training)** - Train models in a distributed setting using [Horovod](https://github.com/horovod/horovod), which allows training on a - single machine with multiple GPUs or multiple machines with multiple GPUs. + Train models in a distributed setting using [Horovod](https://github.com/horovod/horovod), + which allows training on a single machine with multiple GPUs or multiple + machines with multiple GPUs. -- [Serving](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/serving) +- **[Serving](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/serving)** Serve models using FastAPI. @@ -113,7 +131,7 @@ configuration system. curl http://0.0.0.0:8000/predict -X POST -F "movie_title=Friends With Money" -F "content_rating=R" -F "genres=Art House & International, Comedy, Drama" -F "runtime=88.0" -F "top_critic=TRUE" -F "review_content=The cast is terrific, the movie isn't." ``` -- [Hyperparameter optimization](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/hyperopt) +- **[Hyperparameter optimization](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/hyperopt)** Run hyperparameter optimization locally or using [Ray Tune](https://docs.ray.io/en/latest/tune/index.html). @@ -121,22 +139,26 @@ configuration system. ludwig hyperopt --config config.yaml --dataset data.csv ``` -- [AutoML](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/automl) +- **[AutoML](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/automl)** - Ludwig AutoML takes a dataset, the target column, and a time budget, and returns a trained Ludwig model. + Ludwig AutoML takes a dataset, the target column, and a time budget, and + returns a trained Ludwig model. -- [Third-Party integrations](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/integrations) +- **[Third-Party integrations](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/integrations)** - Ludwig provides an extendable interface to integrate with third-party systems for tracking experiments. Third-party - integrations exist for Comet ML, Weights & Biases, WhyLabs and MLFlow. + Ludwig provides an extendable interface to integrate with third-party + systems for tracking experiments. Third-party integrations exist for Comet + ML, Weights & Biases, WhyLabs, and MLFlow. -- [Extensibility](https://ludwig-ai.github.io/ludwig-docs/latest/developer_guide) +- **[Extensibility](https://ludwig-ai.github.io/ludwig-docs/latest/developer_guide)** - Ludwig is built from the ground up with extensibility in mind. It is easy to add new data types by implementing clear, - well-documented abstract classes that define functions to preprocess, encode, and decode data. + Ludwig is built from the ground up with extensibility in mind. It is easy to + add new data types by implementing clear, well-documented abstract classes + that define functions to preprocess, encode, and decode data. - Furthermore, new `torch nn.Module` models can be easily added by them to a registry. This encourages reuse and sharing - new models with the community. Refer to the [Developer Guide](https://ludwig-ai.github.io/ludwig-docs/latest/developer_guide) + Furthermore, new `torch nn.Module` models can be easily added by them to a + registry. This encourages reuse and sharing new models with the community. + Refer to the [Developer Guide](https://ludwig-ai.github.io/ludwig-docs/latest/developer_guide) for further details. # Quick Start @@ -256,76 +278,93 @@ ludwig visualize --visualization compare_performance --test_statistics path/to/t For the full set of visualization see the [Visualization Guide](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/visualizations). -## Step 6: Happy modeling! +## Step 6: Happy modeling Try applying Ludwig to your data. [Reach out](https://join.slack.com/t/ludwig-ai/shared_invite/zt-mrxo87w6-DlX5~73T2B4v_g6jj0pJcQ) if you have any questions. # Advantages -Ludwig is a profound utility for research scientists, data scientists, and machine learning engineers. - -## Minimal machine learning boilerplate - -Ludwig takes care of the engineering complexity of deep learning out of the box, enabling research scientists to focus on building models at the highest level of abstraction. +- **Minimal machine learning boilerplate** -Data preprocessing, hyperparameter optimization, device management, and distributed training for newly registered `torch.nn.Module` models come completely free. + Ludwig takes care of the engineering complexity of machine learning out of + the box, enabling research scientists to focus on building models at the + highest level of abstraction. Data preprocessing, hyperparameter + optimization, device management, and distributed training for + `torch.nn.Module` models come completely free. -## Easily build your benchmarks +- **Easily build your benchmarks** -Creating a state-of-the-art baseline and comparing it with a new model is a simple config change. + Creating a state-of-the-art baseline and comparing it with a new model is a + simple config change. -## Easily apply new architectures to multiple problems and datasets +- **Easily apply new architectures to multiple problems and datasets** -Apply new models across the extensive set of tasks and datasets that Ludwig supports. Ludwig includes a [full benchmarking toolkit](https://arxiv.org/abs/2111.04260) accessible to any user, for running experiments with multiple models across multiple datasets with just a simple configuration. + Apply new models across the extensive set of tasks and datasets that Ludwig + supports. Ludwig includes a + [full benchmarking toolkit](https://arxiv.org/abs/2111.04260) accessible to + any user, for running experiments with multiple models across multiple + datasets with just a simple configuration. -## Highly configurable data preprocessing, modeling, and metrics +- **Highly configurable data preprocessing, modeling, and metrics** -Any and all aspects of the model architecture, training loop, hyperparameter search, and backend infrastructure can be modified as additional fields in the declarative configuration to customize the pipeline to meet your requirements. + Any and all aspects of the model architecture, training loop, hyperparameter + search, and backend infrastructure can be modified as additional fields in + the declarative configuration to customize the pipeline to meet your + requirements. For details on what can be configured, check out + [Ludwig Configuration](https://ludwig-ai.github.io/ludwig-docs/latest/configuration/) + docs. -For details on what can be configured, check out [Ludwig Configuration](https://ludwig-ai.github.io/ludwig-docs/latest/configuration/) docs. +- **Multi-modal, multi-task learning out-of-the-box** -## Multi-modal, multi-task learning out-of-the-box + Mix and match tabular data, text, images, and even audio into complex model + configurations without writing code. -Mix and match tabular data, text, images, and even audio into complex model configurations without writing code. +- **Rich model exporting and tracking** -## Rich model exporting and tracking + Automatically track all trials and metrics with tools like Tensorboard, + Comet ML, Weights & Biases, MLFlow, and Aim Stack. -Automatically track all trials and metrics with tools like Tensorboard, Comet ML, Weights & Biases, and MLflow. +- **Automatically scale training to multi-GPU, multi-node clusters** -## Automatically scale training to multi-GPU, multi-node clusters + Go from training on your local machine to the cloud without code changes. -Go from training on your local machine to the cloud without code changes. +- **Low-code interface for state-of-the-art models, including pre-trained Huggingface Transformers** -## Low-code interface for state-of-the-art models, including pre-trained Huggingface Transformers + Ludwig also natively integrates with pre-trained models, such as the ones + available in [Huggingface Transformers](https://huggingface.co/docs/transformers/index). + Users can choose from a vast collection of state-of-the-art pre-trained + PyTorch models to use without needing to write any code at all. For example, + training a BERT-based sentiment analysis model with Ludwig is as simple as: -Ludwig also natively integrates with pre-trained models, such as the ones available in [Huggingface Transformers](https://huggingface.co/docs/transformers/index). Users can choose from a vast collection of state-of-the-art pre-trained PyTorch models to use without needing to write any code at all. For example, training a BERT-based sentiment analysis model with Ludwig is as simple as: - -```shell -ludwig train --dataset sst5 --config_str “{input_features: [{name: sentence, type: text, encoder: bert}], output_features: [{name: label, type: category}]}” -``` + ```shell + ludwig train --dataset sst5 --config_str “{input_features: [{name: sentence, type: text, encoder: bert}], output_features: [{name: label, type: category}]}” + ``` -## Low-code interface for AutoML +- **Low-code interface for AutoML** -[Ludwig AutoML](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/automl/) allows users to obtain trained models by providing just a dataset, the target column, and a time budget. + [Ludwig AutoML](https://ludwig-ai.github.io/ludwig-docs/latest/user_guide/automl/) + allows users to obtain trained models by providing just a dataset, the + target column, and a time budget. -```python -auto_train_results = ludwig.automl.auto_train(dataset=my_dataset_df, target=target_column_name, time_limit_s=7200) -``` + ```python + auto_train_results = ludwig.automl.auto_train(dataset=my_dataset_df, target=target_column_name, time_limit_s=7200) + ``` -## Easy productionisation +- **Easy productionisation** -Ludwig makes it easy to serve deep learning models, including on GPUs. Launch a REST API for your trained Ludwig model. + Ludwig makes it easy to serve deep learning models, including on GPUs. + Launch a REST API for your trained Ludwig model. -```shell -ludwig serve --model_path=/path/to/model -``` + ```shell + ludwig serve --model_path=/path/to/model + ``` -Ludwig supports exporting models to efficient Torschscript bundles. + Ludwig supports exporting models to efficient Torschscript bundles. -```shell -ludwig export_torchscript -–model_path=/path/to/model -``` + ```shell + ludwig export_torchscript -–model_path=/path/to/model + ``` # Tutorials @@ -356,7 +395,7 @@ ludwig export_torchscript -–model_path=/path/to/model # More Information -[Full official documentation](https://ludwig-ai.github.io/ludwig-docs/). +[Full official documentation](ludwig.ai). Read our publications on [Ludwig](https://arxiv.org/pdf/1909.07930.pdf), [declarative ML](https://arxiv.org/pdf/2107.08148.pdf), and [Ludwig’s SoTA benchmarks](https://openreview.net/pdf?id=hwjnu6qW7E4). @@ -370,4 +409,4 @@ know, please consider [joining the Ludwig Slack](https://join.slack.com/t/ludwig - [Slack](https://join.slack.com/t/ludwig-ai/shared_invite/zt-mrxo87w6-DlX5~73T2B4v_g6jj0pJcQ) - [Twitter](https://twitter.com/ludwig_ai) - [Medium](https://medium.com/ludwig-ai) -- [GitHub Issues](https://github.com/ludwig-ai/ludwig/issues) +- [GitHub Issues](https://github.com/ludwig-ai/ludwig/issues) \ No newline at end of file