Merge pull request #18 from basf/develop

publish version 0.1.3
basf · Jun 3, 2024 · 0912d15 · 0912d15
2 parents ce4793d + 36a8bce
commit 0912d15
Show file tree

Hide file tree

Showing 45 changed files with 2,946 additions and 1,337 deletions.
diff --git a/.github/workflows/ISSUE_TEMPLATE/bug_report.md b/.github/workflows/ISSUE_TEMPLATE/bug_report.md
@@ -0,0 +1,28 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: "[BUG]"
+labels: bug
+assignees: ''
+
+---
+
+**Describe the bug**
+A clear and concise description of what the bug is.
+
+**To Reproduce**
+Steps to reproduce the behavior:
+
+**Expected behavior**
+A clear and concise description of what you expected to happen.
+
+**Screenshots**
+If applicable, add screenshots to help explain your problem.
+
+**Desktop (please complete the following information):**
+ - OS: [e.g. Ubuntu]
+ - Python version [e.g. 3.8]
+ - Mambular Version [e.g. 0.1.2]
+
+**Additional context**
+Add any other context about the problem here.
diff --git a/.github/workflows/ISSUE_TEMPLATE/config.yml b/.github/workflows/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1 @@
+blank_issues_enabled: false
diff --git a/.github/workflows/ISSUE_TEMPLATE/doc_request.md b/.github/workflows/ISSUE_TEMPLATE/doc_request.md
@@ -0,0 +1,11 @@
+---
+name: Doc request
+about: Create a documentation request to help us improve
+title: "[DOC]"
+labels: docs
+assignees: ''
+
+---
+
+**Description of the question**
+A clear and concise description of what should be documented.
diff --git a/.github/workflows/ISSUE_TEMPLATE/feature_request.md b/.github/workflows/ISSUE_TEMPLATE/feature_request.md
@@ -0,0 +1,20 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: "[FEATURE]"
+labels: enhancement
+assignees: ''
+
+---
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. eg. I would like to include preprocessing for [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context or screenshots about the feature request here.
diff --git a/.github/workflows/ISSUE_TEMPLATE/question.md b/.github/workflows/ISSUE_TEMPLATE/question.md
@@ -0,0 +1,17 @@
+---
+name: Question
+about: Ask a question about how to use the software
+title: '[FAQ]'
+labels: question
+assignees: ''
+
+---
+
+**Context**
+Gives some context if needed (environment, system, hardware).
+
+**Describe the task you are trying to achieve.**
+A clear and concise description of what the task is.
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
diff --git a/.github/workflows/publish.yml b/.github/workflows/publish.yml
@@ -0,0 +1,35 @@
+name: Publish Package
+
+on:
+  push:
+    branches:
+      - master
+
+jobs:
+  publish:
+    runs-on: ubuntu-latest
+
+    steps:
+      - name: Checkout code
+        uses: actions/checkout@v2
+
+      - name: Set up Python
+        uses: actions/setup-python@v2
+        with:
+          python-version: "3.8"
+
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install setuptools wheel twine
+
+      - name: Build package
+        run: |
+          python setup.py sdist bdist_wheel
+
+      - name: Publish package to PyPI
+        env:
+          TWINE_USERNAME: __token__
+          TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
+        run: |
+          twine upload dist/*
diff --git a/LICENSE b/LICENSE
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2024 BASF
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
diff --git a/README.md b/README.md
@@ -1,10 +1,23 @@
-# Mambular: Tabular Deep Learning with Mamba Architectures
+<div align="center">
+  <img src="./docs/images/logo/mamba_tabular.jpg" width="400"/>
+
+
+[![PyPI](https://img.shields.io/pypi/v/mambular)](https://pypi.org/project/mambular)
+![PyPI - Downloads](https://img.shields.io/pypi/dw/mambular)
+[![docs build](https://readthedocs.org/projects/mambular/badge/?version=latest)](https://mambular.readthedocs.io/en/latest/?badge=latest)
+[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mambular.readthedocs.io/en/latest/)
+[![open issues](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/basf/mamba-tabular/issues)
 
-Mambular is a Python package that brings the power of Mamba architectures to tabular data, offering a suite of deep learning models for regression, classification, and distributional regression tasks. Designed with ease of use in mind, Mambular models adhere to scikit-learn's `BaseEstimator` interface, making them highly compatible with the familiar scikit-learn ecosystem. This means you can fit, predict, and transform using Mambular models just as you would with any traditional scikit-learn model, but with the added performance and flexibility of deep learning.
 
-<img src="https://github.com/basf/mamba-tabular/blob/master/docs/images/logo/mamba_tabular.jpg" alt="Mamba Tabular" width="400"/>
+[📘Documentation](https://mambular.readthedocs.io/en/latest/index.html) |
+[🛠️Installation](https://mambular.readthedocs.io/en/latest/installation.html) |
+[Models](https://mambular.readthedocs.io/en/latest/api/models/index.html) |
+[🤔Report Issues](https://github.com/basf/mamba-tabular/issues)
+</div>
+
+# Mambular: Tabular Deep Learning with Mamba Architectures
 
-<!-- ![Logo](./docs/images/logo/mamba_tabular.jpg) -->
+Mambular is a Python package that brings the power of Mamba architectures to tabular data, offering a suite of deep learning models for regression, classification, and distributional regression tasks. Designed with ease of use in mind, Mambular models adhere to scikit-learn's `BaseEstimator` interface, making them highly compatible with the familiar scikit-learn ecosystem. This means you can fit, predict, and evaluate using Mambular models just as you would with any traditional scikit-learn model, but with the added performance and flexibility of deep learning.
 
 ## Features
 
@@ -28,29 +41,25 @@ pip install mambular
 
 ## Preprocessing
 
-Mambular elevates the preprocessing stage of model development, employing a sophisticated suite of techniques to ensure your data is in the best shape for the Mamba architectures. Our preprocessing module is designed to be both powerful and intuitive, offering a range of options to transform your tabular data efficiently.
+Mambular simplifies the preprocessing stage of model development with a comprehensive set of techniques to prepare your data for Mamba architectures. Our preprocessing module is designed to be both powerful and easy to use, offering a variety of options to efficiently transform your tabular data.
 
 ### Data Type Detection and Transformation
 
-Mambular automatically identifies the type of each feature in your dataset, applying the most suitable transformations to numerical and categorical variables. This includes:
-
+Mambular automatically identifies the type of each feature in your dataset and applies the most appropriate transformations for numerical and categorical variables. This includes:
 - **Ordinal Encoding**: Categorical features are seamlessly transformed into numerical values, preserving their inherent order and making them model-ready.
 - **One-Hot Encoding**: For nominal data, Mambular employs one-hot encoding to capture the presence or absence of categories without imposing ordinality.
 - **Binning**: Numerical features can be discretized into bins, a useful technique for handling continuous variables in certain modeling contexts.
 - **Decision Tree Binning**: Optionally, Mambular can use decision trees to find the optimal binning strategy for numerical features, enhancing model interpretability and performance.
 - **Normalization**: Mambular can easily handle numerical features without specifically turning them into categorical features. Standard preprocessing steps such as normalization per feature are possible
 - **Standardization**: Similarly, Standardization instead of Normalization can be used.
+- **PLE**: Periodic Linear Encodings for numerical features can enhance performance for tabular DL methods.
 
 
 ### Handling Missing Values
 
-Our preprocessing pipeline gracefully handles missing data, employing strategies like mean imputation for numerical features and mode imputation for categorical ones, ensuring that your models receive complete data inputs without manual intervention.
-
-### Flexible and Customizable
-
-While Mambular excels in automating the preprocessing workflow, it also offers flexibility. You can customize the preprocessing steps to fit the unique needs of your dataset, ensuring that you're not locked into a one-size-fits-all approach.
+Our preprocessing pipeline effectively handles missing data by using mean imputation for numerical features and mode imputation for categorical features. This ensures that your models receive complete data inputs without needing manual intervention.
+Additionally, Mambular can manage unknown categorical values during inference by incorporating classical <UNK> tokens in categorical preprocessing.
 
-By integrating Mambular's preprocessing module into your workflow, you're not just preparing your data for deep learning; you're optimizing it for excellence. This commitment to data quality is what sets Mambular apart, making it an indispensable tool in your machine learning arsenal.
 
 ## Fit a Model
 Fitting a model in mambular is as simple as it gets. All models in mambular are sklearn BaseEstimators. Thus the `.fit` method is implemented for all of them. Additionally, this allows for using all other sklearn inherent methods such as their built in hyperparameter optimization tools.
@@ -59,14 +68,14 @@ Fitting a model in mambular is as simple as it gets. All models in mambular are
 from mambular.models import MambularClassifier
 # Initialize and fit your model
 model = MambularClassifier(
-    dropout=0.01,
-    d_model=128,
-    n_layers=6,
-    numerical_preprocessing="normalization",
+    d_model=64,
+    n_layers=8,
+    numerical_preprocessing="ple",
+    n_bins=50
 )
 
 # X can be a dataframe or something that can be easily transformed into a pd.DataFrame as a np.array
-model.fit(X, y, max_epochs=500, lr=1e-03, patience=25)
+model.fit(X, y, max_epochs=150, lr=1e-04)
 ```
 
 Predictions are also easily obtained:
@@ -108,12 +117,6 @@ Mambular introduces a cutting-edge approach to distributional regression through
 These distribution classes allow `MambularLSS` to flexibly model a wide variety of data types and distributions, providing users with the tools needed to capture the full complexity of their data.
 
 
-### Use Cases for MambularLSS:
-
-- **Risk Assessment**: In finance or insurance, understanding the range and likelihood of potential losses is as important as predicting average outcomes.
-- **Demand Forecasting**: For inventory management, capturing the variability in product demand helps in optimizing stock levels.
-- **Personalized Medicine**: In healthcare, distributional regression can predict a range of possible patient responses to a treatment, aiding in personalized therapy planning.
-
 ### Getting Started with MambularLSS:
 
 To integrate distributional regression into your workflow with `MambularLSS`, start by initializing the model with your desired configuration, similar to other Mambular models:
@@ -124,17 +127,17 @@ from mambular.models import MambularLSS
 # Initialize the MambularLSS model
 model = MambularLSS(
     dropout=0.2,
-    d_model=256,
-    n_layers=4,
+    d_model=64,
+    n_layers=8,
 
 )
 
 # Fit the model to your data
 model.fit(
     X, 
     y, 
-    max_epochs=300, 
-    lr=1e-03, 
+    max_epochs=150, 
+    lr=1e-04, 
     patience=10,     
     family="normal" # define your distribution
     )
@@ -147,7 +150,7 @@ If you find this project useful in your research, please consider cite:
 ```BibTeX
 @misc{2024,
     title={Mambular: Tabular Deep Learning with Mamba Architectures},
-    author={Anton Frederik Thielmann, Soheila Samiee, Christoph Weisser, Benjamin Saefken'},
+    author={Anton Frederik Thielmann, Manish Kumar, Christoph Weisser, Benjamin Saefken, Soheila Samiee},
     howpublished = {\url{https://github.com/basf/mamba-tabular}},
     year={2024}
 }

diff --git a/docs/api/base_models/BaseModels.rst b/docs/api/base_models/BaseModels.rst
@@ -1,5 +1,5 @@
-Base Models
-===========
+mambular.base_models
+====================
 
 .. autoclass:: mambular.base_models.BaseMambularClassifier
     :members:

diff --git a/docs/api/models/Models.rst b/docs/api/models/Models.rst
@@ -1,5 +1,5 @@
-Models
-======
+mambular.models
+===============
 
 .. autoclass:: mambular.models.MambularClassifier
     :members:

diff --git a/docs/api/utils/Preprocessor.rst b/docs/api/utils/Preprocessor.rst
@@ -1,5 +1,5 @@
-Preprocessing
-=============
+mambular.utils
+==============
 
 .. autoclass:: mambular.utils.Preprocessor
     :members:
diff --git a/docs/conf.py b/docs/conf.py
@@ -16,7 +16,7 @@
 
 project = 'mambular'
 copyright = '2024, Christoph Weisser'
-author = 'Anton Frederik Thielmann, Soheila Samiee, Christoph Weisser, Benjamin Saefken'
+author = 'Anton Frederik Thielmann, Soheila Samiee, Christoph Weisser, Benjamin Saefken, Manish Kumar'
 
 VERSION_PATH = "../mambular/__version__.py"
 with open(VERSION_PATH) as f:

diff --git a/docs/development.md b/docs/development.md
@@ -1,17 +1,8 @@
 
-# Contribute
+## Contribute
 
 Thank you for considering contributing to our Python package! We appreciate your time and effort in helping us improve our project. Please take a moment to review the following guidelines to ensure a smooth and efficient contribution process.
 
-## Table of Contents
-
-- Code of Conduct
-- Setting Up Development Environment
-- How to Contribute
-- Submitting Contributions
-- Issue Tracker
-- License
-
 ### Code of Conduct
 
 We kindly request all contributors to adhere to our Code of Conduct when participating in this project. It outlines our expectations for respectful and inclusive behavior within the community.
@@ -43,7 +34,7 @@ pip install -r docs/requirements_docs.txt
 
 1. Create a new branch from the `develop` branch for your contributions. Please use descriptive and concise branch names.
 2. Make your desired changes or additions to the codebase.
-3. Ensure that your code adheres to our coding style guidelines.
+3. Ensure that your code adheres to [PEP8](https://peps.python.org/pep-0008/) coding style guidelines.
 4. Write appropriate tests for your changes, ensuring that they pass.
     - `make test`
 5. Update the documentation and examples, if necessary.