Skip to content

Commit

Permalink
Merge pull request #18 from basf/develop
Browse files Browse the repository at this point in the history
publish version 0.1.3
  • Loading branch information
AnFreTh authored Jun 3, 2024
2 parents ce4793d + 36a8bce commit 0912d15
Show file tree
Hide file tree
Showing 45 changed files with 2,946 additions and 1,337 deletions.
28 changes: 28 additions & 0 deletions .github/workflows/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
---
name: Bug report
about: Create a report to help us improve
title: "[BUG]"
labels: bug
assignees: ''

---

**Describe the bug**
A clear and concise description of what the bug is.

**To Reproduce**
Steps to reproduce the behavior:

**Expected behavior**
A clear and concise description of what you expected to happen.

**Screenshots**
If applicable, add screenshots to help explain your problem.

**Desktop (please complete the following information):**
- OS: [e.g. Ubuntu]
- Python version [e.g. 3.8]
- Mambular Version [e.g. 0.1.2]

**Additional context**
Add any other context about the problem here.
1 change: 1 addition & 0 deletions .github/workflows/ISSUE_TEMPLATE/config.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
blank_issues_enabled: false
11 changes: 11 additions & 0 deletions .github/workflows/ISSUE_TEMPLATE/doc_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
name: Doc request
about: Create a documentation request to help us improve
title: "[DOC]"
labels: docs
assignees: ''

---

**Description of the question**
A clear and concise description of what should be documented.
20 changes: 20 additions & 0 deletions .github/workflows/ISSUE_TEMPLATE/feature_request.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
---
name: Feature request
about: Suggest an idea for this project
title: "[FEATURE]"
labels: enhancement
assignees: ''

---

**Is your feature request related to a problem? Please describe.**
A clear and concise description of what the problem is. eg. I would like to include preprocessing for [...]

**Describe the solution you'd like**
A clear and concise description of what you want to happen.

**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.

**Additional context**
Add any other context or screenshots about the feature request here.
17 changes: 17 additions & 0 deletions .github/workflows/ISSUE_TEMPLATE/question.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
---
name: Question
about: Ask a question about how to use the software
title: '[FAQ]'
labels: question
assignees: ''

---

**Context**
Gives some context if needed (environment, system, hardware).

**Describe the task you are trying to achieve.**
A clear and concise description of what the task is.

**Describe the solution you'd like**
A clear and concise description of what you want to happen.
35 changes: 35 additions & 0 deletions .github/workflows/publish.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
name: Publish Package

on:
push:
branches:
- master

jobs:
publish:
runs-on: ubuntu-latest

steps:
- name: Checkout code
uses: actions/checkout@v2

- name: Set up Python
uses: actions/setup-python@v2
with:
python-version: "3.8"

- name: Install dependencies
run: |
python -m pip install --upgrade pip
pip install setuptools wheel twine
- name: Build package
run: |
python setup.py sdist bdist_wheel
- name: Publish package to PyPI
env:
TWINE_USERNAME: __token__
TWINE_PASSWORD: ${{ secrets.PYPI_TOKEN }}
run: |
twine upload dist/*
21 changes: 21 additions & 0 deletions LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
MIT License

Copyright (c) 2024 BASF

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.
61 changes: 32 additions & 29 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,23 @@
# Mambular: Tabular Deep Learning with Mamba Architectures
<div align="center">
<img src="./docs/images/logo/mamba_tabular.jpg" width="400"/>


[![PyPI](https://img.shields.io/pypi/v/mambular)](https://pypi.org/project/mambular)
![PyPI - Downloads](https://img.shields.io/pypi/dw/mambular)
[![docs build](https://readthedocs.org/projects/mambular/badge/?version=latest)](https://mambular.readthedocs.io/en/latest/?badge=latest)
[![docs](https://img.shields.io/badge/docs-latest-blue)](https://mambular.readthedocs.io/en/latest/)
[![open issues](https://img.shields.io/badge/contributions-welcome-brightgreen.svg?style=flat)](https://github.com/basf/mamba-tabular/issues)

Mambular is a Python package that brings the power of Mamba architectures to tabular data, offering a suite of deep learning models for regression, classification, and distributional regression tasks. Designed with ease of use in mind, Mambular models adhere to scikit-learn's `BaseEstimator` interface, making them highly compatible with the familiar scikit-learn ecosystem. This means you can fit, predict, and transform using Mambular models just as you would with any traditional scikit-learn model, but with the added performance and flexibility of deep learning.

<img src="https://github.com/basf/mamba-tabular/blob/master/docs/images/logo/mamba_tabular.jpg" alt="Mamba Tabular" width="400"/>
[📘Documentation](https://mambular.readthedocs.io/en/latest/index.html) |
[🛠️Installation](https://mambular.readthedocs.io/en/latest/installation.html) |
[Models](https://mambular.readthedocs.io/en/latest/api/models/index.html) |
[🤔Report Issues](https://github.com/basf/mamba-tabular/issues)
</div>

# Mambular: Tabular Deep Learning with Mamba Architectures

<!-- ![Logo](./docs/images/logo/mamba_tabular.jpg) -->
Mambular is a Python package that brings the power of Mamba architectures to tabular data, offering a suite of deep learning models for regression, classification, and distributional regression tasks. Designed with ease of use in mind, Mambular models adhere to scikit-learn's `BaseEstimator` interface, making them highly compatible with the familiar scikit-learn ecosystem. This means you can fit, predict, and evaluate using Mambular models just as you would with any traditional scikit-learn model, but with the added performance and flexibility of deep learning.

## Features

Expand All @@ -28,29 +41,25 @@ pip install mambular

## Preprocessing

Mambular elevates the preprocessing stage of model development, employing a sophisticated suite of techniques to ensure your data is in the best shape for the Mamba architectures. Our preprocessing module is designed to be both powerful and intuitive, offering a range of options to transform your tabular data efficiently.
Mambular simplifies the preprocessing stage of model development with a comprehensive set of techniques to prepare your data for Mamba architectures. Our preprocessing module is designed to be both powerful and easy to use, offering a variety of options to efficiently transform your tabular data.

### Data Type Detection and Transformation

Mambular automatically identifies the type of each feature in your dataset, applying the most suitable transformations to numerical and categorical variables. This includes:

Mambular automatically identifies the type of each feature in your dataset and applies the most appropriate transformations for numerical and categorical variables. This includes:
- **Ordinal Encoding**: Categorical features are seamlessly transformed into numerical values, preserving their inherent order and making them model-ready.
- **One-Hot Encoding**: For nominal data, Mambular employs one-hot encoding to capture the presence or absence of categories without imposing ordinality.
- **Binning**: Numerical features can be discretized into bins, a useful technique for handling continuous variables in certain modeling contexts.
- **Decision Tree Binning**: Optionally, Mambular can use decision trees to find the optimal binning strategy for numerical features, enhancing model interpretability and performance.
- **Normalization**: Mambular can easily handle numerical features without specifically turning them into categorical features. Standard preprocessing steps such as normalization per feature are possible
- **Standardization**: Similarly, Standardization instead of Normalization can be used.
- **PLE**: Periodic Linear Encodings for numerical features can enhance performance for tabular DL methods.


### Handling Missing Values

Our preprocessing pipeline gracefully handles missing data, employing strategies like mean imputation for numerical features and mode imputation for categorical ones, ensuring that your models receive complete data inputs without manual intervention.

### Flexible and Customizable

While Mambular excels in automating the preprocessing workflow, it also offers flexibility. You can customize the preprocessing steps to fit the unique needs of your dataset, ensuring that you're not locked into a one-size-fits-all approach.
Our preprocessing pipeline effectively handles missing data by using mean imputation for numerical features and mode imputation for categorical features. This ensures that your models receive complete data inputs without needing manual intervention.
Additionally, Mambular can manage unknown categorical values during inference by incorporating classical <UNK> tokens in categorical preprocessing.

By integrating Mambular's preprocessing module into your workflow, you're not just preparing your data for deep learning; you're optimizing it for excellence. This commitment to data quality is what sets Mambular apart, making it an indispensable tool in your machine learning arsenal.

## Fit a Model
Fitting a model in mambular is as simple as it gets. All models in mambular are sklearn BaseEstimators. Thus the `.fit` method is implemented for all of them. Additionally, this allows for using all other sklearn inherent methods such as their built in hyperparameter optimization tools.
Expand All @@ -59,14 +68,14 @@ Fitting a model in mambular is as simple as it gets. All models in mambular are
from mambular.models import MambularClassifier
# Initialize and fit your model
model = MambularClassifier(
dropout=0.01,
d_model=128,
n_layers=6,
numerical_preprocessing="normalization",
d_model=64,
n_layers=8,
numerical_preprocessing="ple",
n_bins=50
)

# X can be a dataframe or something that can be easily transformed into a pd.DataFrame as a np.array
model.fit(X, y, max_epochs=500, lr=1e-03, patience=25)
model.fit(X, y, max_epochs=150, lr=1e-04)
```

Predictions are also easily obtained:
Expand Down Expand Up @@ -108,12 +117,6 @@ Mambular introduces a cutting-edge approach to distributional regression through
These distribution classes allow `MambularLSS` to flexibly model a wide variety of data types and distributions, providing users with the tools needed to capture the full complexity of their data.


### Use Cases for MambularLSS:

- **Risk Assessment**: In finance or insurance, understanding the range and likelihood of potential losses is as important as predicting average outcomes.
- **Demand Forecasting**: For inventory management, capturing the variability in product demand helps in optimizing stock levels.
- **Personalized Medicine**: In healthcare, distributional regression can predict a range of possible patient responses to a treatment, aiding in personalized therapy planning.

### Getting Started with MambularLSS:

To integrate distributional regression into your workflow with `MambularLSS`, start by initializing the model with your desired configuration, similar to other Mambular models:
Expand All @@ -124,17 +127,17 @@ from mambular.models import MambularLSS
# Initialize the MambularLSS model
model = MambularLSS(
dropout=0.2,
d_model=256,
n_layers=4,
d_model=64,
n_layers=8,

)

# Fit the model to your data
model.fit(
X,
y,
max_epochs=300,
lr=1e-03,
max_epochs=150,
lr=1e-04,
patience=10,
family="normal" # define your distribution
)
Expand All @@ -147,7 +150,7 @@ If you find this project useful in your research, please consider cite:
```BibTeX
@misc{2024,
title={Mambular: Tabular Deep Learning with Mamba Architectures},
author={Anton Frederik Thielmann, Soheila Samiee, Christoph Weisser, Benjamin Saefken'},
author={Anton Frederik Thielmann, Manish Kumar, Christoph Weisser, Benjamin Saefken, Soheila Samiee},
howpublished = {\url{https://github.com/basf/mamba-tabular}},
year={2024}
}
Expand Down
4 changes: 2 additions & 2 deletions docs/api/base_models/BaseModels.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Base Models
===========
mambular.base_models
====================

.. autoclass:: mambular.base_models.BaseMambularClassifier
:members:
Expand Down
4 changes: 2 additions & 2 deletions docs/api/models/Models.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Models
======
mambular.models
===============

.. autoclass:: mambular.models.MambularClassifier
:members:
Expand Down
4 changes: 2 additions & 2 deletions docs/api/utils/Preprocessor.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Preprocessing
=============
mambular.utils
==============

.. autoclass:: mambular.utils.Preprocessor
:members:
2 changes: 1 addition & 1 deletion docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@

project = 'mambular'
copyright = '2024, Christoph Weisser'
author = 'Anton Frederik Thielmann, Soheila Samiee, Christoph Weisser, Benjamin Saefken'
author = 'Anton Frederik Thielmann, Soheila Samiee, Christoph Weisser, Benjamin Saefken, Manish Kumar'

VERSION_PATH = "../mambular/__version__.py"
with open(VERSION_PATH) as f:
Expand Down
13 changes: 2 additions & 11 deletions docs/development.md
Original file line number Diff line number Diff line change
@@ -1,17 +1,8 @@

# Contribute
## Contribute

Thank you for considering contributing to our Python package! We appreciate your time and effort in helping us improve our project. Please take a moment to review the following guidelines to ensure a smooth and efficient contribution process.

## Table of Contents

- Code of Conduct
- Setting Up Development Environment
- How to Contribute
- Submitting Contributions
- Issue Tracker
- License

### Code of Conduct

We kindly request all contributors to adhere to our Code of Conduct when participating in this project. It outlines our expectations for respectful and inclusive behavior within the community.
Expand Down Expand Up @@ -43,7 +34,7 @@ pip install -r docs/requirements_docs.txt

1. Create a new branch from the `develop` branch for your contributions. Please use descriptive and concise branch names.
2. Make your desired changes or additions to the codebase.
3. Ensure that your code adheres to our coding style guidelines.
3. Ensure that your code adheres to [PEP8](https://peps.python.org/pep-0008/) coding style guidelines.
4. Write appropriate tests for your changes, ensuring that they pass.
- `make test`
5. Update the documentation and examples, if necessary.
Expand Down
Loading

0 comments on commit 0912d15

Please sign in to comment.