Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix missing load_specific_dataset(), update testing_daily workflow, release v0.7.1 #481

Merged
merged 4 commits into from
Jul 27, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/publish_to_PyPI.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ jobs:

- name: Build package
run: |
python -m build --no-isolation
python -m build

- name: Publish the new package to PyPI
uses: pypa/[email protected]
Expand Down
42 changes: 20 additions & 22 deletions .github/workflows/testing_daily.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@ on:
schedule:
# https://crontab.guru. Run everyday at 0:00AM UTC, i.e. 08:00AM Beijing, i.e. 08:00PM Montreal (summer time)
- cron: "0 0 * * *"
push:
branches:
- temp_test_branch # if in need, create such a temporary branch to test some functions

jobs:
Daily-testing:
Expand All @@ -14,49 +17,44 @@ jobs:
strategy:
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest, macOS-13]
python-version: ["3.7", "3.11"]
os: [ ubuntu-latest, windows-latest, macOS-latest ]
python-version: [ "3.8" ]
pytorch-version: ["2.3.0"]

steps:
- name: Check out the repo code
uses: actions/checkout@v3

- name: Determine the PyTorch version
uses: haya14busa/action-cond@v1
id: determine_pytorch_ver
with:
cond: ${{ matrix.python-version == 3.7 }}
if_true: "1.13.1"
if_false: "2.1.0"

- name: Set up Conda
uses: conda-incubator/setup-miniconda@v2
with:
activate-environment: pypots-test
miniconda-version: "latest"
activate-environment: pypots
python-version: ${{ matrix.python-version }}
environment-file: tests/environment_for_conda_test.yml
environment-file: requirements/conda_env.yml
auto-activate-base: false

- name: Fetch the test environment details
run: |
# many libs not compatible with numpy 2.0. Note 3.12 requests for numpy>=2.0. fix pandas version to avoid installing pandas 2.0, the same reason with numpy
conda install -c conda-forge numpy=1.24 pandas=1.5
which python
conda info
conda list

- name: Replace with the latest PyPOTS code for testing
run: |
python_site_path=`python -c "import site; print(site.getsitepackages()[0])"`
echo "python site-packages path: $python_site_path"
rm -rf $python_site_path/pypots
python -c "import shutil;import site;shutil.copytree('pypots',site.getsitepackages()[0]+'/pypots')"

- name: Test with pytest
run: |
# run tests separately here due to Segmentation Fault in test_clustering when run all in
# one command with `pytest` on MacOS. Bugs not caught, so this is a trade-off to avoid SF.
python tests/global_test_config.py
rm -rf testing_results && rm -rf tests/__pycache__ && rm -rf tests/*/__pycache__
python -m pytest -rA tests/classification/* -s -n auto --cov=pypots --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/imputation/* -s -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/clustering/* -s -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/forecasting/* -s -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/optim/* -s -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/data/* -s -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/utils/* -s -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python -m pytest -rA tests/cli/* -s -n auto --cov=pypots --cov-append --dist=loadgroup --cov-config=.coveragerc
python tests/global_test_config.py
python -m pytest -rA tests/*/* -s -n auto --cov=pypots --dist=loadgroup --cov-config=.coveragerc

- name: Generate the LCOV report
run: |
Expand Down
2 changes: 2 additions & 0 deletions pypots/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
sliding_window,
inverse_sliding_window,
)
from .load_specific_datasets import load_specific_dataset

__all__ = [
# base dataset classes
Expand All @@ -33,6 +34,7 @@
"gene_complete_random_walk_for_anomaly_detection",
"gene_complete_random_walk_for_classification",
"gene_random_walk",
"load_specific_dataset",
# utils
"parse_delta",
"sliding_window",
Expand Down
73 changes: 73 additions & 0 deletions pypots/data/load_specific_datasets.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
"""
Functions to load supported open-source time-series datasets.
"""

# Created by Wenjie Du <[email protected]>
# License: BSD-3-Clause


from benchpots.datasets import preprocess_physionet2012

from ..utils.logging import logger

# currently supported datasets
SUPPORTED_DATASETS = [
"physionet_2012",
]

# preprocessing functions of the supported datasets
PREPROCESSING_FUNC = {
"physionet_2012": preprocess_physionet2012,
}


def list_supported_datasets() -> list:
"""Return the datasets natively supported by PyPOTS so far.

Returns
-------
SUPPORTED_DATASETS :
A list including all supported datasets.

"""
return SUPPORTED_DATASETS


def load_specific_dataset(dataset_name: str, use_cache: bool = True) -> dict:
"""Load specific datasets supported by PyPOTS.
Different from tsdb.load_dataset(), which only produces merely raw data,
load_specific_dataset here does some preprocessing operations,
like truncating time series to generate samples with the same length.

Parameters
----------
dataset_name :
The name of the dataset to be loaded, which should be supported, i.e. in SUPPORTED_DATASETS.

use_cache :
Whether to use cache. This is an argument of tsdb.load_dataset().

Returns
-------
data :
A dict contains the preprocessed dataset.
Users only need to continue the preprocessing steps to generate the data they want,
e.g. standardizing and splitting.

"""
logger.info(
f"Loading the dataset {dataset_name} with TSDB (https://github.com/WenjieDu/Time_Series_Data_Beans)..."
)
assert dataset_name in SUPPORTED_DATASETS, (
f"Dataset {dataset_name} is not supported. "
f"If you believe this dataset is valuable to be supported by PyPOTS,"
f"please create an issue on GitHub "
f"https://github.com/WenjieDu/PyPOTS/issues"
)
logger.info(f"Starting preprocessing {dataset_name}...")
data = PREPROCESSING_FUNC[dataset_name]("all", 0.1)
logger.warning(
"⚠️ load_specific_dataset() will be deprecated in the near future. Data preprocessing functions "
"are moved to BenchPOTS, which now supports processing 170+ public time-series datasets."
)
return data
2 changes: 1 addition & 1 deletion pypots/version.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,4 +22,4 @@
#
# Dev branch marker is: 'X.Y.dev' or 'X.Y.devN' where N is an integer.
# 'X.Y.dev0' is the canonical version of 'X.Y.dev'
__version__ = "0.7"
__version__ = "0.7.1"
3 changes: 2 additions & 1 deletion requirements/conda_env.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,12 +8,12 @@ channels:

dependencies:
# basic
- conda-forge::python >=3.8
- conda-forge::pip
- conda-forge::h5py
- conda-forge::numpy
- conda-forge::scipy
- conda-forge::sympy
- conda-forge::python
- conda-forge::einops
- conda-forge::pandas
- conda-forge::seaborn
Expand Down Expand Up @@ -46,6 +46,7 @@ dependencies:
# dev
- conda-forge::black
- conda-forge::flake8
- conda-forge::flake8-pyproject
- conda-forge::pre-commit
- conda-forge::jupyterlab

Expand Down
51 changes: 0 additions & 51 deletions requirements/environment_for_conda_test.yml

This file was deleted.

Loading