Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Suggestions for project #6

Closed
wants to merge 49 commits into from

Conversation

DanielAtKrypton
Copy link

The following measures were taken:

  • Add datasets folder with a README.md to ease data update.
  • Add to ignore list datasets/*.csv
  • Create make_npz.py script to create dataset_CAPT_v7.npz file
  • Lint
  • Sorted imports
  • Removed unused imports
  • Formatted code with autopep8
  • training.ipynb now has a strategy to size train, validation and test data keeping the same previous proportions.
  • tst folder renamed to test, so that vscode icons automatically detects the correct icon for tests.
  • Fix search.py import bug

learning_curve.py Outdated Show resolved Hide resolved
.gitignore Outdated Show resolved Hide resolved
README.md Outdated Show resolved Hide resolved
export_doc.py Outdated Show resolved Hide resolved
learning_curve.py Outdated Show resolved Hide resolved
src/utils/csv2npz.py Outdated Show resolved Hide resolved
src/visualization/__init__.py Outdated Show resolved Hide resolved
test/decoder.py Outdated Show resolved Hide resolved
@maxjcohen
Copy link
Owner

Hi, first of all thanks for contributing to this project. I'm writing a few reviews for these commits and I'll merge once we settle on a good pull request.

Copy link
Owner

@maxjcohen maxjcohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would actually like to keep the output of the training notebook, to give an example of training and visualization upon loading the repo.

Copy link
Owner

@maxjcohen maxjcohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes this should be linted.

Copy link
Owner

@maxjcohen maxjcohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to keep the benchmark repo and this transformer repo independent. I am currently working on a slightly different dataset than the one used for the benchmark, this is why this file is different. I may consider creating a "datachallenge" branch on this repo with the original labels.json file.

Copy link
Owner

@maxjcohen maxjcohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All benchmark models can be found in the benchmark.py file. You fill find the LSTM among them.

@DanielAtKrypton DanielAtKrypton marked this pull request as draft May 3, 2020 18:37
@DanielAtKrypton DanielAtKrypton marked this pull request as ready for review May 4, 2020 22:23
Copy link
Owner

@maxjcohen maxjcohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To sum up the main modifications that are still needed:

  • Removing empty docstring
  • Removing in-code pylint settings
  • Reverting changes to the tst package, unless there is a substancial modification

benchmark.ipynb Outdated Show resolved Hide resolved
cross_validation.py Outdated Show resolved Hide resolved
cross_validation.py Show resolved Hide resolved
datasets/README.md Outdated Show resolved Hide resolved
export_doc.py Outdated Show resolved Hide resolved
export_doc.py Outdated Show resolved Hide resolved
learning_curve.py Outdated Show resolved Hide resolved
src/utils/search.py Outdated Show resolved Hide resolved
tst/decoder.py Outdated Show resolved Hide resolved
@DanielAtKrypton
Copy link
Author

Here is an example that interacts with the main changes I am proposing:

import numpy as np
from matplotlib import pyplot as plt

from src.transformer_tsp import TransformerTimeSeriesPredictor
from tests.flights_dataset import FlightsDataset

if __name__ == "__main__":
    plot_config = {}
    plot_config['training progress'] = False
    plot_config['prediction on training data'] = False
    plot_config['forecast'] = True

    forecast_config = {}
    forecast_config['include history'] = True
    forecast_config['months ahead'] = 24

    predictor_config = {}

    config = {}
    config['plot'] = plot_config
    config['forecast'] = forecast_config
    config['predictor'] = predictor_config
    config['predict enabled'] = False
    config['forecast enabled'] = True

    tsp = TransformerTimeSeriesPredictor()

    hist_loss = tsp.fit(FlightsDataset())
    # training_dataframe = tsp.get_training_dataframe()

    if config['plot']['training progress']:
        plt.figure()
        plt.plot(hist_loss, 'o-', label='train')
        plt.show()

    if config['predict enabled']:
        # Select training example
        idx = np.random.randint(0, len(tsp.dataloader.dataset))
        x, y = tsp.dataloader.dataset[idx]

        # Run predictions
        netout = tsp.predict(x)

        if config['plot']['prediction on training data']:
            plt.figure()
        d_output = netout.shape[2]
        for idx_output_var in range(d_output):
            # Select real passengers data
            y_true = y[:, idx_output_var]

            y_pred = netout[0, :, idx_output_var]

            if config['plot']['prediction on training data']:
                plt.subplot(d_output, 1, idx_output_var+1)

                plt.plot(y_true, label="Truth")
                plt.plot(y_pred, label="Prediction")
                plt.title(tsp.dataloader.dataset.labels['y'][idx_output_var])
                plt.legend()
        if config['plot']['prediction on training data']:
            plt.show()

    # Run forecast
    if config['forecast enabled']:
        netout = tsp.forecast(config['forecast']['months ahead'],
                              include_history=config['forecast']['include history'])

        if config['plot']['forecast']:
            plt.figure()
        d_output = netout.shape[2]
        # Select any training example just for comparison
        idx = np.random.randint(0, len(tsp.dataloader.dataset))
        x, y = tsp.dataloader.dataset[idx]
        for idx_output_var in range(d_output):
            # Select real passengers data
            y_true = y[:, idx_output_var]

            y_pred = netout[0, :, idx_output_var]

            if config['plot']['forecast']:
                plt.subplot(d_output, 1, idx_output_var+1)

                if config['forecast']['include history']:
                    plot_args = [y_pred]
                else:
                    plot_args = [
                        [i+tsp.dataset.get_x_shape()[1]+1 for i in range(len(y_pred))], y_pred]
                plt.plot(*plot_args, label="Prediction")
                plt.plot(y_true, label="Truth")
                plt.title(tsp.dataloader.dataset.labels['y'][idx_output_var])
                plt.legend()
        if config['plot']['forecast']:
            plt.show()

@DanielAtKrypton
Copy link
Author

I created a package than can reduce the complexity. I'll be working on integrating it in the pull request.

@DanielAtKrypton DanielAtKrypton marked this pull request as draft June 3, 2020 06:40
Copy link
Owner

@maxjcohen maxjcohen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I made some remarks about a few lines bothering me, but the main point is the change of the package name: I'd like to keep it as is ! This way, I can enjoy my commit history not being shambled. The fact that the name tst is taken is not really a limitation, I never had intention to upload it to pip. Instead, the tst package can be installed via pip install git+https://github.com/maxjcohen/transformer/.

Thanks for you contributions !

.bumpversion.cfg Show resolved Hide resolved
Comment on lines +1 to +15
language: python
cache:
pip: true
directories:
# - datasets
before_install:
- python --version
- pip install -U pip
- pip install codecov
install:
- pip install -e .[test] # install package + test dependencies
script:
- pytest --cov=time_series_transformer # run tests
after_success:
- codecov # submit coverage
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rather avoid using Travis here, this is not so much a development project that it requires CI/CD.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good practice to whenever there is a code change, to run a checklist of all the desired functionality covering all corners of the code base. If you look at codecov with the preliminary tests I came up with 65% of the code was already covered. Now that number can reach 100% with the addition of a few more tests, so that you can see it is always working as intended when there is a code change. And if the build breaks for any reason you are immediatelly informed by email. Then you can go there and fix it fast. Sometimes a dependency breaks the code base, as was the case with numpy 1.19.4 in Windows. The only solution as of now was to stick with a previous numpy version.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I understand, but we currently don't have any kind of test, and dependency problems are left to the user by not mentioning the precise version of each package in the requirements.

If at some point we develop some meaningful tests, we could consider using Travis, but wright now it's just a waste of CPU time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I developed two small fast tests for it. One with the GPU if it is available and another with the CPU. See here. There is a notebook that shows the test cenario here. This file was also removed in the latest pull request draft.

@@ -1,67 +1,34 @@
# Transformers for Time Series
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we removing such big chunks of the Readme ?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The purpose was to simplify the package documentation so that only essential info is there for SW developers.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes but this content is important, it show how and why I developed that repo. The README is not the documentation, I think we can get away with having a detailed file here.

requirements-lock.txt Show resolved Hide resolved
scripts/deploy.ps1 Show resolved Hide resolved
time_series_transformer/decoder.py Outdated Show resolved Hide resolved
time_series_transformer/decoder.py Show resolved Hide resolved
time_series_transformer/transformer.py Show resolved Hide resolved
@DanielAtKrypton DanielAtKrypton changed the title General improvements Suggestions for project Dec 21, 2020
@DanielAtKrypton
Copy link
Author

I secured time_series_transformer package name, I can hand that over to you. Just send me an email at daniel(at)kryptonunite(dot)com. We can better discuss development and academic cooperation by email if you will.

@DanielAtKrypton
Copy link
Author

In order to better understand how TimeSeriesPredictor works, please take a look at this and this. I think specially students get most benefit here since it simplifies the data science cycle. No need to manually scale input and outputs and many options are available.

@DanielAtKrypton DanielAtKrypton mentioned this pull request Jan 5, 2021
@DanielAtKrypton
Copy link
Author

Closed in favor of #44

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants