Suggestions for project #6

DanielAtKrypton · 2020-05-03T06:23:55Z

The following measures were taken:

Add datasets folder with a README.md to ease data update.
Add to ignore list datasets/*.csv
Create make_npz.py script to create dataset_CAPT_v7.npz file
Lint
Sorted imports
Removed unused imports
Formatted code with autopep8
training.ipynb now has a strategy to size train, validation and test data keeping the same previous proportions.
tst folder renamed to test, so that vscode icons automatically detects the correct icon for tests.
Fix search.py import bug

learning_curve.py

.gitignore

README.md

export_doc.py

learning_curve.py

src/utils/csv2npz.py

src/visualization/__init__.py

test/decoder.py

maxjcohen · 2020-05-03T08:22:08Z

Hi, first of all thanks for contributing to this project. I'm writing a few reviews for these commits and I'll merge once we settle on a good pull request.

maxjcohen

I would actually like to keep the output of the training notebook, to give an example of training and visualization upon loading the repo.

maxjcohen

Yes this should be linted.

maxjcohen

I would like to keep the benchmark repo and this transformer repo independent. I am currently working on a slightly different dataset than the one used for the benchmark, this is why this file is different. I may consider creating a "datachallenge" branch on this repo with the original labels.json file.

maxjcohen

All benchmark models can be found in the benchmark.py file. You fill find the LSTM among them.

maxjcohen

To sum up the main modifications that are still needed:

Removing empty docstring
Removing in-code pylint settings
Reverting changes to the tst package, unless there is a substancial modification

benchmark.ipynb

cross_validation.py

datasets/README.md

export_doc.py

learning_curve.py

src/utils/search.py

tst/decoder.py

DanielAtKrypton · 2020-06-02T00:54:32Z

Here is an example that interacts with the main changes I am proposing:

import numpy as np
from matplotlib import pyplot as plt

from src.transformer_tsp import TransformerTimeSeriesPredictor
from tests.flights_dataset import FlightsDataset

if __name__ == "__main__":
    plot_config = {}
    plot_config['training progress'] = False
    plot_config['prediction on training data'] = False
    plot_config['forecast'] = True

    forecast_config = {}
    forecast_config['include history'] = True
    forecast_config['months ahead'] = 24

    predictor_config = {}

    config = {}
    config['plot'] = plot_config
    config['forecast'] = forecast_config
    config['predictor'] = predictor_config
    config['predict enabled'] = False
    config['forecast enabled'] = True

    tsp = TransformerTimeSeriesPredictor()

    hist_loss = tsp.fit(FlightsDataset())
    # training_dataframe = tsp.get_training_dataframe()

    if config['plot']['training progress']:
        plt.figure()
        plt.plot(hist_loss, 'o-', label='train')
        plt.show()

    if config['predict enabled']:
        # Select training example
        idx = np.random.randint(0, len(tsp.dataloader.dataset))
        x, y = tsp.dataloader.dataset[idx]

        # Run predictions
        netout = tsp.predict(x)

        if config['plot']['prediction on training data']:
            plt.figure()
        d_output = netout.shape[2]
        for idx_output_var in range(d_output):
            # Select real passengers data
            y_true = y[:, idx_output_var]

            y_pred = netout[0, :, idx_output_var]

            if config['plot']['prediction on training data']:
                plt.subplot(d_output, 1, idx_output_var+1)

                plt.plot(y_true, label="Truth")
                plt.plot(y_pred, label="Prediction")
                plt.title(tsp.dataloader.dataset.labels['y'][idx_output_var])
                plt.legend()
        if config['plot']['prediction on training data']:
            plt.show()

    # Run forecast
    if config['forecast enabled']:
        netout = tsp.forecast(config['forecast']['months ahead'],
                              include_history=config['forecast']['include history'])

        if config['plot']['forecast']:
            plt.figure()
        d_output = netout.shape[2]
        # Select any training example just for comparison
        idx = np.random.randint(0, len(tsp.dataloader.dataset))
        x, y = tsp.dataloader.dataset[idx]
        for idx_output_var in range(d_output):
            # Select real passengers data
            y_true = y[:, idx_output_var]

            y_pred = netout[0, :, idx_output_var]

            if config['plot']['forecast']:
                plt.subplot(d_output, 1, idx_output_var+1)

                if config['forecast']['include history']:
                    plot_args = [y_pred]
                else:
                    plot_args = [
                        [i+tsp.dataset.get_x_shape()[1]+1 for i in range(len(y_pred))], y_pred]
                plt.plot(*plot_args, label="Prediction")
                plt.plot(y_true, label="Truth")
                plt.title(tsp.dataloader.dataset.labels['y'][idx_output_var])
                plt.legend()
        if config['plot']['forecast']:
            plt.show()

…class.

DanielAtKrypton · 2020-06-03T06:40:39Z

I created a package than can reduce the complexity. I'll be working on integrating it in the pull request.

maxjcohen

Hi, I made some remarks about a few lines bothering me, but the main point is the change of the package name: I'd like to keep it as is ! This way, I can enjoy my commit history not being shambled. The fact that the name tst is taken is not really a limitation, I never had intention to upload it to pip. Instead, the tst package can be installed via pip install git+https://github.com/maxjcohen/transformer/.

Thanks for you contributions !

.bumpversion.cfg

maxjcohen · 2020-12-21T08:39:15Z

.travis.yml

+language: python
+cache:
+  pip: true
+  directories:
+  # - datasets
+before_install:
+  - python --version
+  - pip install -U pip
+  - pip install codecov
+install:
+    - pip install -e .[test] # install package + test dependencies
+script:
+    - pytest --cov=time_series_transformer # run tests
+after_success:
+  - codecov # submit coverage 


I would rather avoid using Travis here, this is not so much a development project that it requires CI/CD.

It is a good practice to whenever there is a code change, to run a checklist of all the desired functionality covering all corners of the code base. If you look at codecov with the preliminary tests I came up with 65% of the code was already covered. Now that number can reach 100% with the addition of a few more tests, so that you can see it is always working as intended when there is a code change. And if the build breaks for any reason you are immediatelly informed by email. Then you can go there and fix it fast. Sometimes a dependency breaks the code base, as was the case with numpy 1.19.4 in Windows. The only solution as of now was to stick with a previous numpy version.

Yes I understand, but we currently don't have any kind of test, and dependency problems are left to the user by not mentioning the precise version of each package in the requirements.

If at some point we develop some meaningful tests, we could consider using Travis, but wright now it's just a waste of CPU time.

I developed two small fast tests for it. One with the GPU if it is available and another with the CPU. See here. There is a notebook that shows the test cenario here. This file was also removed in the latest pull request draft.

maxjcohen · 2020-12-21T08:40:00Z

README.md

@@ -1,67 +1,34 @@
 # Transformers for Time Series


Why are we removing such big chunks of the Readme ?

The purpose was to simplify the package documentation so that only essential info is there for SW developers.

Yes but this content is important, it show how and why I developed that repo. The README is not the documentation, I think we can get away with having a detailed file here.

requirements-lock.txt

scripts/deploy.ps1

time_series_transformer/decoder.py

time_series_transformer/transformer.py

DanielAtKrypton · 2020-12-21T16:16:00Z

I secured time_series_transformer package name, I can hand that over to you. Just send me an email at daniel(at)kryptonunite(dot)com. We can better discuss development and academic cooperation by email if you will.

DanielAtKrypton · 2020-12-21T16:32:21Z

In order to better understand how TimeSeriesPredictor works, please take a look at this and this. I think specially students get most benefit here since it simplifies the data science cycle. No need to manually scale input and outputs and many options are available.

…Max.

DanielAtKrypton · 2021-01-05T11:40:45Z

Closed in favor of #44

Daniel Kaminski de Souza added 5 commits May 3, 2020 02:47

🎨 Lint, format, remove not used imports, sort imports.

8d87d82

🐛 Fix search.py import bug.

1d9f542

👕 Clear training.ipynb output.

4ece89a

👕 lint make_npz.py

71c9edb

👕 lint datasets/README.md

bce27ba

DanielAtKrypton mentioned this pull request May 3, 2020

Where can I download the dataset? #2

Closed

Daniel Kaminski de Souza added 2 commits May 3, 2020 03:52

👕 lint test/multiHeadAttention.py

f1fd606

🚧 Update labels.json to reflect the challenge.

8ab9035

maxjcohen reviewed May 3, 2020

View reviewed changes

🚧 Add model.py.

de9274c

maxjcohen reviewed May 3, 2020

View reviewed changes

🚧 Add dataset2.py.

6c35365

maxjcohen reviewed May 3, 2020

View reviewed changes

Daniel Kaminski de Souza added 3 commits May 3, 2020 15:09

🚧 Improve pull request accordingly to feedback.

b1fb712

👕 dopout should actually be dropout.

c780755

Restore training.ipynb

2a8ba6d

DanielAtKrypton marked this pull request as draft May 3, 2020 18:37

Daniel Kaminski de Souza added 5 commits May 3, 2020 15:58

➖ Remove csv2npz and make_npz.py

de92c1a

➕ Add pylint to requirements.txt.

5a7fd4e

👕 lint src/utils/search.py

6359785

👕 lint src/utils/search.py

6e86025

🚧 Add to ignore list *.csv

6de0f16

DanielAtKrypton marked this pull request as ready for review May 4, 2020 22:23

maxjcohen requested changes May 6, 2020

View reviewed changes

Daniel Kaminski de Souza added 3 commits May 6, 2020 17:50

🚧 Development path

2fe0b18

🚧 Development path.

d4fe40c

🚧 Development path.

5285fd4

➕ Implement forecast functionality.

ad0479c

🐛 Fix make_future_dataframe method of TransformerTimeSeriesPredictor …

e032fbb

…class.

DanielAtKrypton marked this pull request as draft June 3, 2020 06:40

Daniel Kaminski de Souza added 14 commits December 20, 2020 23:31

🚧 Development path.

2d42ffe

🔥 Clean code base.

e096b00

🔥 Clean code base.

2a6d5aa

🚧 Update version number in bumpversion.cfg

0aac45e

🚧 Update version number in deploy.ps1

339fe18

🔥 Remove doc conf leftover in bumpversion.cfg

d3417a1

Bump version: 0.3.0 → 0.4.0

8e88a81

Add to git ignore list build and dist folders.

7db1943

Bump version: 0.4.0 → 0.4.1

f6188f7

➕ Add travis.

5cd8133

📝 Add badges to README.md.

4e6f58e

🐛 Should fix codecov.

a3d4795

Bump version: 0.4.1 → 0.4.2

c8de445

✅ Update main_test.

78e00c4

maxjcohen requested changes Dec 21, 2020

View reviewed changes

DanielAtKrypton changed the title ~~General improvements~~ Suggestions for project Dec 21, 2020

Daniel Kaminski de Souza added 4 commits January 4, 2021 03:02

🚧 Development path.

9e7f016

🚀 Speed up fitting.

9837bbc

Bump version: 0.4.2 → 0.4.3

50e7ca9

🚧 Development path in order to progressively improve allignment with …

2df8280

…Max.

DanielAtKrypton mentioned this pull request Jan 5, 2021

Suggestions PR #44

Draft

DanielAtKrypton closed this Jan 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Suggestions for project #6

Suggestions for project #6

DanielAtKrypton commented May 3, 2020

maxjcohen commented May 3, 2020

maxjcohen left a comment

maxjcohen left a comment

maxjcohen left a comment

maxjcohen left a comment

maxjcohen left a comment

DanielAtKrypton commented Jun 2, 2020

DanielAtKrypton commented Jun 3, 2020

maxjcohen left a comment

maxjcohen Dec 21, 2020

DanielAtKrypton Dec 21, 2020

maxjcohen Dec 30, 2020

DanielAtKrypton Jan 5, 2021

maxjcohen Dec 21, 2020

DanielAtKrypton Dec 21, 2020

maxjcohen Dec 30, 2020

DanielAtKrypton commented Dec 21, 2020

DanielAtKrypton commented Dec 21, 2020

DanielAtKrypton commented Jan 5, 2021

Suggestions for project #6

Suggestions for project #6

Conversation

DanielAtKrypton commented May 3, 2020

maxjcohen commented May 3, 2020

maxjcohen left a comment

Choose a reason for hiding this comment

maxjcohen left a comment

Choose a reason for hiding this comment

maxjcohen left a comment

Choose a reason for hiding this comment

maxjcohen left a comment

Choose a reason for hiding this comment

maxjcohen left a comment

Choose a reason for hiding this comment

DanielAtKrypton commented Jun 2, 2020

DanielAtKrypton commented Jun 3, 2020

maxjcohen left a comment

Choose a reason for hiding this comment

maxjcohen Dec 21, 2020

Choose a reason for hiding this comment

DanielAtKrypton Dec 21, 2020

Choose a reason for hiding this comment

maxjcohen Dec 30, 2020

Choose a reason for hiding this comment

DanielAtKrypton Jan 5, 2021

Choose a reason for hiding this comment

maxjcohen Dec 21, 2020

Choose a reason for hiding this comment

DanielAtKrypton Dec 21, 2020

Choose a reason for hiding this comment

maxjcohen Dec 30, 2020

Choose a reason for hiding this comment

DanielAtKrypton commented Dec 21, 2020

DanielAtKrypton commented Dec 21, 2020

DanielAtKrypton commented Jan 5, 2021