Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated notebook to run on TensorFlow 2.18.0 #19

Open
wants to merge 45 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
45 commits
Select commit Hold shift + click to select a range
5a4feaa
updated package name from sklearn to scikit-learn
seyedrezamirkhani Nov 20, 2024
7c89888
working by splitting the numeric features
seyedrezamirkhani Nov 21, 2024
90406cf
all the code working
seyedrezamirkhani Nov 22, 2024
1bc2355
initial commit
seyedrezamirkhani Nov 22, 2024
ad1224f
TODO: merge input
seyedrezamirkhani Nov 22, 2024
8adbd96
using only numeric features as independent inputs
seyedrezamirkhani Nov 23, 2024
d3061a1
using only numeric features as merged input
seyedrezamirkhani Nov 23, 2024
d29d073
initial commit; only works with mse loss, need to fix the issue with …
seyedrezamirkhani Nov 23, 2024
21d3999
working with initial model function by changing the order in which th…
seyedrezamirkhani Nov 23, 2024
bb240ec
working with dnn_split returning dict instead of lists, like the orig…
seyedrezamirkhani Nov 23, 2024
3969ae2
removed unused cells
seyedrezamirkhani Nov 24, 2024
01454c3
made to work with zero_inflated_lognormal_loss (zlin). the problem wa…
seyedrezamirkhani Nov 24, 2024
16752ba
updated with code from regression_merged
seyedrezamirkhani Nov 25, 2024
20fa9b7
updated bash block to 1- Used DATA_FOLDER variable. 2- only download …
seyedrezamirkhani Nov 26, 2024
c1ef756
refactored code; added comments
seyedrezamirkhani Nov 26, 2024
380ce4c
initial commit
seyedrezamirkhani Nov 27, 2024
154ee59
added kdd_cup_98 tmp/result/image files
seyedrezamirkhani Nov 27, 2024
c93750f
initial commit
seyedrezamirkhani Nov 27, 2024
80a107b
removed commented cells
seyedrezamirkhani Nov 27, 2024
256b746
removed unused paths
seyedrezamirkhani Nov 27, 2024
b9389b3
unified paths
seyedrezamirkhani Nov 27, 2024
b186363
removed unused files
seyedrezamirkhani Nov 27, 2024
2e81949
initial commit
seyedrezamirkhani Nov 27, 2024
4a4917e
removing extra files
seyedrezamirkhani Nov 27, 2024
c312fa8
pprepared for pull request
seyedrezamirkhani Nov 27, 2024
a5a0e68
tidy up for merge request
seyedrezamirkhani Nov 27, 2024
83fcd01
tidy up for merge request
seyedrezamirkhani Nov 27, 2024
acf6291
tidy up for merge request
seyedrezamirkhani Nov 27, 2024
37cae76
using my own repo
seyedrezamirkhani Nov 27, 2024
f29d927
made DATA_FOLDER variable to reduce hard-coding of paths; changed %%s…
seyedrezamirkhani Nov 29, 2024
2600ff3
fixed error in company query
seyedrezamirkhani Nov 29, 2024
4a78a2e
updated to work the TF 2.18; fixed error in query for company
seyedrezamirkhani Nov 29, 2024
0c92da4
removed commented code for Embedding
seyedrezamirkhani Nov 29, 2024
6e20750
with output
seyedrezamirkhani Nov 29, 2024
7aa53df
added kaggle_acquire_valued_shoppers_challenge/tmp/
seyedrezamirkhani Dec 1, 2024
7a9d740
updated linear_model function to build a sequential model now that th…
seyedrezamirkhani Dec 1, 2024
5ab43d5
saved with zlin and dnn parameters without any output
seyedrezamirkhani Dec 1, 2024
d0b162a
updated to work with TF 2.18
seyedrezamirkhani Dec 1, 2024
0695f39
updated to work with TF 2.18
seyedrezamirkhani Dec 1, 2024
63a6f83
cleared notebook
seyedrezamirkhani Dec 1, 2024
d184b95
corrected path for notebooks/kaggle_acquire_valued_shoppers_challenge…
seyedrezamirkhani Dec 1, 2024
c0e72d3
update pip install url for the package in this project
seyedrezamirkhani Dec 2, 2024
9f6624d
initial commit
seyedrezamirkhani Dec 2, 2024
a5b840f
added changes made in this repo to support TensorFlow 2.18
seyedrezamirkhani Dec 2, 2024
ab96c0e
Added TLDR to explain how the kaggle notebooks work
seyedrezamirkhani Dec 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
.ipynb_checkpoints
__pycache__

notebooks/kdd_cup_98/tmp/
notebooks/kaggle_acquire_valued_shoppers_challenge/tmp/
80 changes: 77 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,77 @@
# Updating project to work with TensorFlow 2.18

This repo is created as a fork of the [Customer Lifetime Value](https://github.com/google/lifetime_value) project from google. A pull request for these changes has been created. The changes in this repo enable the use of TensorFlow 2.18.

The code has has been tested using the Ubuntu 24 operating system running Python 3.12 and the NVIDIA RTX A5000 graphics card.

## List of changes

- updated package name from sklearn to scikit-learn in Setup.py

- updated notebooks from the notebook folder to:

- use a DATA_FOLDER variable for location of input and output files.

- replaced `%%script` blocks with `%%bash` since `%%script` is no longer supported.

- added an extra dimension for y_train and y_eval during the fit call to
make them 2-dimensional arrays. Without this, the zlin loss (ltv.zero_inflated_lognormal_loss)
will fail as it has a checks for target variable being two dimensional.

- removed quote characters around the company variable in calls to the pandas *query*
function in notebooks of the *kaggle_acquire_valued_shoppers_challenge* folder. This may
have worked in previous versions of pandas, but it silently returns an empty dataframe if
a string is used as a query value against a numeric column.

- replaced referenced to LinearModel with a Sequential linear model as this class is longer supported.

- moved the numeric input field in kdd_cup_98/regression.ipynb to the last parameter.
Due to the shape of this parameter (21,) and the presence of other features, if this
paramter is not the last, TensorFlow throws an error during the call to the *fit* method.

- added environment.yml to save packages used in the conda environment used to build this project. This includes NVIDIA libraries.

- added requirements.txt file.

## TLDR

### kaggle acquire valued shoppers challenge

There are three notebooks in the folder *notebooks/kaggle_acquire_valued_shoppers_challenge*

- **preprocess_data.ipynb** has the code for processing the raw transaction file
to build company specific feature files. This code is repeated in the other two
notebooks. The preprocessing involves:

- Filtering of all records. Only select transactions with positive values; this excludes all returns which have a negative value. So the label won't reflect the returns nor will the calibration value

- Generating the calibration value. Sum up the total purchase amount for the first day of shopping.

- Generating calibration attributes. Take the most expensive transaction and select its
'chain', 'dept', 'category', 'brand', 'productmeasure' values for the first day of shopping. Note, all other transactions are ignored. Any null values for these attributes are replaced by UNKNOWN.

- Generating the label/holdout value. This is the total amount purchased by a customer in one year

The zero_inflated_lognormal_loss function used by both regression and classification notebooks, requires three inputs which are generated as the three output nodes of these models.

- **regression.ipynb**

To predict using a regression model, call the intial predict function followed by a call to the *zero_inflated_lognormal_pred* function, passing it all three output node values e.g.

```
logits = model.predict(x=x_eval, batch_size=1024)
y_pred = ltv.zero_inflated_lognormal_pred(logits).numpy().flatten()
```

- **classification.ipynb**

To predict using a classification model, call the intial predict function followed by a call to the sigmoid function passing it the values of the first of the three output node values e.g.

```
logits = model.predict(x=x_eval, batch_size=1024)
y_pred = K.sigmoid(logits[..., :1]).numpy().flatten()
```

# Lifetime Value

Accurate predictions of customers’ lifetime value (LTV) given their attributes
Expand Down Expand Up @@ -32,20 +106,20 @@ A Deep Probabilistic Model for Customer Lifetime Value Prediction.
The easiest way is propably using pip:

```
pip install -q git+https://github.com/google/lifetime_value
pip install -q git+https://github.com/seyedrezamirkhani/lifetime_value
```

If you are using a machine without admin rights, you can do:

```
pip install -q git+https://github.com/google/lifetime_value --user
pip install -q git+https://github.com/seyedrezamirkhani/lifetime_value --user
```

If you are using [Google Colab](https://colab.research.google.com/), just add
"!" to the beginning:

```
!pip install -q git+https://github.com/google/lifetime_value
!pip install -q git+https://github.com/seyedrezamirkhani/lifetime_value
```

Package works for python 3 only.
Expand Down
184 changes: 184 additions & 0 deletions environment.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,184 @@
name: clv-google
channels:
- defaults
dependencies:
- _libgcc_mutex=0.1=main
- _openmp_mutex=5.1=1_gnu
- bzip2=1.0.8=h5eee18b_6
- ca-certificates=2024.9.24=h06a4308_0
- expat=2.6.3=h6a678d5_0
- ld_impl_linux-64=2.40=h12ee557_0
- libffi=3.4.4=h6a678d5_1
- libgcc-ng=11.2.0=h1234567_1
- libgomp=11.2.0=h1234567_1
- libstdcxx-ng=11.2.0=h1234567_1
- libuuid=1.41.5=h5eee18b_0
- ncurses=6.4=h6a678d5_0
- openssl=3.0.15=h5eee18b_0
- pip=24.2=py312h06a4308_0
- python=3.12.7=h5148396_0
- readline=8.2=h5eee18b_0
- setuptools=75.1.0=py312h06a4308_0
- sqlite=3.45.3=h5eee18b_0
- tk=8.6.14=h39e8969_0
- wheel=0.44.0=py312h06a4308_0
- xz=5.4.6=h5eee18b_1
- zlib=1.2.13=h5eee18b_1
- pip:
- absl-py==2.1.0
- anyio==4.6.2.post1
- argon2-cffi==23.1.0
- argon2-cffi-bindings==21.2.0
- arrow==1.3.0
- asttokens==2.4.1
- astunparse==1.6.3
- async-lru==2.0.4
- attrs==24.2.0
- babel==2.16.0
- beautifulsoup4==4.12.3
- bleach==6.2.0
- certifi==2024.8.30
- cffi==1.17.1
- charset-normalizer==3.4.0
- cloudpickle==3.1.0
- comm==0.2.2
- contourpy==1.3.1
- cycler==0.12.1
- debugpy==1.8.8
- decorator==5.1.1
- defusedxml==0.7.1
- dm-tree==0.1.8
- executing==2.1.0
- fastjsonschema==2.20.0
- flatbuffers==24.3.25
- fonttools==4.55.0
- fqdn==1.5.1
- gast==0.6.0
- google-pasta==0.2.0
- grpcio==1.68.0
- h11==0.14.0
- h5py==3.12.1
- httpcore==1.0.7
- httpx==0.27.2
- idna==3.10
- ipykernel==6.29.5
- ipython==8.29.0
- ipywidgets==8.1.5
- isoduration==20.11.0
- jedi==0.19.2
- jinja2==3.1.4
- joblib==1.4.2
- json5==0.9.28
- jsonpointer==3.0.0
- jsonschema==4.23.0
- jsonschema-specifications==2024.10.1
- jupyter==1.1.1
- jupyter-client==8.6.3
- jupyter-console==6.6.3
- jupyter-core==5.7.2
- jupyter-events==0.10.0
- jupyter-lsp==2.2.5
- jupyter-server==2.14.2
- jupyter-server-terminals==0.5.3
- jupyterlab==4.2.6
- jupyterlab-pygments==0.3.0
- jupyterlab-server==2.27.3
- jupyterlab-widgets==3.0.13
- kaggle==1.6.17
- keras==3.6.0
- kiwisolver==1.4.7
- libclang==18.1.1
- lifetime-value==0.1
- markdown==3.7
- markdown-it-py==3.0.0
- markupsafe==3.0.2
- matplotlib==3.9.2
- matplotlib-inline==0.1.7
- mdurl==0.1.2
- mistune==3.0.2
- ml-dtypes==0.4.1
- namex==0.0.8
- nbclient==0.10.0
- nbconvert==7.16.4
- nbformat==5.10.4
- nest-asyncio==1.6.0
- notebook==7.2.2
- notebook-shim==0.2.4
- numpy==2.0.2
- nvidia-cublas-cu12==12.5.3.2
- nvidia-cuda-cupti-cu12==12.5.82
- nvidia-cuda-nvcc-cu12==12.5.82
- nvidia-cuda-nvrtc-cu12==12.5.82
- nvidia-cuda-runtime-cu12==12.5.82
- nvidia-cudnn-cu12==9.3.0.75
- nvidia-cufft-cu12==11.2.3.61
- nvidia-curand-cu12==10.3.6.82
- nvidia-cusolver-cu12==11.6.3.83
- nvidia-cusparse-cu12==12.5.1.3
- nvidia-nccl-cu12==2.21.5
- nvidia-nvjitlink-cu12==12.5.82
- opt-einsum==3.4.0
- optree==0.13.1
- overrides==7.7.0
- packaging==24.2
- pandas==2.2.3
- pandocfilters==1.5.1
- parso==0.8.4
- pexpect==4.9.0
- pillow==11.0.0
- platformdirs==4.3.6
- prometheus-client==0.21.0
- prompt-toolkit==3.0.48
- protobuf==5.28.3
- psutil==6.1.0
- ptyprocess==0.7.0
- pure-eval==0.2.3
- pycparser==2.22
- pydot==3.0.2
- pygments==2.18.0
- pyparsing==3.2.0
- python-dateutil==2.9.0.post0
- python-json-logger==2.0.7
- python-slugify==8.0.4
- pytz==2024.2
- pyyaml==6.0.2
- pyzmq==26.2.0
- referencing==0.35.1
- requests==2.32.3
- rfc3339-validator==0.1.4
- rfc3986-validator==0.1.1
- rich==13.9.4
- rpds-py==0.21.0
- scikit-learn==1.5.2
- scipy==1.14.1
- seaborn==0.13.2
- send2trash==1.8.3
- six==1.16.0
- sniffio==1.3.1
- soupsieve==2.6
- stack-data==0.6.3
- tensorboard==2.18.0
- tensorboard-data-server==0.7.2
- tensorflow==2.18.0
- tensorflow-probability==0.25.0
- termcolor==2.5.0
- terminado==0.18.1
- text-unidecode==1.3
- tf-keras==2.18.0
- threadpoolctl==3.5.0
- tinycss2==1.4.0
- tornado==6.4.1
- tqdm==4.67.0
- traitlets==5.14.3
- types-python-dateutil==2.9.0.20241003
- typing-extensions==4.12.2
- tzdata==2024.2
- uri-template==1.3.0
- urllib3==2.2.3
- wcwidth==0.2.13
- webcolors==24.11.1
- webencodings==0.5.1
- websocket-client==1.8.0
- werkzeug==3.1.3
- widgetsnbextension==4.0.13
- wrapt==1.16.0
Loading