google · seyedrezamirkhani · Nov 20, 2024 · Nov 21, 2024 · Nov 22, 2024 · Nov 22, 2024
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,5 @@
+.ipynb_checkpoints
+__pycache__
+
+notebooks/kdd_cup_98/tmp/
+notebooks/kaggle_acquire_valued_shoppers_challenge/tmp/
diff --git a/README.md b/README.md
@@ -1,3 +1,77 @@
+# Updating project to work with TensorFlow 2.18
+
+This repo is created as a fork of the [Customer Lifetime Value](https://github.com/google/lifetime_value) project from google. A pull request for these changes has been created. The changes in this repo enable the use of TensorFlow 2.18.
+
+The code has has been tested using the Ubuntu 24 operating system running Python 3.12 and the NVIDIA RTX A5000 graphics card.
+
+## List of changes
+
+- updated package name from sklearn to scikit-learn in Setup.py
+
+- updated notebooks from the notebook folder to:
+
+  - use a DATA_FOLDER variable for location of input and output files.
+
+  - replaced `%%script` blocks with `%%bash` since `%%script` is no longer supported.
+
+  - added an extra dimension for y_train and y_eval during the fit call to 
+make them 2-dimensional arrays. Without this, the zlin loss (ltv.zero_inflated_lognormal_loss)
+will fail as it has a checks for target variable being two dimensional.
+
+  - removed quote characters around the company variable in calls to the pandas *query* 
+function in notebooks of the  *kaggle_acquire_valued_shoppers_challenge* folder. This may 
+have worked in previous versions of pandas, but it silently returns an empty dataframe if
+a string is used as a query value against a numeric column.
+
+  - replaced referenced to LinearModel with a Sequential linear model as this class is longer supported.
+
+  - moved the numeric input field in kdd_cup_98/regression.ipynb to the last parameter.
+Due to the shape of this parameter (21,) and the presence of other features, if this 
+paramter is not the last, TensorFlow throws an error during the call to the *fit* method.
+
+- added environment.yml to save packages used in the conda environment used to build this project. This includes NVIDIA libraries.
+
+- added requirements.txt file.
+
+## TLDR
+
+### kaggle acquire valued shoppers challenge
+
+There are three notebooks in the folder *notebooks/kaggle_acquire_valued_shoppers_challenge*
+
+- **preprocess_data.ipynb** has the code for processing the raw transaction file
+to build company specific feature files. This code is repeated in the other two
+notebooks. The preprocessing involves:
+
+  - Filtering of all records.  Only select transactions with positive values; this excludes all returns which have a negative value. So the label won't reflect the returns nor will the calibration value
+
+  - Generating the calibration value. Sum up the total purchase amount for the first day of shopping.
+
+  - Generating calibration attributes. Take the most expensive transaction and select its
+		'chain', 'dept', 'category', 'brand', 'productmeasure' values for the first day of shopping. Note, all other transactions are ignored. Any null values for these attributes are replaced by UNKNOWN.
+
+  - Generating the  label/holdout value. This is the total amount purchased by a customer in one year
+
+The zero_inflated_lognormal_loss function used by both regression and classification notebooks, requires three inputs which are generated as the three output nodes of these models.
+
+- **regression.ipynb** 
+
+To predict using a regression model, call the intial predict function followed by a call to the *zero_inflated_lognormal_pred* function, passing it all three output node values e.g.
+
+```
+logits = model.predict(x=x_eval, batch_size=1024)
+y_pred = ltv.zero_inflated_lognormal_pred(logits).numpy().flatten()
+  ```
+
+- **classification.ipynb**
+
+To predict using a classification model, call the intial predict function followed by a call to the sigmoid function passing it the values of the first of the three output node values e.g.
+
+```
+logits = model.predict(x=x_eval, batch_size=1024)
+y_pred = K.sigmoid(logits[..., :1]).numpy().flatten()
+```
+
 # Lifetime Value
 
 Accurate predictions of customers’ lifetime value (LTV) given their attributes
@@ -32,20 +106,20 @@ A Deep Probabilistic Model for Customer Lifetime Value Prediction.
 The easiest way is propably using pip:
 
 ```
-pip install -q git+https://github.com/google/lifetime_value
+pip install -q git+https://github.com/seyedrezamirkhani/lifetime_value
 ```
 
 If you are using a machine without admin rights, you can do:
 
 ```
-pip install -q git+https://github.com/google/lifetime_value --user
+pip install -q git+https://github.com/seyedrezamirkhani/lifetime_value --user
 ```
 
 If you are using [Google Colab](https://colab.research.google.com/), just add
 "!" to the beginning:
 
 ```
-!pip install -q git+https://github.com/google/lifetime_value
+!pip install -q git+https://github.com/seyedrezamirkhani/lifetime_value
 ```
 
 Package works for python 3 only.

diff --git a/environment.yml b/environment.yml
@@ -0,0 +1,184 @@
+name: clv-google
+channels:
+  - defaults
+dependencies:
+  - _libgcc_mutex=0.1=main
+  - _openmp_mutex=5.1=1_gnu
+  - bzip2=1.0.8=h5eee18b_6
+  - ca-certificates=2024.9.24=h06a4308_0
+  - expat=2.6.3=h6a678d5_0
+  - ld_impl_linux-64=2.40=h12ee557_0
+  - libffi=3.4.4=h6a678d5_1
+  - libgcc-ng=11.2.0=h1234567_1
+  - libgomp=11.2.0=h1234567_1
+  - libstdcxx-ng=11.2.0=h1234567_1
+  - libuuid=1.41.5=h5eee18b_0
+  - ncurses=6.4=h6a678d5_0
+  - openssl=3.0.15=h5eee18b_0
+  - pip=24.2=py312h06a4308_0
+  - python=3.12.7=h5148396_0
+  - readline=8.2=h5eee18b_0
+  - setuptools=75.1.0=py312h06a4308_0
+  - sqlite=3.45.3=h5eee18b_0
+  - tk=8.6.14=h39e8969_0
+  - wheel=0.44.0=py312h06a4308_0
+  - xz=5.4.6=h5eee18b_1
+  - zlib=1.2.13=h5eee18b_1
+  - pip:
+      - absl-py==2.1.0
+      - anyio==4.6.2.post1
+      - argon2-cffi==23.1.0
+      - argon2-cffi-bindings==21.2.0
+      - arrow==1.3.0
+      - asttokens==2.4.1
+      - astunparse==1.6.3
+      - async-lru==2.0.4
+      - attrs==24.2.0
+      - babel==2.16.0
+      - beautifulsoup4==4.12.3
+      - bleach==6.2.0
+      - certifi==2024.8.30
+      - cffi==1.17.1
+      - charset-normalizer==3.4.0
+      - cloudpickle==3.1.0
+      - comm==0.2.2
+      - contourpy==1.3.1
+      - cycler==0.12.1
+      - debugpy==1.8.8
+      - decorator==5.1.1
+      - defusedxml==0.7.1
+      - dm-tree==0.1.8
+      - executing==2.1.0
+      - fastjsonschema==2.20.0
+      - flatbuffers==24.3.25
+      - fonttools==4.55.0
+      - fqdn==1.5.1
+      - gast==0.6.0
+      - google-pasta==0.2.0
+      - grpcio==1.68.0
+      - h11==0.14.0
+      - h5py==3.12.1
+      - httpcore==1.0.7
+      - httpx==0.27.2
+      - idna==3.10
+      - ipykernel==6.29.5
+      - ipython==8.29.0
+      - ipywidgets==8.1.5
+      - isoduration==20.11.0
+      - jedi==0.19.2
+      - jinja2==3.1.4
+      - joblib==1.4.2
+      - json5==0.9.28
+      - jsonpointer==3.0.0
+      - jsonschema==4.23.0
+      - jsonschema-specifications==2024.10.1
+      - jupyter==1.1.1
+      - jupyter-client==8.6.3
+      - jupyter-console==6.6.3
+      - jupyter-core==5.7.2
+      - jupyter-events==0.10.0
+      - jupyter-lsp==2.2.5
+      - jupyter-server==2.14.2
+      - jupyter-server-terminals==0.5.3
+      - jupyterlab==4.2.6
+      - jupyterlab-pygments==0.3.0
+      - jupyterlab-server==2.27.3
+      - jupyterlab-widgets==3.0.13
+      - kaggle==1.6.17
+      - keras==3.6.0
+      - kiwisolver==1.4.7
+      - libclang==18.1.1
+      - lifetime-value==0.1
+      - markdown==3.7
+      - markdown-it-py==3.0.0
+      - markupsafe==3.0.2
+      - matplotlib==3.9.2
+      - matplotlib-inline==0.1.7
+      - mdurl==0.1.2
+      - mistune==3.0.2
+      - ml-dtypes==0.4.1
+      - namex==0.0.8
+      - nbclient==0.10.0
+      - nbconvert==7.16.4
+      - nbformat==5.10.4
+      - nest-asyncio==1.6.0
+      - notebook==7.2.2
+      - notebook-shim==0.2.4
+      - numpy==2.0.2
+      - nvidia-cublas-cu12==12.5.3.2
+      - nvidia-cuda-cupti-cu12==12.5.82
+      - nvidia-cuda-nvcc-cu12==12.5.82
+      - nvidia-cuda-nvrtc-cu12==12.5.82
+      - nvidia-cuda-runtime-cu12==12.5.82
+      - nvidia-cudnn-cu12==9.3.0.75
+      - nvidia-cufft-cu12==11.2.3.61
+      - nvidia-curand-cu12==10.3.6.82
+      - nvidia-cusolver-cu12==11.6.3.83
+      - nvidia-cusparse-cu12==12.5.1.3
+      - nvidia-nccl-cu12==2.21.5
+      - nvidia-nvjitlink-cu12==12.5.82
+      - opt-einsum==3.4.0
+      - optree==0.13.1
+      - overrides==7.7.0
+      - packaging==24.2
+      - pandas==2.2.3
+      - pandocfilters==1.5.1
+      - parso==0.8.4
+      - pexpect==4.9.0
+      - pillow==11.0.0
+      - platformdirs==4.3.6
+      - prometheus-client==0.21.0
+      - prompt-toolkit==3.0.48
+      - protobuf==5.28.3
+      - psutil==6.1.0
+      - ptyprocess==0.7.0
+      - pure-eval==0.2.3
+      - pycparser==2.22
+      - pydot==3.0.2
+      - pygments==2.18.0
+      - pyparsing==3.2.0
+      - python-dateutil==2.9.0.post0
+      - python-json-logger==2.0.7
+      - python-slugify==8.0.4
+      - pytz==2024.2
+      - pyyaml==6.0.2
+      - pyzmq==26.2.0
+      - referencing==0.35.1
+      - requests==2.32.3
+      - rfc3339-validator==0.1.4
+      - rfc3986-validator==0.1.1
+      - rich==13.9.4
+      - rpds-py==0.21.0
+      - scikit-learn==1.5.2
+      - scipy==1.14.1
+      - seaborn==0.13.2
+      - send2trash==1.8.3
+      - six==1.16.0
+      - sniffio==1.3.1
+      - soupsieve==2.6
+      - stack-data==0.6.3
+      - tensorboard==2.18.0
+      - tensorboard-data-server==0.7.2
+      - tensorflow==2.18.0
+      - tensorflow-probability==0.25.0
+      - termcolor==2.5.0
+      - terminado==0.18.1
+      - text-unidecode==1.3
+      - tf-keras==2.18.0
+      - threadpoolctl==3.5.0
+      - tinycss2==1.4.0
+      - tornado==6.4.1
+      - tqdm==4.67.0
+      - traitlets==5.14.3
+      - types-python-dateutil==2.9.0.20241003
+      - typing-extensions==4.12.2
+      - tzdata==2024.2
+      - uri-template==1.3.0
+      - urllib3==2.2.3
+      - wcwidth==0.2.13
+      - webcolors==24.11.1
+      - webencodings==0.5.1
+      - websocket-client==1.8.0
+      - werkzeug==3.1.3
+      - widgetsnbextension==4.0.13
+      - wrapt==1.16.0