Click-through rate (CTR) prediction is one of the fundamental tasks for online advertising and recommendation. Inspired by factorization machines, in this paper, we propose FINAL, a factorized interaction layer that extends the widely-used linear layer and is capable of learning quadratic feature interactions. Similar to MLPs, multiple FINAL layers can be stacked into a FINAL block, yielding feature interactions with an exponential degree growth. We unify feature interactions and MLPs into a single FINAL block and empirically show its effectiveness as a replacement for the MLP block.
Jieming Zhu, Qinglin Jia, Guohao Cai, Quanyu Dai, Jingjie Li, Zhenhua Dong, Ruiming Tang, Rui Zhang. FINAL: Factorized Interaction Layer for CTR Prediction, in SIGIR 2023.
Please install the requirements as follows:
pip install -r requirements.txt
The dataset_config.yaml
file contains all the dataset settings as follows.
Params | Type | Default | Description |
---|---|---|---|
data_root | str | the root directory to load and save data data | |
data_format | str | input data format, "h5", "csv", or "tfrecord" supported | |
train_data | str | None | training data path |
valid_data | str | None | validation data path |
test_data | str | None | test data path |
min_categr_count | int | 1 | min count to filter category features, |
feature_cols | list | a list of features with the following dict keys | |
feature_cols::name | str|list | feature column name in csv. A list is allowed in which the features have the same feature type and will be expanded accordingly. | |
feature_cols::active | bool | whether to use the feature | |
feature_cols::dtype | str | the input data dtype, "int"|"str" | |
feature_cols::type | str | feature type "numeric"|"categorical"|"sequence"|"meta" | |
label_col | dict | specify label column | |
label_col::name | str | label column name in csv | |
label_col::dtype | str | label data dtype |
The model_config.yaml
file contains all the model hyper-parameters as follows.
Params | Type | Default | Description |
---|---|---|---|
model | str | "FinalNet" | model name, which should be same with model class name |
dataset_id | str | "TBD" | dataset_id to be determined |
loss | str | "binary_crossentropy" | loss function |
metrics | list | ['logloss', 'AUC'] | a list of metrics for evaluation |
task | str | "binary_classification" | task type supported: "regression" , "binary_classification" |
optimizer | str | "adam" | optimizer used for training |
learning_rate | float | 1.0e-3 | learning rate |
embedding_regularizer | float|str | 0 | regularization weight for embedding matrix: L2 regularization is applied by default. Other optional examples: "l2(1.e-3)" , "l1(1.e-3)" , "l1_l2(1.e-3, 1.e-3)" . |
net_regularizer | float|str | 0 | regularization weight for network parameters: L2 regularization is applied by default. Other optional examples: "l2(1.e-3)" , "l1(1.e-3)" , "l1_l2(1.e-3, 1.e-3)" . |
batch_size | int | 10000 | batch size, usually a large number for CTR prediction task |
embedding_dim | int | 32 | embedding dimension of features. Note that field-wise embedding_dim can be specified in feature_specs . |
block1_hidden_units | list | [64, 64, 64] | hidden units of block1 |
block1_hidden_activations | str|list | "relu" | activation function in block1. Particularly, layer-wise activations can be specified as a list, e.g., ["relu", "leakyrelu", "sigmoid"] |
block2_hidden_units | list | [64, 64, 64] | hidden units of block2 |
block2_hidden_activations | str | "relu" | activation function in block2. Particularly, layer-wise activations can be specified as a list, e.g., ["relu", "leakyrelu", "sigmoid"] |
block1_dropout | float | 0 | dropout rate in block1 |
block2_dropout | float | 0 | dropout rate in block2 |
block_type | str | "2B" | block type: "1B" , "2B" |
batch_norm | bool | True | whether to use BN |
use_feature_gating | bool | True | whether to use feature gating |
residual_type | str | "concat" | residual type in FINAL: "concat" , "sum" |
epochs | int | 100 | the max number of epochs for training, which can early stop via monitor metrics. |
shuffle | bool | True | whether shuffle the data samples for each epoch of training |
seed | int | 2021 | the random seed used for reproducibility |
monitor | str|dict | 'AUC' | the monitor metrics for early stopping. It supports a single metric, e.g., "AUC" . It also supports multiple metrics using a dict, e.g., {"AUC": 2, "logloss": -1} means 2*AUC - logloss . |
monitor_mode | str | 'max' | "max" means that the higher the better, while "min" denotes that the lower the better. |
model_root | str | './checkpoints/' | the dir to save model checkpoints and running logs |
early_stop_patience | int | 2 | training is stopped when monitor metric fails to become better for early_stop_patience=2 consective evaluation intervals. |
save_best_only | bool | True | whether to save the best model checkpoint only |
eval_steps | int|None | None | evaluate the model on validation data every eval_steps . By default, None means evaluation every epoch. |
The evaluation results on AUC:
Model | Criteo | Avazu | MovieLens | Frappe |
---|---|---|---|---|
DCN | 81.39 | 76.47 | 96.87 | 98.39 |
AutoInt+ | 81.39 | 76.45 | 96.92 | 98.48 |
AFN+ | 81.43 | 76.48 | 96.42 | 98.26 |
MaskNet | 81.39 | 76.49 | 96.87 | 98.43 |
DCNv2 | 81.42 | 76.54 | 96.91 | 98.45 |
EDCN | 81.47 | 76.52 | 96.71 | 98.50 |
FinalMLP | 81.49 | 76.66 | 97.20 | 98.61 |
FinalNet_1B | 81.44 | 76.60 | 97.06 | 98.52 |
FinalNet_2B | 81.54 | 76.71 | 97.25 | 98.57 |
-
Get the dataset in
data\Criteo\Criteo_x1
and then run the models:python run_expid.py --config ./config/FinalNet_1B_criteo_x1_tuner_config_04 --expid FinalNet_criteo_x1_007_b4783421 --gpu 0 python run_expid.py --config ./config/FinalNet_2B_criteo_x1_tuner_config_01 --expid FinalNet_criteo_x1_028_b6c861e4 --gpu 0
Note that we run the experiment in the following environment.
GPU: Tesla V100 32G CUDA: 10.2 Python: 3.7.10 Pytorch: 1.11.0
-
See the running logs:
-
Get the dataset in
data\Avazu\Avazu_x1
and then run the models:python run_expid.py --config ./config/FinalNet_1B_avazu_x1_tuner_config_01 --expid FinalNet_avazu_x1_030_3ad8933e --gpu 0 python run_expid.py --config ./config/FinalNet_2B_avazu_x1_tuner_config_04 --expid FinalNet_avazu_x1_023_9d3d37bd --gpu 0
Note that we run the experiment in the following environment.
GPU: Tesla V100 32G CUDA: 10.2 Python: 3.7.10 Pytorch: 1.11.0
-
See the running logs:
-
Get the dataset in
data\Movielens\MovielensLatest_x1
and then run the models:python run_expid.py --config ./config/FinalNet_1B_movielenslatest_x1_tuner_config_02 --expid FinalNet_movielenslatest_x1_008_f8e20ee7 --gpu 0 python run_expid.py --config ./config/FinalNet_2B_movielenslatest_x1_tuner_config_03 --expid FinalNet_movielenslatest_x1_014_50fc5f7f --gpu 0
Note that we run the experiment in the following environment.
GPU: Tesla P100 16G CUDA: 10.0 Python: 3.6.5 Pytorch: 1.0.1.post2
-
See the running logs:
-
Get the dataset in
data\Frappe\Frappe_x1
and then run the models:python run_expid.py --config ./config/FinalNet_1B_frappe_x1 --expid FinalNet_frappe_x1_006_75be0578 --gpu 0 python run_expid.py --config ./config/FinalNet_2B_frappe_x1 --expid FinalNet_frappe_x1_003_c3722b71 --gpu 0
Note that we run the experiment in the following environment.
GPU: Tesla P100 16G CUDA: 10.0 Python: 3.6.5 Pytorch: 1.0.1.post2
-
See the running logs:
For reproducing our baselines, please refer to the BARS benchmark at https://github.com/reczoo/BARS/tree/main/ranking/ctr