Skip to content

all parameters

bugface edited this page Aug 2, 2021 · 2 revisions

"model_type": "deberta" => what type of transformer architecture you will use e.g., bert, roberta, xlnet "data_format_mode": 0, => 0 is for sep mode - [CLS]S1[SEP]S2[SEP]; 1 is for uni mode - [CLS]S1S2[SEP], we recommend 0 "classification_scheme": 2, => which tokens will be used for classification, 0 will only use [CLS]; 1 will use [CLS], [S1], and [S2]; 2 will use [CLS], [S1], [S2], [E1], [E2]; 3 will use [S1], [S2] "pretrained_model": "microsoft/deberta-base", => actual model pretrained weights, you can use models from huggingface repo or our mimic pretrained models "data_dir": "../sample_data", => the directory for you data, should have train.tsv, test.tsv, dev.tsv (optional) "new_model_dir": "../deberta_re_model", => where to save your fined-tuned checkpoints "predict_output_file": "../deberta_re_predict.txt", "overwrite_model_dir": true, "seed": 1234, "max_seq_length": 128, "cache_data": false, "data_file_header": true, "do_train": true, "do_eval": false, => if set do_eval, you need to provide dev.tsv, and model selection will be based on performances on dev.tsv "do_predict": true, "do_lower_case": true, "train_batch_size": 2, "eval_batch_size": 32, "learning_rate": 1e-05, "num_train_epochs": 5, "gradient_accumulation_steps": 1, "do_warmup": true, "warmup_ratio": 0.1, "weight_decay": 0.0, "adam_epsilon": 1e-08, "max_grad_norm": 1.0, "max_num_checkpoints": 0, => the max number of checkpoints can be saved, if more than the max number, the oldest will be deleted "log_file": null, => where to save the log information, if none print loggings to console only "log_lvl": "I", => log level; i for info, w for warn, e for error, d for debug "log_step": 2, "num_core": 4, => how many CPU cores will be used for data processing (tokenization and covert to transformer compatible codes) "non_relation_label": "nonRel", "progress_bar": false, "fp16": false, "fp16_opt_level": "O1", "use_focal_loss": false, => whether use focal loss function, default loss function is cross entropy "focal_loss_gamma": 2, "use_binary_classification_mode": false, => where use binary classification loss function, if yes, your labels must have only two categories "balance_sample_weights": false => we will add the sample weights into the loss function

Clone this wiki locally