Skip to content

Commit 1363448

Browse files
1649759610tianxin
and
tianxin
authored
[SKEP P1] pass labels to forward function in examples/applications (PaddlePaddle#5067)
* initial commit * refine readme * refine codestyle * refine readme * refine readme * fix model saving bug * initial commit * initial commit * initial commit * use common metric instead of eval_metrics.py and remove unuseful code * mv stage project to ASO_analysis * add unified sentiment analysis * refine readme * refine readme * refnie readme * add unified sentiment analysis * refine readme * initial commit * initial commit * refine readme * add taskflow for sentiment analysis with UIE * refine Readme * refine readme.md * support sentiment analysis (UIE) with inputing by file format * refine readme * delete predict scripts * refine readme * delete unuseful files * add pipeline for sentiment_analysis * merging with the newest code * fix to convert data without synonyms * add senta pipeline * refine readme * drop functions: inputting file and saving results * add UIE-seta-[base, medium, mini, micro, nano] * modify .gitignore to trace deploy code * add deploy with SimpleServer * add debug mode * fix debug mode * update the loading method of UIE * refine readme * fix bug caused by version updating * fix hard coding for model name. * refine codestyle * modify readme according the way of 'step by step' * refine codestyl * change saving txt to json files * download font automatically when not input font_path * change readme in the way 'step by step' * add model prediction by batch * add uie-senta-x to support_schema_list * update sentiment analysis in taskflow * add prediction with saved offline model * change the exception exposure way * add description for visual schema * delete comments * remove comments * remove unused code and comments * convert uie-senta-x model params to fit ernie/uie * refine readme for sentiment analysis * add running time * refine readme for senta pipeline * change uie-base to uie-senta-base * load uie-senta-x with auto module * add deploy with SimpleServer * refine codestyle * refine readme * add uie-senta-x to support_schema_list * fix hard coding for mdoel anme * refine codestyle * refine codestyl * refine codestyle * refine codestyle * refine codestyle * refine codestyle * refine codestyle * refine codestyle * refine codestyle * fix senta response * refine codestyle * remove lambda expressions * add link of senta pipeline * refine codestyle * remove local path * fix typos * refine readme * load uie-senta-x with automodel * remove commented code * restore auto * add link of hotel dataset to readme. * add link for downloading test_hotel.txt * fix url problem for server and client * refine readme * fix for senta_examples.py * update visualization function * update visualization function * refine readme and update visualization description * update visualization function * refine readme and update visualization function * change logger in PaddleNLP to log information * fix running time for skep and uie * fix bug to solve tokenizer updating problem * refine label-studio readme * refine label-studio readme * refine label-studio readme * optimize example construction for a, o, as, ao extraction task * add the labeling method for ext task: a, as, ao and so on. * add note for visual_analysis.py * change link for downloading data and refine log output * refine log output * refine readme * expose options interface * refine readme * modify typos * expose options for customing sentiment analysis * README.md * fix bug for param is_shuffle in label_studio.py * [BugFix] Fix the param is_shuffle problem * [BugFix] Fix the bool param is_shuffle problem * [BugFix] Fix the bool param is_shuffle problem * [Model Update] add configuration for skep model. * [Transformer Update] update skep and add related unittest * [Skep Update] fix examples and taskflow * [Transformer Update] update skep examples for sentiment analysis * [Transformer Update] update skep examples for sentiment analysis * [Transformer Update] CodeStyle for examples/skep/* * [Transformer Update] examples/skep done * [Transformer Update] refine readme in examples/skep * [Transformer Update] initial skep tests done. * [Skep Upgrade] add skep in taskflow tests * [Skep Upgrade] add tests for skep in taskflow * [Skep Upgrade] remove yapf in examples/skep * [Skep P0] remove print * [SKEP P0] fix the param ckpt_dir in predict scripts * [SKEP P0] fix the param ckpt_dir for skep examples. * [SKEP P0] fix ci_case * [SKEP P0] remove tiny_random_bert taskflow/test_sentiment_analysis * [SKEP P0] add tiny-random-skep as code comment * [SKEP P0] using tiny_random_skep for tests in case of OOM at test machine. * [SKEP P0] add taskflow prediction for examples/skep/sentence * [SKEP P0] add __internal_testing__/tiny-random-skep as taskflow model * [SKEP P1] fix return parameters for SkepCrfForTokenClassification * [SKEP P1] pass labels to forward for examples * [SKEP P1] pass labels to forward for applications --------- Co-authored-by: tianxin <[email protected]>
1 parent 3b6db41 commit 1363448

File tree

4 files changed

+7
-17
lines changed

4 files changed

+7
-17
lines changed

applications/sentiment_analysis/ASO_analysis/classification/train.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -20,7 +20,6 @@
2020

2121
import numpy as np
2222
import paddle
23-
import paddle.nn.functional as F
2423
from data import convert_example_to_feature, load_dict
2524
from datasets import load_dataset
2625
from evaluate import evaluate
@@ -100,8 +99,7 @@ def train():
10099
batch_data["token_type_ids"],
101100
batch_data["labels"],
102101
)
103-
logits = model(input_ids, token_type_ids=token_type_ids)
104-
loss = F.cross_entropy(logits, labels)
102+
loss, logits = model(input_ids, token_type_ids=token_type_ids, labels=labels)
105103

106104
loss.backward()
107105
lr_scheduler.step()

applications/sentiment_analysis/ASO_analysis/extraction/train.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,6 @@
2121

2222
import numpy as np
2323
import paddle
24-
import paddle.nn.functional as F
2524
from data import convert_example_to_feature, load_dict
2625
from datasets import load_dataset
2726
from evaluate import evaluate
@@ -100,8 +99,7 @@ def train():
10099
batch_data["token_type_ids"],
101100
batch_data["labels"],
102101
)
103-
logits = model(input_ids, token_type_ids=token_type_ids)
104-
loss = F.cross_entropy(logits.reshape([-1, len(label2id)]), labels.reshape([-1]), ignore_index=-1)
102+
loss, logits = model(input_ids, token_type_ids=token_type_ids, labels=labels)
105103

106104
loss.backward()
107105
lr_scheduler.step()

examples/sentiment_analysis/skep/train_aspect.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,6 @@ def create_dataloader(dataset, mode="train", batch_size=1, batchify_fn=None, tra
147147
weight_decay=args.weight_decay,
148148
apply_decay_param_fun=lambda x: x in decay_params,
149149
)
150-
criterion = paddle.nn.loss.CrossEntropyLoss()
151150
metric = paddle.metric.Accuracy()
152151

153152
global_step = 0
@@ -156,8 +155,7 @@ def create_dataloader(dataset, mode="train", batch_size=1, batchify_fn=None, tra
156155
for epoch in range(1, args.epochs + 1):
157156
for step, batch in enumerate(train_data_loader, start=1):
158157
input_ids, token_type_ids, labels = batch["input_ids"], batch["token_type_ids"], batch["labels"]
159-
logits = model(input_ids, token_type_ids)
160-
loss = criterion(logits, labels)
158+
loss, logits = model(input_ids, token_type_ids, labels=labels)
161159
probs = F.softmax(logits, axis=1)
162160
correct = metric.compute(probs, labels)
163161
metric.update(correct)

examples/sentiment_analysis/skep/train_sentence.py

Lines changed: 4 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -68,13 +68,12 @@ def set_seed(seed):
6868

6969

7070
@paddle.no_grad()
71-
def evaluate(model, criterion, metric, data_loader):
71+
def evaluate(model, metric, data_loader):
7272
"""
7373
Given a dataset, it evals model and computes the metric.
7474
7575
Args:
7676
model(obj:`paddle.nn.Layer`): A model to classify texts.
77-
criterion(obj:`paddle.nn.Layer`): It can compute the loss.
7877
metric(obj:`paddle.metric.Metric`): The evaluation metric.
7978
data_loader(obj:`paddle.io.DataLoader`): The dataset loader which generates batches.
8079
"""
@@ -83,8 +82,7 @@ def evaluate(model, criterion, metric, data_loader):
8382
losses = []
8483
for batch in data_loader:
8584
input_ids, token_type_ids, labels = batch["input_ids"], batch["token_type_ids"], batch["labels"]
86-
logits = model(input_ids, token_type_ids)
87-
loss = criterion(logits, labels)
85+
loss, logits = model(input_ids, token_type_ids, labels=labels)
8886
losses.append(loss.numpy())
8987
correct = metric.compute(logits, labels)
9088
metric.update(correct)
@@ -196,7 +194,6 @@ def create_dataloader(dataset, mode="train", batch_size=1, batchify_fn=None, tra
196194
weight_decay=args.weight_decay,
197195
apply_decay_param_fun=lambda x: x in decay_params,
198196
)
199-
criterion = paddle.nn.loss.CrossEntropyLoss()
200197
metric = paddle.metric.Accuracy()
201198

202199
# start to train model
@@ -206,8 +203,7 @@ def create_dataloader(dataset, mode="train", batch_size=1, batchify_fn=None, tra
206203
for epoch in range(1, args.epochs + 1):
207204
for step, batch in enumerate(train_data_loader, start=1):
208205
input_ids, token_type_ids, labels = batch["input_ids"], batch["token_type_ids"], batch["labels"]
209-
logits = model(input_ids, token_type_ids)
210-
loss = criterion(logits, labels)
206+
loss, logits = model(input_ids, token_type_ids, labels=labels)
211207
probs = F.softmax(logits, axis=1)
212208
correct = metric.compute(probs, labels)
213209
metric.update(correct)
@@ -227,7 +223,7 @@ def create_dataloader(dataset, mode="train", batch_size=1, batchify_fn=None, tra
227223
save_dir = os.path.join(args.save_dir, "model_%d" % global_step)
228224
if not os.path.exists(save_dir):
229225
os.makedirs(save_dir)
230-
evaluate(model, criterion, metric, dev_data_loader)
226+
evaluate(model, metric, dev_data_loader)
231227
# Need better way to get inner model of DataParallel
232228
model._layers.save_pretrained(save_dir)
233229
tokenizer.save_pretrained(save_dir)

0 commit comments

Comments
 (0)