【PaddleNLP No.5】Update simcse to apply PIR #10396

hanlintang · 2025-04-11T12:56:48Z

PR types

Function optimization

PR changes

Models

Description

Updated slm/applications/neural_search/recall/simcse/README.md to include instructions for downloading and unzipping the required data.
Modified deploy.sh to ensure it can be executed directly from slm/applications/neural_search/recall/simcse/, aligning with the structure of other scripts.
Updated slm/applications/neural_search/recall/simcse/deploy/python/predict.py to support PIR.

issue: #9763
@DrownFish19

其他问题

发现在paddle3.0.0下simcse模型无法成功转换为静态模型，执行导出脚本输出如下：

aistudio@jupyter-227232-8957468:~/PaddleNLP/slm/applications/neural_search/recall/simcse$ python export_model.py --params_path checkpoints/model_7000/model_state.pdparams                        --model_name_or_path rocketqa-zh-base-query-encoder                        --output_path=./output
/home/aistudio/.local/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py:768: UserWarning: full_graph=False don't support input_spec arguments. It will not produce any effect.
You can set full_graph=True, then you can assign input spec.

  warnings.warn(
/home/aistudio/.local/lib/python3.8/site-packages/_distutils_hack/__init__.py:26: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
[2025-04-11 12:00:06,888] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.modeling.ErnieModel'> to load 'rocketqa-zh-base-query-encoder'.
[2025-04-11 12:00:06,890] [    INFO] - Loading weights file from cache at /home/aistudio/.paddlenlp/models/rocketqa-zh-base-query-encoder/model_state.pdparams
[2025-04-11 12:00:07,779] [    INFO] - Loaded weights file from disk, setting weights to model.
W0411 12:00:10.467890 54098 gpu_resources.cc:119] Please NOTE: device: 0, GPU Compute Capability: 7.0, Driver API Version: 12.0, Runtime API Version: 12.6
W0411 12:00:10.472265 54098 gpu_resources.cc:164] device: 0, cuDNN Version: 9.5.
W0411 12:00:10.472301 54098 gpu_resources.cc:196] WARNING: device: 0. The installed Paddle is compiled with CUDA 12.6, but CUDA runtime version in your machine is 12.0, which may cause serious incompatible bug. Please recompile or reinstall Paddle with compatible CUDA version.
[2025-04-11 12:00:11,455] [ WARNING] - Some weights of the model checkpoint at rocketqa-zh-base-query-encoder were not used when initializing ErnieModel: ['classifier.bias', 'classifier.weight']
- This IS expected if you are initializing ErnieModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ErnieModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
[2025-04-11 12:00:11,455] [    INFO] - All the weights of ErnieModel were initialized from the model checkpoint at rocketqa-zh-base-query-encoder.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ErnieModel for predictions without further training.
[2025-04-11 12:00:11,479] [    INFO] - We are using <class 'paddlenlp.transformers.ernie.tokenizer.ErnieTokenizer'> to load 'rocketqa-zh-base-query-encoder'.
[2025-04-11 12:00:11,502] [    INFO] - tokenizer config file saved in /home/aistudio/.paddlenlp/models/rocketqa-zh-base-query-encoder/tokenizer_config.json
[2025-04-11 12:00:11,502] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/rocketqa-zh-base-query-encoder/special_tokens_map.json
Loaded parameters from checkpoints/model_7000/model_state.pdparams
/home/aistudio/.local/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py:768: UserWarning: full_graph=False don't support input_spec arguments. It will not produce any effect.
You can set full_graph=True, then you can assign input spec.

  warnings.warn(

导致转换后的模型在deploy/python/predict.py中运行结果为空，尝试在export_model.py中显示full_graph=True但并没什么效果，参考PaddlePaddle/Paddle#69119 ，应该是默认开启的。尝试不传input_spec依旧报错：

[2025-04-11 12:13:07,143] [    INFO] - Special tokens file saved in /home/aistudio/.paddlenlp/models/rocketqa-zh-base-query-encoder/special_tokens_map.json
Loaded parameters from checkpoints/model_7000/model_state.pdparams
Traceback (most recent call last):
  File "export_model.py", line 58, in <module>
    paddle.jit.save(model, save_path)
  File "/usr/local/lib/python3.8/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/aistudio/.local/lib/python3.8/site-packages/paddle/base/wrapped_decorator.py", line 40, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/home/aistudio/.local/lib/python3.8/site-packages/paddle/jit/api.py", line 901, in wrapper
    func(layer, path, input_spec, **configs)
  File "/usr/local/lib/python3.8/dist-packages/decorator.py", line 232, in fun
    return caller(func, *(extras + args), **kw)
  File "/home/aistudio/.local/lib/python3.8/site-packages/paddle/base/wrapped_decorator.py", line 40, in __impl__
    return wrapped_func(*args, **kwargs)
  File "/home/aistudio/.local/lib/python3.8/site-packages/paddle/base/dygraph/base.py", line 101, in __impl__
    return func(*args, **kwargs)
  File "/home/aistudio/.local/lib/python3.8/site-packages/paddle/jit/api.py", line 1215, in save
    static_func.concrete_program_specify_input_spec(
  File "/home/aistudio/.local/lib/python3.8/site-packages/paddle/jit/dy2static/program_translator.py", line 1079, in concrete_program_specify_input_spec
    raise ValueError(
ValueError: No valid transformed program for function: forward(query_input_ids,title_input_ids,query_token_type_ids,query_position_ids,query_attention_mask,title_token_type_ids,title_position_ids,title_attention_mask), input_spec: None.
            Please specific `input_spec` in `@paddle.jit.to_static` or feed input tensor to call the decorated function at once.

paddle-bot · 2025-04-11T12:56:55Z

Thanks for your contribution!

codecov · 2025-04-11T13:31:51Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 49.09%. Comparing base (e3ed3a3) to head (44d3d0c).
Report is 124 commits behind head on develop.

❌ Your project status has failed because the head coverage (49.09%) is below the target coverage (58.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files

@@           Coverage Diff            @@
##           develop   #10396   +/-   ##
========================================
  Coverage    49.09%   49.09%           
========================================
  Files          763      763           
  Lines       125659   125659           
========================================
  Hits         61688    61688           
  Misses       63971    63971

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

DrownFish19 · 2025-04-16T01:31:52Z

相关问题已经收到，后续我来跟进修改

Amourtani · 2025-04-17T10:31:01Z

我也有同样的导出模型报错问题

[PIR] Update simcse to apply pir

44d3d0c

paddle-bot bot added the contributor label Apr 11, 2025

paddle-bot bot assigned wawltor Apr 11, 2025

DrownFish19 added the HappyOpenSource 快乐开源活动issue与PR label Apr 12, 2025

luotao1 mentioned this pull request Apr 14, 2025

PaddleNLP 快乐开源活动 (2025 H1) 🎉 #9763

Open

DrownFish19 assigned DrownFish19 and unassigned wawltor Apr 16, 2025

luotao1 self-assigned this Apr 22, 2025

ZHUI merged commit af16c95 into PaddlePaddle:develop May 21, 2025
9 of 12 checks passed

hanlintang deleted the simcse branch May 21, 2025 03:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

【PaddleNLP No.5】Update simcse to apply PIR #10396

【PaddleNLP No.5】Update simcse to apply PIR #10396

Uh oh!

hanlintang commented Apr 11, 2025

Uh oh!

paddle-bot bot commented Apr 11, 2025

Uh oh!

codecov bot commented Apr 11, 2025 •

edited

Loading

Uh oh!

DrownFish19 commented Apr 16, 2025

Uh oh!

Amourtani commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

【PaddleNLP No.5】Update simcse to apply PIR #10396

【PaddleNLP No.5】Update simcse to apply PIR #10396

Uh oh!

Conversation

hanlintang commented Apr 11, 2025

PR types

PR changes

Description

其他问题

Uh oh!

paddle-bot bot commented Apr 11, 2025

Uh oh!

codecov bot commented Apr 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

DrownFish19 commented Apr 16, 2025

Uh oh!

Amourtani commented Apr 17, 2025

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Apr 11, 2025 •

edited

Loading