-
-
Notifications
You must be signed in to change notification settings - Fork 237
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] ID Mismatch Error in VectorDB During Evaluation #1033
Comments
@e7217 Did you use passage augmenter & validation? |
@vkehfdl1 umm... This is my current configuration, and I do not include the passage augmenter in any of my stages. vectordb:
- name: autorag_test
db_type: milvus
embedding_model: openai
collection_name: autorag_test
uri: http://192.xxx.xxx.xxx:19530
embedding_batch: 50
similarity_metric: l2
# index_type: hnsw
params:
nlist : 16384
node_lines:
- node_line_name: retrieve_node_line
nodes:
- node_type: retrieval
strategy:
metrics: [retrieval_f1, retrieval_ndcg, retrieval_map]
top_k: 3
modules:
- module_type: vectordb
vectordb: autorag_test
- node_line_name: post_retrieve_node_line
nodes:
- node_type: prompt_maker
strategy:
metrics:
- metric_name: rouge
- metric_name: sem_score
embedding_model: openai
generator_modules:
- module_type: openai_llm
llm: gpt-4o-mini
modules:
- module_type: fstring
prompt:
- |
단락을 읽고 질문에 답하세요. \n 질문 : {query} \n 단락: {retrieved_contents} \n 답변 :
- |
단락을 읽고 질문에 답하세요. 답할때 단계별로 천천히 고심하여 답변하세요. 반드시 단락 내용을 기반으로 말하고 거짓을 말하지 마세요. \n 질문: {query} \n 단락: {retrieved_contents} \n 답변 :
- node_type: generator
strategy:
metrics: # bert_score 및 g_eval 사용 역시 추천합니다. 빠른 실행을 위해 여기서는 제외하고 하겠습니다.
- metric_name: rouge
- metric_name: sem_score
embedding_model: openai
- metric_name: bert_score
lang: ko
# - metric_name: g_eval
# metrics: ["consistency"]
# model: ['gpt-4o-mini']
modules:
- module_type: vllm
llm: [
Qwen/Qwen2.5-7B-Instruct,
meta-llama/Llama-3.1-8B-instruct,
meta-llama/Llama-3.2-3B-instruct,
meta-llama/Llama-3.2-1B-instruct,
]
# llm: meta-llama/Llama-3.2-3B-instruct
temperature: [
0,
0.1,
# 0.5,
]
# temperature: 0.1
# tensor_parallel_size: 2 # If the gpu is two.
max_tokens: 128
max_model_len: 800
gpu_memory_utilization: 0.8 |
@e7217 Thanks for providing the configuration sir! |
Hi @e7217 Did you resolve the issue? |
@vkehfdl1 The issue has not been resolved yet. I’ve written some temporary code to work around the issue, but it’s not ideal for general use.
I haven’t found a proper solution yet, but I will give it another try. |
@e7217 Oh, in closer look of the YAML file, it looks like you used different vectordb at configuration and vectordb module? Can you double-check the YAML file? |
@vkehfdl1 I encountered an error when I changed
In the |
I've thought about the potential cause of the issue. The error occurs when the |
Describe the bug
Hello,
I have a question regarding the error described below.
It appears that the error occurs because the id values in the corpus data located in the benchmark/data folder do not match the id values retrieved from the VectorDB.
corpus data :
vector db :
code :
Full logs
When performing the
evaluate
step repeatedly, items seem to accumulate in the VectorDB.Should the number of items in the VectorDB collection always be reset to zero before performing the
evaluate
step?It would be very helpful to understand the intention behind implementing this flow, as it will assist me in using this package more effectively.
Thank you for your assistance!
The text was updated successfully, but these errors were encountered: