Skip to content

geemi725/ChemLit-QA

Repository files navigation

ChemLit-QA

A comprehensive, expert-validated dataset comprising over 1,000 entries specifically designed for the field of chemistry. The dataset was curated by an initial generation and filtering of a QAC dataset using an automated framework based on GPT-4, followed by rigorous evaluation by Chemistry experts. Additionally, we provide two supplementary datasets ChemLit-QA-neg focused on negative data, and ChemLit-QA-multi focused on multihop reasoning tasks for LLMs, further enhancing the resources available for advanced scientific research.

We provide the full ChemLit-QA dataset and its variants in this repository, as well as the exact train-test split of ChemLit-QA that was used in the fine-tuning task.

Overview

abstract-fig

Dataset description

Field Description
chunk The text chunk from which the Question-Answer-Context (QAC) triple is generated.
Reasoning_type Expert-corrected reasoning type. Includes 7 categories: Explanatory, Comparative, Causal, Conditional, Analogical, Evaluative, Predictive
Question LLM-generated question
Answer Expert-corrected answer
Difficulty Expert-assigned difficulty. Includes 3 categories: Easy, Medium, Hard
Context Expert-corrected context. Contains the full sentences that supports the answer.
A_start_end The start-end indices of the answer (most similar sentences) in the chunk
similar_chunks The top 6 most similar chunks to the given chunk in terms of cosine similarity
Cluster_labels 2-level hierarchical label describing the topic of this chunk
ID Identifier of the entry
Answer Relevancy Scores_gpt-4o How relevant the answer is to the question, assessd by GPT-4o
Faithfulness Scores_gpt-4o How faithful the answer is to the context, assessd by GPT-4o
Hallucination Scores_gpt-4o How much information in the answer is not mentioned in the context, assessd by GPT-4o
Question Faithfulness Scores_gpt-4o How faithful the question is to the context, assessd by GPT-4o
SE_penalized Penalized Semantic Entropy of the question
Keywords Keywords of the question

Dataset quality as per LLM-based metrics

Metric Mean ± std. dev
Answer Relevancy Score (GPT-4o) 0.99 ± 0.02
Faithfulness Score (GPT-4o) 0.99 ± 0.01
Hallucination Score (GPT-4o) 0.0 ± 0.0
Question Faithfulness Score (GPT-4o) 0.93 ± 0.10
Penalized semantic entropy (GPT-4o) 0.20 ± 0.44

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •