Skip to content

The dataset of E-QGen: Educational Lecture Abstract-based Question Generation System

Notifications You must be signed in to change notification settings

NYCU-NLP-Lab/E-QGen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 

Repository files navigation

E-QGen Dataset

This is the dataset of E-QGen: Educational Lecture Abstract-based Question Generation System

Introduction

This dataset is constructed by the method described in the paper E-QGen: Educational Lecture Abstract-based Question Generation System. We collected course transcripts from online courses on YouTube and match up with corresponding questions in the comment section. This dataset mainly focus on the lectures related to computer science. A total number of 356 golden pairs, 4,434 silver pairs and 4,829 platinum pairs is collected. Please check out the paper for more detailed collection procedure and dataset description. \ In this repo, we provide direct access to our dataset, which are the paragraph and question pairs.

Usage

Golden Pairs

Golden pairs are constructed by matching the timestamps back to the specific transcripts.

  • golden_pair_3agree.csv, golden_pair_2agree.csv
    • The postfix of the file name shows that the number of LLMs are used while filtering out questions from comments.
  • golden_pair_3agree_notime_gpt4.csv, golden_pair_2agree_notime_notime_gpt4.csv
    • The postfix _notime_gpt4 means that the timestamps of the questions are removed. Since removing timstamps may cause the sentence become strange, we use GPT-4 to refine the question comments.

Silver Pairs

  • silver_pairs.csv are collected by matching the comments without timestamps and the lecture paragraph. We compute the cosine similarity with PaLM, PaLM embedding and Sentence Transformer embeddings

Platinum Pairs

  • platinum_pairs.csv are generated by OpenAI GPT-4 model. We ask the GPT-4 model to generate 20 questions for each lecture paragraph.

Data Sources

About

The dataset of E-QGen: Educational Lecture Abstract-based Question Generation System

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published