This is some script that try to build synthetic question-answer pair for the science paper.
Some data is using in [2405.11461] DocReLM: Mastering Document Retrieval with Language Model (arxiv.org).
The evidential/factor is from veya2ztn/uparxive: llm-friendly dataest for the whole arxiv .tex source. (github.com)