diff --git a/README.md b/README.md index 85df3ae3..2b0060bd 100644 --- a/README.md +++ b/README.md @@ -4,7 +4,7 @@ ### Question Answering Systems with Descriptions -The [QA Systems Table](systems.md) contains links to publications, demo/APIs (if available) and short descriptions of ca. 100 QA systems. +The [QA Systems Table](systems.md) contains links to publications, demo/APIs (if available), and short descriptions of ca. 100 QA systems. ### DBpedia @@ -86,7 +86,7 @@ For adding a new dataset or task, you can also follow the steps above. Alternati 1. If your dataset is completely new, create a new file and link to it in the table of contents above. 2. Briefly describe the dataset and include relevant references. 3. Describe the evaluation setting and evaluation metric. -4. Show how an annotated example of the dataset looks like. +4. Show what an annotated example of the dataset looks like. 5. Add a download link if available. 6. Copy the below table and fill in at least two results (including the state-of-the-art) for your dataset (change Metric1/Metric2/Metric3 to the metric of your dataset). 7. Submit your change as a pull request. @@ -104,11 +104,15 @@ Instructions for building the website locally using Jekyll can be found [here](j Please cite the following: -```Perevalov, A., Yan, X., Kovriguina, L., Jiang, L., Both, A., & Usbeck, R. (2022). Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis. arXiv preprint arXiv:2201.08174.``` +```Perevalov, A., Yan, X., Kovriguina, L., Jiang, L., Both, A., & Usbeck, R. (2022, June). Knowledge Graph Question Answering Leaderboard: A Community Resource to Prevent a Replication Crisis. In Proceedings of the Thirteenth Language Resources and Evaluation Conference (pp. 2998-3007).``` + +The full paper is available [here](https://aclanthology.org/2022.lrec-1.321/) (including BibTeX code). + ### Acknowledgement This site is based on https://nlpprogress.com/ and thus, a great thanks goes to Sebastian Ruder. -## Instruction on adding new dataset and leadeboard. +## Instruction on adding new dataset and leaderboard. + Please check this video: diff --git a/dbpedia/qald.md b/dbpedia/qald.md index 188fd3eb..b7cdecfd 100644 --- a/dbpedia/qald.md +++ b/dbpedia/qald.md @@ -59,57 +59,58 @@ Please see the original [paper](http://ceur-ws.org/Vol-2241/paper-06.pdf) for de ### Leaderboard -| Model / System | Year | Precision | Recall | F1 |Language| Reported by | -|:--------------------------:|:----:|:---------:|:------:|:-----:|:------:|:--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| -| SGPT_Q,K | 2022 | - | - | 67.82 | EN | [Al Hasan Rony et al](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815253) | -| SGPT_Q | 2022 | - | - | 60.22 | EN | [Al Hasan Rony et al](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815253) | -| Stage I No Noise [2] | 2022 | 80.40 | 42.10 | 55.30 | EN | [Purkayastha et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892263) | -| LingTeQA [1] | 2020 | 52.60 | 64.20 | 53.50 | EN | [P. Nhuan et al](https://ieeexplore.ieee.org/abstract/document/9282949) | -| qaSQP | 2019 | 45.80 | 47.10 | 46.30 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | -| chatGPT | 2023 | - | - | 45.71 | EN | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) | -| GPT-3.5v3 | 2023 | - | - | 46.19 | EN | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) | -| NSpM | 2022 | - | - | 45.34 | EN | [Al Hasan Rony et al](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815253) | -| GPT-3.5v2 | 2023 | - | - | 44.95 | EN | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) | -| KGQAn | 2023 | 49.81 | 39.39 | 43.99 | EN | [Omar et al.](https://arxiv.org/pdf/2303.00595.pdf) | -| Ensemble BR framework | 2023 | 42.40 | 47.60 | 43.00 | EN | [Chen et al.](https://assets.researchsquare.com/files/rs-2676239/v1_covered.pdf?c=1680800823) | -| KGQAn | 2021 | 50.61 | 34.67 | 41.15 | EN | [Omar et al.](http://ceur-ws.org/Vol-2980/paper312.pdf) | -| Light-QAWizard | 2022 | 39.80 | 42.60 | 40.60 | EN | [Chen et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9893129) | -| Stage-I Part Noise [7] | 2022 | 63.90 | 28.70 | 39.60 | EN | [Purkayastha et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892263) | -| GPT-3 | 2023 | - | - | 38.54 | EN | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) | -| Stage-II w/o type [5] | 2022 | 59.40 | 26.10 | 36.20 | EN | [Purkayastha et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892263) | -| Stage-II w/ type [6] | 2022 | 59.40 | 26.10 | 36.20 | EN | [Purkayastha et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892263) | -| Stage-I Full Noise [8] | 2022 | 82.60 | 23 | 36.00 | EN | [Purkayastha et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892263) | -| QAWizard | 2022 | 31.10 | 46.90 | 33.00 | - | EN | [Chen et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9893129) | -| QAmp | 2019 | 25 | 50 | 33 | EN |[Vakulenko et. al.](https://dl.acm.org/doi/pdf/10.1145/3357384.3358026?casa_token=X_2SYFDIrd8AAAAA:Z9FcBHNuARtktnurgKswRUvVZx7E1eSdRsXWqVIZej6fJDVTcUGVQ-aqazqiStuQKqAd362eKw3CzQ)| -| QAwizard | 2023 | 31.10 | 46.90 | 33 | EN | [Chen et al.](https://assets.researchsquare.com/files/rs-2676239/v1_covered.pdf?c=1680800823) | -| WDAqua-core0 | 2021 | - | - | 32 | EN | [Orogat et al.](https://arxiv.org/pdf/2105.00811.pdf) | -| NSQA | 2021 | 31.89 | 32.05 | 31.26 | EN | [P.Kapanipathi et alf](https://aclanthology.org/2021.findings-acl.339.pdf) | -| DTQA | 2021 | 31.41 | 32.16 | 30.88 | EN | [Abdelaziz et al.](https://ojs.aaai.org/index.php/AAAI/article/view/17988) | -| NSQA | 2021 | 31.40 | 32.10 | 30.80 | EN | [ M. Borroto et al](https://arxiv.org/pdf/2111.03000.pdf) | -| sparql-qa | 2021 | 31 | 32.48 | 30.60 | EN | [ M. Borroto et al](https://arxiv.org/pdf/2111.03000.pdf) | -| FLAN-T5 | 2023 | - | - | 30.17 | EN | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) | -| DTQA | 2023 | 31.40 | 32.20 | 30.10 | EN | [Chen et al.](https://assets.researchsquare.com/files/rs-2676239/v1_covered.pdf?c=1680800823) | -| gAnswer | 2021 | - | - | 30 | EN | [Orogat et al.](https://arxiv.org/pdf/2105.00811.pdf) | -| gAnswer | 2021 | 29.34 | 32.68 | 29.81 | EN | [Abdelaziz et al.](https://ojs.aaai.org/index.php/AAAI/article/view/17988) | -| gAnswer [3] | 2021 | 29.30 | 32.70 | 29.80 | EN | [Purkayastha et al.](https://arxiv.org/pdf/2109.09475.pdf) | -| gAnswer2 | 2019 | 29.30 | 32.70 | 29.80 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | -| gAnswer2 | 2023 | 29.30 | 32.70 | 29.80 | EN | [Chen et al.](https://assets.researchsquare.com/files/rs-2676239/v1_covered.pdf?c=1680800823) | -| gAnswer | 2021 | 60.70 | 31.60 | 29.60 | EN | [ L Siciliani et al.](http://www.semantic-web-journal.net/system/files/swj2701.pdf) | -| TeBaQA | 2022 | - | - | 28.81 | EN | [Al Hasan Rony et al](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815253) | -| WDAqua-core1 | 2019 | 22 | 38 | 28 | EN |[Vakulenko et. al.](https://dl.acm.org/doi/pdf/10.1145/3357384.3358026?casa_token=X_2SYFDIrd8AAAAA:Z9FcBHNuARtktnurgKswRUvVZx7E1eSdRsXWqVIZej6fJDVTcUGVQ-aqazqiStuQKqAd362eKw3CzQ)| -| SQG | 2022 | - | - | 27.85 | EN | [Al Hasan Rony et al](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815253) | -| WDAqua-core1 | 2019 | 26.10 | 26.70 | 25 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | -| WDAqua | 2023 | 26.10 | 26.70 | 25 | EN | [Chen et al.](https://assets.researchsquare.com/files/rs-2676239/v1_covered.pdf?c=1680800823) | -| WDAqua-core1 | 2021 | 26.09 | 26.70 | 24.99 | EN | [Abdelaziz et al.](https://ojs.aaai.org/index.php/AAAI/article/view/17988) | -| qaSearch | 2019 | 23.60 | 24.10 | 23.70 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | -| QAnswer | 2021 | 45.90 | 22.20 | 19.70 | EN | [ L Siciliani et al.](http://www.semantic-web-journal.net/system/files/swj2701.pdf) | -| QASparql | 2021 | - | - | 19 | EN | [Orogat et al.](https://arxiv.org/pdf/2105.00811.pdf) | -| TeBaQA | 2021 | 64.40 | 14.10 | 13.90 | EN | [ L Siciliani et al.](http://www.semantic-web-journal.net/system/files/swj2701.pdf) | -| TeBaQA | 2019 | 12.90 | 13.40 | 13 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | -| QASystem | 2019 | 9.70 | 11.60 | 9.80 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | -| AskNow | 2021 | - | - | 8 | EN | [Orogat et al.](https://arxiv.org/pdf/2105.00811.pdf) | -| Qanary(TM+DP+QB) | 2021 | - | - | 7 | EN | [Orogat et al.](https://arxiv.org/pdf/2105.00811.pdf) | -| Elon | 2021 | 4.90 | 5.30 | 5 | EN | [Steinmetz et al.](https://link.springer.com/article/10.1007/s13740-021-00128-9) | +| Model / System | Year | Precision | Recall | F1 |Language| Reported by | +|:----------------------:|:----:|:---------:|:------:|:-----:|:------:|:----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:| +| SGPT_Q,K | 2022 | - | - | 67.82 | EN | [Al Hasan Rony et al](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815253) | +| SPARQLGEN | 2023 | - | - | 67.07 | EN | [Kovriguina et al.](https://ceur-ws.org/Vol-3526/paper-08.pdf) | +| SGPT_Q | 2022 | - | - | 60.22 | EN | [Al Hasan Rony et al](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815253) | +| Stage I No Noise [2] | 2022 | 80.40 | 42.10 | 55.30 | EN | [Purkayastha et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892263) | +| LingTeQA [1] | 2020 | 52.60 | 64.20 | 53.50 | EN | [P. Nhuan et al](https://ieeexplore.ieee.org/abstract/document/9282949) | +| qaSQP | 2019 | 45.80 | 47.10 | 46.30 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | +| chatGPT | 2023 | - | - | 45.71 | EN | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) | +| GPT-3.5v3 | 2023 | - | - | 46.19 | EN | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) | +| NSpM | 2022 | - | - | 45.34 | EN | [Al Hasan Rony et al](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815253) | +| GPT-3.5v2 | 2023 | - | - | 44.95 | EN | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) | +| KGQAn | 2023 | 49.81 | 39.39 | 43.99 | EN | [Omar et al.](https://arxiv.org/pdf/2303.00595.pdf) | +| Ensemble BR framework | 2023 | 42.40 | 47.60 | 43.00 | EN | [Chen et al.](https://assets.researchsquare.com/files/rs-2676239/v1_covered.pdf?c=1680800823) | +| KGQAn | 2021 | 50.61 | 34.67 | 41.15 | EN | [Omar et al.](http://ceur-ws.org/Vol-2980/paper312.pdf) | +| Light-QAWizard | 2022 | 39.80 | 42.60 | 40.60 | EN | [Chen et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9893129) | +| Stage-I Part Noise [7] | 2022 | 63.90 | 28.70 | 39.60 | EN | [Purkayastha et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892263) | +| GPT-3 | 2023 | - | - | 38.54 | EN | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) | +| Stage-II w/o type [5] | 2022 | 59.40 | 26.10 | 36.20 | EN | [Purkayastha et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892263) | +| Stage-II w/ type [6] | 2022 | 59.40 | 26.10 | 36.20 | EN | [Purkayastha et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892263) | +| Stage-I Full Noise [8] | 2022 | 82.60 | 23 | 36.00 | EN | [Purkayastha et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9892263) | +| QAWizard | 2022 | 31.10 | 46.90 | 33.00 | - | EN | [Chen et al.](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9893129) | +| QAmp | 2019 | 25 | 50 | 33 | EN | [Vakulenko et. al.](https://dl.acm.org/doi/pdf/10.1145/3357384.3358026?casa_token=X_2SYFDIrd8AAAAA:Z9FcBHNuARtktnurgKswRUvVZx7E1eSdRsXWqVIZej6fJDVTcUGVQ-aqazqiStuQKqAd362eKw3CzQ) | +| QAwizard | 2023 | 31.10 | 46.90 | 33 | EN | [Chen et al.](https://assets.researchsquare.com/files/rs-2676239/v1_covered.pdf?c=1680800823) | +| WDAqua-core0 | 2021 | - | - | 32 | EN | [Orogat et al.](https://arxiv.org/pdf/2105.00811.pdf) | +| NSQA | 2021 | 31.89 | 32.05 | 31.26 | EN | [P.Kapanipathi et alf](https://aclanthology.org/2021.findings-acl.339.pdf) | +| DTQA | 2021 | 31.41 | 32.16 | 30.88 | EN | [Abdelaziz et al.](https://ojs.aaai.org/index.php/AAAI/article/view/17988) | +| NSQA | 2021 | 31.40 | 32.10 | 30.80 | EN | [ M. Borroto et al](https://arxiv.org/pdf/2111.03000.pdf) | +| sparql-qa | 2021 | 31 | 32.48 | 30.60 | EN | [ M. Borroto et al](https://arxiv.org/pdf/2111.03000.pdf) | +| FLAN-T5 | 2023 | - | - | 30.17 | EN | [Tan et al.](https://arxiv.org/pdf/2303.07992.pdf) | +| DTQA | 2023 | 31.40 | 32.20 | 30.10 | EN | [Chen et al.](https://assets.researchsquare.com/files/rs-2676239/v1_covered.pdf?c=1680800823) | +| gAnswer | 2021 | - | - | 30 | EN | [Orogat et al.](https://arxiv.org/pdf/2105.00811.pdf) | +| gAnswer | 2021 | 29.34 | 32.68 | 29.81 | EN | [Abdelaziz et al.](https://ojs.aaai.org/index.php/AAAI/article/view/17988) | +| gAnswer [3] | 2021 | 29.30 | 32.70 | 29.80 | EN | [Purkayastha et al.](https://arxiv.org/pdf/2109.09475.pdf) | +| gAnswer2 | 2019 | 29.30 | 32.70 | 29.80 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | +| gAnswer2 | 2023 | 29.30 | 32.70 | 29.80 | EN | [Chen et al.](https://assets.researchsquare.com/files/rs-2676239/v1_covered.pdf?c=1680800823) | +| gAnswer | 2021 | 60.70 | 31.60 | 29.60 | EN | [ L Siciliani et al.](http://www.semantic-web-journal.net/system/files/swj2701.pdf) | +| TeBaQA | 2022 | - | - | 28.81 | EN | [Al Hasan Rony et al](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815253) | +| WDAqua-core1 | 2019 | 22 | 38 | 28 | EN | [Vakulenko et. al.](https://dl.acm.org/doi/pdf/10.1145/3357384.3358026?casa_token=X_2SYFDIrd8AAAAA:Z9FcBHNuARtktnurgKswRUvVZx7E1eSdRsXWqVIZej6fJDVTcUGVQ-aqazqiStuQKqAd362eKw3CzQ) | +| SQG | 2022 | - | - | 27.85 | EN | [Al Hasan Rony et al](https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=9815253) | +| WDAqua-core1 | 2019 | 26.10 | 26.70 | 25 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | +| WDAqua | 2023 | 26.10 | 26.70 | 25 | EN | [Chen et al.](https://assets.researchsquare.com/files/rs-2676239/v1_covered.pdf?c=1680800823) | +| WDAqua-core1 | 2021 | 26.09 | 26.70 | 24.99 | EN | [Abdelaziz et al.](https://ojs.aaai.org/index.php/AAAI/article/view/17988) | +| qaSearch | 2019 | 23.60 | 24.10 | 23.70 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | +| QAnswer | 2021 | 45.90 | 22.20 | 19.70 | EN | [ L Siciliani et al.](http://www.semantic-web-journal.net/system/files/swj2701.pdf) | +| QASparql | 2021 | - | - | 19 | EN | [Orogat et al.](https://arxiv.org/pdf/2105.00811.pdf) | +| TeBaQA | 2021 | 64.40 | 14.10 | 13.90 | EN | [ L Siciliani et al.](http://www.semantic-web-journal.net/system/files/swj2701.pdf) | +| TeBaQA | 2019 | 12.90 | 13.40 | 13 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | +| QASystem | 2019 | 9.70 | 11.60 | 9.80 | EN | [Zheng et. al.](https://arxiv.org/pdf/1910.09760.pdf) | +| AskNow | 2021 | - | - | 8 | EN | [Orogat et al.](https://arxiv.org/pdf/2105.00811.pdf) | +| Qanary(TM+DP+QB) | 2021 | - | - | 7 | EN | [Orogat et al.](https://arxiv.org/pdf/2105.00811.pdf) | +| Elon | 2021 | 4.90 | 5.30 | 5 | EN | [Steinmetz et al.](https://link.springer.com/article/10.1007/s13740-021-00128-9) | * [1] DBpedia 2016-10. * [2] DBpedia 2016-10.