Description
Code and data set relative to "Artificial Intelligence Trend Analysis in German Business and Politics - A Web Mining Approach" Philipp Dumbach, Leo Schwinn, Tim Löhr, Tassilo Elsberger, Björn M. Eskofier which was accepted as Research Article at the International Journal of Data Science and Analytics (https://www.springer.com/journal/41060) and undergoes currently the publication procedure.
This repository contains the code for the crawlers regarding the two political data sources of the German parliament Bundestag. There are two types of political documents within this data set: the Drucksachen (DRS) which summarize for example law drafts, inquiries or surveys composed for individuals or by parties in the parliament, and the Plenarprotokolle (PP), that are word-by-word transcripts of the individual plenary sessions. The protocols were used beginning with the election period 14 that started in 1998.
The majority of the political data (election periods 14 to 18) was available in a condensed and ordered XML format and could therefore be manually downloaded. For election period 19 (ended in 2021) the web page structure changed and the scraper was implemented similar to the magazines’ static HTML pages.
German Politics Data Set
The raw data from the political sources can be accessed via the following osf project. It includes the DRS and PP files from the legislatory periods 14 until 19 in the German parliament.
https://osf.io/j29vw/?view_only=9da84a0d25a54c459ce5790c9b4a76d7
Further Information
For more information regarding the data crawling procedure regarding the five business data sources please contact the corresponding author.
Affiliation
Machine Learning and Data Analytics Lab
Friedrich-Alexander-Universität Erlangen-Nürnberg
Department Artificial Intelligence in Biomedical Engineering
Carl-Thiersch-Str. 2b
91054 Erlangen
Mail (corresponding author): [email protected]