Creates a datestamped HTML report and a corresponding Excel file listing all Wikipedia articles (in all languages) in which (one or more) images from a given category tree on Wikimedia Commons are used.
Latest update: 18 September 2024
This repo creates datestamped HTML reports with corresponding Excel files listing all Wikipedia articles (in all languages) in which (one or more) images/media from a given category tree on Wikimedia Commons are used.
Here is quick example of such an HTML report and its corresponding Excel file for images from the collection of the KB, national library of the Netherlands. It is datestamped 04-09-2024.
The KB uses the 'classical' GLAMorous tool to measure the use of KB media files (as stored in Wikimedia Commons) in Wikipedia articles. This tool reports 4 things:
- 1 - The total number of KB media files in Category:Media contributed by Koninklijke Bibliotheek (Category "Media contributed by Koninklijke Bibliotheek" has XXXX files.)
- 2 - The number of Wikipedia language versions in which KB media files are used (length of the table, omitting non-language Wikipedias, such as 'outreach.wikipedia', 'simple.wikipedia' or 'incubator.wikipedia')
- 3 - The total number of times that these images show up in Wikipedia articles, in all language versions. (Total image usages).
- 4 - The number of unique KB media files that are used in Wikipedia articles in all those languages. (Distinct images used)
Please note: 'Total image usages' does NOT equal the number of unique Wikipedia articles! A single unique image can illustrate multiple unique articles, and/or the other way around, 1 unique article can contain multiple distinct images. In other words: images-articles have many-to-many relationships.
What was still missing were functionalities to create
- 5 - The number of unique Wikipedia articles in which KB media files are used,
- 6 - A manifest overview of those articles, grouped per Wikipedia language version,
- 7 - A structured output format that can be easily processed by tools, such as CSV of Excel files.
Bulk/group functionalities:
- 8 - A method to generate these reports in bulk, so for multiple Commons categories trees at once (with one report per category tree).
- 9 - Aggregated data and key figure statistics for sets of reports, eg. for grouped reports from a specific country.
That is why we developed the GLAMorousToHTML tool. It takes the XML-output of the GLAMorous tool and processes that data into HTML reports and Excel files.
The GLAMorousToHTML tool has so for produced GLAM reports for the following heritage institutions, countries and regions:
- KB, national library of the Netherlands
- The Netherlands
- Nordic European countries
- United States of America
- Australia and New Zealand
When interpreting these reports, take note of
- the structure of the reports and Excel files,
- who contributed the images,
- the accuracy of category trees and
- image thumbnails & template contamination.
- A first article about the NDE reports will be published a.s.a.p. (September 2024)
- Public outreach and reuse of KB images via Wikipedia, 2014-2022 (December 2022). This article is also available as a PDF.
The technical notes give more info about
- The structure of the this repo, its files and folders
- Short description of their functions
- How to run this repo yourself
- Change log
- Features to be added
Please note that his page is still under construction and is therefore messy and incomplete.
All original materials in this repo, expect for the flags, logos and publications are released under the CC0 1.0 Universal license, effectively donating all original content to the public domain.
For the publications listed above : see each article for its exact licensing condition.
This tool is developed and maintained by Olaf Janssen, Wikimedia coordinator @KB, national library of the Netherlands. You can find his contact details on his KB expert page or via his Wikimedia user page.
If you are interested in getting reports for your own GLAM institution, please send me a message.