Skip to content

photomedia/citationDataEnrichTransform

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Pipeline Description

This set of scripts is used to enrich citation data with links to dbpedia and wikidata and then transform that citation data into a dynamic GML graph.

Data Source

The citation data source for this script was downloaded from here: https://sites.google.com/site/vispubdata/home

It contains information on IEEE Visualization (IEEE VIS) publications from 1990-2020.

Data Citation: Petra Isenberg, Florian Heimerl, Steffen Koch, Tobias Isenberg, Panpan Xu, Chad Stolper, Michael Sedlmair, Jian Chen, Torsten Möller, and John Stasko. vispubdata.org: A Metadata Collection about IEEE Visualization (VIS) Publications. IEEE Transactions on Visualization and Computer Graphics, 23(9):2199–2206, September 2017. (doi: 10.1109/TVCG.2016.2615308)

Data Pipeline Summary

The data pipeline is as follows:

Download from https://sites.google.com/site/vispubdata/home

publications.csv --> Transform to JSON using CSVtoJSON.py

publications.json --> Enrich with DBpedia and WikiData links using get-concepts.py.

enriched-publications.json --> Transform JSON to XML using JSONtoXML.py

↓ enriched-publications.xml (Intermediate file generated by JSONtoXML.py script) ↓

enriched-publications-eprints-model.xml --> Transform to a dynamic co-concept graph in GML format using Pig Latin script eprints-items-publications-date-merged-edges.pig

OUTPUT/merged-file-co_node-dynamic-gml-with_edge_labels-withheader.gml --> Open directly with Gephi, apply layout and visual mappings, save and export renders

Interactive visualization

These allow for online interaction with the graph (search, community display, zoom/pan, etc.). The sigma export interactive visualization results are here:

Renders Folder

A folder with some exported PNG files of visualizations of the GML graph.
https://github.com/photomedia/citationDataEnrichTransform/tree/main/renders

  • Giant Component, Node Size mapped to Betweenness Centrality (BC) on a spline.
  • Giant Component, Node Size mapped to Betweenness Centrality (BC) on a spline, Filter Nodes with BC greater than .01
  • Giant Component, Node Size mapped to Betweenness Centrality (BC) on a spline, Filter Nodes with Degree greater than 10
  • Giant Component, Node Size mapped to Betweenness Centrality (BC) on a spline, Filter only Concepts that are related by publications from more than 1 conference
  • Temporal Filters
    • Filter leaving only concept relations that span 25 years or longer
    • Filter by time 1990-2000, 2000-2010, 2010-2020
    • Filter by time 1990-2000, 2000-2010, 2010-2020 and Duration of concept relations LESS than 10 years
    • Filter by time (2015-2020) and Duration of concept relations LESS than 5 years

About

Enriches documents with wikidata entities

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • JavaScript 77.0%
  • Python 9.1%
  • CSS 7.6%
  • HTML 3.6%
  • PigLatin 2.7%