Skip to content

Latest commit

 

History

History
102 lines (81 loc) · 6.56 KB

Glossary.md

File metadata and controls

102 lines (81 loc) · 6.56 KB

Data Glossary

Source: “Beyond Data Literacy: Reinventing Community Engagement and Empowerment in the Age of Data.” Data-Pop Alliance

Algorithms: In mathematics and computer science, an algorithm is a series of predefined instructions or rules – often written in a programming language intended for use by a computer – designed to define how to sequentially solve a recurrent problem through calculations and data processing. The use of algorithms for decision-making has grown in several sectors and services such as policing and banking.

Big Data: The ecosystem created by the concomitant emergence of ‘the 3 Cs of Big Data’: • Digital Crumbs—pieces of data passively emitted and/or collected by digital devices which constitute very large data sets and streams and contain unique insights about their behaviors and beliefs; • Big Data Capacities—what has also been referred to as Big Data Analytics, that is the set of tools and methods, hardware and software, know-how and skills, necessary to process and analyse these new kinds of data—including visualization techniques, statistical machine- learning and algorithms, etc; • Big Data Communities—which describe the various actors involved in the Big Data ecosystem, from the generators of data to their analysts and end-users—i.e. potentially the whole population.

Civic technology: A type of technology that enables citizen engagement or makes government more accessible, effective, and efficient for the economic and social good of society. This specific type of technology helps to connect people to resources, ideas, and other people needed to improve their societies or communities.

Data: An object, variable, or piece of information that has the perceived capacity to be collected, stored, and identifiable. It comes largely in two forms: structured and unstructured.

Structured data are essentially answers to questions asked by the collector of data, are generally easy to organize and identify and have a strict hierarchy that is not easily manipulated (i.e. responses to a survey organized in a table format and information about people’s years of education and income in a chart).

Unstructured data are not readily amenable to automated analysis and often are used in ways that differ from the intended purpose when collected (such as photos, videos, tweets), and do not need to follow a hierarchical method of identification.

Data is also used as a policy concept and social phenomena (e.g. “data is changing the world”), or as a shortcut for data ecosystems, Big Data, etc.

Data ecosystems: Complex adaptive systems that include data infrastructure, tools, media, producers, consumers, curators, and sharers. They are complex organizations of dynamic social relationships through which data/information moves and transforms in flows.

Data exhaust: Data that are passively emitted from cell phones, sensors, social media and other platforms as digital translations of human actions and interactions.

Data inclusion: The universal ability of people to create, control, access and use data.

Data journalism: A new form of journalism stimulated by the open data movement, in which stories are presented or supplemented through graphics or visualizations of analyzed datasets. These static or interactive graphics include databases, maps, diagrams, grids, charts and many other forms of illustrations that have transformed the look of mainstream news media. Data literacy: The desire and ability to engage constructively in society through and with data.

iii Data modeling: Using existing datasets to infer current conditions or predict future outcomes. The process involves resolving complex relationships among datasets in order to understand what data means and how the elements relate.

Data Revolution: A term that has become mainstream in the policy and development discourse since the High-Level Panel of Eminent Persons on the Post-2015 Development Agenda called for a “Data Revolution” to “strengthen data and statistics for accountability and decision-making purposes”. It refers to the applications and implications of data as a social phenomenon. The term “Industrial Revolution of Data” was coined by Computer Scientist Joseph Hellerstein in 2008.

Data science: A field of research and practice that focuses on solving real-world problems using large amounts of data by combining skills from often distinct areas of expertise: math, computer science (hacking and coding), statistics, social science, and even storytelling or art.

Digital divide: The differential access and ability to use information and communications technologies between individuals, communities and countries — and the resulting socioeconomic and political inequalities.

Literacy: As defined by UNESCO, "the ability to identify, understand, interpret, create, communicate and compute, using printed and written materials associated with varying contexts. Literacy involves a continuum of learning in enabling individuals to achieve their goals, to develop their knowledge and potential, and to participate fully in their community and wider society."4

Literacy in the age of data: See Literacy in a post-2015 world.

Open data : Data that is easily accessible, machine-readable, accessible for free or at negligible cost, and with minimal limitations on its use, transformation, and distribution

Popular data: The practice of engaging, empowering and participatory approaches to data-driven presentation and decision-making (R. Bhargava).

Small data: Explicitly collected data – the data is collected in the open, with notice, and on purpose. Small Data can be analyzed by interested laymen. Small Data doesn’t depend on technology-assisted analysis, but can engage it as appropriate." (R. Bhargava).

(Statistical) Machine learning- A subset of data science, falling at the intersection of traditional statistics and machine learning. Machine learning refers to the construction and study of computer algorithms — step-by-step procedures used for calculations and classification — that can ‘learn’ when exposed to new data. This enables better predictions and decisions to be made based on what was experienced in the past, as with filtering spam emails, for example. The addition of “statistical” reflects the emphasis on statistical analysis and methodology, which is the main approach to modern machine learning.