Grand Data Challenge
June 29, 2016, UCDC Center, Washington DC, USA

(as part of the SBP-BRiMS2016 conference - June 28-July 1, 2016)

Challenge Award Winner will be Announced on July 1, right before the lunch break.

Overview

Open Source data, such as news, public data sets, and social media, are often used to understand human socio-cultural and political-economic behavior. The vast volume of open source information poses great opportunities, giving ever-increasing access to the real time behavior and beliefs of the world's citizenry, but also requires a fundamental shift in how this information is understood and utilized.  The concept of quantifying human behavior into discrete "occurrences", “moments” or “networks of relations” underlies much of the social and policy sciences; however, today, the sheer volume of data that can be analyzed in such forms has escalated.

Fundamental research problems exist in how to fuse data, how to identify the relevant portions of the data, how assess change in the data, how to sample the data, and how to visualize the data. These issues must be met to advance social theorizing and improve policy analysis. This year’s SBP-BRiMS challenge problem invites you to take part in addressing one or more of these challenges.

Using at least one of four political event datasets (GDELT, KEDS, ICEWS, Phoenix) and one other data set (which may be a second one of these event datasets, or any other relevant dataset), this year’s challenge problem asks participants to address any issue of interest to you or your team that involves events and their distribution over time or space. All entries must have both a strong social theory, political theory or policy perspective and a strong methodology perspective.

Contact Information

Please email sbp-2016-grand-challenge-group@googlegroups.com with any questions or inquiries.

Rules

  • Participants may work individually or in teams.
  • Participants must use at least one of these four data sets - GDELT, KEDS, ICEWS, or Phoenix.
  • Participants should address a social science theoretical or policy relevant issue and should employ methodologies appropriate for the empirical assessment of big data (e.g., computational algorithms, machine learning, computer simulation, social network analysis, text mining).
  • Each participating team may prepare only one entry.
  • Entries must represent original work that has not been previously published or submitted to other challenges.
  • Each participating team must send at least one member to the SBP-BRiMS 2016 conference to present a poster describing their entry.
  • Participants are encouraged to use a second data set which may be one of GDELT, KEDS, ICEWS, or Phoenix, or a data set collected by or made available to the team. Participants are not required to make these data public; however, they are encouraged to do so if possible.
  • At the conference, all entries will be judged by the community using a participant voting system.
  • The individual or group that submits the winning entry will be invited to present a talk at the conference in 2017, and will contribute a full length paper to an SBP-BRiMS special issue of the journal Computational and Mathematical Organization Theory.

Guidelines

A strong entry will have one or more of these components:

  • Employ multiple data sets.
  • Include a high quality visualization (note that participants will be allowed to display dynamic visualizations via some form of electronic media. However, please note that tables will not be provided.
  • Account for biases in the data across time, space, topics and sources.
  • Provide a new metric or algorithm development such as:
    • A new spatial, temporal, and network analytic methodologies and algorithms that can cope with the vast scale of open source data (e.g. GDELT contains >1B location mentions) and support answering a key social or policy issue.
    • A new spatial analytic methodologies that can better take into account change over time and non-spatial distances (such as co-occurrences and semantic similarity between locations).
    • A new network methodologies that better incorporate the diversity of actor and relationship types in the data, spatio-temporal information, or for constructing edges from the data and for distributing actor and edge attributes onto the graph.
  • Generate a new empirical finding that challenges or provides novel support for existing social or political theory.

In addition, a strong entry should be well-written and provide some level of creativity in its use of or combination of data.

Submitting and Entry

What to Submit

A 2-page abstract describing the project. This should define:

  • What social/policy question was asked or challenge addressed?
  • Why is this question important or the challenge critical?
  • What data sets were used?
  • What is the novel contribution?
  • What is the key methodology or methodologies used?
  • Who is the team? Provide names, email and institution.

When to Submit

This abstract is due on May 1, 2016 May 30, 2016.

How to Submit

All abstracts should be submitted through the EasyChair link at at: https://easychair.org/conferences/?conf=sbpbrims2016 for the challenge on the SBP-BRiMS 2016 website. Please be sure to select the Challenge track.

What to Present

All entries will send at least one team member to SBP-BRiMS that will be registered for the conference by the early registration deadline to present their poster in the poster-session on June 30. Participants may bring in additional props to enhance their presentation.

NOTE: All challenge people should:

  • Prepare 1 slide to give a 1 minute come see our poster talk.
  • Prepare a pdf of the poster.
  • Send the PDF of the poster and the single slide (as a powerpoint slide) to sbp-brims@andrew.cmu.edu
  • Prepare a poster. Any format is fine. However the size must be 3'x4' or 4'x3'.
  • You should print your own poster and bring it.
  • You will be asked to thumbtack your poster to a wall. Thumbtacks will be provided and the walls are cloth.
  • You also have the option of bringing your own easel and your own poster on a poster board. If you want it on a poster-board you should put it on such a board.
  • You are also responsible for removing your poster after the session.

How entries will be judged

Entries will be judged by community voting at the poster session.

Who is eligible

Anyone with an interest in using this data to address a social or policy issue. Entries are accepted from single individuals or teams.

Winning Entry

The winning entry will be invited to send a team member to present the project at SBP-BRiMS 2017. In addition, winning entry and selected other entries will submit a full paper to a special issue of Computational and Mathematical Organization Theory.

Data Sets

We invite participants to explore one or more of these data sets: GDELT, KEDS, ICEWS or Phoenix in conjunction with a second data set which may or may not be from this list. The datasets below consist of records of individual politically-relevant events, from statements of support to military attacks. Each record includes information on the source and target of each action, its date and location, and information about the event itself, machine-coded from media reports.

GDELT:

The GDELT database (http://blog.gdeltproject.org/the-datasets-of-gdelt-as-of-february-2016/) is divided into three core data streams, capturing physical activity, counts of key incidents like death, and a graph structure capturing the latent and physical aspects of the global news into a single computable network. GDELT’s event data consists of a quarter-billion geo-referenced dyadic “event records” covering all countries in the world 1979 to present, capturing who did what to whom, when, and where in the CAMEO taxonomy. GDELT’s count data (2013-present) records mentions of counts of things with respect to a set of predefined categories such as a number of protesters, a number killed, or a number displaced or sickened. GDELT’s Global Knowledge Graph (2013-present) is an attempt to connect the people, organizations, locations, counts, themes, news sources, and events appearing in the news media across the globe each day into a single massive network that captures what’s happening around the world, what its context is and who’s involved, and how the world is feeling about it, each day.

KEDS:

The Kansas Event Data System, and its successor, the Penn State Event Data System, were toolchains built and applied by political scientist Philip Schrodt in developing some of the first computer-extracted event data sets. They utilize the TABARI parser, variants of which are used by GDELT and ICEWS. Dr. Schrodt’s webpage (http://eventdata.parusanalytics.com/data.html ) contains several specific event datasets covering different regions and periods of interest, many with area-specific dictionaries. These datasets are generally older than the other datasets presented here, and were created in order to answer particular research questions, and thus have been subject to more study.

ICEWS:

ICEWS is the Integrated Crisis Early Warning System dataset, originally developed for the United States Department of Defense by Lockheed Martin. Like GDELT, ICEWS contains worldwide, daily political events machine-coded from media sources, excluding internal US events. The public version of the dataset is released approximately monthly, on a year’s delay, and is available on the Harvard Dataverse (https://dataverse.harvard.edu/dataverse/icews) . ICEWS events date back to 1995, and each event includes source and target names, categories, and countries, as well as CAMEO codes, intensity scores, and geocoding down to the city level.

ICEWS also includes the Ground Truth Data Set. This dataset is expert-coded by month, and lists whether each country is undergoing an event of interest, such as an internal political crisis, insurgency, or external conflict.

Phoenix

This abstract is due on The Phoenix Data Project (http://phoenixdata.org/) is the output of the Open Event Data Alliance, using an open-source pipeline of tools to code daily political event data from news sources. It updates daily, and includes actor country and role codes, CAMEO event codes, and event geocoding. Since this is a more recent dataset, it only begins in April, 2014 (and has several missing days in 2014 before being available daily in 2015). However, it also has specific codes for contemporary actors of interest, such as ISIS. Events are also annotated with Issues, which provide additional event context.

Since the Phoenix pipeline is completely open source, users can deploy the tools themselves to collect events from custom media sources and streams. They can also modify the actor, event and issue dictionaries for their own use, or propose improvements to the main repository.

 

 

 

2015 Challenge Winners