(as part of the SBP-BRiMS2016 conference - June 28-July 1, 2016)
Open Source data, such as news, public data sets, and social media, are often used to understand human socio-cultural and political-economic behavior. The vast volume of open source information poses great opportunities, giving ever-increasing access to the real time behavior and beliefs of the world's citizenry, but also requires a fundamental shift in how this information is understood and utilized. The concept of quantifying human behavior into discrete "occurrences", “moments” or “networks of relations” underlies much of the social and policy sciences; however, today, the sheer volume of data that can be analyzed in such forms has escalated.
Fundamental research problems exist in how to fuse data, how to identify the relevant portions of the data, how assess change in the data, how to sample the data, and how to visualize the data. These issues must be met to advance social theorizing and improve policy analysis. This year’s SBP-BRiMS challenge problem invites you to take part in addressing one or more of these challenges.
Using at least one of four political event datasets (GDELT, KEDS, ICEWS, Phoenix) and one other data set (which may be a second one of these event datasets, or any other relevant dataset), this year’s challenge problem asks participants to address any issue of interest to you or your team that involves events and their distribution over time or space. All entries must have both a strong social theory, political theory or policy perspective and a strong methodology perspective.
Please email sbp-2016-grand-challenge-group@googlegroups.com with any questions or inquiries.
A strong entry will have one or more of these components:
In addition, a strong entry should be well-written and provide some level of creativity in its use of or combination of data.
What to Submit
A 2-page abstract describing the project. This should define:
When to Submit
This abstract is due on May 1, 2016 May 30, 2016.
How to Submit
All abstracts should be submitted through the EasyChair link at at: https://easychair.org/conferences/?conf=sbpbrims2016 for the challenge on the SBP-BRiMS 2016 website. Please be sure to select the Challenge track.
What to Present
All entries will send at least one team member to SBP-BRiMS that will be registered for the conference by the early registration deadline to present their poster in the poster-session on June 30. Participants may bring in additional props to enhance their presentation.
NOTE: All challenge people should:
How entries will be judged
Entries will be judged by community voting at the poster session.
Who is eligible
Anyone with an interest in using this data to address a social or policy issue. Entries are accepted from single individuals or teams.
The winning entry will be invited to send a team member to present the project at SBP-BRiMS 2017. In addition, winning entry and selected other entries will submit a full paper to a special issue of Computational and Mathematical Organization Theory.
We invite participants to explore one or more of these data sets: GDELT, KEDS, ICEWS or Phoenix in conjunction with a second data set which may or may not be from this list. The datasets below consist of records of individual politically-relevant events, from statements of support to military attacks. Each record includes information on the source and target of each action, its date and location, and information about the event itself, machine-coded from media reports.
GDELT:
The GDELT database (http://blog.gdeltproject.org/the-datasets-of-gdelt-as-of-february-2016/) is divided into three core data streams, capturing physical activity, counts of key incidents like death, and a graph structure capturing the latent and physical aspects of the global news into a single computable network. GDELT’s event data consists of a quarter-billion geo-referenced dyadic “event records” covering all countries in the world 1979 to present, capturing who did what to whom, when, and where in the CAMEO taxonomy. GDELT’s count data (2013-present) records mentions of counts of things with respect to a set of predefined categories such as a number of protesters, a number killed, or a number displaced or sickened. GDELT’s Global Knowledge Graph (2013-present) is an attempt to connect the people, organizations, locations, counts, themes, news sources, and events appearing in the news media across the globe each day into a single massive network that captures what’s happening around the world, what its context is and who’s involved, and how the world is feeling about it, each day.
KEDS:
The Kansas Event Data System, and its successor, the Penn State Event Data System, were toolchains built and applied by political scientist Philip Schrodt in developing some of the first computer-extracted event data sets. They utilize the TABARI parser, variants of which are used by GDELT and ICEWS. Dr. Schrodt’s webpage (http://eventdata.parusanalytics.com/data.html ) contains several specific event datasets covering different regions and periods of interest, many with area-specific dictionaries. These datasets are generally older than the other datasets presented here, and were created in order to answer particular research questions, and thus have been subject to more study.
ICEWS:
ICEWS is the Integrated Crisis Early Warning System dataset, originally developed for the United States Department of Defense by Lockheed Martin. Like GDELT, ICEWS contains worldwide, daily political events machine-coded from media sources, excluding internal US events. The public version of the dataset is released approximately monthly, on a year’s delay, and is available on the Harvard Dataverse (https://dataverse.harvard.edu/dataverse/icews) . ICEWS events date back to 1995, and each event includes source and target names, categories, and countries, as well as CAMEO codes, intensity scores, and geocoding down to the city level.
ICEWS also includes the Ground Truth Data Set. This dataset is expert-coded by month, and lists whether each country is undergoing an event of interest, such as an internal political crisis, insurgency, or external conflict.
Phoenix
This abstract is due on The Phoenix Data Project (http://phoenixdata.org/) is the output of the Open Event Data Alliance, using an open-source pipeline of tools to code daily political event data from news sources. It updates daily, and includes actor country and role codes, CAMEO event codes, and event geocoding. Since this is a more recent dataset, it only begins in April, 2014 (and has several missing days in 2014 before being available daily in 2015). However, it also has specific codes for contemporary actors of interest, such as ISIS. Events are also annotated with Issues, which provide additional event context.
Since the Phoenix pipeline is completely open source, users can deploy the tools themselves to collect events from custom media sources and streams. They can also modify the actor, event and issue dictionaries for their own use, or propose improvements to the main repository.