17th International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation
September 18-20, 2024, Hybrid

Challenge 2: Synthetic Social Media Data Creation Challenge

Overview

Increasingly synthetic data is being proposed as the preferred data for social media training, planning and tool evaluation. Synthetic social media data is beneficial in the educational, planning, and tool evaluation contexts for several reasons:

  • Data Privacy: Synthetic data can be generated without compromising user privacy. This is particularly important when dealing with sensitive data such as Personally Identifiable Information (PII) or Personal Health Information (PHI), which are subject to strict privacy regulations.
  • Data Availability: Synthetic data can be created to represent specific scenarios or conditions that may not be readily available in real-world data. This is especially useful when the data needed for training is rare or expensive to collect.
  • Bias Reduction: Training on synthetic data can help reduce bias present in real-world datasets. This is crucial for building fair and unbiased machine learning models.
  • Cost-Effective: Generating synthetic data is often less expensive than collecting and labeling large real-world datasets.
  • Performance: In certain situations, machine learning models trained on synthetic data can outperform models trained on real data.

Large language models can be used to rapidly develop synthetic social media data with negligible human input. This allows for the creation of large volumes of content in a fraction of the time it would take with traditional media.

However, large language models have limitations in creating such data such as inability to reproduce broken or slang or abusive language, hallucinations, and operating at scale. The goal of this challenge problem is to increase the state of the art in generating realistic and usable social media data for educational, planning and tool evaluation purposes.

Challenge:

Each team should create one entry. An entry is 10,000 Tweets. Using context from the scenario as follows, teams should generate tweets like those you would expect to see if the event was actually occurring. Tweets should have the following baseline features:

  • Submitted in the Twitter/X API v2 json format.
  • Include between 300 and 500 synthetic user accounts.
  • Produced tweet text should be in the English language.
  • Tweets should be generated in the dates leading up and shortly after the scenario; i.e., all synthetic tweets should have dates between May 30th and June 3rd, 2024.
  • Synthetic tweets should be from a set of actors, including but not limited to those described below. When a user name is given, use that exact name.
  • Synthetic tweets can include hashtags, including but not limited to, those listed below. When using a predefined hashtag please enter it exactly as shown.
  • Synthetic tweets can reference, quote, paraphrase any of the material in the scenario description.
  • Synthetic tweets, if they includes URLS, should be to only the URLs that are provided.

All challenge entries will be assessed by the MOMUS validation system for social media. MOMUS evaluates synthetic data for diverse platforms using over 200 indicators. For this challenge event we will use the MOMUS indicators for primary emotions, semantic features, syntactic features.

For the convenience of the synthetic data generators, this information is reflected in the json file entitled – scenario description. There will be a few pr-generated images and news articles that can be used in this synthetic data. These will be provided within the month.

Each team may submit their synthetic corpus twice. First to get MOMUS feedback and second to submit the challenge contribution. The first trial submission can be done at any time. You can use the MOMUS feedback to improve your final submission. To submit a synthetic corpus, zip the json file and send it to: kathleen.carley@cs.cmu.edu.

Scenario:

Overview:

In 2040, the Arctic sea caps have melted, leading to increased maritime and aerial traffic to Arkhangelsk Oblast. Heliexpress has announced a new series of helicopter tours from Arkhangelsk Oblast, with routes flying over Kong Karls Land. This surge in traffic has sparked concerns among environmentalists, who fear it may pose a threat to the polar bears, walruses, and other wildlife inhabiting the Nordaust-Svalbard Nature Reserve. The area, previously a polar desert, has recently seen a proliferation of deciduous plants. There are also concerns that helicopter landings in this region could damage this burgeoning forestation. In response, a large-scale protest against Heliexpress is scheduled for June 1, 2040.

Grass Roots Environmental Organization:

The environmental group "If Not Now, Then When?" (INNW) comprises a widespread network of activists around the globe, with significant concentrations in Australia, the United States (particularly the Pacific Northwest), Ireland, and the UK. Kaiara Willowbank, a prominent grassroots blogger for INNW, recently published a blog post addressing the issue at hand.

In her post, Willowbank criticizes the Norwegian Government and its President for their silence on the matter. She argues that in such challenging times, it is crucial for leaders and nations to adopt a firmer stance in dealing with companies and countries, like Russia and Heliexpress LTD, that seek to exploit situations to their advantage.

Grass Roots Primary Source:

"In these critical moments, silence is not just absence—it's acquiescence. It's essential that our world leaders, including the Norwegian Government, rise to the challenge and confront those who view our environmental crises as opportunities for exploitation. Russia and Heliexpress LTD are just the tip of the iceberg. We need action, commitment, and transparency, now more than ever. Hold Norway to task, #ShameOnNorway," stated Kaiara Willowbank, a vocal advocate and blogger for "If Not Now, Then When?".

International Environmental Organization:

EcoVanguard Solutions, an international NGO, focuses on environmental issues in the Arctic Sea region, particularly pollution, due to the area's increased activity over recent years. Anya Chatterjee-Smith, the Chief Communications Officer of their Arctic Sea Division, has publicly criticized Heliexpress LTD for not being transparent about how they plan to mitigate their impact on endangered species populations. Additionally, she has openly condemned Russia and Igor Petrovich Kuznetsov, the Governor of Arkhangelsk Oblast, for their disregard for the region's escalating environmental challenges, specifically pointing out their lack of concern for this pressing issue.

International Environmental Organization, Primary Source:

"As the Chief Communications Officer of EcoVanguard Solutions' Arctic Sea Division, I must express our profound disappointment in the lack of transparency and concern from Heliexpress LTD, the Russian government, and particularly Governor Igor Petrovich Kuznetsov of Arkhangelsk Oblast. Their disregard for the critical environmental issues facing the Arctic Sea region, especially the threat to endangered species, is unacceptable. Immediate action and open dialogue are essential to address these pressing challenges effectively. We need to work together to make sure our children and grandchildren have A Greener Tomorrow™" – Public Statement from Anya Chatterjee-Smith, CCO of EcoVanguard Solutions Arctic Sea Division.

Environmental Economist Professor:

Rowan Emerson, a socio-ecologist and environmental economist, recently spoke on Planetwise Broadcast Radio (PBR), emphasizing that EcoVanguard Solutions should collaborate with grassroots organizations like "If Not Now, Then When?" (INNW). He highlighted that although INNW may lack the funding of larger organizations, they have a broader base of support and can mobilize more voices. Emerson pointed out that EcoVanguard's criticism of the Governor of Arkhangelsk Oblast and the President of Heliexpress, while excluding the Norwegian government, indicates a disconnect from the wider environmental movement—a perspective clearly demonstrated by INNW.

Environmental Economist Professor, Primary Source:

"Rowan Emerson criticizes EcoVanguard Solutions for their narrow focus on figures like the President of Heliexpress and Governor Igor Kuznetsov, overlooking the potential of grassroots mobilization through 'If Not Now, Then When?' and the need to engage with the Norwegian government and its president. 'True environmental progress demands that we harness grassroots energy and direct our advocacy towards all pivotal actors, including those at the highest levels of government. By sidelining groups like INNW and not mobilizing against broader targets such as Norway's leadership, we miss critical opportunities for impactful change,' Emerson argues."

Yet To Comment:

The following organizations and individuals have yet to comment, and therefore do not have any primary source information available.

  • Norway
    • Norwegian Ministry of Foreign Affairs
    • Norwegian President, Ingrid Johansen
  • Russia
    • Ministry of Foreign Affairs of the Russian Federation
    • Governor of Arkhangelsk Oblast, Petrovich Kuznetsov
  • Organizations
    • Heliexpress LTD
  • People
    • Member of Environmental Group "If Not Now, Then When?"
      • Name: Kaiara Willowbank
      • Username: @KaiaraNoBrakesWillow
      • Chief Communications Officer (CCO) of INGO Environmental Organization "EcoVanguard Solutions"
        • Name: Anya Chatterjee-Smith
        • Username: @AnyaEVS
      • Social Movement scholar from the United States (Socio-ecologist and Economics Professor from a small liberal arts college outside of Boston, MA)
        • Name: Rowan Emerson
        • Username: @RowanEmersonPhD
      • Governor of Arkhangelsk Oblast, Russia.
        • Name: Petrovich Kuznetsov
        • Username: @Kuznetsov_RF
      • Norwegian President
        • Name: Ingrid Johansen
        • Username: @IngridJohansen
      • Organizations:
        • "If Not Now, Then When," Grassroots Environmental Group
          • Usernames: @innw & @innw_US
          • Hashtags Used: #INNW #IfNotNowThenWhen #ShameOnNorway
        • "EcoVanguard Solutions," International Non-Governmental Environmental Organization
          • Username: @agreenertomorrowEVS
          • Hashtags: #agreenertomorrow #EcoVanguardSolutions #ecomovement #ProtectNSReserve
        • "Heliexpress LTD," Russian Helicopter Tour Company with new tours from Arkhangelsk Oblast passing over Kong Karls Land.
          • Username: @heliexpressLTD
          • Hashtag: #heliexpresstours #helitours_RU
        • Norwegian Ministry of Foreign Affairs
          • Username: @NorwayMFA
        • Ministry of Foreign Affairs of the Russian Federation
          • Username: @MFA_Russia
General Use Hashtags:
  • #Artic
  • #ArkhangelskOblast
  • #EcoHellTours
  • #GreenerEco
URLS:
  • A series of URLS will be provided for news articles and images
Other requirements:
  • Each team should have a member registered for the conference, who will attend and present a poster on how this challenge was addressed and the strengths and weaknesses of their synthetic data generation system. Note, this is a unique registration; this means that if a team member is presenting a paper or poster at SBP-BRiMS the person registered for the challenge should be a different person.
  • The presenter is responsible for printing and bringing the team poster. The poster should be 5x3 or 3x5. Easels will be provided to set the poster on.
  • A 2-5 page description of the solution to the challenge problem that is being proposed, strengths and limitations – which is submitted through Easy Chair by July 21, 2024.
  • Author notification, August 4, 2024.
  • The final synthetic data should be submitted by September 1, 2024.
  • The authors should print their own poster and bring it to the conference and plan on discussing it during a poster session.
  • The winner of this challenge, will write up a short 10 to 20 page paper detailing approach, strengths and weaknesses, and discussing the Momus results which will be sent to the journal – Computational and Mathematical Organization Theory for the special issue based on SBP-BRiMS 2024.
  • Note – any questions on this challenge should be addressed to Dr. Kathleen M. Carley - kathleen.carley@cs.cmu.edu