Using Scalable Machine Learning to
Understand Violent Collective Action

APSA 2016 Annual Meeting & Exhibition
Panel: Coding and Validating Political Event Data
September 1st, 2016


Jake Ryland Williams
Assistant Professor
Department of Information Science
College of Computing and Informatics
Drexel University
Lefteris Jason Anastasopoulos
Assistant Professor
School of Public and International Affairs
Georgia Informatics Institute
University of Georgia, Athens

Overview

  • Focus: Understanding social action through social media text.
    • Why?
      • The use of social media has become ubiquitous.
      • Users can record even the most mundane actions.
      • Along with text, may come other important data.

    Twitter: a grassroots tool for mobilization

  • Traditional mass media requires significant resources to employ.
  • Cheap and accessible, social media has empowered many.

    • For those unfamiliar with Twitter:
      • Tweets are short (140 character limit) conversational messages.
      • These can be geo-located, allowing for movement tracking.
      • Twitter has a huge user base (more than 300 million active).

    Dataset 1: location-tagged messages

    • The 'PullTweets' database:
      • Collected from Twitter's public (spritzer, 1%) API.
      • At the time (2014), over 1% of tweets were geo-tagged.
      • Targeted filtering rendered over 600 million geo-tagged tweets.
      • Data were enriched with polygon tags (country, state, county).

    Tweets

    Dataset 2: space-time tagged events

    • The AP images database:
      • A database of news-covered Associate Press images.
      • Images are searchable, tagged by keywords, e.g., 'protest.'
      • Along with captions, the data include space-time tags.

    AP images


    The language of modern social action

  • We've gathered 2014 protest times & locations from the AP.
  • Using these, we've filtered tweets by protest times and locations.
    • We code these tweets as representatives of social action types:

      • collective peace, e.g., vigils, trials, singing/chanting;
      • collective force, e.g., mass-arrests, looting, blockades;
      • singular peace, e.g., discourse, care, disgust; and
      • singular force, e.g., physical assault, instigation.

  • Language is most often entirely unrelated,
  • but can represent any number of action types at once.
  • Examples

    Methodology

  • Task: Classify tweets for action types independently.

    • Algorithm: binary, Naive Bayes (NB) classifiers with enhancements:
      • Instead of just words, phrases, e.g., 'tear gas,' employ context.
      • We decompose NB classifications:


        to help answer 'why?'


    • We also:
      • cluster tweets spatially to track movements of groups, and
      • track time series and look for aberrant signatures.

    Expected performance

    In-domain

    Out-of-domain

    Prototypical applications

  • We elucidate social action at very fine temporal and spatial scales.
  • With this, we can aggregate all collective action at higher levels of geography (e.g., Census tract, county, etc.).
  • This enables researchers to study protest activity spatially.
  • Collective action by county

    Forceful collective action by county

    Building a real-time application

  • Our tweets were collected from the public API over 2014.
  • Back then, geo-tagging was prevalant (>1% of all tweets).
    • Policy decisions (integration with Foursquare) led to a decline.
      • Currently, roughly 0.2% of tweets are geo-tagged.
      • Replaced by soft locations, e.g., San Fransisco, Hawaii.

    Upcoming challenges

  • Coding remaining data and building stronger coding guidelines.
  • Building explorable, online interactive graphics.
  • Acquisition of and scaling for larger-stream data.
  • Re-engaging Twitter's community for geo-location participation.
  • Collating actions into events.
  • Questions? Thanks!


    Jake Ryland Williams
    Assistant Professor
    Department of Information Science
    College of Computing and Informatics
    Drexel University
    Lefteris Jason Anastasopoulos
    Assistant Professor
    School of Public and International Affairs
    Georgia Informatics Institute
    University of Georgia, Athens