Technology Sectors

Market Sectors

Using social media ‘Big Data’ analytics to analyze or predict events

Louis Chabot

In July, the Defense Advanced Research Projects Agency (DARPA) issued an RFP looking for new Big Data tools to track social media postings and interactions, reflecting a growing interest within government to use social media and open source data to "fill-in" and "complement" traditional data sources. 

These publicly available data sources include social networking sites, such as Twitter, Facebook and LinkedIn; Websites which contain publicly available or subscription-based content; and topic-specific online message boards and chat rooms. The focus for the government is on using this public data to develop analytics that can be used to anticipate how an adversary or potential friend "thinks" and "feels" about a particular situation, with the hope of being able to predict their behavior, actions and reactions. 

There are significant challenges and points for consideration in developing analytics against social media and open source data: 

Provenance and pedigree -- One of the major weaknesses with public data is that you cannot accurately confirm the identity of the sources and their motivations. From the outset you need to consider data provenance (the metadata that describes the source of the data) and make appropriate pedigree decisions (confidence in the data) when using this data in any analytic. Maintaining data provenance metadata and pedigree-based decision-making is not something you "bolt-onto" analytical software after the fact -- it is an integral part of the analytic that must be designed and included from the outset. 

Merging and re-using existing analytic techniques -- There are various initiatives such as the Mood of the Nation Project which have developed statistical machine-learning techniques for processing Twitter feeds to measure the Joy/Sadness/Anger/Fear level of an entire nation as major events unfold. These analytical techniques are often merged with other classic capabilities, such as entity extraction, relationship discovery and link analysis to offer greater insight into the data. 

Natural language processing (NLP) -- For Latin-rooted languages, NLP techniques are relatively mature, although they still struggle with social language aspects such as sarcasm, metaphors, acronyms, Internet shorthand or intentional encoding to disguise the meaning. It is critical that more mature techniques be developed to handle non-Latin based languages, as it is likely that future conflicts will take place in countries where these languages are used. 

Data Integration -- It's very likely that the social media and open source data will have to be integrated (or linked) with other more trusted data sources to derive a complete picture. Various data integration approaches have been developed over time (e.g., canonical data model, uber data model, software integration layer, semantics-based), but this is a very complex problem at any scale. The lack of any unique identifier (e.g, keys) in the social media and open source data makes it particularly difficult to integrate. 

Predictive aspect -- Many social media analytics look in the past and measure sentiment, influence or the size of an organization of interest. What is really needed are analytics that can predict potential future events. For example, the way villagers and local militia may react to rising food prices triggered by climate change, leading to drier local weather patterns which could produce a localized drought leading to a smaller harvest and a subsequent rise in flour prices. Sophisticated models are required to be able to do this kind of analytic, but therein lies the power of analytics. 

Also, in line with the predictive aspect, social media can contain "side channel" clues for otherwise hidden activity. In other words, discussions or tweets from people witnessing odd activity, such as high foot traffic around a previously isolated house. These details can then be used to focus our analytics on an area, person or group of interest. 

Real-time analytics -- Just a few minutes after an event -- and sometimes even before an event happens -- social media is abuzz with associated details. Traditional batch-based analytics, or analytics that execute against data at-rest, are not appropriate for such a scenario; a more near real-time data streaming analytic execution model is required. Real-time analytics provide the benefit of being able to witness an event unfolding, as it is occurring, with a level of timeliness that cannot be accomplished with at-rest analytics. Thus, we use social media analytics as a real-time indicator of where to look, as opposed to using some form of at-rest data mining across the entire data spectrum, which often is too late. 

Careful forethought and planning regarding the above points can ensure that social media and open source analytics reach their maximum potential. 

Louis Chabot, a professional software architect with more than 25 years of experience in information systems, leads the Big Data practice for Dynamics Research Corp. He can be reached at:

lchabot@drc.com

 

Upcoming Events

Event Details Dates of Event
SANS Austin 2013 May 19 - 24
DoD VA Healthcare Training Forum May 20 - 23
Transport and Logistics of Hazardous Material May 27 - 28
Southwest Microwave Seminar May 28 - 28
Border Management Southwest Summit May 29 - 31
Cyber Security Conference & Expo May 30 - 30
Mobile Device Security Summit 2013 May 30 - Jun 6
Security Analytics Summit 2013 May 30 - Jun 6
Cyber Security Conference & Expo May 30 - 30
Southwest Microwave Seminar May 30 - 30
SANS Malaysia @ MCMC 2013 Jun 3 - 8
2013 SIA Government Summit Jun 4 - 5
Southwest Microwave Seminar Jun 4 - 4
NCT: CBRNe Israel, 4 - 6 June 2013, Tel Aviv Jun 4 - 6
SEL Modern Solutions Power Systems Conference Jun 5 - 7
Mission Command Jun 10 - 12
Cyber Securty Brainstorm Jun 11 - 11
EDGE Summit 2013 Jun 11 - 11
IPv6 Summit 2013 Jun 14 - 16
SANSFIRE 2013 Jun 15 - 22
Oak Ridge National Laboratory's 2nd Biosurveillance Symposium Jun 17
Biodetection Technologies 2013 Jun 18 - 19
Southwest Microwave Seminar Jun 18 - 18
Cyber Defense and Network Security Summit Jun 24 - 26
Vanguard Security & Compliance 2013 Jun 24 - 27
SANS Canberra 2013 Jul 1 - 13
Border Management & Technologies Summit Jul 2 - 5
SANS Rocky Mountain 2013 Jul 15 - 20
SANS Mumbai 2013 Jul 22 - 27
SANS San Francisco 2013 Jul 29 - Aug 3
SANS Boston 2013 Aug 5 - 10
Cyber Security for Government Aug 12 - 14
SANS Thailand 2013 Aug 19 - 31
SANS Virginia Beach 2013 Aug 19 - 30
Maritime Security 2013 West Aug 19 - 21
930gov: Strategic Buying at Year-End Showcase Aug 21 - 21