Digital Version of November/December 2014 Print Edition
Open-Source R software driving Big Data analytics in government
As government agencies and departments expand their capabilities for collecting information, the volume and complexity of digital data stored for public purposes is far outstripping departments’ ability to make sense of it all. Even worse, with data siloed within individual departments and little cross-agency collaboration, untold hours and dollars are being spent on data collection and storage with return on investment in the form of information-based products and services for the public good.
But that may now be starting to change, with the Obama administration’s Big Data Research and Development Initiative.
In fact, the administration has had a Big Data agenda since its earliest days, with the appointment of Aneesh Chopra as the nation's first chief technology officer in 2009. (Chopra passed the mantle to current CTO Todd Park in March.) One of Chopra’s first initiatives was the creation of data.gov, a vehicle to make government data and open-source tools available in a timely and accessible format for a community of citizen data scientists to make sense of it all.
For example, independent statistical analysis of data released by data.gov revealed a flaw in the 2000 Census results that apparently went unnoticed by the Census Bureau.
The combination of Big Data in government and the nation’s CTO’s support of open-source initiatives has led to a proliferation of big-data applications in government departments. For agencies looking to gain insight from data stored in big-data platforms, like Hadoop (not to mention lowering software costs), the open-source statistical software “R” is a natural choice.
With a vibrant and active user community and support from big-data analytics provider Revolution Analytics, many government agencies have turned to the R for their data analysis needs. Here are a few examples:
*** In the early days of the Deepwater Horizon disaster, NIST used uncertainty analysis in R to harmonize spill estimates from various sources, and to provide ranges of estimates to other agencies and the media.
*** Before new drugs are allowed on the market, the FDA works with pharmaceutical companies to verify safety and efficacy through clinical trials. Despite a false perception that only commercial software may be used, many pharmaceutical companies are now using open-source R to analyze data from clinical trials.
*** The National Weather Service uses R for research and development of models to predict river flooding.
*** The newly-formed Consumer Financial Protection Bureau -- freed from the restrictions of a legacy IT infrastructure -- is championing the use of open-source technologies in government.
*** Local governments are also building data-based applications. The SF Estuary Institute uses R and Google Maps to provide a tool to track pollution in the San Francisco Bay area.
Together, Big Data and open-source technologies for data storage and predictive analytics form a powerful combination for government agencies to provide improved services and to better understand how well they are fulfilling their missions. With continued focus on Big Data technology, and more open government programs like data.gov to encourage citizen participation in crowd-sourced data analysis, we can expect to see more departments providing more sophisticated data-based applications for their customers and constituents.
David Smith is the vice president of marketing and community for analytics software provider Revolution Analytics. He can be reached at: