Undefined By Data: A Survey of Big Data Definitions


Paper by Jonathan Stuart Ward and Adam Barker: “The term big data has become ubiquitous. Owing to shared origin between academia, industry and the media there is no single unified definition, and various stakeholders provide diverse and often contradictory definitions. The lack of a consistent definition introduces ambiguity and hampers discourse relating to big data. This short paper attempts to collate the various definitions which have gained some degree of traction and to furnish a clear and concise definition of an otherwise ambiguous term…
Despite the range and differences existing within each of the aforementioned definitions there are some points of similarity. Notably all definitions make at least one of the following assertions:
Size: the volume of the datasets is a critical factor.
Complexity: the structure, behaviour and permutations of the datasets is a critical factor.
Technologies: the tools and techniques which are used to process a sizable or complex dataset is a critical factor.
The definitions surveyed here all encompass at least one of these factors, most encompass two. An extrapolation of these factors would therefore postulate the following: Big data is a term describing the storage and analysis of large and or complex data sets using a series of techniques including, but not limited to: NoSQL, MapReduce and machine learning.”

Using Participatory Crowdsourcing in South Africa to Create a Safer Living Environment


New Paper by Bhaveer Bhana, Stephen Flowerday, and Aharon Satt in the International Journal of Distributed Sensor Networks: “The increase in urbanisation is making the management of city resources a difficult task. Data collected through observations (utilising humans as sensors) of the city surroundings can be used to improve decision making in terms of managing these resources. However, the data collected must be of a certain quality in order to ensure that effective and efficient decisions are made. This study is focused on the improvement of emergency and non-emergency services (city resources) through the use of participatory crowdsourcing (humans as sensors) as a data collection method (collect public safety data), utilising voice technology in the form of an interactive voice response (IVR) system.
The study illustrates how participatory crowdsourcing (specifically humans as sensors) can be used as a Smart City initiative focusing on public safety by illustrating what is required to contribute to the Smart City, and developing a roadmap in the form of a model to assist decision making when selecting an optimal crowdsourcing initiative. Public safety data quality criteria were developed to assess and identify the problems affecting data quality.
This study is guided by design science methodology and applies three driving theories: the Data Information Knowledge Action Result (DIKAR) model, the characteristics of a Smart City, and a credible Data Quality Framework. Four critical success factors were developed to ensure high quality public safety data is collected through participatory crowdsourcing utilising voice technologies.”

Digital Participation – The Case of the Italian 'Dialogue with Citizens'


New paper by Gianluca Sgueo presented at Democracy and Technology – Europe in Tension from the 19th to the 21th Century – Sorbonne Paris, 2013: “This paper focuses on the initiative named “Dialogue With Citizens” that the Italian Government introduced in 2012. The Dialogue was an entirely web-based experiment of participatory democracy aimed at, first, informing citizens through documents and in-depth analysis and, second, designed for answering to their questions and requests. During the year and half of life of the initiative roughly 90.000 people wrote (approximately 5000 messages/month). Additionally, almost 200.000 participated in a number of public online consultations that the government launched in concomitance with the adoption of crucial decisions (i.e. the spending review national program).
From the analysis of this experiment of participatory democracy three questions can be raised. (1) How can a public institution maximize the profits of participation and minimize its costs? (2) How can public administrations manage the (growing) expectations of the citizens once they become accustomed to participation? (3) Is online participatory democracy going to develop further, and why?
In order to fully answer such questions, the paper proceeds as follows: it will initially provide a general overview of online public participation both at the central and the local level. It will then discuss the “Dialogue with Citizens” and a selected number of online public consultations lead by the Italian government in 2012. The conclusions will develop a theoretical framework for reflection on the peculiarities and problems of the web-participation.”

Mobile phone data are a treasure-trove for development


Paul van der Boor and Amy Wesolowski in SciDevNet: “Each of us generates streams of digital information — a digital ‘exhaust trail’ that provides real-time information to guide decisions that affect our lives. For example, Google informs us about traffic by using both its ‘My Location’ feature on mobile phones and third-party databases to aggregate location data. BBVA, one of Spain’s largest banks, analyses transactions such as credit card payments as well as ATM withdrawals to find out when and where peak spending occurs.This type of data harvest is of great value. But, often, there is so much data that its owners lack the know-how to process it and fail to realise its potential value to policymakers.
Meanwhile, many countries, particularly in the developing world, have a dearth of information. In resource-poor nations, the public sector often lives in an analogue world where piles of paper impede operations and policymakers are hindered by uncertainty about their own strengths and capabilities.Nonetheless, mobile phones have quickly pervaded the lives of even the poorest: 75 per cent of the world’s 5.5 billion mobile subscriptions are in emerging markets. These people are also generating digital trails of anything from their movements to mobile phone top-up patterns. It may seem that putting this information to use would take vast analytical capacity. But using relatively simple methods, researchers can analyse existing mobile phone data, especially in poor countries, to improve decision-making.
Think of existing, available data as low-hanging fruit that we — two graduate students — could analyse in less than a month. This is not a test of data-scientist prowess, but more a way of saying that anyone could do it.
There are three areas that should be ‘low-hanging fruit’ in terms of their potential to dramatically improve decision-making in information-poor countries: coupling healthcare data with mobile phone data to predict disease outbreaks; using mobile phone money transactions and top-up data to assess economic growth; and predicting travel patterns after a natural disaster using historical movement patterns from mobile phone data to design robust response programmes.
Another possibility is using call-data records to analyse urban movement to identify traffic congestion points. Nationally, this can be used to prioritise infrastructure projects such as road expansion and bridge building.
The information that these analyses could provide would be lifesaving — not just informative or revenue-increasing, like much of this work currently performed in developed countries.
But some work of high social value is being done. For example, different teams of European and US researchers are trying to estimate the links between mobile phone use and regional economic development. They are using various techniques, such as merging night-time satellite imagery from NASA with mobile phone data to create behavioural fingerprints. They have found that this may be a cost-effective way to understand a country’s economic activity and, potentially, guide government spending.
Another example is given by researchers (including one of this article’s authors) who have analysed call-data records from subscribers in Kenya to understand malaria transmission within the country and design better strategies for its elimination. [1]
In this study, published in Science, the location data of the mobile phones of more than 14 million Kenyan subscribers was combined with national malaria prevalence data. After identifying the sources and sinks of malaria parasites and overlaying these with phone movements, analysis was used to identify likely transmission corridors. UK scientists later used similar methods to create different epidemic scenarios for the Côte d’Ivoire.”

Making All Voices Count


Launch of Making All Voices Count: “Making All Voices Count is a global initiative that supports innovation, scaling-up, and research to deepen existing innovations and help harness new technologies to enable citizen engagement and government responsiveness….Solvable problems need not remain unsolved. Democratic systems in the 21st century continue to be inhibited by 19th century timescales, with only occasional opportunities for citizens to express their views formally, such as during elections. In this century, many citizens have access to numerous tools that enable them to express their views – and measure government performance – in real time.
For example, online reporting platforms enable citizens to monitor the election process by reporting intimidation, vote buying, bias and misinformation; access to mobile technology allows citizens to update water suppliers on gaps in service delivery; crisis information can be crowdsourced via eyewitness reports of violence, as reported by email and sms.
The rise of mobile communication, the installation of broadband and the fast-growing availability of open data, offer tremendous opportunities for data journalism and new media channels. They can inspire governments to develop new ways to fight corruption and respond to citizens efficiently, effectively and fairly. In short, developments in technology and innovation mean that government and citizens can interact like never before.
Making All Voices Count is about seizing this moment to strengthen our commitments to promote transparency, fight corruption, empower citizens, and harness the power of new technologies to make government more effective and accountable.
The programme specifically aims to address the following barriers that weaken the link between governments and citizens:

  • Citizens lack incentives: Citizens may not have the necessary incentives to express their feedback on government performance – due to a sense of powerlessness, distrust in the government, fear of retribution, or lack of reliable information
  • Governments lack incentives: At the same time, governments need incentives to respond to citizen input whenever possible and to leverage citizen participation. The government’s response to citizens should be reinforced by proactive, public communication.  This initiative will help create incentives for government to respond.  Where government responds effectively, citizens’ confidence in government performance and approval ratings are likely to increase
  • Governments lack the ability to translate citizen feedback into action: This could be due to anything from political constraints to a lack of skills and systems. Governments need better tools to effectively analyze and translate citizen input into information that will lead to solutions and shape resource allocation. Once captured, citizens’ feedback (on their experiences with government performance) must be communicated so as to engage both the government and the broader public in finding a solution.
  • Citizens lack meaningful opportunities: Citizens need greater access to better tools and know-how to easily engage with government in a way that results in government action and citizen empowerment”

MicroMappers: Microtasking for Disaster Response


Patrick Meier: “My team and I at QCRI are about to launch MicroMappers: the first ever set of microtasking apps specifically customized for digital humanitarian response. If you’re new to microtasking in the context of disaster response, then I recommend reading this, this and this. The purpose of our web-based microtasking apps (we call them Clickers) is to quickly make sense of all the user-generated, multi-media content posted on social media during disasters. How? By using microtasking and making it as easy as a single click of the mouse to become a digital humanitarian volunteer. This is how volunteers with Zooniverse were able to click-and-thus-tag well over 2,000,000 images in under 48-hours.
We have already developed and customized four Clickers using the free and open source microtasking platform CrowdCrafting: TweetClicker, TweetGeoClicker, ImageClicker and ImageGeoClicker. Each Clicker includes a mini-tutorial to guide volunteers.”

5 Ways Cities Are Using Big Data


Eric Larson in Mashable: “New York City released more than 200 high-value data sets to the public on Monday — a way, in part, to provide more content for open-sourced mapping projects like OpenStreetMap.
It’s one of the many releases since the Local Law 11 of 2012 passed in February, which calls for more transparency of the city government’s collected data.
But it’s not just New York: Cities across the world, large and small, are utilizing big data sets — like traffic statistics, energy consumption rates and GPS mapping — to launch projects to help their respective communities.
We rounded up a few of our favorites below….

1. Seattle’s Power Consumption

The city of Seattle recently partnered with Microsoft and Accenture on a pilot project to reduce the area’s energy usage. Using Microsoft’s Azure cloud, the project will collect and analyze hundreds of data sets collected from four downtown buildings’ management systems.
With predictive analytics, then, the system will work to find out what’s working and what’s not — i.e. where energy can be used less, or not at all. The goal is to reduce power usage by 25%.

2. SpotHero

Finding parking spots — especially in big cities — is undoubtably a headache.

SpotHero is an app, for both iOS and Android devices, that tracks down parking spots in a select number of cities. How it works: Users type in an address or neighborhood (say, Adams Morgan in Washington, D.C.) and are taken to a listing of available garages and lots nearby — complete with prices and time durations.
The app tracks availability in real-time, too, so a spot is updated in the system as soon as it’s snagged.
Seven cities are currently synced with the app: Washington, D.C., New York, Chicago, Baltimore, Boston, Milwaukee and Newark, N.J.

3. Adopt-a-Hydrant

Anyone who’s spent a winter in Boston will agree: it snows.

In January, the city’s Office of New Urban Mechanics released an app called Adopt-a-Hydrant. The program is mapped with every fire hydrant in the city proper — more than 13,000, according to a Harvard blog post — and lets residents pledge to shovel out one, or as many as they choose, in the almost inevitable event of a blizzard.
Once a pledge is made, volunteers receive a notification if their hydrant — or hydrants — become buried in snow.

4. Adopt-a-Sidewalk

Similar to Adopt-a-Hydrant, Chicago’s Adopt-a-Sidewalk app lets residents of the Windy City pledge to shovel sidewalks after snowfall. In a city just as notorious for snowstorms as Boston, it’s an effective way to ensure public spaces remain free of snow and ice — especially spaces belonging to the elderly or disabled.

If you’re unsure which part of town you’d like to “adopt,” just register on the website and browse the map — you’ll receive a pop-up notification for each street you swipe that’s still available.

5. Less Congestion for Lyon

Last year, researchers at IBM teamed up with the city of Lyon, France (about four hours south of Paris), to build a system that helps traffic operators reduce congestion on the road.

The system, called the “Decision Support System Optimizer (DSSO),” uses real-time traffic reports to detect and predict congestions. If an operator sees that a traffic jam is likely to occur, then, she/he can adjust traffic signals accordingly to keep the flow of cars moving smoothly.
It’s an especially helpful tool for emergencies — say, when an ambulance is en route to the hospital. Over time, the algorithms in the system will “learn” from its most successful recommendations, then apply that knowledge when making future predictions.”

Open-Government Laws Fuel Hedge-Fund Profits


Wall Street Journal: “Hedge Funds Are Using FOIA Requests to Obtain Nonpublic Information From Federal Agencies…When SAC Capital Advisors LP was weighing an investment in Vertex Pharmaceuticals Inc., the hedge-fund firm contacted a source it knew would provide nonpublic information without blinking: the federal government.
An investment manager for an SAC affiliate asked the Food and Drug Administration last December for any “adverse event reports” for Vertex’s recently approved cystic-fibrosis drug. Under the Freedom of Information Act, the agency had to hand over the material, which revealed no major problems. The bill: $72.50, cheaper than the price of two Vertex shares.
SAC and its affiliate, Sigma Capital Management LLC, snapped up 13,500 Vertex shares in the first quarter and options to buy 25,000 more, securities filings indicate. The stock rose that quarter, then surged 62% on a single day in April when Vertex announced positive results from safety tests on a separate cystic-fibrosis drug designed to be used in combination with the first.
Finance professionals have been pulling every lever they can these days to extract information from the government. Many have discovered that the biggest lever of all is the one available to everyone—the Freedom of Information Act—conceived by advocates of open government to shine light on how officials make decisions. FOIA is part of an array of techniques sophisticated investors are using to try to obtain potentially market-moving information about products, legislation, regulation and government economic statistics.
“It’s an information arms race,” says Les Funtleyder, a longtime portfolio manager and now partner at private-equity firm Poliwogg Holdings Inc. “It’s important to try every avenue. If anyone else is doing it, you need to, too.”
A review by The Wall Street Journal of more than 100,000 of the roughly three million FOIA requests filed over the past five years, including all of those sent to the FDA, shows that investors use the process to troll for all kinds of information. They ask the Environmental Protection Agency about pollution regulations, the Department of Energy about grants for energy-efficient vehicles, and the Securities and Exchange Commission about whether publicly held companies are under investigation. Such requests are perfectly legal.”
See also “Making FOIA More Free and Open” (Joel Gurin)

Collaboration Between Government and Outreach Organizations: A Case Study of the Department of Veterans Affairs


“In this report, Drs. Lael Keiser and Susan Miller examine the critical role of non-governmental outreach organizations in assisting government agencies to determine benefit eligibility of citizens applying for services.  Many non-profits and other organizations help low-income applicants apply for Social Security, Medicaid, and the Supplemental Nutritional Assistance Program (SNAP, or food stamps).
Some outreach organizations help veterans navigate the complexity of the veterans disability benefits program.  These organizations include the American Legion, the Disabled American Veterans, and the Veterans of Foreign Wars, as well as state government-run veterans agencies.  Drs. Keiser and Miller interviewed dozens of managers from the Department of Veterans Affairs (VA) and outreach organizations about their interactions in helping veterans.  They found “there is indeed effective collaboration” and that these organizations serve a key role for veterans in processing their claims.  These organizations also help lighten the workload of VA benefit examiners by ensuring the paperwork is in order in advance, as well as serving as a communications conduit.
Drs. Keiser and Miller found variations in the effectiveness of the relationships between VA and outreach organization staffs and identified best practices for increasing effectiveness.  These lessons can be applied to other agencies that interactive frequently with outreach organizations that assist citizens in navigating the complexity of applying for various government benefit programs.
Listen to the interview on Federal News Radio.”

Data Swap


GlobeLab @ The Boston GlobeData: “Data swap 2013 is an exclusive opportunity to work on complex, real-world problems, with rich and large-scale datasets and individuals with diverse skills and backgrounds from research, government, and civic organizations throughout Boston.
This isn’t your mother’s hackathon.
There’s no conference room full over over-caffeinated and under-deodorized engineers, no 72 hour time limit, and no room for shoddy prototypes. This is an opportunity for a select number of gifted researchers to join interdisciplinary teams to work on the pressing and meaningful problems facing Boston communities.
Unlike hackathons, meant to generate quick ideas and prototypes in a short period of time, DataSwap is about forging and supporting long-term collaborations between researchers, communities and data guardians. Groups sharing common interests and complementary skills will collaborate around specific problems. Each problem will be proposed by the owners of one of the datasets who present. On day one at The Boston Globe, you’ll learn more about that dataset and others to help you in your research. You’ll be given a community facilitator to help you craft useful research that is relevant outside the bounds of academia. Then, it’s up to you! Over the next several months, you and your team are challenged to craft a presentation around the problem you were given. At the conclusion of the time frame, we’ll reconvene to share our findings with one another and choose a winner.”