Samuel Arbesman, senior scholar at the Ewing Marion Kauffman Foundation and the author of “The Half-Life of Facts” in the Washington Post: “Big data holds the promise of harnessing huge amounts of information to help us better understand the world. But when talking about big data, there’s a tendency to fall into hyperbole. It is what compels contrarians to write such tweets as “Big Data, n.: the belief that any sufficiently large pile of s— contains a pony.” Let’s deflate the hype.
1. “Big data” has a clear definition.
The term “big data” has been in circulation since at least the 1990s, when it is believed to have originated in Silicon Valley. IBM offers a seemingly simple definition: Big data is characterized by the four V’s of volume, variety, velocity and veracity. But the term is thrown around so often, in so many contexts — science, marketing, politics, sports — that its meaning has become vague and ambiguous….
2. Big data is new.
By many accounts, big data exploded onto the scene quite recently. “If wonks were fashionistas, big data would be this season’s hot new color,” a Reuters report quipped last year. In a May 2011 report, the McKinsey Global Institute declared big data “the next frontier for innovation, competition, and productivity.”
It’s true that today we can mine massive amounts of data — textual, social, scientific and otherwise — using complex algorithms and computer power. But big data has been around for a long time. It’s just that exhaustive datasets were more exhausting to compile and study in the days when “computer” meant a person who performed calculations….
3. Big data is revolutionary.
In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,”Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.
If you want more precise advertising directed toward you, then yes, big data is revolutionary. Generally, though, it’s likely to have a modest and gradual impact on our lives….
4. Bigger data is better.
In science, some admittedly mind-blowing big-data analyses are being done. In business, companies are being told to “embrace big data before your competitors do.” But big data is not automatically better.
Really big datasets can be a mess. Unless researchers and analysts can reduce the number of variables and make the data more manageable, they get quantity without a whole lot of quality. Give me some quality medium data over bad big data any day…
5. Big data means the end of scientific theories.
Chris Anderson argued in a 2008 Wired essay that big data renders the scientific method obsolete: Throw enough data at an advanced machine-learning technique, and all the correlations and relationships will simply jump out. We’ll understand everything.
But you can’t just go fishing for correlations and hope they will explain the world. If you’re not careful, you’ll end up with spurious correlations. Even more important, to contend with the “why” of things, we still need ideas, hypotheses and theories. If you don’t have good questions, your results can be silly and meaningless.
Having more data won’t substitute for thinking hard, recognizing anomalies and exploring deep truths.”
Announcing Project Open Data from Cloudant Labs
Yuriy Dybskiy from Cloudant: “There has been an emerging pattern over the last few years of more and more government datasets becoming available for public access. Earlier this year, the White House announced official policy on such data – Project Open Data.
Available resources
Here are four resources on the topic:
- Tim Berners-Lee: Open, Linked Data for a Global Community – [10 min video]
- Rufus Pollock: Open Data – How We Got Here and Where We’re Going – [24 min video]
- Open Knowledge Foundation Datasets – http://data.okfn.org/data
- Max Ogden: Project
dat
– collaborative data – [github repo]
One of the main challenges is access to the datasets. If only there were a database that had easy access to its data baked right in it.
Luckily, there is CouchDB and Cloudant, which share the same APIs to access data via HTTP. This makes for a really great option to store interesting datasets.
Cloudant Open Data
Today we are happy to announce a Cloudant Labs project – Cloudant Open Data!
Several datasets are available at the moment, for example, businesses_sf – data regarding businesses registered in San Francisco and sf_pd_incidents – a collection of incident reports (criminal and non-criminal) made by the San Francisco Police Department.
We’ll add more, but if you have one you’d like us to add faster – drop us a line at [email protected]
Create an account and play with these datasets yourself”
OpenCounter
Code for America: “OpenCounter’s mission is to empower entrepreneurs and foster local economic development by simplifying the process of registering a business.
Economic development happens in many forms, from projects like the revitalization of the Brooklyn Navy Yard or Hudson Rail Yards in New York City, to campaigns to encourage residents to shop at local merchants. While the majority of headlines will focus on a City’s effort to secure a major new employer (think Apple’s 1,000,000 square foot expansion in Austin, Texas), most economic development and job creation happens on a much smaller scale, as individuals stake their financial futures on creating a new product, store, service or firm.
But these new businesses aren’t in a position to accept tax breaks on capital equipment or enter into complex development and disposition agreements to build new offices or stores. Many new businesses can’t even meet the underwriting criteria of SBA backed revolving-loan programs. Competition for local grants for facade improvements or signage assistance can be fierce….
Despite many cities’ genuine efforts to be “business-friendly,” their default user interface consists of florescent-lit formica, waiting lines, and stacks of forms. Online resources often remind one of a phone book, with little interactivity or specialization based on either the businesses’ function or location within a jurisdiction.
That’s why we built OpenCounter….See what we’re up to at opencounter.us or visit a live version of our software at http://opencounter.cityofsantacruz.com.”
From Machinery to Mobility: Government and Democracy in a Participative Age
New book by Jeffrey Roy: “The Westminster-stylized model of Parliamentary democratic politics and public service accountability is increasingly out of step with the realities of today’s digitally and socially networked era. This book explores the reconfiguration of democratic and managerial governance within democratic societies due to the advent of technological mobility. More specifically, the traditional public sector prism of organizational and accountability – denoted as ‘machinery of government’, is increasingly strained in an era characterized by smart devices, social media, and cloud computing. This book examines the roots and implications of the tensions between machinery and mobility and the sorts of investments and initiatives that have been undertaken by governments around the world as well as their appropriateness and relative impacts. This book also examines the prospects for holistic adaptation of democratic and managerial systems going forward, identifying the most crucial directions and determinants for improving public sector performance in terms of outcomes, accountability, and agility. Accordingly, the ultimate aim of this initiative is to contribute to the formation of intellectual foundations for more systemic reforms of public sector governance in Canada and elsewhere, and to offer forward-looking trajectories for government adaptation in shifting from a traditional prism of ‘machinery’ to new organizational and institutional arrangements better suited for an era of ‘mobility’.”
Smartphones As Weather Surveillance Systems
Tom Simonite in MIT Technology Review: “You probably never think about the temperature of your smartphone’s battery, but it turns out to provide an interesting method for tracking outdoor air temperature. It’s a discovery that adds to other evidence that mobile apps could provide a new way to measure what’s happening in the atmosphere and improve weather forecasting.
Startup OpenSignal, whose app crowdsources data on cellphone reception, first noticed in 2012 that changes in battery temperature correlated with those outdoors. On Tuesday, they published a scientific paper on that technique in a geophysics journal and announced that the technique will be used to interpret data from a weather crowdsourcing app. OpenSignal originally started collecting data on battery temperatures to try and understand the connections between signal strength and how quickly a device chews through its battery.
OpenSignal’s crowdsourced weather-tracking effort joins another accidentally enabled by smartphones. A project called PressureNET that collects air pressure data by taking advantage of the fact many Android phones have a barometer inside to aid their GPS function (see “App Feeds Scientists Atmospheric Data From Thousands of Smartphones”). Cliff Mass, an atmospheric scientist at the University of Washington, is working to incorporate PressureNET data into weather models that usually rely on data from weather stations. He believes that smartphones could provide valuable data from places where there are no weather stations, if enough people start sharing data using apps like PressureNET.
Other research suggests that logging changes in cell network signal strength perceived by smartphones could provide yet more weather data. In February researchers in the Netherlands produced detailed maps of rainfall compiled by monitoring fluctuations in the signal strength measured by cellular network masts, caused by water droplets in the atmosphere.”
We the People Update
Washington Post: “The White House launched the We The People petition site in 2011 as a way for Americans to get their government to respond to their calls for action. On the digital platform, people can create and sign petitions seeking specific action on an issue from the federal government. In theory, once a petition has garnered a certain number of signatures within a certain time frame, it is reviewed by White House staff and receives an official response.
But that’s not always what happens.
Now a new site, www.whpetitions.info, takes its own tally and highlights petitions that have received enough signatures but have not received responses. By its count, the White House has responded to 87 percent of petitions that have met their signature thresholds with an average response time of 61 days. But the average waiting time so far for the 30 unanswered petitions is 240 days. And six of them have been waiting for over a year.”
Hackers Called Into Civic Duty
Wall Street Journal: “Cash-strapped cities are turning to an unusual source to improve their online services on the cheap: helpful hackers, who use city data to create tools tracking everything from real-time subway delays to where to get a free flu shot near your home and information about a contentious school-closing plan.
Hackers have been popularly portrayed as giving fits to national-security officials and credit-card companies, but the term also refers to people who like to write their own computer programs and help solve a variety of problems. Recently, hackers have begun working with cities to find ways of building applications, or apps, that make use of data—which gets stripped of personally identifiable information—that municipalities are collecting anyway in the regular course of governance….Last year, Chicago Mayor Rahm Emanuel signed an executive order mandating the city make available all data not protected by privacy laws. Today, the city has nearly 950 data sets publicly available, the most of any U.S. city, according to Code for America, a nonprofit that promotes openness in government.”
Citizen-Centered Governance: The Mayor's Office of New Urban Mechanics and the Evolution of CRM in Boston
New Paper by Susan P. Crawford and Dana Walters (Berkman Center Research Publication No. 17): “Over the last three years, the Boston Mayor’s Office of New Urban Mechanics, the innovative, collaborative ethos within City Hall fostered by Mayor Menino and his current chief of staff, Mitchell Weiss, and Boston’s launch of a CRM system and its associated Citizens Connect smartphone app have all attracted substantial media attention. In particular, the City of Boston’s strategy to put citizen engagement and participation at the center of its efforts, implemented by Chris Osgood and Nigel Jacob as co-chairs of the Mayor’s Office of New Urban Mechanics, has drawn attention to the potential power of collaboration and technology to transform citizens’ connections to their government and to each other. Several global developments have combined to make Boston’s collaborative efforts interesting: First, city managers around the world confront shrinking budgets and diminishing trust in the role of government; second, civic entrepreneurs and technology innovators are pressuring local governments to adopt new forms of engagement with citizens; and third, new digital tools are emerging that can help make city services both more visible and more effective. Boston’s experience in pursuing partnerships that facilitate opportunities for engaging citizens may provide scalable (and disruptive) lessons for other cities.
During the summer of 2013, in anticipation of Mayor Menino’s retirement in January 2014, Prof. Susan Crawford and Project Assistant Dana Walters carried out a case study examining the ongoing evolution of the Boston Mayor’s Hotline into a platform for civic engagement. We chose this CRM focus because the initial development of the system provides a concrete example of how leaders in government can connect to local partners and citizens. In the course of this research, we interviewed 21 city employees and several of their partners outside government, and gathered data about the use of the system.
We found a traditional technology story—selection and integration of CRM software, initial performance management using that software, development of ancillary channels of communication, initial patterns of adoption and use—that reflects the commitment of Mayor Menino to personalized constituent service. We also found that that commitment, his long tenure, and the particular personalities of the people on the New Urban Mechanics team make this both a cultural story as well as a technology story. Here are the highlights…”
Operation Decode San Francisco Will Hack the City's Legal Code
Motherboard: “The city of San Francisco is set to be hacked tonight. Legally, of course. It’s all part of the Operation Decode San Francisco effort, which will unwrap and simplify the city’s dense, labyrinthine laws and re-package them in a fresh, easy-to-use and searchable format.
The crew behind this, OpenGov, originally cut its teeth on KeepTheWebOpen.org. Founded by Rep. Darrell Issa and others to combat SOPA/PIPA, and running on a $5,000 piece of software called the Madison Project, the site also offered up an alternative bill: Issa’s Online Protection and Enforcement of Digital Trade Act (OPEN). Characterized as the first technological crowd-sourcing of legislation, the bill is still stuck in committee, but the site was certainly one of the many tentacles that helped suffocate SOPA/PIPA.
Empirically Informed Regulation
Paper by Cass Sunstein: “In recent years, social scientists have been incorporating empirical findings about human behavior into economic models. These findings offer important insights for thinking about regulation and its likely consequences. They also offer some suggestions about the appropriate design of effective, low-cost, choice-preserving approaches to regulatory problems, including disclosure requirements, default rules, and simplification. A general lesson is that small, inexpensive policy initiatives can have large and highly beneficial effects. In the United States, a large number of recent practices and reforms reflect an appreciation of this lesson. They also reflect an understanding of the need to ensure that regulations have strong empirical foundations, both through careful analysis of costs and benefits in advance and through retrospective review of what works and what does not.”