Big Data


Special Report on Big Data by Volta – A newsletter on Science, Technology and Society in Europe:  “Locating crime spots, or the next outbreak of a contagious disease, Big Data promises benefits for society as well as business. But more means messier. Do policy-makers know how to use this scale of data-driven decision-making in an effective way for their citizens and ensure their privacy?90% of the world’s data have been created in the last two years. Every minute, more than 100 million new emails are created, 72 hours of new video are uploaded to YouTube and Google processes more than 2 million searches. Nowadays, almost everyone walks around with a small computer in their pocket, uses the internet on a daily basis and shares photos and information with their friends, family and networks. The digital exhaust we leave behind every day contributes to an enormous amount of data produced, and at the same time leaves electronic traces that contain a great deal of personal information….
Until recently, traditional technology and analysis techniques have not been able to handle this quantity and type of data. But recent technological developments have enabled us to collect, store and process data in new ways. There seems to be no limitations, either to the volume of data or technology for storing and analyzing them. Big Data can map a driver’s sitting position to identify a car thief, it can use Google searches to predict outbreaks of the H1N1 flu virus, it can data-mine Twitter to predict the price of rice or use mobile phone top-ups to describe unemployment in Asia.
The word ‘data’ means ‘given’ in Latin. It commonly refers to a description of something that can be recorded and analyzed. While there is no clear definition of the concept of ‘Big Data’, it usually refers to the processing of huge amounts and new types of data that have not been possible with traditional tools.

‘The new development is not necessarily that there are so much more data. It’s rather that data is available to us in a new way.’

The notion of Big Data is kind of misleading, argues Robindra Prabhu, a project manager at the Norwegian Board of Technology. “The new development is not necessarily that there are so much more data. It’s rather that data is available to us in a new way. The digitalization of society gives us access to both ‘traditional’, structured data – like the content of a database or register – and unstructured data, for example the content in a text, pictures and videos. Information designed to be read by humans is now also readable by machines. And this development makes a whole new world of  data gathering and analysis available. Big Data is exciting not just because of the amount and variety of data out there, but that we can process data about so much more than before.”

Open data: Unlocking innovation and performance with liquid information


New report by McKinsey Global Institute:“Open data—machine-readable information, particularly government data, that’s made available to others—has generated a great deal of excitement around the world for its potential to empower citizens, change how government works, and improve the delivery of public services. It may also generate significant economic value, according to a new McKinsey report.1 Our research suggests that seven sectors alone could generate more than $3 trillion a year in additional value as a result of open data, which is already giving rise to hundreds of entrepreneurial businesses and helping established companies to segment markets, define new products and services, and improve the efficiency and effectiveness of operations.

Although the open-data phenomenon is in its early days, we see a clear potential to unlock significant economic value by applying advanced analytics to both open and proprietary knowledge. Open data can become an instrument for breaking down information gaps across industries, allowing companies to share benchmarks and spread best practices that raise productivity. Blended with proprietary data sets, it can propel innovation and help organizations replace traditional and intuitive decision-making approaches with data-driven ones. Open-data analytics can also help uncover consumer preferences, allowing companies to improve new products and to uncover anomalies and needless variations. That can lead to leaner, more reliable processes.
However, investments in technology and expertise are required to use the data effectively. And there is much work to be done by governments, companies, and consumers to craft policies that protect privacy and intellectual property, as well as establish standards to speed the flow of data that is not only open but also “liquid.” After all, consumers have serious privacy concerns, and companies are reluctant to share proprietary information—even when anonymity is assured—for fear of losing competitive advantage…
See also Executive Summary and Full Report”

Making government simpler is complicated


Mike Konczal in The Washington Post: “Here’s something a politician would never say: “I’m in favor of complex regulations.” But what would the opposite mean? What would it mean to have “simple” regulations?

There are two definitions of “simple” that have come to dominate liberal conversations about government. One is the idea that we should make use of “nudges” in regulation. The other is the idea that we should avoid “kludges.” As it turns out, however, these two definitions conflict with each other —and the battle between them will dominate conversations about the state in the years ahead.

The case for “nudges”

The first definition of a “simple” regulation is one emphasized in Cass Sunstein’s recent book titled Simpler: The Future of Government (also see here). A simple policy is one that simply “nudges” people into one choice or another using a variety of default rules, disclosure requirements, and other market structures. Think, for instance, of rules that require fast-food restaurants to post calories on their menus, or a mortgage that has certain terms clearly marked in disclosures.

These sorts of regulations are deemed “choice preserving.” Consumers are still allowed to buy unhealthy fast-food meals or sign up for mortgages they can’t reasonably afford. The regulations are just there to inform people about their choices. These rules are designed to keep the market “free,” where all possibilities are ultimately possible, although there are rules to encourage certain outcomes.
In his book, however, Sunstein adds that there’s another very different way to understand the term “simple.” What most people mean when they think of simple regulations is a rule that is “simple to follow.” Usually a rule is simple to follow because it outright excludes certain possibilities and thus ensures others. Which means, by definition, it limits certain choices.

The case against “kludges”
This second definition of simple plays a key role in political scientist Steve Teles’ excellent recent essay, “Kludgeocracy in America.” For Teles, a “kludge” is a “clumsy but temporarily effective” fix for a policy problem. (The term comes from computer science.) These kludges tend to pile up over time, making government cumbersome and inefficient overall.
Teles focuses on several ways that kludges are introduced into policy, with a particularly sharp focus on overlapping jurisdictions and the related mess of federal and state overlap in programs. But, without specifically invoking it, he also suggests that a reliance on “nudge” regulations can lead to more kludges.
After all, non-kludge policy proposal is one that will be simple to follow and will clearly cause a certain outcome, with an obvious causality chain. This is in contrast to a web of “nudges” and incentives designed to try and guide certain outcomes.

Why “nudges” aren’t always simpler
The distinction between the two is clear if we take a specific example core to both definitions: retirement security.
For Teles, “one of the often overlooked benefits of the Social Security program… is that recipients automatically have taxes taken out of their paychecks, and, then without much effort on their part, checks begin to appear upon retirement. It’s simple and direct. By contrast, 401(k) retirement accounts… require enormous investments of time, effort, and stress to manage responsibly.”

Yet 401(k)s are the ultimately fantasy laboratory for nudge enthusiasts. A whole cottage industry has grown up around figuring out ways to default people into certain contributions, on designing the architecture of choices of investments, and trying to effortlessly and painlessly guide people into certain savings.
Each approach emphasizes different things. If you want to focus your energy on making people better consumers and market participations, expanding our government’s resources and energy into 401(k)s is a good choice. If you want to focus on providing retirement security directly, expanding Social Security is a better choice.
The first is “simple” in that it doesn’t exclude any possibility but encourages market choices. The second is “simple” in that it is easy to follow, and the result is simple as well: a certain amount of security in old age is provided directly. This second approach understands the government as playing a role in stopping certain outcomes, and providing for the opposite of those outcomes, directly….

Why it’s hard to create “simple” regulations
Like all supposed binaries this is really a continuum. Taxes, for instance, sit somewhere in the middle of the two definitions of “simple.” They tend to preserve the market as it is but raise (or lower) the price of certain goods, influencing choices.
And reforms and regulations are often most effective when there’s a combination of these two types of “simple” rules.
Consider an important new paper, “Regulating Consumer Financial Products: Evidence from Credit Cards,” by Sumit Agarwal, Souphala Chomsisengphet, Neale Mahoney and Johannes Stroebel. The authors analyze the CARD Act of 2009, which regulated credit cards. They found that the nudge-type disclosure rules “increased the number of account holders making the 36-month payment value by 0.5 percentage points.” However, more direct regulations on fees had an even bigger effect, saving U.S. consumers $20.8 billion per year with no notable reduction in credit access…..
The balance between these two approaches of making regulations simple will be front and center as liberals debate the future of government, whether they’re trying to pull back on the “submerged state” or consider the implications for privacy. The debate over the best way for government to be simple is still far from over.”

Are We Puppets in a Wired World?


Sue Halpern in The New York Review of Books: “Also not obvious was how the Web would evolve, though its open architecture virtually assured that it would. The original Web, the Web of static homepages, documents laden with “hot links,” and electronic storefronts, segued into Web 2.0, which, by providing the means for people without technical knowledge to easily share information, recast the Internet as a global social forum with sites like Facebook, Twitter, FourSquare, and Instagram.
Once that happened, people began to make aspects of their private lives public, letting others know, for example, when they were shopping at H+M and dining at Olive Garden, letting others know what they thought of the selection at that particular branch of H+M and the waitstaff at that Olive Garden, then modeling their new jeans for all to see and sharing pictures of their antipasti and lobster ravioli—to say nothing of sharing pictures of their girlfriends, babies, and drunken classmates, or chronicling life as a high-paid escort, or worrying about skin lesions or seeking a cure for insomnia or rating professors, and on and on.
The social Web celebrated, rewarded, routinized, and normalized this kind of living out loud, all the while anesthetizing many of its participants. Although they likely knew that these disclosures were funding the new information economy, they didn’t especially care…
The assumption that decisions made by machines that have assessed reams of real-world information are more accurate than those made by people, with their foibles and prejudices, may be correct generally and wrong in the particular; and for those unfortunate souls who might never commit another crime even if the algorithm says they will, there is little recourse. In any case, computers are not “neutral”; algorithms reflect the biases of their creators, which is to say that prediction cedes an awful lot of power to the algorithm creators, who are human after all. Some of the time, too, proprietary algorithms, like the ones used by Google and Twitter and Facebook, are intentionally biased to produce results that benefit the company, not the user, and some of the time algorithms can be gamed. (There is an entire industry devoted to “optimizing” Google searches, for example.)
But the real bias inherent in algorithms is that they are, by nature, reductive. They are intended to sift through complicated, seemingly discrete information and make some sort of sense of it, which is the definition of reductive.”
Books reviewed:

From open data to open democracy


Article by : “Such debates further underscore the complexities of open data and where it might lead. While open data may be viewed by some inside and outside government as a technically-focused and largely incremental project based upon information formatting and accessibility (with the degree of openness subject to a myriad of security and confidentiality provisions), such an approach greatly limits its potential. Indeed, the growing ubiquity of mobile and smart devices, the advent of open source operating systems and social media platforms, and the growing commitment by governments themselves to expansive public engagement objectives, all suggest a widening scope.
Yet, what will incentivize the typical citizen to access open data and to partake in collective efforts to create public value? It is here where our digital culture may well fall short, emphasizing individualized service and convenience at the expense of civic responsibility and community-mindedness. For one American academic, this “citizenship deficit” erodes democratic legitimacy and renders our politics more polarized and less discursive. For other observers in Europe, notions of the digital divide are giving rise to new “data divides.”
The politics and practicalities of data privacy often bring further confusion. While privacy advocates call for greater protection and a culture of data activism among Internet users themselves, the networked ethos of online communities and commercialization fuels speed and sharing, often with little understanding of the ramifications of doing so. Differences between consumerism and citizenship are subtle yet profoundly important, while increasingly blurred and overlooked.
A key conundrum provincially and federally, within the Westminster confines of parliamentary democracy, is that open data is being hatched mainly from within the executive branch, whereas the legislative branch watches and withers. In devising genuine democratic openness, politicians and their parties must do more than post expenses online: they must become partners and advocates for renewal. A lesson of open source technology, however, is that systemic change demands an informed and engaged civil society, disgruntled with the status quo but also determined to act anew.
Most often, such actions are highly localized, even in a virtual world, giving rise to the purpose and meaning of smarter and more intelligent communities. And in Canada it bears noting that we see communities both large and small embracing open data and other forms of online experimentation such as participatory budgeting. It is often within small but connected communities where a virtuous cycle of online and in-person identities and actions can deepen and impact decision-making most directly.
How, then, do we reconcile traditional notions of top-down political federalism and national leadership with this bottom-up approach to community engagement and democratic renewal? Shifting from open data to open democracy is likely to be an uneven, diverse, and at times messy affair. Better this way than attempting to ordain top-down change in a centralized and standardized manner.”

Our Privacy Problem is a Democracy Problem in Disguise


Evgeny Morozov in MIT Technology Review: “Intellectually, at least, it’s clear what needs to be done: we must confront the question not only in the economic and legal dimensions but also in a political one, linking the future of privacy with the future of democracy in a way that refuses to reduce privacy either to markets or to laws. What does this philosophical insight mean in practice?

First, we must politicize the debate about privacy and information sharing. Articulating the existence—and the profound political consequences—of the invisible barbed wire would be a good start. We must scrutinize data-intensive problem solving and expose its occasionally antidemocratic character. At times we should accept more risk, imperfection, improvisation, and inefficiency in the name of keeping the democratic spirit alive.
Second, we must learn how to sabotage the system—perhaps by refusing to self-track at all. If refusing to record our calorie intake or our whereabouts is the only way to get policy makers to address the structural causes of problems like obesity or climate change—and not just tinker with their symptoms through nudging—information boycotts might be justifiable. Refusing to make money off your own data might be as political an act as refusing to drive a car or eat meat. Privacy can then reëmerge as a political instrument for keeping the spirit of democracy alive: we want private spaces because we still believe in our ability to reflect on what ails the world and find a way to fix it, and we’d rather not surrender this capacity to algorithms and feedback loops.
Third, we need more provocative digital services. It’s not enough for a website to prompt us to decide who should see our data. Instead it should reawaken our own imaginations. Designed right, sites would not nudge citizens to either guard or share their private information but would reveal the hidden political dimensions to various acts of information sharing. We don’t want an electronic butler—we want an electronic provocateur. Instead of yet another app that could tell us how much money we can save by monitoring our exercise routine, we need an app that can tell us how many people are likely to lose health insurance if the insurance industry has as much data as the NSA, most of it contributed by consumers like us. Eventually we might discern such dimensions on our own, without any technological prompts.
Finally, we have to abandon fixed preconceptions about how our digital services work and interconnect. Otherwise, we’ll fall victim to the same logic that has constrained the imagination of so many well-­meaning privacy advocates who think that defending the “right to privacy”—not fighting to preserve democracy—is what should drive public policy. While many Internet activists would surely argue otherwise, what happens to the Internet is of only secondary importance. Just as with privacy, it’s the fate of democracy itself that should be our primary goal.

GitHub and Government


New site: “Make government better, together. Stories of open source, open data, and open government.
This site is an open source effort to showcase best practices of open sourcing government. See something that you think could be better? Want to submit your own story? Simply fork the project and submit a pull request.

Ready to get started on GitHub? Here are some ideas that are easy to get your feet wet with.

Feedback Repository

GitHub’s about connecting with developers. Whether you’re an API publishing pro, or just getting started, creating a “feedback” repository can go a long way to connect your organization with the community. Get feedback from current and potential data consumers by creating a specific repository for them to contribute ideas and suggestions for types of data or other information they’d like to see opened. Here’s how:

  1. Create a new repository
    • Choose your organization as the Owner
    • Name the repository “feedback” or similar
    • Click the checkbox to automatically create a README.md file
  2. Set up your Readme
    • Click README.md within your newly created repository
    • Click Edit
    • Introduce yourself, describe why you’ve joined GitHub, what you’re hoping to do and what you’d like to learn from the development community. Encourage them to leave feedback through issues on the repository.

Sample text for your README.md:

# City of Gotham Feedback
We've just joined GitHub and want to know what data would be interesting to our development community?
Leave us comments via issues!

Open source a Dataset

Open sourcing a dataset can be as simple as uploading a .csv to GitHub and letting people know about it. Rather than publishing data as a zip file on your website or an FTP server, you can add the files through the GitHub.com web interface, or via the GitHub for Windows or GitHub for Mac native clients. Create a new repository to store your datasets – in many cases, it’s as easy as drag, drop, sync.
GitHub can host any file type (although open, non-binary files like .csvs tend to work best). Plus, GitHub supports rendering certain open data formats interactively such as the popular geospacial .geojson format. Once uploaded, citizens can view the files, and can even open issues or submit pull requests with proposed fixes.

Explore Open Source Civic Apps

There are many open source applications freely available on GitHub that were built just for government. Check them out, and see if it fits a need. Here are some examples:

  • Adopt-a – This open source web app was created for the City of Boston in 2011 by Code for America fellows. It allows residents to “adopt” a hydrant and make sure it’s clear of snow in the winter so that emergency crews can locate them when needed. It has since been adopted in Chicago (for sidewalks), Seattle (for storm drains), and Honolulu (for tsunami sirens).
  • StreetMix – Another creation of Code for America fellows (2013) this website, www.streetmix.net, allows anyone to create street sections in a way that is not only beautiful but educational, too. No downloading, no installing, no paying – make and save your creations right at the website. Great for internal or public community planning meetings.
  • We The PeopleWe The People, the White House’s petitions application hosted at petitions.whitehouse.gov is a Drupal module to allow citizens to submit and digitally sign petitions.

Open source something small

Chances are you’ve got something small you can open source. Check in with your web or new media team, and see if they’ve got something they’ve been dying to share or blog about, no matter how small. It can be snippet of analytics code, or maybe a small script used internally. It doesn’t even have to be code.
Post your website’s privacy policy, comment moderation policy, or terms of service and let the community weigh in before your next edit. No matter how small it is, getting your first open source project going is a great first step.

Improve an existing project

Does you agency use an existing open source project to conduct its own business? Open an issue on the project’s repository with a feature request or a bug you spot. Better yet, fork the project, and submit your improvements. Even if it’s one or two lines of code, such examples are great to blog about to showcase your efforts.
Don’t forget, this site is an open source project, too. Making an needed edit is another great way to get started.”

The move toward 'crowdsourcing' public safety


PhysOrg: “Earlier this year, Martin Dias, assistant professor in the D’Amore-McKim School of Business, presented research for the National Law Enforcement Telecommunications System in which he examined Nlets’ network and how its governance and technology helped enable inter-agency information sharing. This work builds on his research aimed at understanding design principles for this public safety “social networks” and other collaborative networks. We asked Dias to discuss how information sharing around public safety has evolved in recent years and the benefits and challenges of what he describes as “crowdsourcing public safety.” …

What is “crowdsourcing public safety” and why are public safety agencies moving toward this trend?
Crowdsourcing—the term coined by our own assistant professor of journalism Jeff Howe—involves taking a task or job traditionally performed by a distinct agent, or employee, and having that activity be executed by an “undefined, generally large group of people in an open call.” Crowdsourcing public safety involves engaging and enabling private citizens to assist public safety professionals in addressing natural disasters, terror attacks, organized crime incidents, and large-scale industrial accidents.
Public safety agencies have long recognized the need for citizen involvement. Tip lines and missing persons bulletins have been used to engage citizens for years, but with advances in mobile applications and big data analytics, the ability of to receive, process, and make use of high volume, tips, and leads makes crowdsourcing searches and investigations more feasible. You saw this in the FBI Boston Marathon Bombing web-based Tip Line. You see it in the “See Something Say Something” initiatives throughout the country. You see it in AMBER alerts or even remote search and rescue efforts. You even see it in more routine instances like Washington State’s HERO program to reduce traffic violations.
Have these efforts been successful, and what challenges remain?
There are a number of issues to overcome with regard to crowdsourcing public safety—such as maintaining privacy rights, ensuring data quality, and improving trust between citizens and officers. Controversies over the National Security Agency’s surveillance program and neighborhood watch programs – particularly the shooting death of teenager Trayvon Martin by neighborhood watch captain George Zimmerman, reflect some of these challenges. It is not clear yet from research the precise set of success criteria, but those efforts that appear successful at the moment have tended to be centered around a particular crisis incident—such as a specific attack or missing person. But as more crowdsourcing public safety mobile applications are developed, adoption and use is likely to increase. One trend to watch is whether national public safety programs are able to tap into the existing social networks of community-based responders like American Red Cross volunteers, Community Emergency Response Teams, and United Way mentors.
The move toward crowdsourcing is part of an overall trend toward improving community resilience, which refers to a system’s ability to bounce back after a crisis or disturbance. Stephen Flynn and his colleagues at Northeastern’s George J. Kostas Research Institute for Homeland Security are playing a key role in driving a national conversation in this area. Community resilience is inherently multi-disciplinary, so you see research being done regarding transportation infrastructure, social media use after a crisis event, and designing sustainable urban environments. Northeastern is a place where use-inspired research is addressing real-world problems. It will take a village to improve community resilience capabilities, and our institution is a vital part of thought leadership for that village.”
 

If big data is an atomic bomb, disarmament begins in Silicon Valley


at GigaOM: “Big data is like atomic energy, according to scientist Albert-László Barabási in a Monday column on Politico. It’s very beneficial when used ethically, and downright destructive when turned into a weapon. He argues scientists can help resolve the damage done by government spying by embracing the principles of nuclear nonproliferation that helped bring an end to Cold War fears and distrust.
Barabási’s analogy is rather poetic:

“Powered by the right type of Big Data, data mining is a weapon. It can be just as harmful, with long-term toxicity, as an atomic bomb. It poisons trust, straining everything from human relations to political alliances and free trade. It may target combatants, but it cannot succeed without sifting through billions of data points scraped from innocent civilians. And when it is a weapon, it should be treated like a weapon.”

I think he’s right, but I think the fight to disarm the big data bomb begins in places like Silicon Valley and Madison Avenue. And it’s not just scientists; all citizens should have a role…
I write about big data and data mining for a living, and I think the underlying technologies and techniques are incredibly valuable, even if the applications aren’t always ideal. On the one hand, advances in machine learning from companies such as Google and Microsoft are fantastic. On the other hand, Facebook’s newly expanded Graph Search makes Europe’s proposed right-to-be-forgotten laws seem a lot more sensible.
But it’s all within the bounds of our user agreements and beauty is in the eye of the beholder.
Perhaps the reason we don’t vote with our feet by moving to web platforms that embrace privacy, even though we suspect it’s being violated, is that we really don’t know what privacy means. Instead of regulating what companies can and can’t do, perhaps lawmakers can mandate a degree of transparency that actually lets users understand how data is being used, not just what data is being collected. Great, some company knows my age, race, ZIP code and web history: What I really need to know is how it’s using that information to target, discriminate against or otherwise serve me.
An intelligent national discussion about the role of the NSA is probably in order. For all anyone knows,  it could even turn out we’re willing to put up with more snooping than the goverment might expect. But until we get a handle on privacy from the companies we choose to do business with, I don’t think most Americans have the stomach for such a difficult fight.”

The Value of Personal Data


The Digital Enlightenment Yearbook 2013 is dedicated this year to Personal Data:  “The value of personal data has traditionally been understood in ethical terms as a safeguard for personality rights such as human dignity and privacy. However, we have entered an era where personal data are mined, traded and monetized in the process of creating added value – often in terms of free services including efficient search, support for social networking and personalized communications. This volume investigates whether the economic value of personal data can be realized without compromising privacy, fairness and contextual integrity. It brings scholars and scientists from the disciplines of computer science, law and social science together with policymakers, engineers and entrepreneurs with practical experience of implementing personal data management.
The resulting collection will be of interest to anyone concerned about privacy in our digital age, especially those working in the field of personal information management, whether academics, policymakers, or those working in the private sector.”