Open data and the API economy: when it makes sense to give away data


 at ZDNet: “Open data is one of those refreshing trends that flows in the opposite direction of the culture of fear that has developed around data security. Instead of putting data under lock and key, surrounded by firewalls and sandboxes, some organizations see value in making data available to all comers — especially developers.

The GovLab.org, a nonprofit advocacy group, published an overview of the benefits governments and organizations are realizing from open data, as well as some of the challenges. The group defines open data as “publicly available data that can be universally and readily accessed, used and redistributed free of charge. It is structured for usability and computability.”…

For enterprises, an open-data stance may be the fuel to build a vibrant ecosystem of developers and business partners. Scott Feinberg, API architect for The New York Times, is one of the people helping to lead the charge to open-data ecosystems. In a recent CXOTalk interview with ZDNet colleague Michael Krigsman, he explains how through the NYT APIs program, developers can sign up for access to 165 years worth of content.

But it requires a lot more than simply throwing some APIs out into the market. Establishing such a comprehensive effort across APIs requires a change in mindset that many organizations may not be ready for, Feinberg cautions. “You can’t be stingy,” he says. “You have to just give it out. When we launched our developer portal there’s a lot of questions like, are people going to be stealing our data, questions like that. Just give it away. You don’t have to give it all but don’t be stingy, and you will find that first off not that many people are going to use it at first. you’re going to find that out, but the people who do, you’re going to find those passionate people who are really interested in using your data in new ways.”

Feinberg clarifies that the NYT’s APIs are not giving out articles for free. Rather, he explains, “we give is everything but article content. You can search for articles. You can find out what’s trending. You can almost do anything you want with our data through our APIs with the exception of actually reading all of the content. It’s really about giving people the opportunity to really interact with your content in ways that you’ve never thought of, and empowering your community to figure out what they want. You know while we don’t give our actual article text away, we give pretty much everything else and people build a lot of really cool stuff on top of that.”

Open data sets, of course, have to worthy of the APIs that offer them. In his post, Borne outlines the seven qualities open data needs to have to be of value to developers and consumers. (Yes, they’re also “Vs” like big data.)

  1. Validity: It’s “critical to pay attention to these data validity concerns when your organization’s data are exposed to scrutiny and inspection by others,” Borne states.
  2. Value: The data needs to be the font of new ideas, new businesses, and innovations.
  3. Variety: Exposing the wide variety of data available can be “a scary proposition for any data scientist,” Borne observes, but nonetheless is essential.
  4. Voice: Remember that “your open data becomes the voice of your organization to your stakeholders.”
  5. Vocabulary: “The semantics and schema (data models) that describe your data are more critical than ever when you provide the data for others to use,” says Borne. “Search, discovery, and proper reuse of data all require good metadata, descriptions, and data modeling.”
  6. Vulnerability: Accept that open data, because it is so open, will be subjected to “misuse, abuse, manipulation, or alteration.”
  7. proVenance: This is the governance requirement behind open data offerings. “Provenance includes ownership, origin, chain of custody, transformations that been made to it, processing that has been applied to it (including which versions of processing software were used), the data’s uses and their context, and more,” says Borne….(More)”

Evaluating e-Participation: Frameworks, Practice, Evidence


Book edited by Georg Aichholzer, Herbert Kubicek and Lourdes Torres: “There is a widely acknowledged evaluation gap in the field of e-participation practice and research, a lack of systematic evaluation with regard to process organization, outcome and impacts. This book addresses the state of the art of e-participation research and the existing evaluation gap by reviewing various evaluation approaches and providing a multidisciplinary concept for evaluating the output, outcome and impact of citizen participation via the Internet as well as via traditional media. It offers new knowledge based on empirical results of its application (tailored to different forms and levels of e-participation) in an international comparative perspective. The book will advance the academic study and practical application of e-participation through fresh insights, largely drawing on theoretical arguments and empirical research results gained in the European collaborative project “e2democracy”. It applies the same research instruments to a set of similar citizen participation processes in seven local communities in three countries (Austria, Germany and Spain). The generic evaluation framework has been tailored to a tested toolset, and the presentation and discussion of related evaluation results aims at clarifying to what extent these tools can be applied to other consultation and collaboration processes, making the book of interest to policymakers and scholars alike….(More)”

Elements of a New Ethical Framework for Big Data Research


The Berkman Center is pleased to announce the publication of a new paper from the Privacy Tools for Sharing Research Data project team. In this paper, Effy Vayena, Urs Gasser, Alexandra Wood, and David O’Brien from the Berkman Center, with Micah Altman from MIT Libraries, outline elements of a new ethical framework for big data research.

Emerging large-scale data sources hold tremendous potential for new scientific research into human biology, behaviors, and relationships. At the same time, big data research presents privacy and ethical challenges that the current regulatory framework is ill-suited to address. In light of the immense value of large-scale research data, the central question moving forward is not whether such data should be made available for research, but rather how the benefits can be captured in a way that respects fundamental principles of ethics and privacy.

The authors argue that a framework with the following elements would support big data utilization and help harness the value of big data in a sustainable and trust-building manner:

  • Oversight should aim to provide universal coverage of human subjects research, regardless of funding source, across all stages of the information lifecycle.

  • New definitions and standards should be developed based on a modern understanding of privacy science and the expectations of research subjects.

  • Researchers and review boards should be encouraged to incorporate systematic risk-benefit assessments and new procedural and technological solutions from the wide range of interventions that are available.

  • Oversight mechanisms and the safeguards implemented should be tailored to the intended uses, benefits, threats, harms, and vulnerabilities associated with a specific research activity.

Development of a new ethical framework with these elements should be the product of a dynamic multistakeholder process that is designed to capture the latest scientific understanding of privacy, analytical methods, available safeguards, community and social norms, and best practices for research ethics as they evolve over time.

The full paper is available for download through the Washington and Lee Law Review Online as part of a collection of papers featured at the Future of Privacy Forum workshop Beyond IRBs: Designing Ethical Review Processes for Big Data Research held on December 10, 2015, in Washington, DC….(More)”

The Curious Journalist’s Guide to Data


New book by The Tow Center: “This is a book about the principles behind data journalism. Not what visualization software to use and how to scrape a website, but the fundamental ideas that underlie the human use of data. This isn’t “how to use data” but “how data works.”

This gets into some of the mathy parts of statistics, but also the difficulty of taking a census of race and the cognitive psychology of probabilities. It traces where data comes from, what journalists do with it, and where it goes after—and tries to understand the possibilities and limitations. Data journalism is as interdisciplinary as it gets, which can make it difficult to assemble all the pieces you need. This is one attempt. This is a technical book, and uses standard technical language, but all mathematical concepts are explained through pictures and examples rather than formulas.

The life of data has three parts: quantification, analysis, and communication. Quantification is the process that creates data. Analysis involves rearranging the data or combining it with other information to produce new knowledge. And none of this is useful without communicating the result.

Quantification is a problem without a home. Although physicists study measurement extensively, physical theory doesn’t say much about how to quantify things like “educational attainment” or even “unemployment.” There are deep philosophical issues here, but the most useful question to a journalist is simply, how was this data created? Data is useful because it represents the world, but we can only understand data if we correctly understand how it came to be. Representation through data is never perfect: all data has error. Randomly sampled surveys are both a powerful quantification technique and the prototype for all measurement error, so this report explains where the margin of error comes from and what it means – from first principles, using pictures.

All data analysis is really data interpretation, which requires much more than math. Data needs context to mean anything at all: Imagine if someone gave you a spreadsheet with no column names. Each data set could be the source of many different stories, and there is no objective theory that tells us which true stories are the best. But the stories still have to be true, which is where data journalism relies on established statistical principles. The theory of statistics solves several problems: accounting for the possibility that the pattern you see in the data was purely a fluke, reasoning from incomplete and conflicting information, and attempting to isolate causes. Stats has been taught as something mysterious, but it’s not. The analysis chapter centers on a single problem – asking if an earlier bar closing time really did reduce assaults in a downtown neighborhood – and traces through the entire process of analysis by explaining the statistical principles invoked at each step, building up to the state-of-the-art methods of Bayesian inference and causal graphs.

A story isn’t isn’t finished until you’ve communicated your results. Data visualization works because it relies on the biology of human visual perception, just as all data communication relies on human cognitive processing. People tend to overestimate small risks and underestimate large risks; examples leave a much stronger impression than statistics; and data about some will, unconsciously, come to represent all, no matter how well you warn that your sample doesn’t generalize. If you’re not aware of these issues you can leave people with skewed impressions or reinforce harmful stereotypes. The journalist isn’t only responsible for what they put in the story, but what ends up in the mind of the audience.

This report brings together many fields to explore where data comes from, how to analyze it, and how to communicate your results. It uses examples from journalism to explain everything from Bayesian statistics to the neurobiology of data visualization, all in plain language with lots of illustrations. Some of these ideas are thousands of years old, some were developed only a decade ago, and all of them have come together to create the 21st century practice of data journalism….(More)”

The Bottom of the Data Pyramid: Big Data and the Global South


Payal Arora at the International Journal of Communication: “To date, little attention has been given to the impact of big data in the Global South, about 60% of whose residents are below the poverty line. Big data manifests in novel and unprecedented ways in these neglected contexts. For instance, India has created biometric national identities for her 1.2 billion people, linking them to welfare schemes, and social entrepreneurial initiatives like the Ushahidi project that leveraged crowdsourcing to provide real-time crisis maps for humanitarian relief.

While these projects are indeed inspirational, this article argues that in the context of the Global South there is a bias in the framing of big data as an instrument of empowerment. Here, the poor, or the “bottom of the pyramid” populace are the new consumer base, agents of social change instead of passive beneficiaries. This neoliberal outlook of big data facilitating inclusive capitalism for the common good sidelines critical perspectives urgently needed if we are to channel big data as a positive social force in emerging economies. This article proposes to assess these new technological developments through the lens of databased democracies, databased identities, and databased geographies to make evident normative assumptions and perspectives in this under-examined context….(More)”.

When open data is a Trojan Horse: The weaponization of transparency in science and governance


Karen E.C. Levy and David Merritt Johns in Big Data and Society: “Openness and transparency are becoming hallmarks of responsible data practice in science and governance. Concerns about data falsification, erroneous analysis, and misleading presentation of research results have recently strengthened the call for new procedures that ensure public accountability for data-driven decisions. Though we generally count ourselves in favor of increased transparency in data practice, this Commentary highlights a caveat. We suggest that legislative efforts that invoke the language of data transparency can sometimes function as “Trojan Horses” through which other political goals are pursued. Framing these maneuvers in the language of transparency can be strategic, because approaches that emphasize open access to data carry tremendous appeal, particularly in current political and technological contexts. We illustrate our argument through two examples of pro-transparency policy efforts, one historical and one current: industry-backed “sound science” initiatives in the 1990s, and contemporary legislative efforts to open environmental data to public inspection. Rules that exist mainly to impede science-based policy processes weaponize the concept of data transparency. The discussion illustrates that, much as Big Data itself requires critical assessment, the processes and principles that attend it—like transparency—also carry political valence, and, as such, warrant careful analysis….(More)”

How to train Public Entrepreneurs


10 Lessons : “…The GovLab and its network of 25 world-class coaches and over 100 mentors helped 446 participants in more thana dozen US cities and thirty foreign countries to take a public interest technology project from idea to implementation. In the process, we ‘ve learned a lot about the need for new ways of training the next generation of leaders and problem solvers.

Our aim has been to aid public entrepreneurs — passionate and innovative people who wish to take advantage of new technology to do good in the world. That’s why we measure success, not by the number of participants in a class, but by the project’s participants create and the impact those projects have on communities….

Lesson 1: There is growing, and unmet, demand for training a new kind of public servant: the public entrepreneur…

Lesson 2: Tap the distributed supply of talent and expertise to accelerate learning…

Lesson 3:  Create new methods for training public entrepreneurs to solve problems…

Lesson 4:  Develop tools to help public interest innovators “cross the chasm” from idea to implementation…

Lesson 5:  Teach collaboration and partnering for change…

Lesson 6:  In order to be successful, public entrepreneurs must be able to define the problem — a skill widely lacking…

Lesson 7:  Connecting innovators and alumni with one another generates a lasting public infrastructure that can help solve problems more effectively…

Lesson 8:  Pedagogical priorities include making problem solving more data driven and evidence based….

Lesson 9:  The demand and supply are global — which requires a global mindset and platform in order to learn what has worked elsewhere and why…

Lesson 10:  Collaboration and coordination among anchor organizations is key to meeting the demand and coordinating the supply….(More)

Mapping a flood of new data


Rebecca Lipman at Economist Intelligence Unit Perspectives on “One city tweets to stay dry: From drones to old-fashioned phone calls, data come from many unlikely sources. In a disaster, such as a flood or earthquake, responders will take whatever information they can get to visualise the crisis and best direct their resources. Increasingly, cities prone to natural disasters are learning to better aid their citizens by empowering their local agencies and responders with sophisticated tools to cut through the large volume and velocity of disaster-related data and synthesise actionable information.

Consider the plight of the metro area of Jakarta, Indonesia, home to some 28m people, 13 rivers and 1,100 km of canals. With 40% of the city below sea level (and sinking), and regularly subject to extreme weather events including torrential downpours in monsoon season, Jakarta’s residents face far-too-frequent, life-threatening floods. Despite the unpredictability of flooding conditions, citizens have long taken a passive approach that depended on government entities to manage the response. But the information Jakarta’s responders had on the flooding conditions was patchy at best. So in the last few years, the government began to turn to the local population for help. It helped.

Today, Jakarta’s municipal government is relying on the web-based PetaJakarta.org project and a handful of other crowdsourcing mobile apps such as Qlue and CROP to collect data and respond to floods and other disasters. Through these programmes, crowdsourced, time-sensitive data derived from citizens’ social-media inputs have made it possible for city agencies to more precisely map the locations of rising floods and help the residents at risk. In January 2015, for example, the web-based Peta Jakarta received 5,209 reports on floods via tweets with detailed text and photos. Anytime there’s a flood, Peta Jakarta’s data from the tweets are mapped and updated every minute, and often cross-checked by Jakarta Disaster Management Agency (BPBD) officials through calls with community leaders to assess the information and guide responders.

But in any city Twitter is only one piece of a very large puzzle. …

Even with such life-and-death examples, government agencies remain deeply protective of data because of issues of security, data ownership and citizen privacy. They are also concerned about liability issues if incorrect data lead to an activity that has unsuccessful outcomes. These concerns encumber the combination of crowdsourced data with operational systems of record, and impede the fast progress needed in disaster situations….Download the case study .”

Feedback Loop Failure: Implications for the Self-Regulation of the Sharing Economy


Essay by Abbey Stemler: “Ratings and reviews are the lifeblood of the sharing economy. They provide a reputation proxy and make us feel comfortable jumping into stranger’s cars, sleeping in their beds, and having a meal at their kitchen tables. However, as the fields of psychology, management, and behavioral economics are beginning to tell us, these trust building mechanisms might be flawed. Instead of relying on the wisdom of the crowd, we might be relying on the collective bias of the crowd. This essay examines how proposed theories for regulating the sharing economy depend on accurate feedback mechanisms and argues that this reliance should be questioned, because feedback loop failure occurs in the sharing economy and distorts the risk calculation for participants. This failure can lead to uniformed decision-making and consumer fraud….(More)”

The creative citizen unbound


The creative citizen unbound

Book by Ian Hargreaves and John Hartley on “How social media and DIY culture contribute to democracy, communities and the creative economy”: “The creative citizen unbound introduces the concept of ‘creative citizenship’ to explore the potential of civic-minded creative individuals in the era of social media and in the context of an expanding creative economy. Drawing on the findings of a 30-month study of communities supported by the UK research funding councils, multidisciplinary contributors examine the value and nature of creative citizenship, not only in terms of its contribution to civic life and social capital but also to more contested notions of value, both economic and cultural. This original book will be beneficial to researchers and students across a range of disciplines including media and communication, political science, economics, planning and economic geography, and the creative and performing arts….(More)”