New Orleans Gamifies the City Budget


Kelsey E. Thomas at Next City: “New Orleanians can try their hand at being “mayor for a day” with a new interactive website released by the Committee for a Better New Orleans Wednesday.

The Big Easy Budget Game uses open data from the city to allow players to create their own version of an operating budget. Players are given a digital $602 million, and have to balance the budget — keeping in mind the government’s responsibilities, previous year’s spending and their personal priorities.

Each department in the game has a minimum funding level (players can’t just quit funding public schools if they feel like it), and restricted funding, such as state or federal dollars, is off limits.

CBNO hopes to attract 600 players this year, and plans to compile the data from each player into a crowdsourced meta-budget called “The People’s Budget.” Next fall, the People’s Budget will be released along with the city’s proposed 2017 budget.

Along with the budgeting game, CBNO released a more detailed website, also using the city’s open data, that breaks down the city’s budgeted versus actual spending from 2007 to now and is filterable. The goal is to allow users without big data experience to easily research funding relevant to their neighborhoods.

Many cities have been releasing interactive websites to make their data more accessible to residents. Checkbook NYC updates more than $70 billion in city expenses daily and breaks them down by transaction. Fiscal Focus Pittsburgh is an online visualization tool that outlines revenues and expenses in the city’s budget….(More)”

Website Seeks to Make Government Data Easier to Sift Through


Steve Lohr at the New York Times: “For years, the federal government, states and some cities have enthusiastically made vast troves of data open to the public. Acres of paper records on demographics, public health, traffic patterns, energy consumption, family incomes and many other topics have been digitized and posted on the web.

This abundance of data can be a gold mine for discovery and insights, but finding the nuggets can be arduous, requiring special skills.

A project coming out of the M.I.T. Media Lab on Monday seeks to ease that challenge and to make the value of government data available to a wider audience. The project, called Data USA, bills itself as “the most comprehensive visualization of U.S. public data.” It is free, and its software code is open source, meaning that developers can build custom applications by adding other data.

Cesar A. Hidalgo, an assistant professor of media arts and sciences at the M.I.T. Media Lab who led the development of Data USA, said the website was devised to “transform data into stories.” Those stories are typically presented as graphics, charts and written summaries….Type “New York” into the Data USA search box, and a drop-down menu presents choices — the city, the metropolitan area, the state and other options. Select the city, and the page displays an aerial shot of Manhattan with three basic statistics: population (8.49 million), median household income ($52,996) and median age (35.8).

Lower on the page are six icons for related subject categories, including economy, demographics and education. If you click on demographics, one of the so-called data stories appears, based largely on data from the American Community Survey of the United States Census Bureau.

Using colorful graphics and short sentences, it shows the median age of foreign-born residents of New York (44.7) and of residents born in the United States (28.6); the most common countries of origin for immigrants (the Dominican Republic, China and Mexico); and the percentage of residents who are American citizens (82.8 percent, compared with a national average of 93 percent).

Data USA features a selection of data results on its home page. They include the gender wage gap in Connecticut; the racial breakdown of poverty in Flint, Mich.; the wages of physicians and surgeons across the United States; and the institutions that award the most computer science degrees….(More)

Data to the Rescue: Smart Ways of Doing Good


Nicole Wallace in the Chronicle of Philanthropy: “For a long time, data served one purpose in the nonprofit world: measuring program results. But a growing number of charities are rejecting the idea that data equals evaluation and only evaluation.

Of course, many nonprofits struggle even to build the simplest data system. They have too little money, too few analysts, and convoluted data pipelines. Yet some cutting-edge organizations are putting data to work in new and exciting ways that drive their missions. A prime example: The Polaris Project is identifying criminal networks in the human-trafficking underworld and devising strategies to fight back by analyzing its data storehouse along with public information.

Other charities dive deep into their data to improve services, make smarter decisions, and identify measures that predict success. Some have such an abundance of information that they’re even pruning their collection efforts to allow for more sophisticated analysis.

The groups highlighted here are among the best nationally. In their work, we get a sneak peek at how the data revolution might one day achieve its promise.

House Calls: Living Goods

Living Goods launched in eastern Africa in 2007 with an innovative plan to tackle health issues in poor families and reduce deaths among children. The charity provides loans, training, and inventory to locals in Uganda and Kenya — mostly women — to start businesses selling vitamins, medicine, and other health products to friends and neighbors.

Founder Chuck Slaughter copied the Avon model and its army of housewives-turned-sales agents. But in recent years, Living Goods has embraced a 21st-century data system that makes its entrepreneurs better health practitioners. Armed with smartphones, they confidently diagnose and treat major illnesses. At the same time, they collect information that helps the charity track health across communities and plot strategy….

Unraveling Webs of Wickedness: Polaris Project

Calls and texts to the Polaris Project’s national human-trafficking hotline are often heartbreaking, terrifying, or both.

Relatives fear that something terrible has happened to a missing loved one. Trafficking survivors suffering from their ordeal need support. The most harrowing calls are from victims in danger and pleading for help.

Last year more than 5,500 potential cases of exploitation for labor or commercial sex were reported to the hotline. Since it got its start in 2007, the total is more than 24,000.

As it helps victims and survivors get the assistance they need, the Polaris Project, a Washington nonprofit, is turning those phone calls and texts into an enormous storehouse of information about the shadowy world of trafficking. By analyzing this data and connecting it with public sources, the nonprofit is drawing detailed pictures of how trafficking networks operate. That knowledge, in turn, shapes the group’s prevention efforts, its policy work, and even law-enforcement investigations….

Too Much Information: Year Up

Year Up has a problem that many nonprofits can’t begin to imagine: It collects too much data about its program. “Predictive analytics really start to stink it up when you put too much in,” says Garrett Yursza Warfield, the group’s director of evaluation.

What Mr. Warfield describes as the “everything and the kitchen sink” problem started soon after Year Up began gathering data. The group, which fights poverty by helping low-income young adults land entry-level professional jobs, first got serious about measuring its work nearly a decade ago. Though challenged at first to round up even basic information, the group over time began tracking virtually everything it could: the percentage of young people who finish the program, their satisfaction, their paths after graduation through college or work, and much more.

Now the nonprofit is diving deeper into its data to figure out which measures can predict whether a young person is likely to succeed in the program. And halfway through this review, it’s already identified and eliminated measures that it’s found matter little. A small example: Surveys of participants early in the program asked them to rate their proficiency at various office skills. Those self-evaluations, Mr. Warfield’s team concluded, were meaningless: How can novice professionals accurately judge their Excel spreadsheet skills until they’re out in the working world?…

On the Wild Side: Wildnerness Society…Without room to roam, wild animals and plants breed among themselves and risk losing genetic diversity. They also fall prey to disease. And that’s in the best of times. As wildlife adapt to climate change, the chance to migrate becomes vital even to survival.

National parks and other large protected areas are part of the answer, but they’re not enough if wildlife can’t move between them, says Travis Belote, lead ecologist at the Wilderness Society.

“Nature needs to be able to shuffle around,” he says.

Enter the organization’s Wildness Index. It’s a national map that shows the parts of the country most touched by human activity as well as wilderness areas best suited for wildlife. Mr. Belote and his colleagues created the index by combining data on land use, population density, road location and size, water flows, and many other factors. It’s an important tool to help the nonprofit prioritize the locations it fights to protect.

In Idaho, for example, the nonprofit compares the index with information about known wildlife corridors and federal lands that are unprotected but meet the criteria for conservation designation. The project’s goal: determine which areas in the High Divide — a wild stretch that connects Greater Yellowstone with other protected areas — the charity should advocate to legally protect….(More)”

Open data and the API economy: when it makes sense to give away data


 at ZDNet: “Open data is one of those refreshing trends that flows in the opposite direction of the culture of fear that has developed around data security. Instead of putting data under lock and key, surrounded by firewalls and sandboxes, some organizations see value in making data available to all comers — especially developers.

The GovLab.org, a nonprofit advocacy group, published an overview of the benefits governments and organizations are realizing from open data, as well as some of the challenges. The group defines open data as “publicly available data that can be universally and readily accessed, used and redistributed free of charge. It is structured for usability and computability.”…

For enterprises, an open-data stance may be the fuel to build a vibrant ecosystem of developers and business partners. Scott Feinberg, API architect for The New York Times, is one of the people helping to lead the charge to open-data ecosystems. In a recent CXOTalk interview with ZDNet colleague Michael Krigsman, he explains how through the NYT APIs program, developers can sign up for access to 165 years worth of content.

But it requires a lot more than simply throwing some APIs out into the market. Establishing such a comprehensive effort across APIs requires a change in mindset that many organizations may not be ready for, Feinberg cautions. “You can’t be stingy,” he says. “You have to just give it out. When we launched our developer portal there’s a lot of questions like, are people going to be stealing our data, questions like that. Just give it away. You don’t have to give it all but don’t be stingy, and you will find that first off not that many people are going to use it at first. you’re going to find that out, but the people who do, you’re going to find those passionate people who are really interested in using your data in new ways.”

Feinberg clarifies that the NYT’s APIs are not giving out articles for free. Rather, he explains, “we give is everything but article content. You can search for articles. You can find out what’s trending. You can almost do anything you want with our data through our APIs with the exception of actually reading all of the content. It’s really about giving people the opportunity to really interact with your content in ways that you’ve never thought of, and empowering your community to figure out what they want. You know while we don’t give our actual article text away, we give pretty much everything else and people build a lot of really cool stuff on top of that.”

Open data sets, of course, have to worthy of the APIs that offer them. In his post, Borne outlines the seven qualities open data needs to have to be of value to developers and consumers. (Yes, they’re also “Vs” like big data.)

  1. Validity: It’s “critical to pay attention to these data validity concerns when your organization’s data are exposed to scrutiny and inspection by others,” Borne states.
  2. Value: The data needs to be the font of new ideas, new businesses, and innovations.
  3. Variety: Exposing the wide variety of data available can be “a scary proposition for any data scientist,” Borne observes, but nonetheless is essential.
  4. Voice: Remember that “your open data becomes the voice of your organization to your stakeholders.”
  5. Vocabulary: “The semantics and schema (data models) that describe your data are more critical than ever when you provide the data for others to use,” says Borne. “Search, discovery, and proper reuse of data all require good metadata, descriptions, and data modeling.”
  6. Vulnerability: Accept that open data, because it is so open, will be subjected to “misuse, abuse, manipulation, or alteration.”
  7. proVenance: This is the governance requirement behind open data offerings. “Provenance includes ownership, origin, chain of custody, transformations that been made to it, processing that has been applied to it (including which versions of processing software were used), the data’s uses and their context, and more,” says Borne….(More)”

Selected Readings on Data and Humanitarian Response


By Prianka Srinivasan and Stefaan G. Verhulst *

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data and humanitarian response was originally published in 2016.

Data, when used well in a trusted manner, allows humanitarian organizations to innovate how to respond to emergency events, including better coordination of post-disaster relief efforts, the ability to harness local knowledge to create more targeted relief strategies, and tools to predict and monitor disasters in real time. Consequently, in recent years both multinational groups and community-based advocates have begun to integrate data collection and evaluation strategies into their humanitarian operations, to better and more quickly respond to emergencies. However, this movement poses a number of challenges. Compared to the private sector, humanitarian organizations are often less equipped to successfully analyze and manage big data, which pose a number of risks related to the security of victims’ data. Furthermore, complex power dynamics which exist within humanitarian spaces may be further exacerbated through the introduction of new technologies and big data collection mechanisms. In the below we share:

  • Selected Reading List (summaries and hyperlinks)
  • Annotated Selected Reading List
  • Additional Readings

Selected Reading List  (summaries in alphabetical order)

Data and Humanitarian Response

Risks of Using Big Data in Humanitarian Context

Annotated Selected Reading List (in alphabetical order)

Karlsrud, John. “Peacekeeping 4.0: Harnessing the Potential of Big Data, Social Media, and Cyber Technologies.” Cyberspace and International Relations, 2013. http://bit.ly/235Qb3e

  • This chapter from the book “Cyberspace and International Relations” suggests that advances in big data give humanitarian organizations unprecedented opportunities to prevent and mitigate natural disasters and humanitarian crises. However, the sheer amount of unstructured data necessitates effective “data mining” strategies for multinational organizations to make the most use of this data.
  • By profiling some civil-society organizations who use big data in their peacekeeping efforts, Karlsrud suggests that these community-focused initiatives are leading the movement toward analyzing and using big data in countries vulnerable to crisis.
  • The chapter concludes by offering ten recommendations to UN peacekeeping forces to best realize the potential of big data and new technology in supporting their operations.

Mancini, Fancesco. “New Technology and the prevention of Violence and Conflict.” International Peace Institute, 2013. http://bit.ly/1ltLfNV

  • This report from the International Peace Institute looks at five case studies to assess how information and communications technologies (ICTs) can help prevent humanitarian conflicts and violence. Their findings suggest that context has a significant impact on the ability for these ICTs for conflict prevention, and any strategies must take into account the specific contingencies of the region to be successful.
  • The report suggests seven lessons gleaned from the five case studies:
    • New technologies are just one in a variety of tools to combat violence. Consequently, organizations must investigate a variety of complementary strategies to prevent conflicts, and not simply rely on ICTs.
    • Not every community or social group will have the same relationship to technology, and their ability to adopt new technologies are similarly influenced by their context. Therefore, a detailed needs assessment must take place before any new technologies are implemented.
    • New technologies may be co-opted by violent groups seeking to maintain conflict in the region. Consequently, humanitarian groups must be sensitive to existing political actors and be aware of possible negative consequences these new technologies may spark.
    • Local input is integral to support conflict prevention measures, and there exists need for collaboration and awareness-raising with communities to ensure new technologies are sustainable and effective.
    • Information shared between civil-society has more potential to develop early-warning systems. This horizontal distribution of information can also allow communities to maintain the accountability of local leaders.

Meier, Patrick. “Digital humanitarians: how big data is changing the face of humanitarian response.” Crc Press, 2015. http://amzn.to/1RQ4ozc

  • This book traces the emergence of “Digital Humanitarians”—people who harness new digital tools and technologies to support humanitarian action. Meier suggests that this has created a “nervous system” to connect people from disparate parts of the world, revolutionizing the way we respond to humanitarian crises.
  • Meier argues that such technology is reconfiguring the structure of the humanitarian space, where victims are not simply passive recipients of aid but can contribute with other global citizens. This in turn makes us more humane and engaged people.

Robertson, Andrew and Olson, Steve. “Using Data Sharing to Improve Coordination in Peacebuilding.” United States Institute for Peace, 2012. http://bit.ly/235QuLm

  • This report functions as an overview of a roundtable workshop on Technology, Science and Peace Building held at the United States Institute of Peace. The workshop aimed to investigate how data-sharing techniques can be developed for use in peace building or conflict management.
  • Four main themes emerged from discussions during the workshop:
    • “Data sharing requires working across a technology-culture divide”—Data sharing needs the foundation of a strong relationship, which can depend on sociocultural, rather than technological, factors.
    • “Information sharing requires building and maintaining trust”—These relationships are often built on trust, which can include both technological and social perspectives.
    • “Information sharing requires linking civilian-military policy discussions to technology”—Even when sophisticated data-sharing technologies exist, continuous engagement between different stakeholders is necessary. Therefore, procedures used to maintain civil-military engagement should be broadened to include technology.
    • “Collaboration software needs to be aligned with user needs”—technology providers need to keep in mind the needs of its users, in this case peacebuilders, in order to ensure sustainability.

United Nations Independent Expert Advisory Group on a Data Revolution for Sustainable Development. “A World That Counts, Mobilizing the Data Revolution.” 2014. https://bit.ly/2Cb3lXq

  • This report focuses on the potential benefits and risks data holds for sustainable development. Included in this is a strategic framework for using and managing data for humanitarian purposes. It describes a need for a multinational consensus to be developed to ensure data is shared effectively and efficiently.
  • It suggests that “people who are counted”—i.e., those who are included in data collection processes—have better development outcomes and a better chance for humanitarian response in emergency or conflict situations.

Katie Whipkey and Andrej Verity. “Guidance for Incorporating Big Data into Humanitarian Operations.” Digital Humanitarian Network, 2015. http://bit.ly/1Y2BMkQ

  • This report produced by the Digital Humanitarian Network provides an overview of big data, and how humanitarian organizations can integrate this technology into their humanitarian response. It primarily functions as a guide for organizations, and provides concise, brief outlines of what big data is, and how it can benefit humanitarian groups.
  • The report puts forward four main benefits acquired through the use of big data by humanitarian organizations: 1) the ability to leverage real-time information; 2) the ability to make more informed decisions; 3) the ability to learn new insights; 4) the ability for organizations to be more prepared.
  • It goes on to assess seven challenges big data poses for humanitarian organizations: 1) geography, and the unequal access to technology across regions; 2) the potential for user error when processing data; 3) limited technology; 4) questionable validity of data; 5) underdeveloped policies and ethics relating to data management; 6) limitations relating to staff knowledge.

Risks of Using Big Data in Humanitarian Context
Crawford, Kate, and Megan Finn. “The limits of crisis data: analytical and ethical challenges of using social and mobile data to understand disasters.” GeoJournal 80.4, 2015. http://bit.ly/1X0F7AI

  • Crawford & Finn present a critical analysis of the use of big data in disaster management, taking a more skeptical tone to the data revolution facing humanitarian response.
  • They argue that though social and mobile data analysis can yield important insights and tools in crisis events, it also presents a number of limitations which can lead to oversights being made by researchers or humanitarian response teams.
  • Crawford & Finn explore the ethical concerns the use of big data in disaster events introduces, including issues of power, privacy, and consent.
  • The paper concludes by recommending that critical data studies, such as those presented in the paper, be integrated into crisis event research in order to analyze some of the assumptions which underlie mobile and social data.

Jacobsen, Katja Lindskov (2010) Making design safe for citizens: A hidden history of humanitarian experimentation. Citizenship Studies 14.1: 89-103. http://bit.ly/1YaRTwG

  • This paper explores the phenomenon of “humanitarian experimentation,” where victims of disaster or conflict are the subjects of experiments to test the application of technologies before they are administered in greater civilian populations.
  • By analyzing the particular use of iris recognition technology during the repatriation of Afghan refugees to Pakistan in 2002 to 2007, Jacobsen suggests that this “humanitarian experimentation” compromises the security of already vulnerable refugees in order to better deliver biometric product to the rest of the world.

Responsible Data Forum. “Responsible Data Reflection Stories: An Overview.” http://bit.ly/1Rszrz1

  • This piece from the Responsible Data forum is primarily a compilation of “war stories” which follow some of the challenges in using big data for social good. By drawing on these crowdsourced cases, the Forum also presents an overview which makes key recommendations to overcome some of the challenges associated with big data in humanitarian organizations.
  • It finds that most of these challenges occur when organizations are ill-equipped to manage data and new technologies, or are unaware about how different groups interact in digital spaces in different ways.

Sandvik, Kristin Bergtora. “The humanitarian cyberspace: shrinking space or an expanding frontier?” Third World Quarterly 37:1, 17-32, 2016. http://bit.ly/1PIiACK

  • This paper analyzes the shift toward more technology-driven humanitarian work, where humanitarian work increasingly takes place online in cyberspace, reshaping the definition and application of aid. This has occurred along with what many suggest is a shrinking of the humanitarian space.
  • Sandvik provides three interpretations of this phenomena:
    • First, traditional threats remain in the humanitarian space, which are both modified and reinforced by technology.
    • Second, new threats are introduced by the increasing use of technology in humanitarianism, and consequently the humanitarian space may be broadening, not shrinking.
    • Finally, if the shrinking humanitarian space theory holds, cyberspace offers one example of this, where the increasing use of digital technology to manage disasters leads to a contraction of space through the proliferation of remote services.

Additional Readings on Data and Humanitarian Response

* Thanks to: Kristen B. Sandvik; Zara Rahman; Jennifer Schulte; Sean McDonald; Paul Currion; Dinorah Cantú-Pedraza and the Responsible Data Listserve for valuable input.

Evaluating e-Participation: Frameworks, Practice, Evidence


Book edited by Georg Aichholzer, Herbert Kubicek and Lourdes Torres: “There is a widely acknowledged evaluation gap in the field of e-participation practice and research, a lack of systematic evaluation with regard to process organization, outcome and impacts. This book addresses the state of the art of e-participation research and the existing evaluation gap by reviewing various evaluation approaches and providing a multidisciplinary concept for evaluating the output, outcome and impact of citizen participation via the Internet as well as via traditional media. It offers new knowledge based on empirical results of its application (tailored to different forms and levels of e-participation) in an international comparative perspective. The book will advance the academic study and practical application of e-participation through fresh insights, largely drawing on theoretical arguments and empirical research results gained in the European collaborative project “e2democracy”. It applies the same research instruments to a set of similar citizen participation processes in seven local communities in three countries (Austria, Germany and Spain). The generic evaluation framework has been tailored to a tested toolset, and the presentation and discussion of related evaluation results aims at clarifying to what extent these tools can be applied to other consultation and collaboration processes, making the book of interest to policymakers and scholars alike….(More)”

Elements of a New Ethical Framework for Big Data Research


The Berkman Center is pleased to announce the publication of a new paper from the Privacy Tools for Sharing Research Data project team. In this paper, Effy Vayena, Urs Gasser, Alexandra Wood, and David O’Brien from the Berkman Center, with Micah Altman from MIT Libraries, outline elements of a new ethical framework for big data research.

Emerging large-scale data sources hold tremendous potential for new scientific research into human biology, behaviors, and relationships. At the same time, big data research presents privacy and ethical challenges that the current regulatory framework is ill-suited to address. In light of the immense value of large-scale research data, the central question moving forward is not whether such data should be made available for research, but rather how the benefits can be captured in a way that respects fundamental principles of ethics and privacy.

The authors argue that a framework with the following elements would support big data utilization and help harness the value of big data in a sustainable and trust-building manner:

  • Oversight should aim to provide universal coverage of human subjects research, regardless of funding source, across all stages of the information lifecycle.

  • New definitions and standards should be developed based on a modern understanding of privacy science and the expectations of research subjects.

  • Researchers and review boards should be encouraged to incorporate systematic risk-benefit assessments and new procedural and technological solutions from the wide range of interventions that are available.

  • Oversight mechanisms and the safeguards implemented should be tailored to the intended uses, benefits, threats, harms, and vulnerabilities associated with a specific research activity.

Development of a new ethical framework with these elements should be the product of a dynamic multistakeholder process that is designed to capture the latest scientific understanding of privacy, analytical methods, available safeguards, community and social norms, and best practices for research ethics as they evolve over time.

The full paper is available for download through the Washington and Lee Law Review Online as part of a collection of papers featured at the Future of Privacy Forum workshop Beyond IRBs: Designing Ethical Review Processes for Big Data Research held on December 10, 2015, in Washington, DC….(More)”

The Curious Journalist’s Guide to Data


New book by The Tow Center: “This is a book about the principles behind data journalism. Not what visualization software to use and how to scrape a website, but the fundamental ideas that underlie the human use of data. This isn’t “how to use data” but “how data works.”

This gets into some of the mathy parts of statistics, but also the difficulty of taking a census of race and the cognitive psychology of probabilities. It traces where data comes from, what journalists do with it, and where it goes after—and tries to understand the possibilities and limitations. Data journalism is as interdisciplinary as it gets, which can make it difficult to assemble all the pieces you need. This is one attempt. This is a technical book, and uses standard technical language, but all mathematical concepts are explained through pictures and examples rather than formulas.

The life of data has three parts: quantification, analysis, and communication. Quantification is the process that creates data. Analysis involves rearranging the data or combining it with other information to produce new knowledge. And none of this is useful without communicating the result.

Quantification is a problem without a home. Although physicists study measurement extensively, physical theory doesn’t say much about how to quantify things like “educational attainment” or even “unemployment.” There are deep philosophical issues here, but the most useful question to a journalist is simply, how was this data created? Data is useful because it represents the world, but we can only understand data if we correctly understand how it came to be. Representation through data is never perfect: all data has error. Randomly sampled surveys are both a powerful quantification technique and the prototype for all measurement error, so this report explains where the margin of error comes from and what it means – from first principles, using pictures.

All data analysis is really data interpretation, which requires much more than math. Data needs context to mean anything at all: Imagine if someone gave you a spreadsheet with no column names. Each data set could be the source of many different stories, and there is no objective theory that tells us which true stories are the best. But the stories still have to be true, which is where data journalism relies on established statistical principles. The theory of statistics solves several problems: accounting for the possibility that the pattern you see in the data was purely a fluke, reasoning from incomplete and conflicting information, and attempting to isolate causes. Stats has been taught as something mysterious, but it’s not. The analysis chapter centers on a single problem – asking if an earlier bar closing time really did reduce assaults in a downtown neighborhood – and traces through the entire process of analysis by explaining the statistical principles invoked at each step, building up to the state-of-the-art methods of Bayesian inference and causal graphs.

A story isn’t isn’t finished until you’ve communicated your results. Data visualization works because it relies on the biology of human visual perception, just as all data communication relies on human cognitive processing. People tend to overestimate small risks and underestimate large risks; examples leave a much stronger impression than statistics; and data about some will, unconsciously, come to represent all, no matter how well you warn that your sample doesn’t generalize. If you’re not aware of these issues you can leave people with skewed impressions or reinforce harmful stereotypes. The journalist isn’t only responsible for what they put in the story, but what ends up in the mind of the audience.

This report brings together many fields to explore where data comes from, how to analyze it, and how to communicate your results. It uses examples from journalism to explain everything from Bayesian statistics to the neurobiology of data visualization, all in plain language with lots of illustrations. Some of these ideas are thousands of years old, some were developed only a decade ago, and all of them have come together to create the 21st century practice of data journalism….(More)”

The Bottom of the Data Pyramid: Big Data and the Global South


Payal Arora at the International Journal of Communication: “To date, little attention has been given to the impact of big data in the Global South, about 60% of whose residents are below the poverty line. Big data manifests in novel and unprecedented ways in these neglected contexts. For instance, India has created biometric national identities for her 1.2 billion people, linking them to welfare schemes, and social entrepreneurial initiatives like the Ushahidi project that leveraged crowdsourcing to provide real-time crisis maps for humanitarian relief.

While these projects are indeed inspirational, this article argues that in the context of the Global South there is a bias in the framing of big data as an instrument of empowerment. Here, the poor, or the “bottom of the pyramid” populace are the new consumer base, agents of social change instead of passive beneficiaries. This neoliberal outlook of big data facilitating inclusive capitalism for the common good sidelines critical perspectives urgently needed if we are to channel big data as a positive social force in emerging economies. This article proposes to assess these new technological developments through the lens of databased democracies, databased identities, and databased geographies to make evident normative assumptions and perspectives in this under-examined context….(More)”.

When open data is a Trojan Horse: The weaponization of transparency in science and governance


Karen E.C. Levy and David Merritt Johns in Big Data and Society: “Openness and transparency are becoming hallmarks of responsible data practice in science and governance. Concerns about data falsification, erroneous analysis, and misleading presentation of research results have recently strengthened the call for new procedures that ensure public accountability for data-driven decisions. Though we generally count ourselves in favor of increased transparency in data practice, this Commentary highlights a caveat. We suggest that legislative efforts that invoke the language of data transparency can sometimes function as “Trojan Horses” through which other political goals are pursued. Framing these maneuvers in the language of transparency can be strategic, because approaches that emphasize open access to data carry tremendous appeal, particularly in current political and technological contexts. We illustrate our argument through two examples of pro-transparency policy efforts, one historical and one current: industry-backed “sound science” initiatives in the 1990s, and contemporary legislative efforts to open environmental data to public inspection. Rules that exist mainly to impede science-based policy processes weaponize the concept of data transparency. The discussion illustrates that, much as Big Data itself requires critical assessment, the processes and principles that attend it—like transparency—also carry political valence, and, as such, warrant careful analysis….(More)”