Website Seeks to Make Government Data Easier to Sift Through


Steve Lohr at the New York Times: “For years, the federal government, states and some cities have enthusiastically made vast troves of data open to the public. Acres of paper records on demographics, public health, traffic patterns, energy consumption, family incomes and many other topics have been digitized and posted on the web.

This abundance of data can be a gold mine for discovery and insights, but finding the nuggets can be arduous, requiring special skills.

A project coming out of the M.I.T. Media Lab on Monday seeks to ease that challenge and to make the value of government data available to a wider audience. The project, called Data USA, bills itself as “the most comprehensive visualization of U.S. public data.” It is free, and its software code is open source, meaning that developers can build custom applications by adding other data.

Cesar A. Hidalgo, an assistant professor of media arts and sciences at the M.I.T. Media Lab who led the development of Data USA, said the website was devised to “transform data into stories.” Those stories are typically presented as graphics, charts and written summaries….Type “New York” into the Data USA search box, and a drop-down menu presents choices — the city, the metropolitan area, the state and other options. Select the city, and the page displays an aerial shot of Manhattan with three basic statistics: population (8.49 million), median household income ($52,996) and median age (35.8).

Lower on the page are six icons for related subject categories, including economy, demographics and education. If you click on demographics, one of the so-called data stories appears, based largely on data from the American Community Survey of the United States Census Bureau.

Using colorful graphics and short sentences, it shows the median age of foreign-born residents of New York (44.7) and of residents born in the United States (28.6); the most common countries of origin for immigrants (the Dominican Republic, China and Mexico); and the percentage of residents who are American citizens (82.8 percent, compared with a national average of 93 percent).

Data USA features a selection of data results on its home page. They include the gender wage gap in Connecticut; the racial breakdown of poverty in Flint, Mich.; the wages of physicians and surgeons across the United States; and the institutions that award the most computer science degrees….(More)

Selected Readings on Data and Humanitarian Response


By Prianka Srinivasan and Stefaan G. Verhulst *

The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of data and humanitarian response was originally published in 2016.

Data, when used well in a trusted manner, allows humanitarian organizations to innovate how to respond to emergency events, including better coordination of post-disaster relief efforts, the ability to harness local knowledge to create more targeted relief strategies, and tools to predict and monitor disasters in real time. Consequently, in recent years both multinational groups and community-based advocates have begun to integrate data collection and evaluation strategies into their humanitarian operations, to better and more quickly respond to emergencies. However, this movement poses a number of challenges. Compared to the private sector, humanitarian organizations are often less equipped to successfully analyze and manage big data, which pose a number of risks related to the security of victims’ data. Furthermore, complex power dynamics which exist within humanitarian spaces may be further exacerbated through the introduction of new technologies and big data collection mechanisms. In the below we share:

  • Selected Reading List (summaries and hyperlinks)
  • Annotated Selected Reading List
  • Additional Readings

Selected Reading List  (summaries in alphabetical order)

Data and Humanitarian Response

Risks of Using Big Data in Humanitarian Context

Annotated Selected Reading List (in alphabetical order)

Karlsrud, John. “Peacekeeping 4.0: Harnessing the Potential of Big Data, Social Media, and Cyber Technologies.” Cyberspace and International Relations, 2013. http://bit.ly/235Qb3e

  • This chapter from the book “Cyberspace and International Relations” suggests that advances in big data give humanitarian organizations unprecedented opportunities to prevent and mitigate natural disasters and humanitarian crises. However, the sheer amount of unstructured data necessitates effective “data mining” strategies for multinational organizations to make the most use of this data.
  • By profiling some civil-society organizations who use big data in their peacekeeping efforts, Karlsrud suggests that these community-focused initiatives are leading the movement toward analyzing and using big data in countries vulnerable to crisis.
  • The chapter concludes by offering ten recommendations to UN peacekeeping forces to best realize the potential of big data and new technology in supporting their operations.

Mancini, Fancesco. “New Technology and the prevention of Violence and Conflict.” International Peace Institute, 2013. http://bit.ly/1ltLfNV

  • This report from the International Peace Institute looks at five case studies to assess how information and communications technologies (ICTs) can help prevent humanitarian conflicts and violence. Their findings suggest that context has a significant impact on the ability for these ICTs for conflict prevention, and any strategies must take into account the specific contingencies of the region to be successful.
  • The report suggests seven lessons gleaned from the five case studies:
    • New technologies are just one in a variety of tools to combat violence. Consequently, organizations must investigate a variety of complementary strategies to prevent conflicts, and not simply rely on ICTs.
    • Not every community or social group will have the same relationship to technology, and their ability to adopt new technologies are similarly influenced by their context. Therefore, a detailed needs assessment must take place before any new technologies are implemented.
    • New technologies may be co-opted by violent groups seeking to maintain conflict in the region. Consequently, humanitarian groups must be sensitive to existing political actors and be aware of possible negative consequences these new technologies may spark.
    • Local input is integral to support conflict prevention measures, and there exists need for collaboration and awareness-raising with communities to ensure new technologies are sustainable and effective.
    • Information shared between civil-society has more potential to develop early-warning systems. This horizontal distribution of information can also allow communities to maintain the accountability of local leaders.

Meier, Patrick. “Digital humanitarians: how big data is changing the face of humanitarian response.” Crc Press, 2015. http://amzn.to/1RQ4ozc

  • This book traces the emergence of “Digital Humanitarians”—people who harness new digital tools and technologies to support humanitarian action. Meier suggests that this has created a “nervous system” to connect people from disparate parts of the world, revolutionizing the way we respond to humanitarian crises.
  • Meier argues that such technology is reconfiguring the structure of the humanitarian space, where victims are not simply passive recipients of aid but can contribute with other global citizens. This in turn makes us more humane and engaged people.

Robertson, Andrew and Olson, Steve. “Using Data Sharing to Improve Coordination in Peacebuilding.” United States Institute for Peace, 2012. http://bit.ly/235QuLm

  • This report functions as an overview of a roundtable workshop on Technology, Science and Peace Building held at the United States Institute of Peace. The workshop aimed to investigate how data-sharing techniques can be developed for use in peace building or conflict management.
  • Four main themes emerged from discussions during the workshop:
    • “Data sharing requires working across a technology-culture divide”—Data sharing needs the foundation of a strong relationship, which can depend on sociocultural, rather than technological, factors.
    • “Information sharing requires building and maintaining trust”—These relationships are often built on trust, which can include both technological and social perspectives.
    • “Information sharing requires linking civilian-military policy discussions to technology”—Even when sophisticated data-sharing technologies exist, continuous engagement between different stakeholders is necessary. Therefore, procedures used to maintain civil-military engagement should be broadened to include technology.
    • “Collaboration software needs to be aligned with user needs”—technology providers need to keep in mind the needs of its users, in this case peacebuilders, in order to ensure sustainability.

United Nations Independent Expert Advisory Group on a Data Revolution for Sustainable Development. “A World That Counts, Mobilizing the Data Revolution.” 2014. https://bit.ly/2Cb3lXq

  • This report focuses on the potential benefits and risks data holds for sustainable development. Included in this is a strategic framework for using and managing data for humanitarian purposes. It describes a need for a multinational consensus to be developed to ensure data is shared effectively and efficiently.
  • It suggests that “people who are counted”—i.e., those who are included in data collection processes—have better development outcomes and a better chance for humanitarian response in emergency or conflict situations.

Katie Whipkey and Andrej Verity. “Guidance for Incorporating Big Data into Humanitarian Operations.” Digital Humanitarian Network, 2015. http://bit.ly/1Y2BMkQ

  • This report produced by the Digital Humanitarian Network provides an overview of big data, and how humanitarian organizations can integrate this technology into their humanitarian response. It primarily functions as a guide for organizations, and provides concise, brief outlines of what big data is, and how it can benefit humanitarian groups.
  • The report puts forward four main benefits acquired through the use of big data by humanitarian organizations: 1) the ability to leverage real-time information; 2) the ability to make more informed decisions; 3) the ability to learn new insights; 4) the ability for organizations to be more prepared.
  • It goes on to assess seven challenges big data poses for humanitarian organizations: 1) geography, and the unequal access to technology across regions; 2) the potential for user error when processing data; 3) limited technology; 4) questionable validity of data; 5) underdeveloped policies and ethics relating to data management; 6) limitations relating to staff knowledge.

Risks of Using Big Data in Humanitarian Context
Crawford, Kate, and Megan Finn. “The limits of crisis data: analytical and ethical challenges of using social and mobile data to understand disasters.” GeoJournal 80.4, 2015. http://bit.ly/1X0F7AI

  • Crawford & Finn present a critical analysis of the use of big data in disaster management, taking a more skeptical tone to the data revolution facing humanitarian response.
  • They argue that though social and mobile data analysis can yield important insights and tools in crisis events, it also presents a number of limitations which can lead to oversights being made by researchers or humanitarian response teams.
  • Crawford & Finn explore the ethical concerns the use of big data in disaster events introduces, including issues of power, privacy, and consent.
  • The paper concludes by recommending that critical data studies, such as those presented in the paper, be integrated into crisis event research in order to analyze some of the assumptions which underlie mobile and social data.

Jacobsen, Katja Lindskov (2010) Making design safe for citizens: A hidden history of humanitarian experimentation. Citizenship Studies 14.1: 89-103. http://bit.ly/1YaRTwG

  • This paper explores the phenomenon of “humanitarian experimentation,” where victims of disaster or conflict are the subjects of experiments to test the application of technologies before they are administered in greater civilian populations.
  • By analyzing the particular use of iris recognition technology during the repatriation of Afghan refugees to Pakistan in 2002 to 2007, Jacobsen suggests that this “humanitarian experimentation” compromises the security of already vulnerable refugees in order to better deliver biometric product to the rest of the world.

Responsible Data Forum. “Responsible Data Reflection Stories: An Overview.” http://bit.ly/1Rszrz1

  • This piece from the Responsible Data forum is primarily a compilation of “war stories” which follow some of the challenges in using big data for social good. By drawing on these crowdsourced cases, the Forum also presents an overview which makes key recommendations to overcome some of the challenges associated with big data in humanitarian organizations.
  • It finds that most of these challenges occur when organizations are ill-equipped to manage data and new technologies, or are unaware about how different groups interact in digital spaces in different ways.

Sandvik, Kristin Bergtora. “The humanitarian cyberspace: shrinking space or an expanding frontier?” Third World Quarterly 37:1, 17-32, 2016. http://bit.ly/1PIiACK

  • This paper analyzes the shift toward more technology-driven humanitarian work, where humanitarian work increasingly takes place online in cyberspace, reshaping the definition and application of aid. This has occurred along with what many suggest is a shrinking of the humanitarian space.
  • Sandvik provides three interpretations of this phenomena:
    • First, traditional threats remain in the humanitarian space, which are both modified and reinforced by technology.
    • Second, new threats are introduced by the increasing use of technology in humanitarianism, and consequently the humanitarian space may be broadening, not shrinking.
    • Finally, if the shrinking humanitarian space theory holds, cyberspace offers one example of this, where the increasing use of digital technology to manage disasters leads to a contraction of space through the proliferation of remote services.

Additional Readings on Data and Humanitarian Response

* Thanks to: Kristen B. Sandvik; Zara Rahman; Jennifer Schulte; Sean McDonald; Paul Currion; Dinorah Cantú-Pedraza and the Responsible Data Listserve for valuable input.

Elements of a New Ethical Framework for Big Data Research


The Berkman Center is pleased to announce the publication of a new paper from the Privacy Tools for Sharing Research Data project team. In this paper, Effy Vayena, Urs Gasser, Alexandra Wood, and David O’Brien from the Berkman Center, with Micah Altman from MIT Libraries, outline elements of a new ethical framework for big data research.

Emerging large-scale data sources hold tremendous potential for new scientific research into human biology, behaviors, and relationships. At the same time, big data research presents privacy and ethical challenges that the current regulatory framework is ill-suited to address. In light of the immense value of large-scale research data, the central question moving forward is not whether such data should be made available for research, but rather how the benefits can be captured in a way that respects fundamental principles of ethics and privacy.

The authors argue that a framework with the following elements would support big data utilization and help harness the value of big data in a sustainable and trust-building manner:

  • Oversight should aim to provide universal coverage of human subjects research, regardless of funding source, across all stages of the information lifecycle.

  • New definitions and standards should be developed based on a modern understanding of privacy science and the expectations of research subjects.

  • Researchers and review boards should be encouraged to incorporate systematic risk-benefit assessments and new procedural and technological solutions from the wide range of interventions that are available.

  • Oversight mechanisms and the safeguards implemented should be tailored to the intended uses, benefits, threats, harms, and vulnerabilities associated with a specific research activity.

Development of a new ethical framework with these elements should be the product of a dynamic multistakeholder process that is designed to capture the latest scientific understanding of privacy, analytical methods, available safeguards, community and social norms, and best practices for research ethics as they evolve over time.

The full paper is available for download through the Washington and Lee Law Review Online as part of a collection of papers featured at the Future of Privacy Forum workshop Beyond IRBs: Designing Ethical Review Processes for Big Data Research held on December 10, 2015, in Washington, DC….(More)”

Data Mining Reveals the Four Urban Conditions That Create Vibrant City Life


Emerging Technology from the arXiv: “Lack of evidence to city planning has ruined cities all over the world. But data-mining techniques are finally revealing the rules that make cities successful, vibrant places to live. …Back in 1961, the gradual decline of many city centers in the U.S. began to puzzle urban planners and activists alike. One of them, the urban sociologist Jane Jacobs, began a widespread and detailed investigation of the causes and published her conclusions in The Death and Life of Great American Cities, a controversial book that proposed four conditions that are essential for vibrant city life.

Jacobs’s conclusions have become hugely influential. Her ideas have had a significant impact on the development of many modern cities such as Toronto and New York City’s Greenwich Village. However, her ideas have also attracted criticism because of the lack of empirical evidence to back them up, a problem that is widespread in urban planning.
Today, that looks set to change thanks to the work of Marco De Nadai at the University of Trento and a few pals, who have developed a way to gather urban data that they use to test Jacobs’s conditions and how they relate to the vitality of city life. The new approach heralds a new age of city planning in which planners have an objective way of assessing city life and working out how it can be improved.
In her book, Jacobs argues that vibrant activity can only flourish in cities when the physical environment is diverse. This diversity, she says, requires four conditions. The first is that city districts must serve more than two functions so that they attract people with different purposes at different times of the day and night. Second, city blocks must be small with dense intersections that give pedestrians many opportunities to interact. The third condition is that buildings must be diverse in terms of age and form to support a mix of low-rent and high-rent tenants. By contrast, an area with exclusively new buildings can only attract businesses and tenants wealthy enough to support the cost of new building. Finally, a district must have a sufficient density of people and buildings.

While Jacobs’s arguments are persuasive, her critics say there is little evidence to show that these factors are linked with vibrant city life. That changed last year when urban scientists in Seoul, South Korea, published the result of a 10-year study of pedestrian activity in the city at unprecedented resolution. This work successfully tested Jacobs’s ideas for the first time.
However, the data was gathered largely through pedestrian surveys, a process that is time-consuming, costly, and generally impractical for use in most modern cities.
De Nadai and co have come up with a much cheaper and quicker alternative using a new generation of city databases and the way people use social media and mobile phones. The new databases include OpenStreetMap, the collaborative mapping tool; census data, which records populations and building use; land use data, which uses satellite images to classify land use according to various categories; Foursquare data, which records geographic details about personal activity; and mobile-phone records showing the number and frequency of calls in an area.
De Nadai and co gathered this data for six cities in Italy—Rome, Naples, Florence, Bologna, Milan, and Palermo.
Their analysis is straightforward. The team used mobile-phone activity as a measure of urban vitality and land-use records, census data, and Foursquare activity as a measure of urban diversity. Their goal was to see how vitality and diversity are correlated in the cities they studied. The results make for interesting reading….(More)

Innovating for pro-poor services: Why politics matter


Nathaniel Mason, Clare Cummings and Julian Doczi for ODI insights: “To solve sustainable development challenges, such as the provision of universal access to basic services, we need new ideas, as well as old ideas applied in new ways and new places. The pace of global innovation, particularly digital innovation, is generating optimism, positioning the world at the start of the ‘Fourth Industrial Revolution’.1 Innovation can make basic services cheaper, more accessible, more relevant and more desirable for poor people. However, we also know few innovations lead to sustainable, systemic change. The barriers the this are often political – including problems related to motivation, power and collective action. Yet, just as political factors can prevent innovations from being widely adopted, politically smart approaches can help in navigating and mitigating these challenges. And, because innovations can alter the balance of power in societies and markets, they can both provoke new and challenging politics themselves and also help unlock systemic political change. When and why does politics affect innovation? What does this mean for donors, foundations and impact investors backing innovations for development?…(More)

Wikidata


Wikidata aims to create a multilingual free knowledge base about the world that can be read and edited by humans and machines alike. It provides data in all the languages of the Wikimedia projects, and allows for the central access to data in a similar vein as Wikimedia Commons does for multimedia files, it is also used by many other websites. The data on Wikidata is added by a community of volunteers both manually and by using software, much like other Wikimedia projects including Wikipedia.

Wikidata has millions of items, each representing a human, a place, a painting, a concept, etc. Each item has statements (key-value pairs), each statement in turn consisting of a property such as “birth date”, and the appropriate value for the item. Likewise, there can be statements for external IDs, such as a VIAF identifier.

Wikidata is hosted by the Wikimedia Foundation, a nonprofit charitable organization dedicated to encouraging the growth, development and distribution of free, multilingual, educational content, and to providing the full content of these wiki-based projects to the public free of chargeWikidata focuses on a basic level of useful information about the world and links to other resources for specialized data on the subject. Sources for data on Wikidata must be:


There are many reasons to add data to Wikidata including:Why add data to Wikidata
[edit]

Help more people to see your information[edit]

Data from Wikidata is used by many high traffic websites including Wikipedia which is one of the most used websites in the world receiving over 15 billion page views per month.

Improve open knowledge[edit]

Wikidata hosts data that can be used on Wikimedia projects and beyond. By adding data to Wikidata you can ensure the data on your subject is well covered and up to date in all Wikimedia project languages.

Increase traffic to your website[edit]

Anyone looking at Wikidata or other sites that use Wikidata including Wikipedia can see the reference link for the source of the data.

Make your data more useful for yourself and others[edit]

By adding data to Wikidata it becomes more useful. You can:

  • Combine it with other data
  • Use Wikidata tools to explore the data
  • Visualise your data along with data from other sources…(More)

Participatory Budgeting


From waterfall to agile: How a public-sector agency successfully changed its system-development approach to become digital


Martin Lundqvist and PeterBraad Olesen at McKinsey: “Government agencies around the world are under internal and external pressure to become more efficient by incorporating digital technologies and processes into their day-to-day operations. For a lot of public-sector organizations, however, the digital transformation has been bumpy. In many cases, agencies are trying to streamline and automate workflow and processes using antiquated systems-development approaches. Such methods make direct connections between citizens and governments over the Internet more difficult. They also prevent IT organizations from quickly adapting to ever-changing systems requirements or easily combining information from disparate systems. Despite the emergence, over the past decade, of a number of productivity-enhancing technologies, many government institutions continue to cling to old, familiar ways of developing new processes and systems. Nonetheless, a few have been able to change mind-sets internally, shed outdated approaches to improving processes and developing systems, and build new ones. Critically, they have embraced newer techniques, such as agile development, and succeeded in accelerating the digital transformation in core areas of their operations. The Danish Business Authority is one of those organizations.…(More)”

Responsible Data reflection stories


Responsible Data Forum: “Through the various Responsible Data Forum events over the past couple of years, we’ve heard many anecdotes of responsible data challenges faced by people or organizations. These include potentially harmful data management practices, situations where people have experienced gut feelings that there is potential for harm, or workarounds that people have created to avoid those situations.

But we feel that trading in these “war stories” isn’t the most useful way for us to learn from these experiences as acommunity. Instead, we have worked with our communities to build a set of Reflection Stories: a structured, well-researched knowledge base on the unforeseen challenges and (sometimes) negative consequences of usingtechnology and data for social change.

We hope that this can offer opportunities for reflection and learning, as well as helping to develop innovativestrategies for engaging with technology and data in new and responsible ways….

What we learned from the stories

New spaces, new challenges

Moving into new digital spaces is bringing new challenges, and social media is one such space where these challengesare proving very difficult to navigate. This seems to stem from a number of key points:

  • organisations with low levels of technical literacy and experience in tech- or data-driven projects, deciding toengage suddenly with a certain tool or technology without realising what this entails. For some, this seems to stemfrom funders being more willing to support ‘innovative’ tech projects.
  • organisations wishing to engage more with social media while not being aware of more nuanced understandingsof public/private spaces online, and how different communities engage with social media. (see story #2)
    unpredictability and different levels of visibility: due to how privacy settings on Twitter are currently set, visibilityof users can be increased hugely by the actions of others – and once that happens, a user actually has very littleagency to change or reverse that. Sadly, being more visible on, for example, Twitter disproportionately affectswomen and minority groups in a negative way – so while ‘signal boosting’ to raise someone’s profile might be well-meant, the consequences are hard to predict, and almost impossible to reverse manually. (see story #4)
  • consent: related to the above point, “giving consent” can mean many different things when it comes to digitalspaces, especially if the person in question has little experience or understanding of using the technology inquestion (see stories #4 and #5).

Grey areas of responsible data

In almost all of the cases we looked at, very few decisions were concretely “right” or “wrong”. There are many, manygrey areas here, which need to be addressed on a case by case basis. In some cases, people involved really did thinkthrough their actions, and approached their problems thoughtfully and responsibly – but consequences they had notimagined, happened (see story #8).

Additionally, given the quickly moving nature of the space, challenges can arise that simply would not have beenpossible at the start.

….Despite the very varying settings of the stories collected, the shared mitigation strategies indicate that there areindeed a few key principles that can be kept in mind throughout the development of a new tech- or data-drivenproject.

The most stark of these – and one key aspect that is underlying many of these challenges – is a fundamental lack of technical literacy among advocacy organisations. This affects the way they interact with technical partners, the decisions they make around the project, the level to which they can have meaningful input, and more. Perhaps more crucially, it also affects the ability to know what to ask for help about – ie, to ‘know the unknowns’.

Building an organisation’s technical literacy might not mean being able to answer all technical questions in-house, but rather knowing what to ask and what to expect in an answer, from others. For advocacy organisations who don’t (yet)have this, it becomes all too easy to outsource not just the actual technical work but the contextual decisions too, which should be a collaborative process, benefiting from both sets of expertise.

There seems to be a lot of scope to expand this set of stories both in terms of collecting more from other advocacy organisations, and into other sectors, too. Ultimately, we hope that sharing our collective intelligence around lessonslearned from responsible data challenges faced in projects, will contribute to a greater understanding for all of us….Read all the stories here

UN statistics commission agrees starting point for SDG oversight


Emma Rumney at Public Finance: “The United Nations Statistical Commission agreed on a set of 230 preliminary indicators to measure progress towards the 17 Sustainable Development Goals published last September.

Wu Hongbo, under secretary general of the UN Department of Economic and Social Affairs, of which the UKSC is part, said “completing the indicator framework is not the end of the story – on the contrary, it is the beginning”.

Hongbo said it was necessary to acknowledge that developing a high-quality set of indicators is a technical and necessarily continuous process, “with refinements and improvements” made as “knowledge improves and new data sources become available”.

One challenge will entail the effective disaggregation of data, by income, sex, age, race, ethnicity, migratory status, disability, geographic location and more, to allow coverage of specific sectors of the population.

This will be essential if the SDGs are to be implemented successfully.

Hongbo said this will require “an unprecedented amount of data to be produced and analysed”, posing a significant challenge to national statistics systems in both the developing and developed world.

National and regional authorities will also have to develop their own indicators for regional, national and sub-national monitoring, as the global indicators won’t be able to account for different realities, capacities and levels of development.

The statistical commission will now submit its initial global indicator framework to the UN’s Economic and Social Council and General Assembly for adoption….(More)

See also: