Matthew S Mayernik at Big Data and Society: “The movements by national governments, funding agencies, universities, and research communities toward “open data” face many difficult challenges. In high-level visions of open data, researchers’ data and metadata practices are expected to be robust and structured. The integration of the internet into scientific institutions amplifies these expectations. When examined critically, however, the data and metadata practices of scholarly researchers often appear incomplete or deficient. The concepts of “accountability” and “transparency” provide insight in understanding these perceived gaps. Researchers’ primary accountabilities are related to meeting the expectations of research competency, not to external standards of data deposition or metadata creation. Likewise, making data open in a transparent way can involve a significant investment of time and resources with no obvious benefits. This paper uses differing notions of accountability and transparency to conceptualize “open data” as the result of ongoing achievements, not one-time acts….(More)”.
Avoiding Garbage In – Garbage Out: Improving Administrative Data Quality for Research
Blog by China Layne: “In June, I presented the webinar, “Improving Administrative Data Quality for Research and Analysis”, for members of the Association of Public Data Users (APDU). APDU is a national network that provides a venue to promote education, share news, and advocate on behalf of public data users.
The webinar served as a primer to help smaller organizations begin to use their data for research. Participants were given the tools to transform their administrative data into “research-ready” datasets.
I first reviewed seven major issues for administrative data quality and discussed how these issues can affect research and analysis. For instance, issues with incorrect value formats, unit of analysis, and duplicate records can make the data difficult to use. Invalid or inconsistent values lead to inaccurate analysis results. Missing or outlier values can produce inaccurate and biased analysis results. All these issues make the data less useful for research.
Next, I presented concrete strategies for reviewing the data to identify each of these quality issues. I also discussed several tips to make the data review process easier, faster, and easy to replicate. Most importantly among these tips are: (1) reviewing everyvariable in the data set, whether you expect problems or not, and (2) relying on data documentation to understand how the data should look….(More)”.
Data for Development: The Case for Information, Not Just Data
Daniela Ligiero at the Council on Foreign Relations: “When it comes to development, more data is often better—but in the quest for more data, we can often forget about ensuring we have information, which is even more valuable. Information is data that have been recorded, classified, organized, analyzed, interpreted, and translated within a framework so that meaning emerges. At the end of the day, information is what guides action and change.
The need for more data
In 2015, world leaders came together to adopt a new global agenda to guide efforts over the next fifteen years, the Sustainable Development Goals. The High-level Political Forum (HLPF), to be held this year at the United Nations on July 10-19, is an opportunity for review of the 2030 Agenda, and will include an in-depth analysis of seven of the seventeen goals—including those focused on poverty, health, and gender equality. As part of the HLPF, member states are encouraged to undergo voluntary national reviews of progress across goals to facilitate the sharing of experiences, including successes, challenges, and lessons learned; to strengthen policies and institutions; and to mobilize multi-stakeholder support and partnerships for the implementation of the agenda.
A significant challenge that countries continue to face in this process, and one that becomes painfully evident during the HLPF, is the lack of data to establish baselines and track progress. Fortunately, new initiatives aligned with the 2030 Agenda are working to focus on data, such as the Global Partnership for Sustainable Development Data. There are also initiatives focus on collecting more and better data in particular areas, like gender data (e.g., Data2X; UN Women’s Making Every Girl and Woman Count). This work is important and urgently needed.
Data to monitor global progress on the goals is critical to keeping countries accountable to their commitments and allows countries to examine how they are doing across multiple, ambitious goals. However, equally important is the rich, granular national and sub-national level data that can guide the development and implementation of evidence-based, effective programs and policies. These kinds of data are also often lacking or of poor quality, in which case more data and better data is essential. But a frequently-ignored piece of the puzzle at the national level is improved use of the data we already have.
Making the most of the data we have
To illustrate this point, consider the Together for Girls partnership, which was built on obtaining new data where it was lacking and effectively translating it into information to change policies and programs. We are a partnership between national governments, UN agencies and private sector organizations working to break cycles of violence, with special attention to sexual violence against girls. …The first pillar of our work is focused on understanding violence against children within a country, always at the request of the national government. We do this through a national household survey – the Violence Against Children Survey (VACS), led by national governments, CDC, and UNICEF as part of the Together for Girls Partnership….
The truth is there is a plethora of data at the country level, generated by surveys, special studies, administrative systems, private sector, and citizens that can provide meaningful insights across all the development goals.
Connecting the dots
But data—like our programs’—often remain in silos. For example, data focused on violence against children is typically not top of mind for those working on women’s empowerment or adolescent health. Yet, as an example, the VACS can offer valuable information about how sexual violence against girls, as young as 13,is connected to adolescent pregnancy—or how one of the most common perpetrators of sexual violence against girls is a partner, a pattern that starts early and is a predictor for victimization and perpetration later in life. However, these data are not consistently used across actors working on programs related to adolescent pregnancy and violence against women….(More)”.
Research data infrastructures in the UK
The Open Research Data Task Force : “This report is intended to inform the work of the Open Research Data Task Force, which has been established with the aim of building on the principles set out in Open Research Data Concordat (published in July 2016) to co-ordinate creation of a roadmap to develop the infrastructure for open research data across the UK. As an initial contribution to that work, the report provides an outline of the policy and service infrastructure in the UK as it stands in the first half of 2017, including some comparisons with other countries; and it points to some key areas and issues which require attention. It does not seek to identify possible courses of action, nor even to suggest priorities the Task Force might consider in creating its final report to be published in 2018. That will be the focus of work for the Task Force over the next few months.
Why is this important?
The digital revolution continues to bring fundamental changes to all aspects of research: how it is conducted, the findings that are produced, and how they are interrogated and transmitted not only within the research community but more widely. We are as yet still in the early stages of a transformation in which progress is patchy across the research community, but which has already posed significant challenges for research funders and institutions, as well as for researchers themselves. Research data is at the heart of those challenges: not simply the datasets that provide the core of the evidence analysed in scholarly publications, but all the data created and collected throughout the research process. Such data represents a potentially-valuable resource for people and organisations in the commercial, public and voluntary sectors, as well as for researchers. Access to such data, and more general moves towards open science, are also critically-important in ensuring that research is reproducible, and thus in sustaining public confidence in the work of the research community. But effective use of research data depends on an infrastructure – of hardware, software and services, but also of policies, organisations and individuals operating at various levels – that is as yet far from fully-formed. The exponential increases in volumes of data being generated by researchers create in themselves new demands for storage and computing power. But since the data is characterised more by heterogeneity then by uniformity, development of the infrastructure to manage it involves a complex set of requirements in preparing, collecting, selecting, analysing, processing, storing and preserving that data throughout its life cycle.
Over the past decade and more, there have been many initiatives on the part of research institutions, funders, and members of the research community at local, national and international levels to address some of these issues. Diversity is a key feature of the landscape, in terms of institutional types and locations, funding regimes, and nature and scope of partnerships, as well as differences between disciplines and subject areas. Hence decision-makers at various levels have fostered via their policies and strategies many community-organised developments, as well as their own initiatives and services. Significant progress has been achieved as a result, through the enthusiasm and commitment of key organisations and individuals. The less positive features have been a relative lack of harmonisation or consolidation, and there is an increasing awareness of patchiness in provision, with gaps, overlaps and inconsistencies. This is not surprising, since policies, strategies and services relating to research data necessarily affect all aspects of support for the diverse processes of research itself. Developing new policies and infrastructure for research data implies significant re-thinking of structures and regimes for supporting, fostering and promoting research itself. That in turn implies taking full account of widely-varying characteristics and needs of research of different kinds, while also keeping in clear view the benefits to be gained from better management of research data, and from greater openness in making data accessible for others to re-use for a wide range of different purposes….(More)”.
The State of Open Data Portals in Latin America
Michael Steinberg at Center for Data Innovation: “Many Latin American countries publish open data—government data made freely available online in machine-readable formats and without license restrictions. However, there is a tremendous amount of variation in the quantity and type of datasets governments publish on national open data portals—central online repositories for open data that make it easier for users to find data. Despite the wide variation among the countries, the most popular datasets tend to be those that either provide transparency into government operations or offer information that citizens can use directly. As governments continue to update and improve their open data portals, they should take steps to ensure that they are publishing the datasets most valuable to their citizens.
To better understand this variation, we collected information about open data portals in 20 Latin American countries including Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Ecuador, Mexico, Panama, Paraguay, Peru, and Uruguay. Not all Latin American countries have an open data portal, but even if they do not operate a unified portal, some governments may still have open data. Four Latin American countries—Belize, Guatemala, Honduras, and Nicaragua—do not have open data portals. One country— El Salvador—does not have a government-run open data portal, but does have a national open data portal (datoselsalvador.org) run by volunteers….
There are many steps Latin American governments can take to improve open data in their country. Those nations without open data portals should create them, and those who already have them should continue to update them and publish more datasets to better serve their constituents. One way to do this is to monitor the popular datasets on other countries’ open data portals, and where applicable, ensure the government produces similar datasets. Those running open data portals should also routinely monitor search queries to see what users are looking for, and if they are looking for datasets that have not yet been posted, work with the relevant government agencies to make these datasets available.
In summary, there are stark differences in the amount of data published, the format of the data, and the most popular datasets in open data portals in Latin America. However, in every country there is an appetite for data that either provides public accountability for government functions or supplies helpful information to citizens…(More)”.
The Right of Access to Public Information
Book by Hermann-Josef Blanke and Ricardo Perlingeiro: “This book presents a comparative study on access to public information in the context of the main legal orders worldwide. The international team of authors analyzes the Transparency- and Freedom-to-Information legislation with regard to the scope of the right to access, limitations of this right inherent in the respective national laws, the procedure, the relationship with domestic legislation on administrative procedure, as well as judicial protection. It particularly focuses on the Brazilian law of access to information, which is interpreted as a benchmark for regulations in other Latin-American states….(More)”
Index: Collective Intelligence
By Hannah Pierce and Audrie Pirkl
The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on collective intelligence and was originally published in 2017.
The Collective Intelligence Universe
- Amount of money that Reykjavik’s Better Neighbourhoods program has provided each year to crowdsourced citizen projects since 2012: € 2 million (Citizens Foundation)
- Number of U.S. government challenges that people are currently participating in to submit their community solutions: 778 (Challenge.gov).
- Percent of U.S. arts organizations used social media to crowdsource ideas in 2013, from programming decisions to seminar scheduling details: 52% (Pew Research)
- Number of Wikipedia members who have contributed to a page in the last 30 days: over 120,000 (Wikipedia Page Statistics)
- Number of languages that the multinational crowdsourced Letters for Black Lives has been translated into: 23 (Letters for Black Lives)
- Number of comments in a Reddit thread that established a more comprehensive timeline of the theater shooting in Aurora than the media: 1272 (Reddit)
- Number of physicians that are members of SERMO, a platform to crowdsource medical research: 800,000 (SERMO)
- Number of citizen scientist projects registered on SciStarter: over 1,500 (Collective Intelligence 2017 Plenary Talk: Darlene Cavalier)
- Entrants to NASA’s 2009 TopCoder Challenge: over 1,800 (NASA)
Infrastructure
- Number of submissions for Block Holm (a digital platform that allows citizens to build “Minecraft” ideas on vacant city lots) within the first six months: over 10,000 (OpenLearn)
- Number of people engaged to The Participatory Budgeting Project in the U.S.: over 300,000. (Participatory Budgeting Project)
- Amount of money allocated to community projects through this initiative: $238,000,000
Health
- Percentage of Internet-using adults with chronic health conditions that have gone online within the US to connect with others suffering from similar conditions: 23% (Pew Research)
- Number of posts to Patient Opinion, a UK based platform for patients to provide anonymous feedback to healthcare providers: over 120,000 (Nesta)
- Percent of NHS health trusts utilizing the posts to improve services in 2015: 90%
- Stories posted per month: nearly 1,000 (The Guardian)
- Number of tumors reported to the English National Cancer Registration each year: over 300,000 (Gov.UK)
- Number of users of an open source artificial pancreas system: 310 (Collective Intelligence 2017 Plenary Talk: Dana Lewis)
Government
- Number of submissions from 40 countries to the 2017 Open (Government) Contracting Innovation Challenge: 88 (The Open Data Institute)
- Public-service complaints received each day via Indonesian digital platform Lapor!: over 500 (McKinsey & Company)
- Number of registered users of Unicef Uganda’s weekly, SMS poll U-Report: 356,468 (U-Report)
- Number of reports regarding government corruption in India submitted to IPaidaBribe since 2011: over 140,000 (IPaidaBribe)
Business
- Reviews posted since Yelp’s creation in 2009: 121 million reviews (Statista)
- Percent of Americans in 2016 who trust online customer reviews as much as personal recommendations: 84% (BrightLocal)
- Number of companies and their subsidiaries mapped through the OpenCorporates platform: 60 million (Omidyar Network)
Crisis Response
- Number of diverse stakeholders digitally connected to solve climate change problems through the Climate CoLab: over 75,000 (MIT ILP Institute Insider)
- Number of project submissions to USAID’s 2014 Fighting Ebola Grand Challenge: over 1,500 (Fighting Ebola: A Grand Challenge for Development)
- Reports submitted to open source flood mapping platform Peta Jakarta in 2016: 5,000 (The Open Data Institute)
Public Safety
- Number of sexual harassment reports submitted to from 50 cities in India and Nepal to SafeCity, a crowdsourcing site and mobile app: over 4,000 (SafeCity)
- Number of people that used Facebook’s Safety Check, a feature that is being used in a new disaster mapping project, in the first 24 hours after the terror attacks in Paris: 4.1 million (Facebook)
Examining the Mistrust of Science
Proceedings of a National Academies Workshop: “The Government-University-Industry Research Roundtable held a meeting on February 28 and March 1, 2017, to explore trends in public opinion of science, examine potential sources of mistrust, and consider ways that cross-sector collaboration between government, universities, and industry may improve public trust in science and scientific institutions in the future. The keynote address on February 28 was given by Shawn Otto, co-founder and producer of the U.S. Presidential Science Debates and author of The War on Science.
“There seems to be an erosion of the standing and understanding of science and engineering among the public,” Otto said. “People seem much more inclined to reject facts and evidence today than in the recent past. Why could that be?” Otto began exploring that question after the candidates in the 2008 presidential election declined an invitation to debate science-driven policy issues and instead chose to debate faith and values.
“Wherever the people are well-informed, they can be trusted with their own government,” wrote Thomas Jefferson. Now, some 240 years later, science is so complex that it is difficult even for scientists and engineers to understand the science outside of their particular fields. Otto argued,
“The question is, are people still well-enough informed to be trusted with their own government? Of the 535 members of Congress, only 11—less than 2 percent—have a professional background in science or engineering. By contrast, 218—41 percent—are lawyers. And lawyers approach a problem in a fundamentally different way than a scientist or engineer. An attorney will research both sides of a question, but only so that he or she can argue against the position that they do not support. A scientist will approach the question differently, not starting with a foregone conclusion and arguing towards it, but examining both sides of the evidence and trying to make a fair assessment.”
According to Otto, anti-science positions are now acceptable in public discourse, in Congress, state legislatures and city councils, in popular culture, and in presidential politics. Discounting factually incorrect statements does not necessarily reshape public opinion in the way some trust it to. What is driving this change? “Science is never partisan, but science is always political,” said Otto. “Science takes nothing on faith; it says, ‘show me the evidence and I’ll judge for myself.’ But the discoveries that science makes either confirm or challenge somebody’s cherished beliefs or vested economic or ideological interests. Science creates knowledge—knowledge is power, and that power is political.”…(More)”.
Big Data: A Twenty-First Century Arms Race
Report by Atlantic Council and Thomson Reuters: “We are living in a world awash in data. Accelerated interconnectivity, driven by the proliferation of internet-connected devices, has led to an explosion of data—big data. A race is now underway to develop new technologies and implement innovative methods that can handle the volume, variety, velocity, and veracity of big data and apply it smartly to provide decisive advantage and help solve major challenges facing companies and governments
For policy makers in government, big data and associated technologies like machine-learning and artificial Intelligence, have the potential to drastically improve their decision-making capabilities. How governments use big data may be a key factor in improved economic performance and national security. This publication looks at how big data can maximize the efficiency and effectiveness of government and business, while minimizing modern risks. Five authors explore big data across three cross-cutting issues: security, finance, and law.
Chapter 1, “The Conflict Between Protecting Privacy and Securing Nations,” Els de Busser
Chapter 2, “Big Data: Exposing the Risks from Within,” Erica Briscoe
Chapter 3, “Big Data: The Latest Tool in Fighting Crime,” Benjamin Dean, Fellow
Chapter 4, “Big Data: Tackling Illicit Financial Flows,” Tatiana Tropina
Chapter 5, “Big Data: Mitigating Financial Crime Risk,” Miren Aparicio….Read the Publication (PDF)“
Public Data Is More Important Than Ever–And Now It’s Easier To Find
Meg Miller at Co.Design: “Public data, in theory, is meant to be accessible to everyone. But in practice, even finding it can be near impossible, to say nothing of figuring out what to do with it once you do. Government data websites are often clunky and outdated, and some data is still trapped on physical media–like CDs or individual hard drives.
Tens of thousands of these CDs and hard drives, full of data on topics from Arkansas amusement parks to fire incident reporting, have arrived at the doorstep of the New York-based start-up Enigma over the past four years. The company has obtained thousands upon thousands more datasets by way of Freedom of Information Act (FOIA) requests. Enigma specializes in open data: gathering it, curating it, and analyzing it for insights into a client’s industry, for example, or for public service initiatives.
Enigma also shares its 100,000 datasets with the world through an online platform called Public—the broadest collection of public data that is open and searchable by everyone. Public has been around since Enigma launched in 2013, but today the company is introducing a redesigned version of the site that’s fresher and more user-friendly, with easier navigation and additional features that allow users to drill further down into the data.
But while the first iteration of Public was mostly concerned with making Enigma’s enormous trove of data—which it was already gathering and reformating for client work—accessible to the public, the new site focuses more on linking that data in new ways. For journalists, researchers, and data scientists, the tool will offer more sophisticated ways of making sense of the data that they have access to through Enigma….
…the new homepage also curates featured datasets and collections to enforce a sense of discoverability. For example, an Enigma-curated collection of U.S. sanctions data from the U.S. Treasury Department’s Office of Foreign Assets Control (OFAC) shows data on the restrictions on entities or individuals that American companies can and can’t do business with in an effort to achieve specific national security or foreign policy objectives. A new round of sanctions against Russia have been in the news lately as an effort by President Trump to loosen restrictions on blacklisted businesses and individuals in Russia was overruled by the Senate last week. Enigma’s curated data selection on U.S. sanctions could help journalists contextualize recent events with data that shows changes in sanctions lists over time by presidential administration, for instance–or they could compare the U.S. sanctions list to the European Union’s….(More).