Can government stop losing its mind?


Report by Gavin Starks: “Can government remember? Is it condemned to repeat mistakes? Or does it remember too much and so see too many reasons why anything new is bound to fail?

While we are at the beginnings of a data revolution, we are also at a point where the deluge of data is creating the potential for an ‘information collapse’ in complex administrations: structured information and knowledge is lost in the noise or, worse, misinformation rises as fact.

There are many reasons for this: the technical design of systems, turnover of people, and contracting out. Information is stored in silos and often guarded jealously. Cultural and process issues lead to poor use of technologies. Knowledge is both formal (codified) and informal (held in people’s brains). The greatest value will be unlocked by combining these with existing and emerging tools.

This report sets out how the public sector could benefit from a federated, data-driven approach: one that provides greater power to its leaders, benefits its participants and users, and improves performance through better use of, and structured access to, data.

The report explores examples from the Open Data Institute, Open Banking Standard, BBC Archives, Ministry of Justice, NHS Blood and Transplant, Defence Science and Technology Laboratory and Ministry of Defence.

Recommendations:

  1. Design for open; build for search
  2. Build reciprocity into data supply chains
  3. Develop data ethics standards that can evolve at pace
  4. Create a Digital Audit Office
  5. Develop and value a culture of network thinking

To shorten the path between innovation and policy in a way that is repeatable and scalable, the report proposes six areas of focus be considered in any implementation design.

  1. Policy Providing strategic leadership and governance; framing and analysing economic, legal and regulatory impacts (e.g. GDPR, data ethics, security) and highlighting opportunities and threats.
  2. Culture Creating compelling peer, press and public communication and engagement that both address concerns and inspire people to engage in the solutions.
  3. Making Commissioning startups, running innovation competitions and programmes to create practice-based evidence that illustrates the challenges and business opportunities.
  4. Learning Creating training materials that aid implementation and defining evidence-based sustainable business models that are anchored around user-needs.
  5. Standards Defining common human and machine processes that enable both repeatability and scale within commercial and non-commercial environments.
  6. Infrastructure Defining and framing how people and machines will use data, algorithms and open APIs to create sustainable impact….(More)”.

Data in the EU: Commission steps up efforts to increase availability and boost healthcare data sharing


PressRelease: “Today, the European Commission is putting forward a set of measures to increase the availability of data in the EU, building on previous initiatives to boost the free flow of non-personal data in the Digital Single Market.

Data-driven innovation is a key enabler of market growth, job creation, particularly for SMEs and startups, and the development of new technologies. It allows citizens to easily access and manage their health data, and allows public authorities to use data better in research, prevention and health system reforms….

Today’s proposals build on the General Data Protection Regulation (GDPR), which will enter into application as of 25 May 2018. They will ensure:

  • Better access to and reusability of public sector data: A revised law on Public Sector Information covers data held by public undertakings in transport and utilities sectors. The new rules limit the exceptions that allow public bodies to charge more than the marginal costs of data dissemination for the reuse of their data. They also facilitate the reusability of open research data resulting from public funding, and oblige Member States to develop open access policies. Finally, the new rules require – where applicable – technical solutions like Application Programming Interfaces (APIs) to provide real-time access to data.
  • Scientific data sharing in 2018: new set of recommendations address the policy and technological changes since the last Commission proposal on access to and preservation of scientific information. They offer guidance on implementing open access policies in line with open science objectives, research data and data management, the creation of a European Open Science Cloud, and text and data-mining. They also highlight the importance of incentives, rewards, skills and metrics appropriate for the new era of networked research.
  • Private sector data sharing in business-to-business and business-to-governments contexts: A new Communication entitled “Towards a common European data space” provides guidance for businesses operating in the EU on the legal and technical principles that should govern data sharing collaboration in the private sector.
  • Securing citizens’ healthcare data while fostering European cooperation: The Commission is today setting out a plan of action that puts citizens first when it comes to data on citizens’ health: by securing citizens’ access to their health data and introducing the possibility to share their data across borders; by using larger data sets to enable more personalised diagnoses and medical treatment, and better anticipate epidemics; and by promoting appropriate digital tools, allowing public authorities to better use health data for research and for health system reforms. Today’s proposal also covers the interoperability of electronic health records as well as a mechanism for voluntary coordination in sharing data – including genomic data – for disease prevention and research….(More)”.

The Efficiency Paradox: What Big Data Can’t Do


Book by Edward Tenner: “A bold challenge to our obsession with efficiency–and a new understanding of how to benefit from the powerful potential of serendipity

Algorithms, multitasking, the sharing economy, life hacks: our culture can’t get enough of efficiency. One of the great promises of the Internet and big data revolutions is the idea that we can improve the processes and routines of our work and personal lives to get more done in less time than we ever have before. There is no doubt that we’re performing at higher levels and moving at unprecedented speed, but what if we’re headed in the wrong direction?

Melding the long-term history of technology with the latest headlines and findings of computer science and social science, The Efficiency Paradox questions our ingrained assumptions about efficiency, persuasively showing how relying on the algorithms of digital platforms can in fact lead to wasted efforts, missed opportunities, and above all an inability to break out of established patterns. Edward Tenner offers a smarter way of thinking about efficiency, revealing what we and our institutions, when equipped with an astute combination of artificial intelligence and trained intuition, can learn from the random and unexpected….(More)”

How Artificial Intelligence Could Increase the Risk of Nuclear War


Rand Corporation: “The fear that computers, by mistake or malice, might lead humanity to the brink of nuclear annihilation has haunted imaginations since the earliest days of the Cold War.

The danger might soon be more science than fiction. Stunning advances in AI have created machines that can learn and think, provoking a new arms race among the world’s major nuclear powers. It’s not the killer robots of Hollywood blockbusters that we need to worry about; it’s how computers might challenge the basic rules of nuclear deterrence and lead humans into making devastating decisions.

That’s the premise behind a new paper from RAND Corporation, How Might Artificial Intelligence Affect the Risk of Nuclear War? It’s part of a special project within RAND, known as Security 2040, to look over the horizon and anticipate coming threats.

“This isn’t just a movie scenario,” said Andrew Lohn, an engineer at RAND who coauthored the paper and whose experience with AI includes using it to route drones, identify whale calls, and predict the outcomes of NBA games. “Things that are relatively simple can raise tensions and lead us to some dangerous places if we are not careful.”…(More)”.

Using Data to Inform the Science of Broadening Participation


Donna K. Ginther at the American Behavioral Scientist: “In this article, I describe how data and econometric methods can be used to study the science of broadening participation. I start by showing that theory can be used to structure the approach to using data to investigate gender and race/ethnicity differences in career outcomes. I also illustrate this process by examining whether women of color who apply for National Institutes of Health research funding are confronted with a double bind where race and gender compound their disadvantage relative to Whites. Although high-quality data are needed for understanding the barriers to broadening participation in science careers, it cannot fully explain why women and underrepresented minorities are less likely to be scientists or have less productive science careers. As researchers, it is important to use all forms of data—quantitative, experimental, and qualitative—to deepen our understanding of the barriers to broadening participation….(More)”.

Use our personal data for the common good


Hetan Shah at Nature: “Data science brings enormous potential for good — for example, to improve the delivery of public services, and even to track and fight modern slavery. No wonder researchers around the world — including members of my own organization, the Royal Statistical Society in London — have had their heads in their hands over headlines about how Facebook and the data-analytics company Cambridge Analytica might have handled personal data. We know that trustworthiness underpins public support for data innovation, and we have just seen what happens when that trust is lost….But how else might we ensure the use of data for the public good rather than for purely private gain?

Here are two proposals towards this goal.

First, governments should pass legislation to allow national statistical offices to gain anonymized access to large private-sector data sets under openly specified conditions. This provision was part of the United Kingdom’s Digital Economy Act last year and will improve the ability of the UK Office for National Statistics to assess the economy and society for the public interest.

My second proposal is inspired by the legacy of John Sulston, who died earlier this month. Sulston was known for his success in advocating for the Human Genome Project to be openly accessible to the science community, while a competitor sought to sequence the genome first and keep data proprietary.

Like Sulston, we should look for ways of making data available for the common interest. Intellectual-property rights expire after a fixed time period: what if, similarly, technology companies were allowed to use the data that they gather only for a limited period, say, five years? The data could then revert to a national charitable corporation that could provide access to certified researchers, who would both be held to account and be subject to scrutiny that ensure the data are used for the common good.

Technology companies would move from being data owners to becoming data stewards…(More)” (see also http://datacollaboratives.org/).

Obfuscating with transparency


“These approaches…limit the impact of valuable information in developing policies…”

Under the new policy, studies that do not fully meet transparency criteria would be excluded from use in EPA policy development. This proposal follows unsuccessful attempts to enact the Honest and Open New EPA Science Treatment (HONEST) Act and its predecessor, the Secret Science Reform Act. These approaches undervalue many scientific publications and limit the impact of valuable information in developing policies in the areas that the EPA regulates….In developing effective policies, earnest evaluations of facts and fair-minded assessments of the associated uncertainties are foundational. Policy discussions require an assessment of the likelihood that a particular observation is true and examinations of the short- and long-term consequences of potential actions or inactions, including a wide range of different sorts of costs. Those with training in making these judgments with access to as much relevant information as possible are crucial for this process. Of course, policy development requires considerations other than those related to science. Such discussions should follow clear assessment after access to all of the available evidence. The scientific enterprise should stand up against efforts that distort initiatives aimed to improve scientific practice, just to pursue other agendas…(More)”.

Literature review on collective intelligence: a crowd science perspective


Chao Yu in the International Journal of Crowd Science: “A group can be of more power and better wisdom than the sum of the individuals. Foreign scholars have noticed that for a long time and called it collective intelligence. It has emerged from the communication, collaboration, competition and brain storming, etc. Collective intelligence appears in many fields such as public decisions, voting activities, social networks and crowdsourcing.

Crowd science mainly focuses on the basic principles and laws of the intelligent activities of groups under the new interconnection model. It explores how to give full play to the intelligence agents and groups, dig their potential to solve the problems that are difficult for a single agent.

In this paper, we present a literature review on collective intelligence in a crowd science perspective. We focus on researchers’ related work, especially that under which circumstance can group show their wisdom, how to measure it, how to optimize it and its modern or future applications in the digital world. That is exactly what the crowd science pays close attention to….(More)”.

What if a nuke goes off in Washington, D.C.? Simulations of artificial societies help planners cope with the unthinkable


Mitchell Waldrop at Science: “…The point of such models is to avoid describing human affairs from the top down with fixed equations, as is traditionally done in such fields as economics and epidemiology. Instead, outcomes such as a financial crash or the spread of a disease emerge from the bottom up, through the interactions of many individuals, leading to a real-world richness and spontaneity that is otherwise hard to simulate.

That kind of detail is exactly what emergency managers need, says Christopher Barrett, a computer scientist who directs the Biocomplexity Institute at Virginia Polytechnic Institute and State University (Virginia Tech) in Blacksburg, which developed the NPS1 model for the government. The NPS1 model can warn managers, for example, that a power failure at point X might well lead to a surprise traffic jam at point Y. If they decide to deploy mobile cell towers in the early hours of the crisis to restore communications, NPS1 can tell them whether more civilians will take to the roads, or fewer. “Agent-based models are how you get all these pieces sorted out and look at the interactions,” Barrett says.

The downside is that models like NPS1 tend to be big—each of the model’s initial runs kept a 500-microprocessor computing cluster busy for a day and a half—forcing the agents to be relatively simple-minded. “There’s a fundamental trade-off between the complexity of individual agents and the size of the simulation,” says Jonathan Pfautz, who funds agent-based modeling of social behavior as a program manager at the Defense Advanced Research Projects Agency in Arlington, Virginia.

But computers keep getting bigger and more powerful, as do the data sets used to populate and calibrate the models. In fields as diverse as economics, transportation, public health, and urban planning, more and more decision-makers are taking agent-based models seriously. “They’re the most flexible and detailed models out there,” says Ira Longini, who models epidemics at the University of Florida in Gainesville, “which makes them by far the most effective in understanding and directing policy.”

he roots of agent-based modeling go back at least to the 1940s, when computer pioneers such as Alan Turing experimented with locally interacting bits of software to model complex behavior in physics and biology. But the current wave of development didn’t get underway until the mid-1990s….(More)”.

The citation graph is one of humankind’s most important intellectual achievements


Dario Taraborelli at BoingBoing: “When researchers write, we don’t just describe new findings — we place them in context by citing the work of others. Citations trace the lineage of ideas, connecting disparate lines of scholarship into a cohesive body of knowledge, and forming the basis of how we know what we know.

Today, citations are also a primary source of data. Funders and evaluation bodies use them to appraise scientific impact and decide which ideas are worth funding to support scientific progress. Because of this, data that forms the citation graph should belong to the public. The Initiative for Open Citations was created to achieve this goal.

Back in the 1950s, reference works like Shepard’s Citations provided lawyers with tools to reconstruct which relevant cases to cite in the context of a court trial. No such a tool existed at the time for identifying citations in scientific publications. Eugene Garfield — the pioneer of modern citation analysis and citation indexing — described the idea of extending this approach to science and engineering as his Eureka moment. Garfield’s first experimental Genetics Citation Index, compiled by the newly-formed Institute for Scientific Information (ISI) in 1961, offered a glimpse into what a full citation index could mean for science at large. It was distributed, for free, to 1,000 libraries and scientists in the United States.

Fast forward to the end of the 20th century. the Web of Science citation index — maintained by Thomson Reuters, who acquired ISI in 1992 — has become the canonical source for scientists, librarians, and funders to search scholarly citations, and for the field of scientometrics, to study the structure and evolution of scientific knowledge. ISI could have turned into a publicly funded initiative, but it started instead as a for-profit effort. In 2016, Thomson Reuters sold its Intellectual Property & Science business to a private-equity fund for $3.55 billion. Its citation index is now owned by Clarivate Analytics.

Raw citation data being non-copyrightable, it’s ironic that the vision of building a comprehensive index of scientific literature has turned into a billion-dollar business, with academic institutions paying cripplingly expensive annual subscriptions for access and the public locked out.

Enter the Initiative for Open Citations.

In 2016, a small group founded the Initiative for Open Citations (I4OC) as a voluntary effort to work with scholarly publishers — who routinely publish this data — to persuade them to release it in the open and promote its unrestricted availability. Before the launch of the I4OC, only 1% of indexed scholarly publications with references were making citation data available in the public domain. When the I4OC was officially announced in 2017, we were able to report that this number had shifted from 1% to 40%. In the main, this was thanks to the swift action of a small number of large academic publishers.

In April 2018, we are celebrating the first anniversary of the initiative. Since the launch, the fraction of indexed scientific articles with open citation data (as measured by Crossref) has surpassed 50% and the number of participating publishers has risen to 490Over half a billion references are now openly available to the public without any copyright restriction. Of the top-20 biggest publishers with citation data, all but 5 — Elsevier, IEEE, Wolters Kluwer Health, IOP Publishing, ACS — now make this data open via Crossref and its APIs. Over 50 organisations — including science funders, platforms and technology organizations, libraries, research and advocacy institutions — have joined us in this journey to help advocate and promote the reuse of open citations….(More)”.