UK’s Digital Strategy


Executive Summary: “This government’s Plan for Britain is a plan to build a stronger, fairer country that works for everyone, not just the privileged few. …Our digital strategy now develops this further, applying the principles outlined in the Industrial Strategy green paper to the digital economy. The UK has a proud history of digital innovation: from the earliest days of computing to the development of the World Wide Web, the UK has been a cradle for inventions which have changed the world. And from Ada Lovelace – widely recognised as the first computer programmer – to the pioneers of today’s revolution in artificial intelligence, the UK has always been at the forefront of invention. …

Maintaining the UK government as a world leader in serving its citizens online

From personalised services in health, to safer care for the elderly at home, to tailored learning in education and access to culture – digital tools, techniques and technologies give us more opportunities than ever before to improve the vital public services on which we all rely.

The UK is already a world leader in digital government,7 but we want to go further and faster. The new Government Transformation Strategy published on 9 February 2017 sets out our intention to serve the citizens and businesses of the UK with a better, more coherent experience when using government services online – one that meets the raised expectations set by the many other digital services and tools they use every day. So, we will continue to develop single cross-government platform services, including by working towards 25 million GOV.UK Verify users by 2020 and adopting new services onto the government’s GOV.UK Pay and GOV.UK Notify platforms.

We will build on the ‘Government as a Platform’ concept, ensuring we make greater reuse of platforms and components across government. We will also continue to move towards common technology, ensuring that where it is right we are consuming commodity hardware or cloud-based software instead of building something that is needlessly government specific.

We will also continue to work, across government and the public sector, to harness the potential of digital to radically improve the efficiency of our public services – enabling us to provide a better service to citizens and service users at a lower cost. In education, for example, we will address the barriers faced by schools in regions not connected to appropriate digital infrastructure and we will invest in the Network of Teaching Excellence in Computer Science to help teachers and school leaders build their knowledge and understanding of technology. In transport, we will make our infrastructure smarter, more accessible and more convenient for passengers. At Autumn Statement 2016 we announced that the National Productivity Investment Fund would allocate £450 million from 2018-19 to 2020-21 to trial digital signalling technology on the rail network. And in policing, we will enable police officers to use biometric applications to match fingerprint and DNA from scenes of crime and return results including records and alerts to officers over mobile devices at the crime scene.

Read more about digital government.

Unlocking the power of data in the UK economy and improving public confidence in its use

As part of creating the conditions for sustainable growth, we will take the actions needed to make the UK a world-leading data-driven economy, where data fuels economic and social opportunities for everyone, and where people can trust that their data is being used appropriately.

Data is a global commodity and we need to ensure that our businesses can continue to compete and communicate effectively around the world. To maintain our position at the forefront of the data revolution, we will implement the General Data Protection Regulation by May 2018. This will ensure a shared and higher standard of protection for consumers and their data.

Read more about data….(More)”

AI, machine learning and personal data


Jo Pedder at the Information Commissioner’s Office Blog: “Today sees the publication of the ICO’s updated paper on big data and data protection.

But why now? What’s changed in the two and a half years since we first visited this topic? Well, quite a lot actually:

  • big data is becoming the norm for many organisations, using it to profile people and inform their decision-making processes, whether that’s to determine your car insurance premium or to accept/reject your job application;
  • artificial intelligence (AI) is stepping out of the world of science-fiction and into real life, providing the ‘thinking’ power behind virtual personal assistants and smart cars; and
  • machine learning algorithms are discovering patterns in data that traditional data analysis couldn’t hope to find, helping to detect fraud and diagnose diseases.

The complexity and opacity of these types of processing operations mean that it’s often hard to know what’s going on behind the scenes. This can be problematic when personal data is involved, especially when decisions are made that have significant effects on people’s lives. The combination of these factors has led some to call for new regulation of big data, AI and machine learning, to increase transparency and ensure accountability.

In our view though, whilst the means by which the processing of personal data are changing, the underlying issues remain the same. Are people being treated fairly? Are decisions accurate and free from bias? Is there a legal basis for the processing? These are issues that the ICO has been addressing for many years, through oversight of existing European data protection legislation….(More)”

When the Big Lie Meets Big Data


Peter Bruce in Scientific America: “…The science of predictive modeling has come a long way since 2004. Statisticians now build “personality” models and tie them into other predictor variables. … One such model bears the acronym “OCEAN,” standing for the personality characteristics (and their opposites) of openness, conscientiousness, extroversion, agreeableness, and neuroticism. Using Big Data at the individual level, machine learning methods might classify a person as, for example, “closed, introverted, neurotic, not agreeable, and conscientious.”

Alexander Nix, CEO of Cambridge Analytica (owned by Trump’s chief donor, Rebekah Mercer), says he has thousands of data points on you, and every other voter: what you buy or borrow, where you live, what you subscribe to, what you post on social media, etc. At a recent Concordia Summit, using the example of gun rights, Nix described how messages will be crafted to appeal specifically to you, based on your personality profile. Are you highly neurotic and conscientious? Nix suggests the image of a sinister gloved hand reaching through a broken window.

In his presentation, Nix noted that the goal is to induce behavior, not communicate ideas. So where does truth fit in? Johan Ugander, Assistant Professor of Management Science at Stanford, suggests that, for Nix and Cambridge Analytica, it doesn’t. In counseling the hypothetical owner of a private beach how to keep people off his property, Nix eschews the merely factual “Private Beach” sign, advocating instead a lie: “Sharks sighted.” Ugander, in his critique, cautions all data scientists against “building tools for unscrupulous targeting.”

The warning is needed, but may be too late. What Nix described in his presentation involved carefully crafted messages aimed at his target personalities. His messages pulled subtly on various psychological strings to manipulate us, and they obeyed no boundary of truth, but they required humans to create them.  The next phase will be the gradual replacement of human “craftsmanship” with machine learning algorithms that can supply targeted voters with a steady stream of content (from whatever source, true or false) designed to elicit desired behavior. Cognizant of the Pandora’s box that data scientists have opened, the scholarly journal Big Data has issued a call for papers for a future issue devoted to “Computational Propaganda.”…(More)”

Democracy at Work: Moving Beyond Elections to Improve Well-Being


Michael Touchton, Natasha Borges Sugiyama and Brian Wampler in the American Political Science Review: “How does democracy work to improve well-being? In this article, we disentangle the component parts of democratic practice—elections, civic participation, expansion of social provisioning, local administrative capacity—to identify their relationship with well-being. We draw from the citizenship debates to argue that democratic practices allow citizens to gain access to a wide range of rights, which then serve as the foundation for improving social well-being. Our analysis of an original dataset covering over 5,550 Brazilian municipalities from 2006 to 2013 demonstrates that competitive elections alone do not explain variation in infant mortality rates, one outcome associated with well-being. We move beyond elections to show how participatory institutions, social programs, and local state capacity can interact to buttress one another and reduce infant mortality rates. It is important to note that these relationships are independent of local economic growth, which also influences infant mortality. The result of our thorough analysis offers a new understanding of how different aspects of democracy work together to improve a key feature of human development….(More)”.

Handbook of Big Data Technologies


Handbook by Albert Y. Zomaya and Sherif Sakr: “…offers comprehensive coverage of recent advancements in Big Data technologies and related paradigms.  Chapters are authored by international leading experts in the field, and have been reviewed and revised for maximum reader value. The volume consists of twenty-five chapters organized into four main parts. Part one covers the fundamental concepts of Big Data technologies including data curation mechanisms, data models, storage models, programming models and programming platforms. It also dives into the details of implementing Big SQL query engines and big stream processing systems.  Part Two focuses on the semantic aspects of Big Data management including data integration and exploratory ad hoc analysis in addition to structured querying and pattern matching techniques.  Part Three presents a comprehensive overview of large scale graph processing. It covers the most recent research in large scale graph processing platforms, introducing several scalable graph querying and mining mechanisms in domains such as social networks.  Part Four details novel applications that have been made possible by the rapid emergence of Big Data technologies such as Internet-of-Things (IOT), Cognitive Computing and SCADA Systems.  All parts of the book discuss open research problems, including potential opportunities, that have arisen from the rapid progress of Big Data technologies and the associated increasing requirements of application domains.
Designed for researchers, IT professionals and graduate students, this book is a timely contribution to the growing Big Data field. Big Data has been recognized as one of leading emerging technologies that will have a major contribution and impact on the various fields of science and varies aspect of the human society over the coming decades. Therefore, the content in this book will be an essential tool to help readers understand the development and future of the field….(More)”

Crowdsourcing Expertise


Simons Foundation: “Ever wish there was a quick, easy way to connect your research to the public?

By hosting a Wikipedia ‘edit-a-thon’ at a science conference, you can instantly share your research knowledge with millions while improving the science content on the most heavily trafficked and broadly accessible resource in the world. In 2016, in partnership with the Wiki Education Foundation, we helped launched the Wikipedia Year of Science, an ambitious initiative designed to better connect the work of scientists and students to the public. Here, we share some of what we learned.

The Simons Foundation — through its Science Sandbox initiative, dedicated to public engagement — co-hosted a series of Wikipedia edit-a-thons throughout 2016 at almost every major science conference, in collaboration with the world’s leading scientific societies and associations.

At our edit-a-thons, we leveraged the collective brainpower of scientists, giving them basic training on Wikipedia guidelines and facilitating marathon editing sessions — powered by free pizza, coffee and sometimes beer — during which they made copious contributions within their respective areas of expertise.

These efforts, combined with the Wiki Education Foundation’s powerful classroom model, have had a clear impact. To date, we’ve reached over 150 universities including more than 6,000 students and scientists. As for output, 6,306 articles have been created or edited, garnering more than 304 million views; over 2,000 scientific images have been donated; and countless new scientist-editors have been minted, many of whom will likely continue to update Wikipedia content. The most common response we got from scientists and conference organizers about the edit-a-thons was: “Can we do that again next year?”

That’s where this guide comes in.

Through collaboration, input from Wikipedians and scientists, and more than a little trial and error, we arrived at a model that can help you organize your own edit-a-thons. This informal guide captures our main takeaways and lessons learned….Our hope is that edit-a-thons will become another integral part of science conferences, just like tweetups, communication workshops and other recent outreach initiatives. This would ensure that the content of the public’s most common gateway to science research will continually improve in quality and scope.

Download: “Crowdsourcing Expertise: A working guide for organizing Wikipedia edit-a-thons at science conferences

From big data to smart data: FDA’s INFORMED initiative


Sean KhozinGeoffrey Kim & Richard Pazdur in Nature: “….Recent advances in our understanding of disease mechanisms have led to the development of new drugs that are enabling precision medicine. For example, the co-development of kinase inhibitors that target ‘driver mutations’ in metastatic non-small-cell lung cancer (NSCLC) with companion diagnostics has led to substantial improvements in the treatment of some patients. However, growing evidence suggests that most patients with metastatic NSCLC and other advanced cancers may not have tumours with single driver mutations. Furthermore, the generation of clinical evidence in genomically diverse and geographically dispersed groups of patients using traditional trial designs and multiple competing therapies is becoming more costly and challenging.

Strategies aimed at creating new efficiencies in clinical evidence generation and extending the benefits of precision medicine to larger groups of patients are driving a transformation from a reductionist approach to drug development (for example, a single drug targeting a driver mutation and traditional clinical trials) to a holistic approach (for example, combination therapies targeting complex multiomic signatures and real-world evidence). This transition is largely fuelled by the rapid expansion in the four dimensions of biomedical big data, which has created a need for greater organizational and technical capabilities (Fig. 1). Appropriate management and analysis of such data requires specialized tools and expertise in health information technology, data science and high-performance computing. For example, efforts to generate clinical evidence using real-world data are being limited by challenges such as capturing clinically relevant variables from vast volumes of unstructured content (such as physician notes) in electronic health records and organizing various structured data elements that are primarily designed to support billing rather than clinical research. So, new standards and quality-control mechanisms are needed to ensure the validity of the design and analysis of studies based on electronic health records.

Figure 1: Conceptual map of technical and organizational capacity for biomedical big data.
Conceptual map of technical and organizational capacity for biomedical big data.

Big data can be defined as having four dimensions: volume (data size), variety (data type), veracity (data noise and uncertainty) and velocity (data flow and processing). Currently, FDA approval decisions are generally based on data of limited variety, mainly from clinical trials and preclinical studies (1) that are mostly structured (2), in data sets usually no more than a few gigabytes in size (3), that are processed intermittently as part of regulatory submissions (4). The expansion of big data in the four dimensions (grey lines) calls for increasing organizational and technical capacity. This could transform big data into smart data by enabling a holistic approach to personalization of therapies that takes patient, disease and environmental characteristics into account. (Full size image (309 KB);Download PowerPoint slide (492 KB)More)”

Curating Research Data: Practical Strategies for Your Digital Repository


Two books edited by Lisa R. Johnston: “Data are becoming the proverbial coin of the digital realm: a research commodity that might purchase reputation credit in a disciplinary culture of data sharing, or buy transparency when faced with funding agency mandates or publisher scrutiny. Unlike most monetary systems, however, digital data can flow in all too great an abundance. Not only does this currency actually “grow” on trees, but it comes from animals, books, thoughts, and each of us! And that is what makes data curation so essential. The abundance of digital research data challenges library and information science professionals to harness this flow of information streaming from research discovery and scholarly pursuit and preserve the unique evidence for future use.

In two volumes—Practical Strategies for Your Digital Repository and A Handbook of Current PracticeCurating Research Data presents those tasked with long-term stewardship of digital research data a blueprint for how to curate those data for eventual reuse. Volume One explores the concepts of research data and the types and drivers for establishing digital data repositories. Volume Two guides you across the data lifecycle through the practical strategies and techniques for curating research data in a digital repository setting. Data curators, archivists, research data management specialists, subject librarians, institutional repository managers, and digital library staff will benefit from these current and practical approaches to data curation.

Digital data is ubiquitous and rapidly reshaping how scholarship progresses now and into the future. The information expertise of librarians can help ensure the resiliency of digital data, and the information it represents, by addressing how the meaning, integrity, and provenance of digital data generated by researchers today will be captured and conveyed to future researchers….(More)”

Big and open data are prompting a reform of scientific governance


Sabina Leonelli in Times Higher Education: “Big data are widely seen as a game-changer in scientific research, promising new and efficient ways to produce knowledge. And yet, large and diverse data collections are nothing new – they have long existed in fields such as meteorology, astronomy and natural history.

What, then, is all the fuss about? In my recent book, I argue that the true revolution is in the status accorded to data as research outputs in their own right. Along with this has come an emphasis on open data as crucial to excellent and reliable science.

Previously – ever since scientific journals emerged in the 17th century – data were private tools, owned by the scientists who produced them and scrutinised by a small circle of experts. Their usefulness lay in their function as evidence for a given hypothesis. This perception has shifted dramatically in the past decade. Increasingly, data are research components that can and should be made publicly available and usable.

Rather than the birth of a data-driven epistemology, we are thus witnessing the rise of a data-centric approach in which efforts to mobilise, integrate and visualise data become contributions to discovery, not just a by-product of hypothesis testing.

The rise of data-centrism highlights the challenges involved in gathering, classifying and interpreting data, and the concepts, technologies and social structures that surround these processes. This has implications for how research is conducted, organised, governed and assessed.

Data-centric science requires shifts in the rewards and incentives provided to those who produce, curate and analyse data. This challenges established hierarchies: laboratory technicians, librarians and database managers turn out to have crucial skills, subverting the common view of their jobs as marginal to knowledge production. Ideas of research excellence are also being challenged. Data management is increasingly recognised as crucial to the sustainability and impact of research, and national funders are moving away from citation counts and impact factors in evaluations.

New uses of data are forcing publishers to re-assess their business models and dissemination procedures, and research institutions are struggling to adapt their management and administration.

Data-centric science is emerging in concert with calls for increased openness in research….(More)”

Data in public health


Jeremy Berg in Science: “In 1854, physician John Snow helped curtail a cholera outbreak in a London neighborhood by mapping cases and identifying a central public water pump as the potential source. This event is considered by many to represent the founding of modern epidemiology. Data and analysis play an increasingly important role in public health today. This can be illustrated by examining the rise in the prevalence of autism spectrum disorders (ASDs), where data from varied sources highlight potential factors while ruling out others, such as childhood vaccines, facilitating wise policy choices…. A collaboration between the research community, a patient advocacy group, and a technology company (www.mss.ng) seeks to sequence the genomes of 10,000 well-phenotyped individuals from families affected by ASD, making the data freely available to researchers. Studies to date have confirmed that the genetics of autism are extremely complicated—a small number of genomic variations are closely associated with ASD, but many other variations have much lower predictive power. More than half of siblings, each of whom has ASD, have different ASD-associated variations. Future studies, facilitated by an open data approach, will no doubt help advance our understanding of this complex disorder….

A new data collection strategy was reported in 2013 to examine contagious diseases across the United States, including the impact of vaccines. Researchers digitized all available city and state notifiable disease data from 1888 to 2011, mostly from hard-copy sources. Information corresponding to nearly 88 million cases has been stored in a database that is open to interested parties without restriction (www.tycho.pitt.edu). Analyses of these data revealed that vaccine development and systematic vaccination programs have led to dramatic reductions in the number of cases. Overall, it is estimated that ∼100 million cases of serious childhood diseases have been prevented through these vaccination programs.

These examples illustrate how data collection and sharing through publication and other innovative means can drive research progress on major public health challenges. Such evidence, particularly on large populations, can help researchers and policy-makers move beyond anecdotes—which can be personally compelling, but often misleading—for the good of individuals and society….(More)”