From big data to smart data: FDA’s INFORMED initiative


Sean KhozinGeoffrey Kim & Richard Pazdur in Nature: “….Recent advances in our understanding of disease mechanisms have led to the development of new drugs that are enabling precision medicine. For example, the co-development of kinase inhibitors that target ‘driver mutations’ in metastatic non-small-cell lung cancer (NSCLC) with companion diagnostics has led to substantial improvements in the treatment of some patients. However, growing evidence suggests that most patients with metastatic NSCLC and other advanced cancers may not have tumours with single driver mutations. Furthermore, the generation of clinical evidence in genomically diverse and geographically dispersed groups of patients using traditional trial designs and multiple competing therapies is becoming more costly and challenging.

Strategies aimed at creating new efficiencies in clinical evidence generation and extending the benefits of precision medicine to larger groups of patients are driving a transformation from a reductionist approach to drug development (for example, a single drug targeting a driver mutation and traditional clinical trials) to a holistic approach (for example, combination therapies targeting complex multiomic signatures and real-world evidence). This transition is largely fuelled by the rapid expansion in the four dimensions of biomedical big data, which has created a need for greater organizational and technical capabilities (Fig. 1). Appropriate management and analysis of such data requires specialized tools and expertise in health information technology, data science and high-performance computing. For example, efforts to generate clinical evidence using real-world data are being limited by challenges such as capturing clinically relevant variables from vast volumes of unstructured content (such as physician notes) in electronic health records and organizing various structured data elements that are primarily designed to support billing rather than clinical research. So, new standards and quality-control mechanisms are needed to ensure the validity of the design and analysis of studies based on electronic health records.

Figure 1: Conceptual map of technical and organizational capacity for biomedical big data.
Conceptual map of technical and organizational capacity for biomedical big data.

Big data can be defined as having four dimensions: volume (data size), variety (data type), veracity (data noise and uncertainty) and velocity (data flow and processing). Currently, FDA approval decisions are generally based on data of limited variety, mainly from clinical trials and preclinical studies (1) that are mostly structured (2), in data sets usually no more than a few gigabytes in size (3), that are processed intermittently as part of regulatory submissions (4). The expansion of big data in the four dimensions (grey lines) calls for increasing organizational and technical capacity. This could transform big data into smart data by enabling a holistic approach to personalization of therapies that takes patient, disease and environmental characteristics into account. (Full size image (309 KB);Download PowerPoint slide (492 KB)More)”

Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media


Paper by Rupinder Paul Khandpur, Taoran Ji, Steve Jan, Gang Wang, Chang-Tien Lu, Naren Ramakrishnan: “Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner using just a limited fixed set of seed event triggers. A new query expansion strategy based on convolutional kernels and dependency parses helps model reporting structure and aids in identifying key event characteristics. Through a large-scale analysis over Twitter, we demonstrate that our approach consistently identifies and encodes events, outperforming existing methods….(More)”

Open Data Privacy Playbook


A data privacy playbook by Ben Green, Gabe Cunningham, Ariel Ekblaw, Paul Kominers, Andrew Linzer, and Susan Crawford: “Cities today collect and store a wide range of data that may contain sensitive or identifiable information about residents. As cities embrace open data initiatives, more of this information is available to the public. While releasing data has many important benefits, sharing data comes with inherent risks to individual privacy: released data can reveal information about individuals that would otherwise not be public knowledge. In recent years, open data such as taxi trips, voter registration files, and police records have revealed information that many believe should not be released.

Effective data governance is a prerequisite for successful open data programs. The goal of this document is to codify responsible privacy-protective approaches and processes that could be adopted by cities and other government organizations that are publicly releasing data. Our report is organized around four recommendations:

  • Conduct risk-benefit analyses to inform the design and implementation of open data programs.
  • Consider privacy at each stage of the data lifecycle: collect, maintain, release, delete.
  • Develop operational structures and processes that codify privacy management widely throughout the City.
  • Emphasize public engagement and public priorities as essential aspects of data management programs.

Each chapter of this report is dedicated to one of these four recommendations, and provides fundamental context along with specific suggestions to carry them out. In particular, we provide case studies of best practices from numerous cities and a set of forms and tactics for cities to implement our recommendations. The Appendix synthesizes key elements of the report into an Open Data Privacy Toolkit that cities can use to manage privacy when releasing data….(More)”

Denmark is appointing an ambassador to big tech


Matthew Hughes in The Next Web: “Question: Is Facebook a country? It sounds silly, but when you think about it, it does have many attributes in common with nation states. For starters, it’s got a population that’s bigger than that of India, and its 2016 revenue wasn’t too far from Estonia’s GDP. It also has a ‘national ethos’. If America’s philosophy is capitalism, Cuba’s is communism, and Sweden’s is social democracy, Facebook’s is ‘togetherness’, as corny as that may sound.

 Given all of the above, is it really any surprise that Denmark is considering appointing a ‘big tech ambassador’ whose job is to establish and manage the country’s relationship with the world’s most powerful tech companies?

Denmark’s “digital ambassador” is a first. No country has ever created such a role. Their job will be to liase with the likes of Google, Twitter, Facebook.

Given the fraught relationship many European countries have with American big-tech – especially on issues of taxation, privacy, and national security – Denmark’s decision to extend an olive branch seems sensible.

Speaking with the Washington Post, Danish Foreign Minister Anders Samuelsen said, “just as we engage in a diplomatic dialogue with countries, we also need to establish and prioritize comprehensive relations with tech actors, such as Google, Facebook, Apple and so on. The idea is, we see a lot of companies and new technologies that will in many ways involve and be part of everyday life of citizens in Denmark.”….(More)”

Curating Research Data: Practical Strategies for Your Digital Repository


Two books edited by Lisa R. Johnston: “Data are becoming the proverbial coin of the digital realm: a research commodity that might purchase reputation credit in a disciplinary culture of data sharing, or buy transparency when faced with funding agency mandates or publisher scrutiny. Unlike most monetary systems, however, digital data can flow in all too great an abundance. Not only does this currency actually “grow” on trees, but it comes from animals, books, thoughts, and each of us! And that is what makes data curation so essential. The abundance of digital research data challenges library and information science professionals to harness this flow of information streaming from research discovery and scholarly pursuit and preserve the unique evidence for future use.

In two volumes—Practical Strategies for Your Digital Repository and A Handbook of Current PracticeCurating Research Data presents those tasked with long-term stewardship of digital research data a blueprint for how to curate those data for eventual reuse. Volume One explores the concepts of research data and the types and drivers for establishing digital data repositories. Volume Two guides you across the data lifecycle through the practical strategies and techniques for curating research data in a digital repository setting. Data curators, archivists, research data management specialists, subject librarians, institutional repository managers, and digital library staff will benefit from these current and practical approaches to data curation.

Digital data is ubiquitous and rapidly reshaping how scholarship progresses now and into the future. The information expertise of librarians can help ensure the resiliency of digital data, and the information it represents, by addressing how the meaning, integrity, and provenance of digital data generated by researchers today will be captured and conveyed to future researchers….(More)”

Will Democracy Survive Big Data and Artificial Intelligence?


Dirk Helbing, Bruno S. Frey, Gerd Gigerenzer, Ernst Hafen, Michael Hagner, Yvonne Hofstetter, Jeroen van den Hoven, Roberto V. Zicari, and Andrej Zwitter in Scientific American: “….In summary, it can be said that we are now at a crossroads (see Fig. 2). Big data, artificial intelligence, cybernetics and behavioral economics are shaping our society—for better or worse. If such widespread technologies are not compatible with our society’s core values, sooner or later they will cause extensive damage. They could lead to an automated society with totalitarian features. In the worst case, a centralized artificial intelligence would control what we know, what we think and how we act. We are at the historic moment, where we have to decide on the right path—a path that allows us all to benefit from the digital revolution. Therefore, we urge to adhere to the following fundamental principles:

1. to increasingly decentralize the function of information systems;

2. to support informational self-determination and participation;

3. to improve transparency in order to achieve greater trust;

4. to reduce the distortion and pollution of information;

5. to enable user-controlled information filters;

6. to support social and economic diversity;

7. to improve interoperability and collaborative opportunities;

8. to create digital assistants and coordination tools;

9. to support collective intelligence, and

10. to promote responsible behavior of citizens in the digital world through digital literacy and enlightenment.

Following this digital agenda we would all benefit from the fruits of the digital revolution: the economy, government and citizens alike. What are we waiting for?A strategy for the digital age

Big data and artificial intelligence are undoubtedly important innovations. They have an enormous potential to catalyze economic value and social progress, from personalized healthcare to sustainable cities. It is totally unacceptable, however, to use these technologies to incapacitate the citizen. Big nudging and citizen scores abuse centrally collected personal data for behavioral control in ways that are totalitarian in nature. This is not only incompatible with human rights and democratic principles, but also inappropriate to manage modern, innovative societies. In order to solve the genuine problems of the world, far better approaches in the fields of information and risk management are required. The research area of responsible innovation and the initiative ”Data for Humanity” (see “Big Data for the benefit of society and humanity”) provide guidance as to how big data and artificial intelligence should be used for the benefit of society….(More)”

From Nairobi to Manila, mobile phones are changing the lives of bus riders


Shomik Mehnidrata at Transport for Development Blog: “Every day around the world, millions of people rely on buses to get around. In many cities, these services carry the bulk of urban trips, especially in Africa and Latin America. They are known by many different names—matatus, dalalas, minibus taxis, colectivos, diablos rojos, micros, etc.—but all have one thing in common: they are either hardly regulated… or not regulated at all. Although buses play a critical role in the daily life of many urban dwellers, there are a variety of complaints that have spurred calls for improvement and reform.

However, we are now witnessing a different, more organic kind of change that is disrupting the world of informal buses using ubiquitous cheap sensors and mobile technology. One hotbed of innovation is Nairobi, Kenya’s bustling capital. Two years ago, Nairobi made a splash in the world of urban transport by mapping all the routes of informal matatus. Other countries have sought to replicate this model, with open source tools and crowdsourcing supporting similar efforts in Mexico, Manila, and beyond. Back in Nairobi, the Magic Bus app helps commuters use sms services to reserve and pay for seats in matatus; in September 2016, MagicBus’ potential for easing commuter pain in the Kenyan capital was rewarded with a $1 million prize. Other programs implemented in collaboration with insurers and operators are experimenting with on-board sensors to identify and correct dangerous driver behavior such as sudden braking and acceleration. Ma3Route, also in Nairobi (there is a pattern here!) used crowdsourcing to identify dangerous drivers as well as congestion. At the same time, operators are upping their game: using technology to improve system management, control and routing in La Paz, and working with universities to improve their financial planning and capabilities in Cape Town.

Against this backdrop, the question is then: can these ongoing experimental initiatives offer a coherent alternative to formal reform? …(More)”.

Connecting the dots: Building the case for open data to fight corruption


Web Foundation: “This research, published with Transparency International, measures the progress made by 5 key countries in implementing the G20 Anti-Corruption Open Data Principles.

These principles, adopted by G20 countries in 2015, committed countries to increasing and improving the publication of public information, driving forward open data as a tool in anti-corruption efforts.

However, this research – looking at Brazil, France, Germany, Indonesia and South Africa – finds a disappointing lack of progress. No country studied has released all the datasets identified as being key to anti-corruption and much of the information is hard to find and hard use.

Key findings:

  • No country released all anti-corruption datasets
  • Quality issues means data is often not useful or useable
  • Much of the data is not published in line with open data standards, making comparability difficult
  • In many countries there is a lack of open data skills among officials in charge of anti-corruption initiatives

Download the overview report here (PDF), and access the individual country case studies BrazilFranceGermanyIndonesia and South Africa… (More)”

Nuts and Bolts of Encryption: A Primer for Policymakers


Edward W. Felten: “This paper offers a straight for ward introduction to encryption, as it is implemented in modern systems, at a level of detail suitable for policy discussions. No prior background on encryption or data security is assumed.

Encryption is used in two main scenarios. Encrypted storage allows information to be stored on a device, with encryption protecting the data should a malicious party get access to the device. Encrypted communication allows information to be transmitted from one party to another party, often across a network, with encryption protecting the data should a malicious party get access to the data while it is in transit. Encryption is used somewhat differently in these two scenarios, so it makes sense to present them separately. We’ll discuss encrypted storage first, because it is simpler.

We emphasize that the approaches described here are not detailed description s of any particular existing system, but rather generic descriptions of how state-of-the-art systems typically operate. Specific products and standards fill in the details differently, but they are roughly similar at the level of detail given here….(More)”

Data Disrupts Corruption


Carlos Santiso & Ben Roseth at Stanford Social Innovation Review: “…The Panama Papers scandal demonstrates the power of data analytics to uncover corruption in a world flooded with terabytes needing only the computing capacity to make sense of it all. The Rousse impeachment illustrates how open data can be used to bring leaders to account. Together, these stories show how data, both “big” and “open,” is driving the fight against corruption with fast-paced, evidence-driven, crowd-sourced efforts. Open data can put vast quantities of information into the hands of countless watchdogs and whistleblowers. Big data can turn that information into insight, making corruption easier to identify, trace, and predict. To realize the movement’s full potential, technologists, activists, officials, and citizens must redouble their efforts to integrate data analytics into policy making and government institutions….

Making big data open cannot, in itself, drive anticorruption efforts. “Without analytics,” a 2014 White House report on big data and individual privacy underscored, “big datasets could be stored, and they could be retrieved, wholly or selectively. But what comes out would be exactly what went in.”

In this context, it is useful to distinguish the four main stages of data analytics to illustrate its potential in the global fight against corruption: Descriptive analytics uses data to describe what has happened in analyzing complex policy issues; diagnostic analytics goes a step further by mining and triangulating data to explain why a specific policy problem has happened, identify its root causes, and decipher underlying structural trends; predictive analytics uses data and algorithms to predict what is most likely to occur, by utilizing machine learning; and prescriptive analytics proposes what should be done to cause or prevent something from happening….

Despite the big data movement’s promise for fighting corruption, many challenges remain. The smart use of open and big data should focus not only on uncovering corruption, but also on better understanding its underlying causes and preventing its recurrence. Anticorruption analytics cannot exist in a vacuum; it must fit in a strategic institutional framework that starts with quality information and leads to reform. Even the most sophisticated technologies and data innovations cannot prevent what French novelist Théophile Gautier described as the “inexplicable attraction of corruption, even amongst the most honest souls.” Unless it is harnessed for improvements in governance and institutions, data analytics will not have the impact that it could, nor be sustainable in the long run…(More)”.