(Open Access) book edited by Mirko Tobias Schäfer & Karin van Es: “As more and more aspects of everyday life are turned into machine-readable data, researchers are provided with rich resources for researching society. The novel methods and innovative tools to work with this data not only require new knowledge and skills, but also raise issues concerning the practices of investigation and publication. This book critically reflects on the role of data in academia and society and challenges overly optimistic expectations considering data practices as means for understanding social reality. It introduces its readers to the practices and methods for data analysis and visualization and raises questions not only about the politics of data tools, but also about the ethics in collecting, sifting through data, and presenting data research. AUP S17 Catalogue text As machine-readable data comes to play an increasingly important role in everyday life, researchers find themselves with rich resources for studying society. The novel methods and tools needed to work with such data require not only new knowledge and skills, but also a new way of thinking about best research practices. This book critically reflects on the role and usefulness of big data, challenging overly optimistic expectations about what such information can reveal, introducing practices and methods for its analysis and visualization, and raising important political and ethical questions regarding its collection, handling, and presentation….(More)”.
The Techno-Politics of Data and Smart Devolution in City-Regions: Comparing Glasgow, Bristol, Barcelona, and Bilbao
Paper by Igor Calzada: “This paper explores the substantial effect that the critical understanding and techno-political consideration of data are having in some smart city strategies. Particularly, the paper presents some results of a comparative study of four cases of smart city transitions: Glasgow, Bristol, Barcelona, and Bilbao. Likewise, considering how relevant the city-regional path-dependency is in each territorial context, the paper will elucidate the notion of smart devolution as a key governance component that is enabling some cities to formulate their own smart city-regional governance policies and implement them by considering the role of the smart citizens as decision makers rather than mere data providers. The paper concludes by identifying an implicit smart city-regional governance strategy for each case based on the techno-politics of data and smart devolution….(More)”
From big data to smart data: FDA’s INFORMED initiative
Sean Khozin, Geoffrey Kim & Richard Pazdur in Nature: “….Recent advances in our understanding of disease mechanisms have led to the development of new drugs that are enabling precision medicine. For example, the co-development of kinase inhibitors that target ‘driver mutations’ in metastatic non-small-cell lung cancer (NSCLC) with companion diagnostics has led to substantial improvements in the treatment of some patients. However, growing evidence suggests that most patients with metastatic NSCLC and other advanced cancers may not have tumours with single driver mutations. Furthermore, the generation of clinical evidence in genomically diverse and geographically dispersed groups of patients using traditional trial designs and multiple competing therapies is becoming more costly and challenging.
Strategies aimed at creating new efficiencies in clinical evidence generation and extending the benefits of precision medicine to larger groups of patients are driving a transformation from a reductionist approach to drug development (for example, a single drug targeting a driver mutation and traditional clinical trials) to a holistic approach (for example, combination therapies targeting complex multiomic signatures and real-world evidence). This transition is largely fuelled by the rapid expansion in the four dimensions of biomedical big data, which has created a need for greater organizational and technical capabilities (Fig. 1). Appropriate management and analysis of such data requires specialized tools and expertise in health information technology, data science and high-performance computing. For example, efforts to generate clinical evidence using real-world data are being limited by challenges such as capturing clinically relevant variables from vast volumes of unstructured content (such as physician notes) in electronic health records and organizing various structured data elements that are primarily designed to support billing rather than clinical research. So, new standards and quality-control mechanisms are needed to ensure the validity of the design and analysis of studies based on electronic health records.

Big data can be defined as having four dimensions: volume (data size), variety (data type), veracity (data noise and uncertainty) and velocity (data flow and processing). Currently, FDA approval decisions are generally based on data of limited variety, mainly from clinical trials and preclinical studies (1) that are mostly structured (2), in data sets usually no more than a few gigabytes in size (3), that are processed intermittently as part of regulatory submissions (4). The expansion of big data in the four dimensions (grey lines) calls for increasing organizational and technical capacity. This could transform big data into smart data by enabling a holistic approach to personalization of therapies that takes patient, disease and environmental characteristics into account. (Full size image (309 KB);Download PowerPoint slide (492 KB)…More)”
Crowdsourcing Cybersecurity: Cyber Attack Detection using Social Media
Paper by Rupinder Paul Khandpur, Taoran Ji, Steve Jan, Gang Wang, Chang-Tien Lu, Naren Ramakrishnan: “Social media is often viewed as a sensor into various societal events such as disease outbreaks, protests, and elections. We describe the use of social media as a crowdsourced sensor to gain insight into ongoing cyber-attacks. Our approach detects a broad range of cyber-attacks (e.g., distributed denial of service (DDOS) attacks, data breaches, and account hijacking) in an unsupervised manner using just a limited fixed set of seed event triggers. A new query expansion strategy based on convolutional kernels and dependency parses helps model reporting structure and aids in identifying key event characteristics. Through a large-scale analysis over Twitter, we demonstrate that our approach consistently identifies and encodes events, outperforming existing methods….(More)”
Open Data Privacy Playbook
A data privacy playbook by Ben Green, Gabe Cunningham, Ariel Ekblaw, Paul Kominers, Andrew Linzer, and Susan Crawford: “Cities today collect and store a wide range of data that may contain sensitive or identifiable information about residents. As cities embrace open data initiatives, more of this information is available to the public. While releasing data has many important benefits, sharing data comes with inherent risks to individual privacy: released data can reveal information about individuals that would otherwise not be public knowledge. In recent years, open data such as taxi trips, voter registration files, and police records have revealed information that many believe should not be released.
Effective data governance is a prerequisite for successful open data programs. The goal of this document is to codify responsible privacy-protective approaches and processes that could be adopted by cities and other government organizations that are publicly releasing data. Our report is organized around four recommendations:
- Conduct risk-benefit analyses to inform the design and implementation of open data programs.
- Consider privacy at each stage of the data lifecycle: collect, maintain, release, delete.
- Develop operational structures and processes that codify privacy management widely throughout the City.
- Emphasize public engagement and public priorities as essential aspects of data management programs.
Each chapter of this report is dedicated to one of these four recommendations, and provides fundamental context along with specific suggestions to carry them out. In particular, we provide case studies of best practices from numerous cities and a set of forms and tactics for cities to implement our recommendations. The Appendix synthesizes key elements of the report into an Open Data Privacy Toolkit that cities can use to manage privacy when releasing data….(More)”
Denmark is appointing an ambassador to big tech
Matthew Hughes in The Next Web: “Question: Is Facebook a country? It sounds silly, but when you think about it, it does have many attributes in common with nation states. For starters, it’s got a population that’s bigger than that of India, and its 2016 revenue wasn’t too far from Estonia’s GDP. It also has a ‘national ethos’. If America’s philosophy is capitalism, Cuba’s is communism, and Sweden’s is social democracy, Facebook’s is ‘togetherness’, as corny as that may sound.
Denmark’s “digital ambassador” is a first. No country has ever created such a role. Their job will be to liase with the likes of Google, Twitter, Facebook.
Given the fraught relationship many European countries have with American big-tech – especially on issues of taxation, privacy, and national security – Denmark’s decision to extend an olive branch seems sensible.
Speaking with the Washington Post, Danish Foreign Minister Anders Samuelsen said, “just as we engage in a diplomatic dialogue with countries, we also need to establish and prioritize comprehensive relations with tech actors, such as Google, Facebook, Apple and so on. The idea is, we see a lot of companies and new technologies that will in many ways involve and be part of everyday life of citizens in Denmark.”….(More)”
Will Democracy Survive Big Data and Artificial Intelligence?
Dirk Helbing, Bruno S. Frey, Gerd Gigerenzer, Ernst Hafen, Michael Hagner, Yvonne Hofstetter, Jeroen van den Hoven, Roberto V. Zicari, and Andrej Zwitter in Scientific American: “….In summary, it can be said that we are now at a crossroads (see Fig. 2). Big data, artificial intelligence, cybernetics and behavioral economics are shaping our society—for better or worse. If such widespread technologies are not compatible with our society’s core values, sooner or later they will cause extensive damage. They could lead to an automated society with totalitarian features. In the worst case, a centralized artificial intelligence would control what we know, what we think and how we act. We are at the historic moment, where we have to decide on the right path—a path that allows us all to benefit from the digital revolution. Therefore, we urge to adhere to the following fundamental principles:
1. to increasingly decentralize the function of information systems;
2. to support informational self-determination and participation;
3. to improve transparency in order to achieve greater trust;
4. to reduce the distortion and pollution of information;
5. to enable user-controlled information filters;
6. to support social and economic diversity;
7. to improve interoperability and collaborative opportunities;
8. to create digital assistants and coordination tools;
9. to support collective intelligence, and
10. to promote responsible behavior of citizens in the digital world through digital literacy and enlightenment.
Following this digital agenda we would all benefit from the fruits of the digital revolution: the economy, government and citizens alike. What are we waiting for?A strategy for the digital age
Big data and artificial intelligence are undoubtedly important innovations. They have an enormous potential to catalyze economic value and social progress, from personalized healthcare to sustainable cities. It is totally unacceptable, however, to use these technologies to incapacitate the citizen. Big nudging and citizen scores abuse centrally collected personal data for behavioral control in ways that are totalitarian in nature. This is not only incompatible with human rights and democratic principles, but also inappropriate to manage modern, innovative societies. In order to solve the genuine problems of the world, far better approaches in the fields of information and risk management are required. The research area of responsible innovation and the initiative ”Data for Humanity” (see “Big Data for the benefit of society and humanity”) provide guidance as to how big data and artificial intelligence should be used for the benefit of society….(More)”
Connecting the dots: Building the case for open data to fight corruption
Web Foundation: “This research, published with Transparency International, measures the progress made by 5 key countries in implementing the G20 Anti-Corruption Open Data Principles.
These principles, adopted by G20 countries in 2015, committed countries to increasing and improving the publication of public information, driving forward open data as a tool in anti-corruption efforts.
However, this research – looking at Brazil, France, Germany, Indonesia and South Africa – finds a disappointing lack of progress. No country studied has released all the datasets identified as being key to anti-corruption and much of the information is hard to find and hard use.
Key findings:
- No country released all anti-corruption datasets
- Quality issues means data is often not useful or useable
- Much of the data is not published in line with open data standards, making comparability difficult
- In many countries there is a lack of open data skills among officials in charge of anti-corruption initiatives
Download the overview report here (PDF), and access the individual country case studies Brazil; France; Germany; Indonesia and South Africa… (More)”
Data Disrupts Corruption
Carlos Santiso & Ben Roseth at Stanford Social Innovation Review: “…The Panama Papers scandal demonstrates the power of data analytics to uncover corruption in a world flooded with terabytes needing only the computing capacity to make sense of it all. The Rousse impeachment illustrates how open data can be used to bring leaders to account. Together, these stories show how data, both “big” and “open,” is driving the fight against corruption with fast-paced, evidence-driven, crowd-sourced efforts. Open data can put vast quantities of information into the hands of countless watchdogs and whistleblowers. Big data can turn that information into insight, making corruption easier to identify, trace, and predict. To realize the movement’s full potential, technologists, activists, officials, and citizens must redouble their efforts to integrate data analytics into policy making and government institutions….
Making big data open cannot, in itself, drive anticorruption efforts. “Without analytics,” a 2014 White House report on big data and individual privacy underscored, “big datasets could be stored, and they could be retrieved, wholly or selectively. But what comes out would be exactly what went in.”
In this context, it is useful to distinguish the four main stages of data analytics to illustrate its potential in the global fight against corruption: Descriptive analytics uses data to describe what has happened in analyzing complex policy issues; diagnostic analytics goes a step further by mining and triangulating data to explain why a specific policy problem has happened, identify its root causes, and decipher underlying structural trends; predictive analytics uses data and algorithms to predict what is most likely to occur, by utilizing machine learning; and prescriptive analytics proposes what should be done to cause or prevent something from happening….
Despite the big data movement’s promise for fighting corruption, many challenges remain. The smart use of open and big data should focus not only on uncovering corruption, but also on better understanding its underlying causes and preventing its recurrence. Anticorruption analytics cannot exist in a vacuum; it must fit in a strategic institutional framework that starts with quality information and leads to reform. Even the most sophisticated technologies and data innovations cannot prevent what French novelist Théophile Gautier described as the “inexplicable attraction of corruption, even amongst the most honest souls.” Unless it is harnessed for improvements in governance and institutions, data analytics will not have the impact that it could, nor be sustainable in the long run…(More)”.
Big and open data are prompting a reform of scientific governance
Sabina Leonelli in Times Higher Education: “Big data are widely seen as a game-changer in scientific research, promising new and efficient ways to produce knowledge. And yet, large and diverse data collections are nothing new – they have long existed in fields such as meteorology, astronomy and natural history.
What, then, is all the fuss about? In my recent book, I argue that the true revolution is in the status accorded to data as research outputs in their own right. Along with this has come an emphasis on open data as crucial to excellent and reliable science.
Previously – ever since scientific journals emerged in the 17th century – data were private tools, owned by the scientists who produced them and scrutinised by a small circle of experts. Their usefulness lay in their function as evidence for a given hypothesis. This perception has shifted dramatically in the past decade. Increasingly, data are research components that can and should be made publicly available and usable.
Rather than the birth of a data-driven epistemology, we are thus witnessing the rise of a data-centric approach in which efforts to mobilise, integrate and visualise data become contributions to discovery, not just a by-product of hypothesis testing.
The rise of data-centrism highlights the challenges involved in gathering, classifying and interpreting data, and the concepts, technologies and social structures that surround these processes. This has implications for how research is conducted, organised, governed and assessed.
Data-centric science requires shifts in the rewards and incentives provided to those who produce, curate and analyse data. This challenges established hierarchies: laboratory technicians, librarians and database managers turn out to have crucial skills, subverting the common view of their jobs as marginal to knowledge production. Ideas of research excellence are also being challenged. Data management is increasingly recognised as crucial to the sustainability and impact of research, and national funders are moving away from citation counts and impact factors in evaluations.
New uses of data are forcing publishers to re-assess their business models and dissemination procedures, and research institutions are struggling to adapt their management and administration.
Data-centric science is emerging in concert with calls for increased openness in research….(More)”