Patients are Pooling Data to Make Diabetes Research More Representative


Blog by Tracy Kariuki: “Saira Khan-Gallo knows how overwhelming managing and living healthily with diabetes can be. As a person living with type 1 diabetes for over two decades, she understands how tracking glucose levels, blood pressure, blood cholesterol, insulin intake, and, and, and…could all feel like drowning in an infinite pool of numbers.

But that doesn’t need to be the case. This is why Tidepool, a non-profit tech organization composed of caregivers and other people living with diabetes such as Gallo, is transforming diabetes data management. Its data visualization platform enables users to make sense of the data and derive insights into their health status….

Through its Big Data Donation Project, Tidepool has been supporting the advancement of diabetes research by sharing anonymized data from people living with diabetes with researchers.

To date, more than 40,000 individuals have chosen to donate data uploaded from their diabetes devices like blood glucose meters, insulin pumps and continuous glucose monitors, which is then shared by Tidepool with students, academics, researchers, and industry partners — Making the database larger than many clinical trials. For instance, Oregon Health and Science University have used datasets collected from Tidepool to build an algorithm that predicts hypoglycemia, which is low blood sugar, with the goal of advancing closed loop therapy for diabetes management…(More)”.

Datafication, Identity, and the Reorganization of the Category Individual


Paper by Juan Ortiz Freuler: “A combination of political, sociocultural, and technological shifts suggests a change in the way we understand human rights. Undercurrents fueling this process are digitization and datafication. Through this process of change, categories that might have been cornerstones of our past and present might very well become outdated. A key category that is under pressure is that of the individual. Since datafication is typically accompanied by technologies and processes aimed at segmenting and grouping, such groupings become increasingly relevant at the expense of the notion of the individual. This concept might become but another collection of varied characteristics, a unit of analysis that is considered at times too broad—and at other times too narrow—to be considered relevant or useful by the systems driving our key economic, social, and political processes.

This Essay provides a literature review and a set of key definitions linking the processes of digitization, datafication, and the concept of the individual to existing conceptions of individual rights. It then presents a framework to dissect and showcase the ways in which current technological developments are putting pressure on our existing conceptions of the individual and individual rights…(More)”.

What prevents us from reusing medical real-world data in research


Paper by Julia Gehrmann, Edit Herczog, Stefan Decker & Oya Beyan: “Recent studies show that Medical Data Science (MDS) carries great potential to improve healthcare. Thereby, considering data from several medical areas and of different types, i.e. using multimodal data, significantly increases the quality of the research results. On the other hand, the inclusion of more features in an MDS analysis means that more medical cases are required to represent the full range of possible feature combinations in a quantity that would be sufficient for a meaningful analysis. Historically, data acquisition in medical research applies prospective data collection, e.g. in clinical studies. However, prospectively collecting the amount of data needed for advanced multimodal data analyses is not feasible for two reasons. Firstly, such a data collection process would cost an enormous amount of money. Secondly, it would take decades to generate enough data for longitudinal analyses, while the results are needed now. A worthwhile alternative is using real-world data (RWD) from clinical systems of e.g. university hospitals. This data is immediately accessible in large quantities, providing full flexibility in the choice of the analyzed research questions. However, when compared to prospectively curated data, medical RWD usually lacks quality due to the specificities of medical RWD outlined in section 2. The reduced quality makes its preparation for analysis more challenging…(More)”.

Data-driven research and healthcare: public trust, data governance and the NHS


Paper by Angeliki Kerasidou & Charalampia (Xaroula) Kerasidou: “It is widely acknowledged that trust plays an important role for the acceptability of data sharing practices in research and healthcare, and for the adoption of new health technologies such as AI. Yet there is reported distrust in this domain. Although in the UK, the NHS is one of the most trusted public institutions, public trust does not appear to accompany its data sharing practices for research and innovation, specifically with the private sector, that have been introduced in recent years. In this paper, we examine the question of, what is it about sharing NHS data for research and innovation with for-profit companies that challenges public trust? To address this question, we draw from political theory to provide an account of public trust that helps better understand the relationship between the public and the NHS within a democratic context, as well as, the kind of obligations and expectations that govern this relationship. Then we examine whether the way in which the NHS is managing patient data and its collaboration with the private sector fit under this trust-based relationship. We argue that the datafication of healthcare and the broader ‘health and wealth’ agenda adopted by consecutive UK governments represent a major shift in the institutional character of the NHS, which brings into question the meaning of public good the NHS is expected to provide, challenging public trust. We conclude by suggesting that to address the problem of public trust, a theoretical and empirical examination of the benefits but also the costs associated with this shift needs to take place, as well as an open conversation at public level to determine what values should be promoted by a public institution like the NHS….(More)”.

Setting data free: The politics of open data for food and agriculture


Paper by M. Fairbairn, and Z. Kish: “Open data is increasingly being promoted as a route to achieve food security and agricultural development. This article critically examines the promotion of open agri-food data for development through a document-based case study of the Global Open Data for Agriculture and Nutrition (GODAN) initiative as well as through interviews with open data practitioners and participant observation at open data events. While the concept of openness is striking for its ideological flexibility, we argue that GODAN propagates an anti-political, neoliberal vision for how open data can enhance agricultural development. This approach centers values such as private innovation, increased production, efficiency, and individual empowerment, in contrast to more political and collectivist approaches to openness practiced by some agri-food social movements. We further argue that open agri-food data projects, in general, have a tendency to reproduce elements of “data colonialism,” extracting data with minimal consideration for the collective harms that may result, and embedding their own values within universalizing information infrastructures…(More)”.

Unleashing the power of data for electric vehicles and charging infrastructure


Report by Thomas Deloison: “As the world moves toward widespread electric vehicle (EV) adoption, a key challenge lies ahead: deploying charging infrastructure rapidly and effectively. Solving this challenge will be essential to decarbonize transport, which has a higher reliance on fossil fuels than any other sector and accounts for a fifth of global carbon emissions. However, the companies and governments investing in charging infrastructure face significant hurdles, including high initial capital costs and difficulties related to infrastructure planning, permitting, grid connections and grid capacity development.

Data has the power to facilitate these processes: increased predictability and optimized planning and infrastructure management go a long way in easing investments and accelerating deployment. Last year, members of the World Business Council for Sustainable Development (WBCSD) demonstrated that digital solutions based on data sharing could reduce carbon emissions from charging by 15% and unlock crucial grid capacity and capital efficiency gains.

Exceptional advances in data, analytics and connectivity are making digital solutions a potent tool to plan and manage transport, energy and infrastructure. Thanks to the deployment of sensors and the rise of connectivity,  businesses are collecting information faster than ever before, allowing for data flows between physical assets. Charging infrastructure operators, automotive companies, fleet operators, energy providers, building managers and governments collect insights on all aspects of electric vehicle charging infrastructure (EVCI), from planning and design to charging experiences at the station.

The real value of data lies in its aggregationThis will require breaking down siloes across industries and enabling digital collaboration. A digital action framework released by WBCSD, in collaboration with Arcadis, Fujitsu and other member companies and partners, introduces a set of recommendations for companies and governments to realize the full potential of digital solutions and accelerate EVCI deployments:

  • Map proprietary data, knowledge gaps and digital capacity across the value chain to identify possible synergies. The highest value potential from digital solutions will lie at the nexus of infrastructure, consumer behavior insights, grid capacity and transport policy. For example, to ensure the deployment of charging stations where they will be most needed and at the right capacity level, it is crucial to plan investments within energy grid capacity, spatial constraints and local projected demand for EVs.
  • Develop internal data collection and storage capacity with due consideration for existing structures for data sharing. A variety of schemes allow actors to engage in data sharing or monetization. Yet, their use is limited by mismatched use of data standards and specification and process uncertainty. Companies must build a strong understanding of these structures internally by providing internal training and guidance, and invest in sound data collection, storage and analysis capacity.
  • Foster a policy environment that supports digital collaboration across sectors and industries. Digital policies must provide incentives and due diligence frameworks to guide data exchanges across industries and support the adoption of common standards and protocols. For instance, it will be crucial to integrate linkages with energy systems and infrastructure beyond roads in the rollout of the European mobility data space…(More)”.

‘Not for Machines to Harvest’: Data Revolts Break Out Against A.I.


Article by Sheera Frenkel, and Stuart A. Thompson: “Fan fiction writers are just one group now staging revolts against A.I. systems as a fever over the technology has gripped Silicon Valley and the world. In recent months, social media companies such as Reddit and Twitter, news organizations including The New York Times and NBC News, authors such as Paul Tremblay and the actress Sarah Silverman have all taken a position against A.I. sucking up their data without permission.

Their protests have taken different forms. Writers and artists are locking their files to protect their work or are boycotting certain websites that publish A.I.-generated content, while companies like Reddit want to charge for access to their data. At least 10 lawsuits have been filed this year against A.I. companies, accusing them of training their systems on artists’ creative work without consent. This past week, Ms. Silverman and the authors Christopher Golden and Richard Kadrey sued OpenAI, the maker of ChatGPT, and others over A.I.’s use of their work.

At the heart of the rebellions is a newfound understanding that online information — stories, artwork, news articles, message board posts and photos — may have significant untapped value.

The new wave of A.I. — known as “generative A.I.” for the text, images and other content it generates — is built atop complex systems such as large language models, which are capable of producing humanlike prose. These models are trained on hoards of all kinds of data so they can answer people’s questions, mimic writing styles or churn out comedy and poetry.

That has set off a hunt by tech companies for even more data to feed their A.I. systems. Google, Meta and OpenAI have essentially used information from all over the internet, including large databases of fan fiction, troves of news articles and collections of books, much of which was available free online. In tech industry parlance, this was known as “scraping” the internet…(More)”.

Digital divides are lower in Smart Cities


Paper by Andrea Caragliu and Chiara F. Del Bo: “Ever since the emergence of digital technologies in the early 1990s, the literature has discussed the potential pitfalls of an uneven distribution of e-skills under the umbrella of the digital divide. To provide a definition of the concept, “Lloyd Morrisett coined the term digital divide to mean “a discrepancy in access to technology resources between socioeconomic groups” (Robyler and Doering, 2014, p. 27)

Despite digital divide being high on the policy agenda, statistics suggest the persisting relevance of this issue. For instance, focusing on Europe, according to EUROSTAT statistics, in 2021 about 90 per cent of people living in Zeeland, a NUTS2 region in the Netherlands, had ordered at least once in their life goods or services over the internet for private use, against a minimum in the EU27 of 15 per cent (in the region of Yugoiztochen, in Bulgaria). In the same year, while basically all (99 per cent) interviewees in the NUTS2 region of Northern and Western Ireland declared using the internet at least once a week, the same statistic drops to two thirds of the sample in the Bulgarian region of Severozapaden. While over time these territorial divides are converging, they can still significantly affect the potential positive impact of the diffusion of digital technologies.

Over the past three years, the digital divide has been made dramatically apparent by the COVID-19 pandemic outbreak. When, during the first waves of full lockdowns enacted in most Countries, tertiary and schooling activities were moved online, many economic outcomes showed significant worsening. Among these, learning outcomes in pupils and service sectors’ productivity were particularly affected.

A simultaneous development in the scientific literature has discussed the attractive features of planning and managing cities ‘smartly’. Smart Cities have been initially identified as urban areas with a tendency to invest and deploy ICTs. More recently, this notion also started to encompass the context characteristics that make a city capable of reaping the benefits of ICTs – social and human capital, soft and hard institutions.

While mounting empirical evidence suggests a superior economic performance of Cities ticking all these boxes, the Smart City movement did not come without critiques. The debate on urban smartness as an instrument for planning and managing more efficient cities has been recently positing that Smart Cities could be raising inequalities. This effect would be due to the role of driver of smart urban transformations played by multinational corporations, who, in a dystopic view, would influence local policymakers’ agendas.

Given these issues, and our own research on Smart Cities, we started asking ourselves whether the risks of increasing inequalities associated with the Smart City model were substantiated. To this end, we focused on empirically verifying whether cities moving forward along the smart city model were facing increases in income and digital inequalities. We answered the first question in Caragliu and Del Bo (2022), and found compelling evidence that smart city characteristics actually decrease income inequalities…(More)”.

A new way to look at data privacy


Article by Adam Zewe: “Imagine that a team of scientists has developed a machine-learning model that can predict whether a patient has cancer from lung scan images. They want to share this model with hospitals around the world so clinicians can start using it in diagnosis.

But there’s a problem. To teach their model how to predict cancer, they showed it millions of real lung scan images, a process called training. Those sensitive data, which are now encoded into the inner workings of the model, could potentially be extracted by a malicious agent. The scientists can prevent this by adding noise, or more generic randomness, to the model that makes it harder for an adversary to guess the original data. However, perturbation reduces a model’s accuracy, so the less noise one can add, the better.

MIT researchers have developed a technique that enables the user to potentially add the smallest amount of noise possible, while still ensuring the sensitive data are protected.

The researchers created a new privacy metric, which they call Probably Approximately Correct (PAC) Privacy, and built a framework based on this metric that can automatically determine the minimal amount of noise that needs to be added. Moreover, this framework does not need knowledge of the inner workings of a model or its training process, which makes it easier to use for different types of models and applications.

In several cases, the researchers show that the amount of noise required to protect sensitive data from adversaries is far less with PAC Privacy than with other approaches. This could help engineers create machine-learning models that provably hide training data, while maintaining accuracy in real-world settings…

A fundamental question in data privacy is: How much sensitive data could an adversary recover from a machine-learning model with noise added to it?

Differential Privacy, one popular privacy definition, says privacy is achieved if an adversary who observes the released model cannot infer whether an arbitrary individual’s data is used for the training processing. But provably preventing an adversary from distinguishing data usage often requires large amounts of noise to obscure it. This noise reduces the model’s accuracy.

PAC Privacy looks at the problem a bit differently. It characterizes how hard it would be for an adversary to reconstruct any part of randomly sampled or generated sensitive data after noise has been added, rather than only focusing on the distinguishability problem…(More)”

How do we know how smart AI systems are?


Article by Melanie Mitchell: “In 1967, Marvin Minksy, a founder of the field of artificial intelligence (AI), made a bold prediction: “Within a generation…the problem of creating ‘artificial intelligence’ will be substantially solved.” Assuming that a generation is about 30 years, Minsky was clearly overoptimistic. But now, nearly two generations later, how close are we to the original goal of human-level (or greater) intelligence in machines?

Some leading AI researchers would answer that we are quite close. Earlier this year, deep-learning pioneer and Turing Award winner Geoffrey Hinton told Technology Review, “I have suddenly switched my views on whether these things are going to be more intelligent than us. I think they’re very close to it now and they will be much more intelligent than us in the future.” His fellow Turing Award winner Yoshua Bengio voiced a similar opinion in a recent blog post: “The recent advances suggest that even the future where we know how to build superintelligent AIs (smarter than humans across the board) is closer than most people expected just a year ago.”

These are extraordinary claims that, as the saying goes, require extraordinary evidence. However, it turns out that assessing the intelligence—or more concretely, the general capabilities—of AI systems is fraught with pitfalls. Anyone who has interacted with ChatGPT or other large language models knows that these systems can appear quite intelligent. They converse with us in fluent natural language, and in many cases seem to reason, to make analogies, and to grasp the motivations behind our questions. Despite their well-known unhumanlike failings, it’s hard to escape the impression that behind all that confident and articulate language there must be genuine understanding…(More)”.