Private UK health data donated for medical research shared with insurance companies


Article by Shanti Das: “Sensitive health information donated for medical research by half a million UK citizens has been shared with insurance companies despite a pledge that it would not be.

An Observer investigation has found that UK Biobank opened up its vast biomedical database to insurance sector firms several times between 2020 and 2023. The data was provided to insurance consultancy and tech firms for projects to create digital tools that help insurers predict a person’s risk of getting a chronic disease. The findings have raised concerns among geneticists, data privacy experts and campaigners over vetting and ethical checks at Biobank.

Set up in 2006 to help researchers investigating diseases, the database contains millions of blood, saliva and urine samples, collected regularly from about 500,000 adult volunteers – along with medical records, scans, wearable device data and lifestyle information.

Approved researchers around the world can pay £3,000 to £9,000 to access records ranging from medical history and lifestyle information to whole genome sequencing data. The resulting research has yielded major medical discoveries and led to Biobank being considered a “jewel in the crown” of British science.

Biobank said it strictly guarded access to its data, only allowing access by bona fide researchers for health-related projects in the public interest. It said this included researchers of all stripes, whether employed by academic, charitable or commercial organisations – including insurance companies – and that “information about data sharing was clearly set out to participants at the point of recruitment and the initial assessment”.

But evidence gathered by the Observer suggests Biobank did not explicitly tell participants it would share data with insurance companies – and made several public commitments not to do so.

When the project was announced, in 2002, Biobank promised that data would not be given to insurance companies after concerns were raised that it could be used in a discriminatory way, such as by the exclusion of people with a particular genetic makeup from insurance.

In an FAQ section on the Biobank website, participants were told: “Insurance companies will not be allowed access to any individual results nor will they be allowed access to anonymised data.” The statement remained online until February 2006, during which time the Biobank project was subject to public scrutiny and discussed in parliament.

The promise was also reiterated in several public statements by backers of Biobank, who said safeguards would be built in to ensure that “no insurance company or police force or employer will have access”.

This weekend, Biobank said the pledge – made repeatedly over four years – no longer applied. It said the commitment had been made before recruitment formally began in 2007 and that when Biobank volunteers enrolled they were given revised information.

This included leaflets and consent forms that contained a provision that anonymised Biobank data could be shared with private firms for “health-related” research, but did not explicitly mention insurance firms or correct the previous assurances…(More)”

Managing smart city governance – A playbook for local and regional governments


Report by UN Habitat” “This playbook and its recommendations are primarily aimed at municipal governments and their political leaders, local administrators, and public officials who are involved in smart city initiatives. The recommendations, which are delineated in the subsequent sections of this playbook, are intended to help develop more effective, inclusive, and sustainable governance practices for urban digital transformations. The guidance offered on these pages could also be useful for national agencies, private companies, non-governmental organizations, and all stakeholders committed to promoting the sustainable development of urban communities through the implementation of smart city initiatives…(More)”.

Unlocking the Potential: The Call for an International Decade of Data


Working Paper by Stefaan Verhulst : “The goal of this working paper is to reiterate the central importance of data – to Artificial Intelligence (AI) in particular, but more generally to the landscape of digital technology.

What follows serves as a clarion call to the global community to prioritize and advance data as the bedrock for social and economic development, especially for the UN’s Sustainable Development Goals. It begins by recognizing the existence of significant remaining challenges related to data; encompassing issues of accessibility, distribution, divides, and asymmetries. In light of these challenges, and as we propel ourselves into an era increasingly dominated by AI and AI-related innovation, the paper argues that establishing a more robust foundation for the stewardship of data is critical; a foundation that, for instance, embodies inclusivity, self-determination, and responsibility.

Finally, the paper advocates for the creation of an International Decade of Data (IDD), an initiative aimed at solidifying this foundation globally and advancing our collective efforts towards data-driven progress.

Download ‘Unlocking the Potential: The Call for an International Decade of Data’ here

New Tools to Guide Data Sharing Agreements


Article by Andrew J. Zahuranec, Stefaan Verhulst, and Hannah Chafetz: “The process of forming a data-sharing agreement is not easy. The process involves figuring out incentives, evaluating the degree to which others are willing and able to collaborate, and defining the specific conduct that is and is not allowed. Even under the best of circumstances, these steps can be costly and time-consuming.

Today, the Open Data Policy Lab took a step to help data practitioners control these costs. Moving from Idea to Practice: Three Resources to Streamline the Creation of Data Sharing Agreements” provides data practitioners with three resources meant to support them throughout the process of developing an agreement. These include:

  • A Guide to Principled Data Sharing Agreement Negotiation by Design: A document outlining the different principles that a data practitioner might seek to uphold while negotiating an agreement;
  • The Contractual Wheel of Data Collaboration 2.0: A listing of the different kinds of data sharing agreement provisions that a data practitioner might include in an agreement;
  • A Readiness Matrix for Data Sharing Agreements: A form to evaluate the degree to which a partner can participate in a data-sharing agreement.

The resources are a result of a series of Open Data Action Labs, an initiative from the Open Data Policy Lab to define new strategies and tools that can help organizations resolve policy challenges they face. The Action Labs are built around a series of workshops (called “studios”) which given experts and stakeholders an opportunity to define the problems facing them and then ideate possible solutions in a collaborative setting. In February and March 2023, the Open Data Policy Lab and Trust Relay co-hosted conversations with experts in law, data, and smart cities on the challenge of forming a data sharing agreement. Find all the resources here.”

Researchers warn we could run out of data to train AI by 2026. What then?


Article by Rita Matulionyte: “As artificial intelligence (AI) reaches the peak of its popularity, researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems. This could slow down the growth of AI models, especially large language models, and may even alter the trajectory of the AI revolution.

But why is a potential lack of data an issue, considering how much there are on the web? And is there a way to address the risk?…

We need a lot of data to train powerful, accurate and high-quality AI algorithms. For instance, ChatGPT was trained on 570 gigabytes of text data, or about 300 billion words.

Similarly, the stable diffusion algorithm (which is behind many AI image-generating apps such as DALL-E, Lensa and Midjourney) was trained on the LIAON-5B dataset comprising of 5.8 billion image-text pairs. If an algorithm is trained on an insufficient amount of data, it will produce inaccurate or low-quality outputs.

The quality of the training data is also important…This is why AI developers seek out high-quality content such as text from books, online articles, scientific papers, Wikipedia, and certain filtered web content. The Google Assistant was trained on 11,000 romance novels taken from self-publishing site Smashwords to make it more conversational.

The AI industry has been training AI systems on ever-larger datasets, which is why we now have high-performing models such as ChatGPT or DALL-E 3. At the same time, research shows online data stocks are growing much slower than datasets used to train AI.

In a paper published last year, a group of researchers predicted we will run out of high-quality text data before 2026 if the current AI training trends continue. They also estimated low-quality language data will be exhausted sometime between 2030 and 2050, and low-quality image data between 2030 and 2060.

AI could contribute up to US$15.7 trillion (A$24.1 trillion) to the world economy by 2030, according to accounting and consulting group PwC. But running out of usable data could slow down its development…(More)”.

What Is Public Trust in the Health System? Insights into Health Data Use


Open Access Book by Felix Gille: “This book explores the concept of public trust in health systems.

In the context of recent events, including public response to interventions to tackle the COVID-19 pandemic, vaccination uptake and the use of health data and digital health, this important book uses empirical evidence to address why public trust is vital to a well-functioning health system.

In doing so, it provides a comprehensive contemporary explanation of public trust, how it affects health systems and how it can be nurtured and maintained as an integral component of health system governance…(More)”.

Chatbots May ‘Hallucinate’ More Often Than Many Realize


Cade Metz at The New York Times: “When the San Francisco start-up OpenAI unveiled its ChatGPT online chatbot late last year, millions were wowed by the humanlike way it answered questions, wrote poetry and discussed almost any topic. But most people were slow to realize that this new kind of chatbot often makes things up.

When Google introduced a similar chatbot several weeks later, it spewed nonsense about the James Webb telescope. The next day, Microsoft’s new Bing chatbot offered up all sorts of bogus information about the Gap, Mexican nightlife and the singer Billie Eilish. Then, in March, ChatGPT cited a half dozen fake court cases while writing a 10-page legal brief that a lawyer submitted to a federal judge in Manhattan.

Now a new start-up called Vectara, founded by former Google employees, is trying to figure out how often chatbots veer from the truth. The company’s research estimates that even in situations designed to prevent it from happening, chatbots invent information at least 3 percent of the time — and as high as 27 percent.

Experts call this chatbot behavior “hallucination.” It may not be a problem for people tinkering with chatbots on their personal computers, but it is a serious issue for anyone using this technology with court documents, medical information or sensitive business data.

Because these chatbots can respond to almost any request in an unlimited number of ways, there is no way of definitively determining how often they hallucinate. “You would have to look at all of the world’s information,” said Simon Hughes, the Vectara researcher who led the project…(More)”.

Climate data can save lives. Most countries can’t access it.


Article by Zoya Teirstein: “Earth just experienced one of its hottest, and most damaging, periods on record. Heat waves in the United States, Europe, and China; catastrophic flooding in IndiaBrazilHong Kong, and Libya; and outbreaks of malaria, dengue, and other mosquito-borne illnesses across southern Asia claimed tens of thousands of lives. The vast majority of these deaths could have been averted with the right safeguards in place.

The World Meteorological Organization, or WMO, published a report last week that shows just 11 percent of countries have the full arsenal of tools required to save lives as the impacts of climate change — including deadly weather events, infectious diseases, and respiratory illnesses like asthma — become more extreme. The United Nations climate agency predicts that significant natural disasters will hit the planet 560 times per year by the end of this decade. What’s more, countries that lack early warning systems, such as extreme heat alerts, will see eight times more climate-related deaths than countries that are better prepared. By midcentury, some 50 percent of these deaths will take place in Africa, a continent that is responsible for around 4 percent of the world’s greenhouse gas emissions each year…(More)”.

Smart City Data Governance


OECD Report: “Smart cities leverage technologies, in particular digital, to generate a vast amount of real-time data to inform policy- and decision-making for an efficient and effective public service delivery. Their success largely depends on the availability and effective use of data. However, the amount of data generated is growing more rapidly than governments’ capacity to store and process them, and the growing number of stakeholders involved in data production, analysis and storage pushes cities data management capacity to the limit. Despite the wide range of local and national initiatives to enhance smart city data governance, urban data is still a challenge for national and city governments due to: insufficient financial resources; lack of business models for financing and refinancing of data collection; limited access to skilled experts; the lack of full compliance with the national legislation on data sharing and protection; and data and security risks. Facing these challenges is essential to managing and sharing data sensibly if cities are to boost citizens’ well-being and promote sustainable environments…(More)”

Assessing and Suing an Algorithm


Report by Elina Treyger, Jirka Taylor, Daniel Kim, and Maynard A. Holliday: “Artificial intelligence algorithms are permeating nearly every domain of human activity, including processes that make decisions about interests central to individual welfare and well-being. How do public perceptions of algorithmic decisionmaking in these domains compare with perceptions of traditional human decisionmaking? What kinds of judgments about the shortcomings of algorithmic decisionmaking processes underlie these perceptions? Will individuals be willing to hold algorithms accountable through legal channels for unfair, incorrect, or otherwise problematic decisions?

Answers to these questions matter at several levels. In a democratic society, a degree of public acceptance is needed for algorithms to become successfully integrated into decisionmaking processes. And public perceptions will shape how the harms and wrongs caused by algorithmic decisionmaking are handled. This report shares the results of a survey experiment designed to contribute to researchers’ understanding of how U.S. public perceptions are evolving in these respects in one high-stakes setting: decisions related to employment and unemployment…(More)”.