Innovation in Anticipation for Migration: A Deep Dive into Methods, Tools, and Data Sources


Blog by Sara Marcucci and Stefaan Verhulst: “In the ever-evolving landscape of anticipatory methods for migration policy, innovation is a dynamic force propelling the field forward. This seems to be happening in two main ways: first, as we mentioned in our previous blog, one of the significant shifts lies in the blurring of boundaries between quantitative forecasting and qualitative foresight, as emerging mixed-method approaches challenge traditional paradigms. This transformation opens up new pathways for understanding complex phenomena, particularly in the context of human migration flows. 

Innovation in Anticipation for Migration: A Deep Dive into Methods, Tools, and Data Sources

Second, the innovation happening today is not necessarily rooted in the development of entirely new methodologies, but rather in how existing methods are adapted and enhanced. Indeed, innovation seems to extend to the utilization of diverse tools and data sources that bolster the effectiveness of existing methods, offering a more comprehensive and timely perspective on migration trends.

In the context of this blog series, methods refer to the various approaches and techniques used to anticipate and analyze migration trends, challenges, and opportunities. These methods are employed to make informed decisions and develop policies related to human migration. They can include a wide range of strategies to gather and interpret data and insights in the field of migration policy. 

Tools, on the other hand, refer to the specific instruments or technologies used to support and enhance the effectiveness of these methods. They encompass a diverse set of resources and technologies that facilitate data collection, analysis, and decision-making in the context of migration policy. These tools can include both quantitative and qualitative data collection and analysis tools, as well as innovative data sources, software, and techniques that help enhance anticipatory methods.

This blog aims to deep dive into the main anticipatory methods adopted in the field of migration, as well as some of the tools and data sources employed to enhance and experiment with them. First, the blog will provide a list of methods considered; second, it will illustrate the main innovative tools employed, and finally it will provide a set of new, non-traditional data sources that are increasingly being used to feed anticipatory methods…(More)”.

The State of Open Data 2023


Report by Springer Nature, Digital Science and Figshare: “The 2023 survey showed that the key motivations for researchers to share their data remain very similar to previous years, with full citation of research papers or a data citation ranking highly. 89% of respondents also said they make their data available publicly, however almost three quarters of respondents had never received support with planning, managing or sharing research data.

One size does not fit all: Variations in responses from different areas of expertise and geographies highlight a need for a more nuanced approach to research data management support globally. For example, 64% of respondents supported the idea of a national mandate for making research data openly available, with Indian and German respondents more likely to support this idea (both 71%).

Credit is an ongoing issue: For eight years running, our survey has revealed a recurring concern among researchers: the perception that they don’t receive sufficient recognition for openly sharing their data. 60% of respondents said they receive too little credit for sharing their data.

AI awareness hasn’t translated to action: For the first time, this year we asked survey respondents to indicate if they were using ChatGPT or similar AI tools for data collection, data processing and metadata collection. The most common response to all three questions was ‘I’m aware of these tools but haven’t considered it.’..(More)”.

Data Governance and Privacy Challenges in the Digital Healthcare Revolution


Paper by Nargiz Kazimova: “The onset of the COVID-19 pandemic has catalyzed an imperative for digital transformation in the healthcare sector. This study investigates the accelerated shift towards a digitally-enhanced healthcare delivery system, advocating for the widespread adoption of telemedicine and the relaxation of regulatory barriers. The paper also scrutinizes the burgeoning use of electronic health records, wearable devices, artificial intelligence, and machine learning, and how these technologies offer promising avenues for improving patient care and medical outcomes. Despite the advancements, the rapid digital integration raises significant privacy and security concerns. The stigma associated with certain illnesses and potential discrimination presents serious challenges that digital healthcare innovations can exacerbate.
This research underscores the criticality of stringent data governance to safeguard personal health information in the face of growing digitalization. The analysis begins with an exploration of the data governance role in optimizing healthcare outcomes and preserving privacy, followed by an assessment of the breadth and depth of health data proliferation. The paper subsequently navigates the complex legal and ethical terrain, contrasting HIPAA and GDPR frameworks to underline the current regulatory challenges.
A comprehensive set of strategic recommendations is provided for reinforcing data governance and enhancing privacy protection in healthcare. The author advises on updating legal provisions to match the dynamic healthcare environment, widening the scope of privacy laws, and improving the transparency of data-sharing practices. The establishment of ethical guidelines for the collection and use of health data is also recommended, focusing on explicit consent, decision-making transparency, harm accountability, maintenance of data anonymity, and the mitigation of biases in datasets.
Moreover, the study advocates for stronger transparency in data sharing with clear communication on data use, rigorous internal and external audit mechanisms, and informed consent processes. The conclusion calls for increased collaboration between healthcare providers, patients, administrative staff, ethicists, regulators, and technology companies to create governance models that reconcile patient rights with the expansive use of health data. The paper culminates in a call to action for a balanced approach to privacy and innovation in the data-driven era of healthcare…(More)”.

The AI regulations that aren’t being talked about


Article by Deloitte: “…But our research shows that this focus may be overlooking some of the most important tools already on the books. Of the 1,600+ policies we analyzed, only 11% were focused on regulating AI-adjacent issues like data privacy, cybersecurity, intellectual property, and so on (Figure 5). Even when limiting the search to only regulations, 60% were focused directly on AI and only 40% on AI-adjacent issues (Figure 5). For example, several countries have data protection agencies with regulatory powers to help protect citizens’ data privacy. But while these agencies may not have AI or machine learning named specifically in their charters, the importance of data in training and using AI models makes them an important AI-adjacent tool.

This can be problematic because directly regulating a fast-moving technology like AI can be difficult. Take the hypothetical example of removing bias from home loan decisions. Regulators could accomplish this goal by mandating that AI should have certain types of training data to ensure that the models are representative and will not produce biased results, but such an approach can become outdated when new methods of training AI models emerge. Given the diversity of different types of AI models already in use, from recurrent neural networks to generative pretrained transformers to generative adversarial networks and more, finding a single set of rules that can deliver what the public desires both now, and in the future, may be a challenge…(More)”.

The battle over right to repair is a fight over your car’s data


Article by Ofer Tur-Sinai: “Cars are no longer just a means of transportation. They have become rolling hubs of data communication. Modern vehicles regularly transmit information wirelessly to their manufacturers.

However, as cars grow “smarter,” the right to repair them is under siege.

As legal scholars, we find that the question of whether you and your local mechanic can tap into your car’s data to diagnose and repair spans issues of property rights, trade secrets, cybersecurity, data privacy and consumer rights. Policymakers are forced to navigate this complex legal landscape and ideally are aiming for a balanced approach that upholds the right to repair, while also ensuring the safety and privacy of consumers…

Until recently, repairing a car involved connecting to its standard on-board diagnostics port to retrieve diagnostic data. The ability for independent repair shops – not just those authorized by the manufacturer – to access this information was protected by a state law in Massachusetts, approved by voters on Nov. 6, 2012, and by a nationwide memorandum of understanding between major car manufacturers and the repair industry signed on Jan. 15, 2014.

However, with the rise of telematics systems, which combine computing with telecommunications, these dynamics are shifting. Unlike the standardized onboard diagnostics ports, telematics systems vary across car manufacturers. These systems are often protected by digital locks, and circumventing these locks could be considered a violation of copyright law. The telematics systems also encrypt the diagnostic data before transmitting it to the manufacturer.

This reduces the accessibility of telematics information, potentially locking out independent repair shops and jeopardizing consumer choice – a lack of choice that can lead to increased costs for consumers….

One issue left unresolved by the legislation is the ownership of vehicle data. A vehicle generates all sorts of data as it operates, including location, diagnostic, driving behavior, and even usage patterns of in-car systems – for example, which apps you use and for how long.

In recent years, the question of data ownership has gained prominence. In 2015, Congress legislated that the data stored in event data recorders belongs to the vehicle owner. This was a significant step in acknowledging the vehicle owner’s right over specific datasets. However, the broader issue of data ownership in today’s connected cars remains unresolved…(More)”.

Private UK health data donated for medical research shared with insurance companies


Article by Shanti Das: “Sensitive health information donated for medical research by half a million UK citizens has been shared with insurance companies despite a pledge that it would not be.

An Observer investigation has found that UK Biobank opened up its vast biomedical database to insurance sector firms several times between 2020 and 2023. The data was provided to insurance consultancy and tech firms for projects to create digital tools that help insurers predict a person’s risk of getting a chronic disease. The findings have raised concerns among geneticists, data privacy experts and campaigners over vetting and ethical checks at Biobank.

Set up in 2006 to help researchers investigating diseases, the database contains millions of blood, saliva and urine samples, collected regularly from about 500,000 adult volunteers – along with medical records, scans, wearable device data and lifestyle information.

Approved researchers around the world can pay £3,000 to £9,000 to access records ranging from medical history and lifestyle information to whole genome sequencing data. The resulting research has yielded major medical discoveries and led to Biobank being considered a “jewel in the crown” of British science.

Biobank said it strictly guarded access to its data, only allowing access by bona fide researchers for health-related projects in the public interest. It said this included researchers of all stripes, whether employed by academic, charitable or commercial organisations – including insurance companies – and that “information about data sharing was clearly set out to participants at the point of recruitment and the initial assessment”.

But evidence gathered by the Observer suggests Biobank did not explicitly tell participants it would share data with insurance companies – and made several public commitments not to do so.

When the project was announced, in 2002, Biobank promised that data would not be given to insurance companies after concerns were raised that it could be used in a discriminatory way, such as by the exclusion of people with a particular genetic makeup from insurance.

In an FAQ section on the Biobank website, participants were told: “Insurance companies will not be allowed access to any individual results nor will they be allowed access to anonymised data.” The statement remained online until February 2006, during which time the Biobank project was subject to public scrutiny and discussed in parliament.

The promise was also reiterated in several public statements by backers of Biobank, who said safeguards would be built in to ensure that “no insurance company or police force or employer will have access”.

This weekend, Biobank said the pledge – made repeatedly over four years – no longer applied. It said the commitment had been made before recruitment formally began in 2007 and that when Biobank volunteers enrolled they were given revised information.

This included leaflets and consent forms that contained a provision that anonymised Biobank data could be shared with private firms for “health-related” research, but did not explicitly mention insurance firms or correct the previous assurances…(More)”

Managing smart city governance – A playbook for local and regional governments


Report by UN Habitat” “This playbook and its recommendations are primarily aimed at municipal governments and their political leaders, local administrators, and public officials who are involved in smart city initiatives. The recommendations, which are delineated in the subsequent sections of this playbook, are intended to help develop more effective, inclusive, and sustainable governance practices for urban digital transformations. The guidance offered on these pages could also be useful for national agencies, private companies, non-governmental organizations, and all stakeholders committed to promoting the sustainable development of urban communities through the implementation of smart city initiatives…(More)”.

Unlocking the Potential: The Call for an International Decade of Data


Working Paper by Stefaan Verhulst : “The goal of this working paper is to reiterate the central importance of data – to Artificial Intelligence (AI) in particular, but more generally to the landscape of digital technology.

What follows serves as a clarion call to the global community to prioritize and advance data as the bedrock for social and economic development, especially for the UN’s Sustainable Development Goals. It begins by recognizing the existence of significant remaining challenges related to data; encompassing issues of accessibility, distribution, divides, and asymmetries. In light of these challenges, and as we propel ourselves into an era increasingly dominated by AI and AI-related innovation, the paper argues that establishing a more robust foundation for the stewardship of data is critical; a foundation that, for instance, embodies inclusivity, self-determination, and responsibility.

Finally, the paper advocates for the creation of an International Decade of Data (IDD), an initiative aimed at solidifying this foundation globally and advancing our collective efforts towards data-driven progress.

Download ‘Unlocking the Potential: The Call for an International Decade of Data’ here

New Tools to Guide Data Sharing Agreements


Article by Andrew J. Zahuranec, Stefaan Verhulst, and Hannah Chafetz: “The process of forming a data-sharing agreement is not easy. The process involves figuring out incentives, evaluating the degree to which others are willing and able to collaborate, and defining the specific conduct that is and is not allowed. Even under the best of circumstances, these steps can be costly and time-consuming.

Today, the Open Data Policy Lab took a step to help data practitioners control these costs. Moving from Idea to Practice: Three Resources to Streamline the Creation of Data Sharing Agreements” provides data practitioners with three resources meant to support them throughout the process of developing an agreement. These include:

  • A Guide to Principled Data Sharing Agreement Negotiation by Design: A document outlining the different principles that a data practitioner might seek to uphold while negotiating an agreement;
  • The Contractual Wheel of Data Collaboration 2.0: A listing of the different kinds of data sharing agreement provisions that a data practitioner might include in an agreement;
  • A Readiness Matrix for Data Sharing Agreements: A form to evaluate the degree to which a partner can participate in a data-sharing agreement.

The resources are a result of a series of Open Data Action Labs, an initiative from the Open Data Policy Lab to define new strategies and tools that can help organizations resolve policy challenges they face. The Action Labs are built around a series of workshops (called “studios”) which given experts and stakeholders an opportunity to define the problems facing them and then ideate possible solutions in a collaborative setting. In February and March 2023, the Open Data Policy Lab and Trust Relay co-hosted conversations with experts in law, data, and smart cities on the challenge of forming a data sharing agreement. Find all the resources here.”

Researchers warn we could run out of data to train AI by 2026. What then?


Article by Rita Matulionyte: “As artificial intelligence (AI) reaches the peak of its popularity, researchers have warned the industry might be running out of training data – the fuel that runs powerful AI systems. This could slow down the growth of AI models, especially large language models, and may even alter the trajectory of the AI revolution.

But why is a potential lack of data an issue, considering how much there are on the web? And is there a way to address the risk?…

We need a lot of data to train powerful, accurate and high-quality AI algorithms. For instance, ChatGPT was trained on 570 gigabytes of text data, or about 300 billion words.

Similarly, the stable diffusion algorithm (which is behind many AI image-generating apps such as DALL-E, Lensa and Midjourney) was trained on the LIAON-5B dataset comprising of 5.8 billion image-text pairs. If an algorithm is trained on an insufficient amount of data, it will produce inaccurate or low-quality outputs.

The quality of the training data is also important…This is why AI developers seek out high-quality content such as text from books, online articles, scientific papers, Wikipedia, and certain filtered web content. The Google Assistant was trained on 11,000 romance novels taken from self-publishing site Smashwords to make it more conversational.

The AI industry has been training AI systems on ever-larger datasets, which is why we now have high-performing models such as ChatGPT or DALL-E 3. At the same time, research shows online data stocks are growing much slower than datasets used to train AI.

In a paper published last year, a group of researchers predicted we will run out of high-quality text data before 2026 if the current AI training trends continue. They also estimated low-quality language data will be exhausted sometime between 2030 and 2050, and low-quality image data between 2030 and 2060.

AI could contribute up to US$15.7 trillion (A$24.1 trillion) to the world economy by 2030, according to accounting and consulting group PwC. But running out of usable data could slow down its development…(More)”.