Global Data Stewardship


On-line Course by Stefaan G. Verhulst: “Creating a systematic and sustainable data access program is critical for data stewardship. What you do with your data, how you reuse it, and how you make it available to the general public can help others reimagine what’s possible for data sharing and cross-sector data collaboration. In this course, instructor Stefaan Verhulst shows you how to develop and manage data reuse initiatives as a competent and responsible global data steward.

Following the insights of current research and practical, real-world examples, learn about the growing importance of data stewardship, data supply, and data demand to understand the value proposition and societal case for data reuse. Get tips on designing and implementing data collaboration models, governance framework, and infrastructure, as well as best practices for measuring, sunsetting, and supporting data reuse initiatives. Upon completing this course, you’ll be ready to start pushing your new skill set and continue your data stewardship learning journey….(More)”

For chemists, the AI revolution has yet to happen


Editorial Team at Nature: “Many people are expressing fears that artificial intelligence (AI) has gone too far — or risks doing so. Take Geoffrey Hinton, a prominent figure in AI, who recently resigned from his position at Google, citing the desire to speak out about the technology’s potential risks to society and human well-being.

But against those big-picture concerns, in many areas of science you will hear a different frustration being expressed more quietly: that AI has not yet gone far enough. One of those areas is chemistry, for which machine-learning tools promise a revolution in the way researchers seek and synthesize useful new substances. But a wholesale revolution has yet to happen — because of the lack of data available to feed hungry AI systems.

Any AI system is only as good as the data it is trained on. These systems rely on what are called neural networks, which their developers teach using training data sets that must be large, reliable and free of bias. If chemists want to harness the full potential of generative-AI tools, they need to help to establish such training data sets. More data are needed — both experimental and simulated — including historical data and otherwise obscure knowledge, such as that from unsuccessful experiments. And researchers must ensure that the resulting information is accessible. This task is still very much a work in progress…(More)”.

The latest in homomorphic encryption: A game-changer shaping up


Article by Katharina Koerner: “Privacy professionals are witnessing a revolution in privacy technology. The emergence and maturing of new privacy-enhancing technologies that allow for data use and collaboration without sharing plain text data or sending data to a central location are part of this revolution.

The United Nations, the Organisation for Economic Co-operation and Development, the U.S. White House, the European Union Agency for Cybersecurity, the UK Royal Society, and Singapore’s media and privacy authorities all released reports, guidelines and regulatory sandboxes around the use of PETs in quick succession. We are in an era where there are high hopes for data insights to be leveraged for the public good while maintaining privacy principles and enhanced security.

A prominent example of a PET is fully homomorphic encryption, often mentioned in the same breath as differential privacy, federated learning, secure multiparty computation, private set intersection, synthetic data, zero knowledge proofs or trusted execution environments.

As FHE advances and becomes standardized, it has the potential to revolutionize the way we handle, protect and utilize personal data. Staying informed about the latest advancements in this field can help privacy pros prepare for the changes ahead in this rapidly evolving digital landscape.

Homomorphic encryption: A game changer?

FHE is a groundbreaking cryptographic technique that enables third parties to process information without revealing the data itself by running computations on encrypted data.

This technology can have far-reaching implications for secure data analytics. Requests to a databank can be answered without accessing its plain text data, as the analysis is conducted on data that remains encrypted. This adds a third layer of security for data when in use, along with protecting data at rest and in transit…(More)”.

Data Privacy and Algorithmic Inequality


Paper by Zhuang Liu, Michael Sockin & Wei Xiong: “This paper develops a foundation for a consumer’s preference for data privacy by linking it to the desire to hide behavioral vulnerabilities. Data sharing with digital platforms enhances the matching efficiency for standard consumption goods, but also exposes individuals with self-control issues to temptation goods. This creates a new form of inequality in the digital era—algorithmic inequality. Although data privacy regulations provide consumers with the option to opt out of data sharing, these regulations cannot fully protect vulnerable consumers because of data-sharing externalities. The coordination problem among consumers may also lead to multiple equilibria with drastically different levels of data sharing by consumers. Our quantitative analysis further illustrates that although data is non-rival and beneficial to social welfare, it can also exacerbate algorithmic inequality…(More)”.

Big data proves mobility is not gender-neutral


Blog by Ellin Ivarsson, Aiga Stokenberg and Juan Ignacio Fulponi: “All over the world, there is growing evidence showing that women and men travel differently. While there are many reasons behind this, one key factor is the persistence of traditional gender norms and roles that translate into different household responsibilities, different work schedules, and, ultimately, different mobility needs. Greater overall risk aversion and sensitivity to safety issues also play an important role in how women get around. Yet gender often remains an afterthought in the transport sector, meaning most policies or infrastructure investment plans are not designed to take into account the specific mobility needs of women.

The good news is that big data can help change that. In a recent study, the World Bank Transport team combined several data sources to analyze how women travel around the Buenos Aires Metropolitan Area (AMBA), including mobile phone signal data, congestion data from Waze, public transport smart card data, and data from a survey implemented by the team in early 2022 with over 20,300 car and motorcycle users.

Our research revealed that, on average, women in AMBA travel less often than men, travel shorter distances, and tend to engage in more complex trips with multiple stops and purposes. On average, 65 percent of the trips made by women are shorter than 5 kilometers, compared to 60 percent among men. Also, women’s hourly travel patterns are different, with 10 percent more trips than men during the mid-day off-peak hour, mostly originating in central AMBA. This reflects the larger burden of household responsibilities faced by women – such as picking children up from school – and the fact that women tend to work more irregular hours…(More)” See also Gender gaps in urban mobility.

China’s new AI rules protect people — and the Communist Party’s power


Article by Johanna M. Costigan: “In April, in an effort to regulate rapidly advancing artificial intelligence technologies, China’s internet watchdog introduced draft rules on generative AI. They cover a wide range of issues — from how data is trained to how users interact with generative AI such as chatbots. 

Under the new regulations, companies are ultimately responsible for the “legality” of the data they use to train AI models. Additionally, generative AI providers must not share personal data without permission, and must guarantee the “veracity, accuracy, objectivity, and diversity” of their pre-training data. 

These strict requirements by the Cyberspace Administration of China (CAC) for AI service providers could benefit Chinese users, granting them greater protections from private companies than many of their global peers. Article 11 of the regulations, for instance, prohibits providers from “conducting profiling” on the basis of information gained from users. Any Instagram user who has received targeted ads after their smartphone tracked their activity would stand to benefit from this additional level of privacy.  

Another example is Article 10 — it requires providers to employ “appropriate measures to prevent users from excessive reliance on generated content,” which could help prevent addiction to new technologies and increase user safety in the long run. As companion chatbots such as Replika become more popular, companies should be responsible for managing software to ensure safe use. While some view social chatbots as a cure for loneliness, depression, and social anxiety, they also present real risks to users who become reliant on them…(More)”.

AI-assisted diplomatic decision-making during crises—Challenges and opportunities


Article by Neeti Pokhriyal and Till Koebe: “Recent academic works have demonstrated the efficacy of employing or integrating “non-traditional” data (e.g., social media, satellite imagery, etc) for situational awareness tasks…

Despite these successes, we identify four critical challenges unique to the area of diplomacy that needs to be considered within the growing AI and diplomacy community going ahead:

1. First, decisions during crises are almost always taken using limited or incomplete information. There may be deliberate misuse and obfuscation of data/signals between different parties involved. At the start of a crisis, information is usually limited and potentially biased, especially along socioeconomic and rural-urban lines as crises are known to exacerbate the vulnerabilities already existing in the populations. This requires AI tools to quantify and visualize calibrated uncertainty in their outputs in an appropriate manner.

2. Second, in many cases, human lives and livelihoods are at stake. Therefore, any forecast, reasoning, or recommendation provided by AI assistance needs to be explainable and transparent for authorized users, but also secure against unauthorized access as diplomatic information is often highly sensitive. The question of accountability in case of misleading AI assistance needs to be addressed beforehand.

3. Third, in complex situations with high stakes but limited information, cultural differences and value-laden judgment driven by personal experiences play a central role in diplomatic decision-making. This calls for the use of learning techniques that can incorporate domain knowledge and experience.

4. Fourth, diplomatic interests during crises are often multifaceted, resulting in deep mistrust in and strategic misuse of information. Social media data, when used for consular tasks, has been shown to be susceptible to various d-/misinformation campaigns, some by the public, others by state actors for strategic manipulation…(More)”

What do data portals do? Tracing the politics of online devices for making data public


Paper by Jonathan Gray: “The past decade has seen the rise of “data portals” as online devices for making data public. They have been accorded a prominent status in political speeches, policy documents, and official communications as sites of innovation, transparency, accountability, and participation. Drawing on research on data portals around the world, data portal software, and associated infrastructures, this paper explores three approaches for studying the social life of data portals as technopolitical devices: (a) interface analysis, (b) software analysis, and (c) metadata analysis. These three approaches contribute to the study of the social lives of data portals as dynamic, heterogeneous, and contested sites of public sector datafication. They are intended to contribute to critically assessing how participation around public sector datafication is invited and organized with portals, as well as to rethinking and recomposing them…(More)”.

As the Quantity of Data Explodes, Quality Matters


Article by Katherine Barrett and Richard Greene: “With advances in technology, governments across the world are increasingly using data to help inform their decision making. This has been one of the most important byproducts of the use of open data, which is “a philosophy- and increasingly a set of policies – that promotes transparency, accountability and value creation by making government data available to all,” according to the Organisation for Economic Co-operation and Development (OECD).

But as data has become ever more important to governments, the quality of that data has become an increasingly serious issue. A number of nations, including the United States, are taking steps to deal with it. For example, according to a study from Deloitte, “The Dutch government is raising the bar to enable better data quality and governance across the public sector.” In the same report, a case study about Finland states that “data needs to be shared at the right time and in the right way. It is also important to improve the quality and usability of government data to achieve the right goals.” And the United Kingdom has developed its Government Data Quality Hub to help public sector organizations “better identify their data challenges and opportunities and effectively plan targeted improvements.”

Our personal experience is with U.S. states and local governments, and in that arena the road toward higher quality data is a long and difficult one, particularly as the sheer quantity of data has grown exponentially. As things stand, based on our ongoing research into performance audits, it is clear that issues with data are impediments to the smooth process of state and local governments…(More)”.

Digital Equity 2.0: How to Close the Data Divide


Report by Gillian Diebold: “For the last decade, closing the digital divide, or the gap between those subscribing to broadband and those not subscribing, has been a top priority for policymakers. But high-speed Internet and computing device access are no longer the only barriers to fully participating and benefiting from the digital economy. Data is also increasingly essential, including in health care, financial services, and education. Like the digital divide, a gap has emerged between the data haves and the data have-nots, and this gap has introduced a new set of inequities: the data divide.

Policymakers have put a great deal of effort into closing the digital divide, and there is now near-universal acceptance of the notion that obtaining widespread Internet access generates social and economic benefits. But closing the data divide has received little attention. Moreover, efforts to improve data collection are typically overshadowed by privacy advocates’ warnings against collecting any data. In fact, unlike the digital divide, many ignore the data divide or argue that the way to close it is to collect vastly less data.1 But without substantial efforts to increase data representation and access, certain individuals and communities will be left behind in an increasingly data-driven world.

This report describes the multipronged efforts needed to address digital inequity. For the digital divide, policymakers have expanded digital connectivity, increased digital literacy, and improved access to digital devices. For the data divide, policymakers should similarly take a holistic approach, including by balancing privacy and data innovation, increasing data collection efforts across a wide array of fronts, enhancing access to data, improving data quality, and improving data analytics efforts. Applying lessons from the digital divide to this new challenge will help policymakers design effective and efficient policy and create a more equitable and effective data economy for all Americans…(More)”.