Paper by M. Fairbairn, and Z. Kish: “Open data is increasingly being promoted as a route to achieve food security and agricultural development. This article critically examines the promotion of open agri-food data for development through a document-based case study of the Global Open Data for Agriculture and Nutrition (GODAN) initiative as well as through interviews with open data practitioners and participant observation at open data events. While the concept of openness is striking for its ideological flexibility, we argue that GODAN propagates an anti-political, neoliberal vision for how open data can enhance agricultural development. This approach centers values such as private innovation, increased production, efficiency, and individual empowerment, in contrast to more political and collectivist approaches to openness practiced by some agri-food social movements. We further argue that open agri-food data projects, in general, have a tendency to reproduce elements of “data colonialism,” extracting data with minimal consideration for the collective harms that may result, and embedding their own values within universalizing information infrastructures…(More)”.
Unleashing the power of data for electric vehicles and charging infrastructure
Report by Thomas Deloison: “As the world moves toward widespread electric vehicle (EV) adoption, a key challenge lies ahead: deploying charging infrastructure rapidly and effectively. Solving this challenge will be essential to decarbonize transport, which has a higher reliance on fossil fuels than any other sector and accounts for a fifth of global carbon emissions. However, the companies and governments investing in charging infrastructure face significant hurdles, including high initial capital costs and difficulties related to infrastructure planning, permitting, grid connections and grid capacity development.
Data has the power to facilitate these processes: increased predictability and optimized planning and infrastructure management go a long way in easing investments and accelerating deployment. Last year, members of the World Business Council for Sustainable Development (WBCSD) demonstrated that digital solutions based on data sharing could reduce carbon emissions from charging by 15% and unlock crucial grid capacity and capital efficiency gains.
Exceptional advances in data, analytics and connectivity are making digital solutions a potent tool to plan and manage transport, energy and infrastructure. Thanks to the deployment of sensors and the rise of connectivity, businesses are collecting information faster than ever before, allowing for data flows between physical assets. Charging infrastructure operators, automotive companies, fleet operators, energy providers, building managers and governments collect insights on all aspects of electric vehicle charging infrastructure (EVCI), from planning and design to charging experiences at the station.
The real value of data lies in its aggregation. This will require breaking down siloes across industries and enabling digital collaboration. A digital action framework released by WBCSD, in collaboration with Arcadis, Fujitsu and other member companies and partners, introduces a set of recommendations for companies and governments to realize the full potential of digital solutions and accelerate EVCI deployments:
- Map proprietary data, knowledge gaps and digital capacity across the value chain to identify possible synergies. The highest value potential from digital solutions will lie at the nexus of infrastructure, consumer behavior insights, grid capacity and transport policy. For example, to ensure the deployment of charging stations where they will be most needed and at the right capacity level, it is crucial to plan investments within energy grid capacity, spatial constraints and local projected demand for EVs.
- Develop internal data collection and storage capacity with due consideration for existing structures for data sharing. A variety of schemes allow actors to engage in data sharing or monetization. Yet, their use is limited by mismatched use of data standards and specification and process uncertainty. Companies must build a strong understanding of these structures internally by providing internal training and guidance, and invest in sound data collection, storage and analysis capacity.
- Foster a policy environment that supports digital collaboration across sectors and industries. Digital policies must provide incentives and due diligence frameworks to guide data exchanges across industries and support the adoption of common standards and protocols. For instance, it will be crucial to integrate linkages with energy systems and infrastructure beyond roads in the rollout of the European mobility data space…(More)”.
‘Not for Machines to Harvest’: Data Revolts Break Out Against A.I.
Article by Sheera Frenkel, and Stuart A. Thompson: “Fan fiction writers are just one group now staging revolts against A.I. systems as a fever over the technology has gripped Silicon Valley and the world. In recent months, social media companies such as Reddit and Twitter, news organizations including The New York Times and NBC News, authors such as Paul Tremblay and the actress Sarah Silverman have all taken a position against A.I. sucking up their data without permission.
Their protests have taken different forms. Writers and artists are locking their files to protect their work or are boycotting certain websites that publish A.I.-generated content, while companies like Reddit want to charge for access to their data. At least 10 lawsuits have been filed this year against A.I. companies, accusing them of training their systems on artists’ creative work without consent. This past week, Ms. Silverman and the authors Christopher Golden and Richard Kadrey sued OpenAI, the maker of ChatGPT, and others over A.I.’s use of their work.
At the heart of the rebellions is a newfound understanding that online information — stories, artwork, news articles, message board posts and photos — may have significant untapped value.
The new wave of A.I. — known as “generative A.I.” for the text, images and other content it generates — is built atop complex systems such as large language models, which are capable of producing humanlike prose. These models are trained on hoards of all kinds of data so they can answer people’s questions, mimic writing styles or churn out comedy and poetry.
That has set off a hunt by tech companies for even more data to feed their A.I. systems. Google, Meta and OpenAI have essentially used information from all over the internet, including large databases of fan fiction, troves of news articles and collections of books, much of which was available free online. In tech industry parlance, this was known as “scraping” the internet…(More)”.
Digital divides are lower in Smart Cities
Paper by Andrea Caragliu and Chiara F. Del Bo: “Ever since the emergence of digital technologies in the early 1990s, the literature has discussed the potential pitfalls of an uneven distribution of e-skills under the umbrella of the digital divide. To provide a definition of the concept, “Lloyd Morrisett coined the term digital divide to mean “a discrepancy in access to technology resources between socioeconomic groups” (Robyler and Doering, 2014, p. 27)”
Despite digital divide being high on the policy agenda, statistics suggest the persisting relevance of this issue. For instance, focusing on Europe, according to EUROSTAT statistics, in 2021 about 90 per cent of people living in Zeeland, a NUTS2 region in the Netherlands, had ordered at least once in their life goods or services over the internet for private use, against a minimum in the EU27 of 15 per cent (in the region of Yugoiztochen, in Bulgaria). In the same year, while basically all (99 per cent) interviewees in the NUTS2 region of Northern and Western Ireland declared using the internet at least once a week, the same statistic drops to two thirds of the sample in the Bulgarian region of Severozapaden. While over time these territorial divides are converging, they can still significantly affect the potential positive impact of the diffusion of digital technologies.
Over the past three years, the digital divide has been made dramatically apparent by the COVID-19 pandemic outbreak. When, during the first waves of full lockdowns enacted in most Countries, tertiary and schooling activities were moved online, many economic outcomes showed significant worsening. Among these, learning outcomes in pupils and service sectors’ productivity were particularly affected.
A simultaneous development in the scientific literature has discussed the attractive features of planning and managing cities ‘smartly’. Smart Cities have been initially identified as urban areas with a tendency to invest and deploy ICTs. More recently, this notion also started to encompass the context characteristics that make a city capable of reaping the benefits of ICTs – social and human capital, soft and hard institutions.
While mounting empirical evidence suggests a superior economic performance of Cities ticking all these boxes, the Smart City movement did not come without critiques. The debate on urban smartness as an instrument for planning and managing more efficient cities has been recently positing that Smart Cities could be raising inequalities. This effect would be due to the role of driver of smart urban transformations played by multinational corporations, who, in a dystopic view, would influence local policymakers’ agendas.
Given these issues, and our own research on Smart Cities, we started asking ourselves whether the risks of increasing inequalities associated with the Smart City model were substantiated. To this end, we focused on empirically verifying whether cities moving forward along the smart city model were facing increases in income and digital inequalities. We answered the first question in Caragliu and Del Bo (2022), and found compelling evidence that smart city characteristics actually decrease income inequalities…(More)”.
A new way to look at data privacy
Article by Adam Zewe: “Imagine that a team of scientists has developed a machine-learning model that can predict whether a patient has cancer from lung scan images. They want to share this model with hospitals around the world so clinicians can start using it in diagnosis.
But there’s a problem. To teach their model how to predict cancer, they showed it millions of real lung scan images, a process called training. Those sensitive data, which are now encoded into the inner workings of the model, could potentially be extracted by a malicious agent. The scientists can prevent this by adding noise, or more generic randomness, to the model that makes it harder for an adversary to guess the original data. However, perturbation reduces a model’s accuracy, so the less noise one can add, the better.
MIT researchers have developed a technique that enables the user to potentially add the smallest amount of noise possible, while still ensuring the sensitive data are protected.
The researchers created a new privacy metric, which they call Probably Approximately Correct (PAC) Privacy, and built a framework based on this metric that can automatically determine the minimal amount of noise that needs to be added. Moreover, this framework does not need knowledge of the inner workings of a model or its training process, which makes it easier to use for different types of models and applications.
In several cases, the researchers show that the amount of noise required to protect sensitive data from adversaries is far less with PAC Privacy than with other approaches. This could help engineers create machine-learning models that provably hide training data, while maintaining accuracy in real-world settings…
A fundamental question in data privacy is: How much sensitive data could an adversary recover from a machine-learning model with noise added to it?
Differential Privacy, one popular privacy definition, says privacy is achieved if an adversary who observes the released model cannot infer whether an arbitrary individual’s data is used for the training processing. But provably preventing an adversary from distinguishing data usage often requires large amounts of noise to obscure it. This noise reduces the model’s accuracy.
PAC Privacy looks at the problem a bit differently. It characterizes how hard it would be for an adversary to reconstruct any part of randomly sampled or generated sensitive data after noise has been added, rather than only focusing on the distinguishability problem…(More)”
How do we know how smart AI systems are?
Article by Melanie Mitchell: “In 1967, Marvin Minksy, a founder of the field of artificial intelligence (AI), made a bold prediction: “Within a generation…the problem of creating ‘artificial intelligence’ will be substantially solved.” Assuming that a generation is about 30 years, Minsky was clearly overoptimistic. But now, nearly two generations later, how close are we to the original goal of human-level (or greater) intelligence in machines?
Some leading AI researchers would answer that we are quite close. Earlier this year, deep-learning pioneer and Turing Award winner Geoffrey Hinton told Technology Review, “I have suddenly switched my views on whether these things are going to be more intelligent than us. I think they’re very close to it now and they will be much more intelligent than us in the future.” His fellow Turing Award winner Yoshua Bengio voiced a similar opinion in a recent blog post: “The recent advances suggest that even the future where we know how to build superintelligent AIs (smarter than humans across the board) is closer than most people expected just a year ago.”
These are extraordinary claims that, as the saying goes, require extraordinary evidence. However, it turns out that assessing the intelligence—or more concretely, the general capabilities—of AI systems is fraught with pitfalls. Anyone who has interacted with ChatGPT or other large language models knows that these systems can appear quite intelligent. They converse with us in fluent natural language, and in many cases seem to reason, to make analogies, and to grasp the motivations behind our questions. Despite their well-known unhumanlike failings, it’s hard to escape the impression that behind all that confident and articulate language there must be genuine understanding…(More)”.
Questions as a Device for Data Responsibility: Toward a New Science of Questions to Steer and Complement the Use of Data Science for the Public Good in a Polycentric Way
Paper by Stefaan G. Verhulst: “We are at an inflection point today in our search to responsibly handle data in order to maximize the public good while limiting both private and public risks. This paper argues that the way we formulate questions should be given more consideration as a device for modern data responsibility. We suggest that designing a polycentric process for co-defining the right questions can play an important role in ensuring that data are used responsibly, and with maximum positive social impact. In making these arguments, we build on two bodies of knowledge—one conceptual and the other more practical. These observations are supplemented by the author’s own experience as founder and lead of “The 100 Questions Initiative.” The 100 Questions Initiative uses a unique participatory methodology to identify the world’s 100 most pressing, high-impact questions across a variety of domains—including migration, gender inequality, air quality, the future of work, disinformation, food sustainability, and governance—that could be answered by unlocking datasets and other resources. This initiative provides valuable practical insights and lessons into building a new “science of questions” and builds on theoretical and practical knowledge to outline a set of benefits of using questions for data responsibility. More generally, this paper argues that, combined with other methods and approaches, questions can help achieve a variety of key data responsibility goals, including data minimization and proportionality, increasing participation, and enhancing accountability…(More)”.
Building Responsive Investments in Gender Equality using Gender Data System Maturity Models
Tools and resources by Data2X and Open Data Watch: “.. to help countries check the maturity of their gender data systems and set priorities for gender data investments. The new Building Responsive Investments in Data for Gender Equality (BRIDGE) tool is designed for use by gender data focal points in national statistical offices (NSOs) of low- and middle- income countries and by their partners within the national statistical system (NSS) to communicate gender data priorities to domestic sources of financing and international donors.
The BRIDGE results will help gender data stakeholders understand the current maturity level of their gender data system, diagnose strengths and weaknesses, and identify priority areas for improvement. They will also serve as an input to any roadmap or action plan developed in collaboration with key stakeholders within the NSS.
Below are links to and explanations of our ‘Gender Data System Maturity Model’ briefs (a long and short version), our BRIDGE assessment and tools methodology, how-to guide, questionnaire, and scoring form that will provide an overall assessment of system maturity and insight into potential action plans to strengthen gender data systems…(More)”.
How Statisticians Should Grapple with Privacy in a Changing Data Landscape
Article by Joshua Snoke, and Claire McKay Bowen: “Suppose you had a data set that contained records of individuals, including demographics such as their age, sex, and race. Suppose also that these data contained additional in-depth personal information, such as financial records, health status, or political opinions. Finally, suppose that you wanted to glean relevant insights from these data using machine learning, causal inference, or survey sampling adjustments. What methods would you use? What best practices would you ensure you followed? Where would you seek information to help guide you in this process?…(More)”
AI tools are designing entirely new proteins that could transform medicine
Article by Ewen Callaway: “OK. Here we go.” David Juergens, a computational chemist at the University of Washington (UW) in Seattle, is about to design a protein that, in 3-billion-plus years of tinkering, evolution has never produced.
On a video call, Juergens opens a cloud-based version of an artificial intelligence (AI) tool he helped to develop, called RFdiffusion. This neural network, and others like it, are helping to bring the creation of custom proteins — until recently a highly technical and often unsuccessful pursuit — to mainstream science.
These proteins could form the basis for vaccines, therapeutics and biomaterials. “It’s been a completely transformative moment,” says Gevorg Grigoryan, the co-founder and chief technical officer of Generate Biomedicines in Somerville, Massachusetts, a biotechnology company applying protein design to drug development.
The tools are inspired by AI software that synthesizes realistic images, such as the Midjourney software that, this year, was famously used to produce a viral image of Pope Francis wearing a designer white puffer jacket. A similar conceptual approach, researchers have found, can churn out realistic protein shapes to criteria that designers specify — meaning, for instance, that it’s possible to speedily draw up new proteins that should bind tightly to another biomolecule. And early experiments show that when researchers manufacture these proteins, a useful fraction do perform as the software suggests.
The tools have revolutionized the process of designing proteins in the past year, researchers say. “It is an explosion in capabilities,” says Mohammed AlQuraishi, a computational biologist at Columbia University in New York City, whose team has developed one such tool for protein design. “You can now create designs that have sought-after qualities.”
“You’re building a protein structure customized for a problem,” says David Baker, a computational biophysicist at UW whose group, which includes Juergens, developed RFdiffusion. The team released the software in March 2023, and a paper describing the neural network appears this week in Nature1. (A preprint version was released in late 2022, at around the same time that several other teams, including AlQuraishi’s2 and Grigoryan’s3, reported similar neural networks)…(More)”.