Time to recognize authorship of open data


Nature Editorial: “At times, it seems there’s an unstoppable momentum towards the principle that data sets should be made widely available for research purposes (also called open data). Research funders all over the world are endorsing the open data-management standards known as the FAIR principles (which ensure data are findable, accessible, interoperable and reusable). Journals are increasingly asking authors to make the underlying data behind papers accessible to their peers. Data sets are accompanied by a digital object identifier (DOI) so they can be easily found. And this citability helps researchers to get credit for the data they generate.

But reality sometimes tells a different story. The world’s systems for evaluating science do not (yet) value openly shared data in the same way that they value outputs such as journal articles or books. Funders and research leaders who design these systems accept that there are many kinds of scientific output, but many reject the idea that there is a hierarchy among them.

In practice, those in powerful positions in science tend not to regard open data sets in the same way as publications when it comes to making hiring and promotion decisions or awarding memberships to important committees, or in national evaluation systems. The open-data revolution will stall unless this changes….

Universities, research groups, funding agencies and publishers should, together, start to consider how they could better recognize open data in their evaluation systems. They need to ask: how can those who have gone the extra mile on open data be credited appropriately?

There will always be instances in which researchers cannot be given access to human data. Data from infants, for example, are highly sensitive and need to pass stringent privacy and other tests. Moreover, making data sets accessible takes time and funding that researchers don’t always have. And researchers in low- and middle-income countries have concerns that their data could be used by researchers or businesses in high-income countries in ways that they have not consented to.

But crediting all those who contribute their knowledge to a research output is a cornerstone of science. The prevailing convention — whereby those who make their data open for researchers to use make do with acknowledgement and a citation — needs a rethink. As long as authorship on a paper is significantly more valued than data generation, this will disincentivize making data sets open. The sooner we change this, the better….(More)”.

Artificial intelligence is creating a new colonial world order


Series by  Karen Hao: “…Over the last few years, an increasing number of scholars have argued that the impact of AI is repeating the patterns of colonial history. European colonialism, they say, was characterized by the violent capture of land, extraction of resources, and exploitation of people—for example, through slavery—for the economic enrichment of the conquering country. While it would diminish the depth of past traumas to say the AI industry is repeating this violence today, it is now using other, more insidious means to enrich the wealthy and powerful at the great expense of the poor….

MIT Technology Review’s new AI Colonialism series, which will be publishing throughout this week, digs into these and other parallels between AI development and the colonial past by examining communities that have been profoundly changed by the technology. In part one, we head to South Africa, where AI surveillance tools, built on the extraction of people’s behaviors and faces, are re-entrenching racial hierarchies and fueling a digital apartheid.

In part two, we head to Venezuela, where AI data-labeling firms found cheap and desperate workers amid a devastating economic crisis, creating a new model of labor exploitation. The series also looks at ways to move away from these dynamics. In part three, we visit ride-hailing drivers in Indonesia who, by building power through community, are learning to resist algorithmic control and fragmentation. In part four, we end in Aotearoa, the Maori name for New Zealand, where an Indigenous couple are wresting back control of their community’s data to revitalize its language.

Together, the stories reveal how AI is impoverishing the communities and countries that don’t have a say in its development—the same communities and countries already impoverished by former colonial empires. They also suggest how AI could be so much more—a way for the historically dispossessed to reassert their culture, their voice, and their right to determine their own future.

That is ultimately the aim of this series: to broaden the view of AI’s impact on society so as to begin to figure out how things could be different. It’s not possible to talk about “AI for everyone” (Google’s rhetoric), “responsible AI” (Facebook’s rhetoric), or “broadly distribut[ing]” its benefits (OpenAI’s rhetoric) without honestly acknowledging and confronting the obstacles in the way….(More)”.

How Democracies Spy on Their Citizens 


Ronan Farrow at the New Yorker: “…Commercial spyware has grown into an industry estimated to be worth twelve billion dollars. It is largely unregulated and increasingly controversial. In recent years, investigations by the Citizen Lab and Amnesty International have revealed the presence of Pegasus on the phones of politicians, activists, and dissidents under repressive regimes. An analysis by Forensic Architecture, a research group at the University of London, has linked Pegasus to three hundred acts of physical violence. It has been used to target members of Rwanda’s opposition party and journalists exposing corruption in El Salvador. In Mexico, it appeared on the phones of several people close to the reporter Javier Valdez Cárdenas, who was murdered after investigating drug cartels. Around the time that Prince Mohammed bin Salman of Saudi Arabia approved the murder of the journalist Jamal Khashoggi, a longtime critic, Pegasus was allegedly used to monitor phones belonging to Khashoggi’s associates, possibly facilitating the killing, in 2018. (Bin Salman has denied involvement, and NSO said, in a statement, “Our technology was not associated in any way with the heinous murder.”) Further reporting through a collaboration of news outlets known as the Pegasus Project has reinforced the links between NSO Group and anti-democratic states. But there is evidence that Pegasus is being used in at least forty-five countries, and it and similar tools have been purchased by law-enforcement agencies in the United States and across Europe. Cristin Flynn Goodwin, a Microsoft executive who has led the company’s efforts to fight spyware, told me, “The big, dirty secret is that governments are buying this stuff—not just authoritarian governments but all types of governments.”…(More)”.

How Smart Tech Tried to Solve the Mental Health Crisis and Only Made It Worse


Article by Emma Bedor Hiland: “Crisis Text Line was supposed to be the exception. Skyrocketing rates of depression, anxiety, and mental distress over the last decade demanded new, innovative solutions. The non-profit organization was founded in 2013 with the mission of providing free mental health text messaging services and crisis intervention tools. It seemed like the right moment to use technology to make the world a better place. Over the following years, the accolades and praise the platform received reflected its success. But their sterling reputation was tarnished overnight at the beginning of 2022 when Politico published an investigation into the way Crisis Text Line had handled and shared user data. The problem with the organization, however, goes well beyond its alleged mishandling of user information.

Despite Crisis Text Line’s assurance that its platform was anonymous, Politico’s January report showed that the company’s private messaging sessions were not actually anonymous. Data about users, including what they shared with Crisis Text Line’s volunteers, had been provided and sold to an entirely different company called Loris.ai, a tech startup that specializes in artificial intelligence software for human resources and customer service. The report brought to light a troubling relationship between the two organizations. Both had previously been headed by the same CEO, Nancy Lublin. In 2019, however, Lublin had stepped down from Loris, and in 2020 Crisis Text Line’s board ousted her following allegations that she had engaged in workplace racism.

But the troubles that enveloped Crisis Text Line can’t be blamed on one bad apple. Crisis Text Line’s board of directors had approved the relationship between the entities. In the technology and big data sectors, commodification of user data is fundamental to a platform or toolset’s economic survival, and by sharing data with Loris.ai, Crisis Text Line was able to provide needed services. The harsh reality revealed by the Politico report was that even mental healthcare is not immune from commodification, despite the risks of aggregating and sharing information about experiences and topics which continue to be stigmatized.

In the case of the Crisis Text Line-Loris.ai partnership, Loris used the nonprofit’s data to improve its own, for-profit development of machine learning algorithms sold to corporations and governments. Although Crisis Text Line maintains that all of the data shared with Loris was anonymized, the transactional nature of the relationship between the two was still fundamentally an economic one. As the Loris.ai website states, “Crisis Text Line is a Loris shareholder. Our success offers material benefit to CTL, helping this non-profit organization continue its important work. We believe this model is a blueprint for ways for-profit companies can infuse social good into their culture and operations, and for nonprofits to prosper.”…(More)”.

A.I. Is Mastering Language. Should We Trust What It Says?


Steven Johnson at the New York Times: “You are sitting in a comfortable chair by the fire, on a cold winter’s night. Perhaps you have a mug of tea in hand, perhaps something stronger. You open a magazine to an article you’ve been meaning to read. The title suggested a story about a promising — but also potentially dangerous — new technology on the cusp of becoming mainstream, and after reading only a few sentences, you find yourself pulled into the story. A revolution is coming in machine intelligence, the author argues, and we need, as a society, to get better at anticipating its consequences. But then the strangest thing happens: You notice that the writer has, seemingly deliberately, omitted the very last word of the first .

The missing word jumps into your consciousness almost unbidden: ‘‘the very last word of the first paragraph.’’ There’s no sense of an internal search query in your mind; the word ‘‘paragraph’’ just pops out. It might seem like second nature, this filling-in-the-blank exercise, but doing it makes you think of the embedded layers of knowledge behind the thought. You need a command of the spelling and syntactic patterns of English; you need to understand not just the dictionary definitions of words but also the ways they relate to one another; you have to be familiar enough with the high standards of magazine publishing to assume that the missing word is not just a typo, and that editors are generally loath to omit key words in published pieces unless the author is trying to be clever — perhaps trying to use the missing word to make a point about your cleverness, how swiftly a human speaker of English can conjure just the right word.

Before you can pursue that idea further, you’re back into the article, where you find the author has taken you to a building complex in suburban Iowa. Inside one of the buildings lies a wonder of modern technology: 285,000 CPU cores yoked together into one giant supercomputer, powered by solar arrays and cooled by industrial fans. The machines never sleep: Every second of every day, they churn through innumerable calculations, using state-of-the-art techniques in machine intelligence that go by names like ‘‘stochastic gradient descent’’ and ‘‘convolutional neural networks.’’ The whole system is believed to be one of the most powerful supercomputers on the planet.

And what, you may ask, is this computational dynamo doing with all these prodigious resources? Mostly, it is playing a kind of game, over and over again, billions of times a second. And the game is called: Guess what the missing word is.…(More)”.

Should we get rid of the scientific paper?


Article by Stuart Ritchie: “But although the internet has transformed the way we read it, the overall system for how we publish science remains largely unchanged. We still have scientific papers; we still send them off to peer reviewers; we still have editors who give the ultimate thumbs up or down as to whether a paper is published in their journal.

This system comes with big problems. Chief among them is the issue of publication bias: reviewers and editors are more likely to give a scientific paper a good write-up and publish it in their journal if it reports positive or exciting results. So scientists go to great lengths to hype up their studies, lean on their analyses so they produce “better” results, and sometimes even commit fraud in order to impress those all-important gatekeepers. This drastically distorts our view of what really went on.

There are some possible fixes that change the way journals work. Maybe the decision to publish could be made based only on the methodology of a study, rather than on its results (this is already happening to a modest extent in a few journals). Maybe scientists could just publish all their research by default, and journals would curate, rather than decide, which results get out into the world. But maybe we could go a step further, and get rid of scientific papers altogether.

Scientists are obsessed with papers – specifically, with having more papers published under their name, extending the crucial “publications” section of their CV. So it might sound outrageous to suggest we could do without them. But that obsession is the problem. Paradoxically, the sacred status of a published, peer-reviewed paper makes it harder to get the contents of those papers right.

Consider the messy reality of scientific research. Studies almost always throw up weird, unexpected numbers that complicate any simple interpretation. But a traditional paper – word count and all – pretty well forces you to dumb things down. If what you’re working towards is a big, milestone goal of a published paper, the temptation is ever-present to file away a few of the jagged edges of your results, to help “tell a better story”. Many scientists admit, in surveys, to doing just that – making their results into unambiguous, attractive-looking papers, but distorting the science along the way.

And consider corrections. We know that scientific papers regularly contain errors. One algorithm that ran through thousands of psychology papers found that, at worst, more than 50% had one specific statistical error, and more than 15% had an error serious enough to overturn the results. With papers, correcting this kind of mistake is a slog: you have to write in to the journal, get the attention of the busy editor, and get them to issue a new, short paper that formally details the correction. Many scientists who request corrections find themselves stonewalled or otherwise ignored by journals. Imagine the number of errors that litter the scientific literature that haven’t been corrected because to do so is just too much hassle.

Finally, consider data. Back in the day, sharing the raw data that formed the basis of a paper with that paper’s readers was more or less impossible. Now it can be done in a few clicks, by uploading the data to an open repository. And yet, we act as if we live in the world of yesteryear: papers still hardly ever have the data attached, preventing reviewers and readers from seeing the full picture.

The solution to all these problems is the same as the answer to “How do I organise my journals if I don’t use cornflakes boxes?” Use the internet. We can change papers into mini-websites (sometimes called “notebooks”) that openly report the results of a given study. Not only does this give everyone a view of the full process from data to analysis to write-up – the dataset would be appended to the website along with all the statistical code used to analyse it, and anyone could reproduce the full analysis and check they get the same numbers – but any corrections could be made swiftly and efficiently, with the date and time of all updates publicly logged…(More)”.

Cities Take the Lead in Setting Rules Around How AI Is Used


Jackie Snow at the Wall Street Journal: “As cities and states roll out algorithms to help them provide services like policing and traffic management, they are also racing to come up with policies for using this new technology.

AI, at its worst, can disadvantage already marginalized groups, adding to human-driven bias in hiring, policing and other areas. And its decisions can often be opaque—making it difficult to tell how to fix that bias, as well as other problems. (The Wall Street Journal discussed calls for regulation of AI, or at least greater transparency about how the systems work, with three experts.)

Cities are looking at a number of solutions to these problems. Some require disclosure when an AI model is used in decisions, while others mandate audits of algorithms, track where AI causes harm or seek public input before putting new AI systems in place.

Here are some ways cities are redefining how AI will work within their borders and beyond.

Explaining the algorithms: Amsterdam and Helsinki

One of the biggest complaints against AI is that it makes decisions that can’t be explained, which can lead to complaints about arbitrary or even biased results.

To let their citizens know more about the technology already in use in their cities, Amsterdam and Helsinki collaborated on websites that document how each city government uses algorithms to deliver services. The registry includes information on the data sets used to train an algorithm, a description of how an algorithm is used, how public servants use the results, the human oversight involved and how the city checks the technology for problems like bias.

Amsterdam has six algorithms fully explained—with a goal of 50 to 100—on the registry website, including how the city’s automated parking-control and trash-complaint reports work. Helsinki, which is only focusing on the city’s most advanced algorithms, also has six listed on its site, with another 10 to 20 left to put up.

“We needed to assess the risk ourselves,” says Linda van de Fliert, an adviser at Amsterdam’s Chief Technology Office. “And we wanted to show the world that it is possible to be transparent.”…(More)” See also AI Localism: The Responsible Use and Design of Artificial Intelligence at the Local Level

Russia Is Leaking Data Like a Sieve


Matt Burgess at Wired: “Names, birthdays, passport numbers, job titles—the personal information goes on for pages and looks like any typical data breach. But this data set is very different. It allegedly contains the personal information of 1,600 Russian troops who served in Bucha, a Ukrainian city devastated during Russia’s war and the scene of multiple potential war crimes.

The data set is not the only one. Another allegedly contains the names and contact details of 620 Russian spies who are registered to work at the Moscow office of the FSB, the country’s main security agency. Neither set of information was published by hackers. Instead they were put online by Ukraine’s intelligence services, with all the names and details freely available to anyone online. “Every European should know their names,” Ukrainian officials wrote in a Facebook post as they published the data.

Since Russian troops crossed Ukraine’s borders at the end of February, colossal amounts of information about the Russian state and its activities have been made public. The data offers unparalleled glimpses into closed-off private institutions, and it may be a gold mine for investigators, from journalists to those tasked with investigating war crimes. Broadly, the data comes in two flavors: information published proactively by Ukranian authorities or their allies, and information obtained by hacktivists. Hundreds of gigabytes of files and millions of emails have been made public.

“Both sides in this conflict are very good at information operations,” says Philip Ingram, a former colonel in British military intelligence. “The Russians are quite blatant about the lies that they’ll tell,” he adds. Since the war started, Russian disinformation has been consistently debunked. Ingram says Ukraine has to be more tactical with the information it publishes. “They have to make sure that what they’re putting out is credible and they’re not caught out telling lies in a way that would embarrass them or embarrass their international partners.”

Both the lists of alleged FSB officers and Russian troops were published online by Ukraine’s Central Intelligence Agency at the end of March and start of April, respectively. While WIRED has not been able to verify the accuracy of the data—and Ukrainian cybersecurity officials did not respond to a request for comment—Aric Toler, from investigative outlet Bellingcat, tweeted that the FSB details appear to have been combined from previous leaks and open source information. It is unclear how up-to-date the information is…(More)”.

The Power of Narrative


Essay by Klaus Schwab and Thierry Mallerett: “…The expression “failure of imagination” captures this by describing the expectation that future opportunities and risks will resemble those of the past. Novelist Graham Greene used it in The Power and the Glory, but the 9/11 Commission made it popular by invoking it as the main reason why intelligence agencies had failed to anticipate the “unimaginable” events of that day.

Ever since, the expression has been associated with situations in which strategic thinking and risk management are stuck in unimaginative and reactive thinking. Considering today’s wide and interdependent array of risks, we can’t afford to be unimaginative, even though, as the astrobiologist Caleb Scharf points out, we risk getting imprisoned in a dangerous cognitive lockdown because of the magnitude of the task. “Indeed, we humans do seem to struggle in general when too many new things are thrown at us at once. Especially when those things are outside of our normal purview. Like, well, weird viruses or new climate patterns,” Scharf writes. “In the face of such things, we can simply go into a state of cognitive lockdown, flipping from one small piece of the problem to another and not quite building a cohesive whole.”

Imagination is precisely what is required to escape a state of “cognitive lockdown” and to build a “cohesive whole.” It gives us the capacity to dream up innovative solutions to successfully address the multitude of risks that confront us. For decades now, we’ve been destabilizing the world, having failed to imagine the consequences of our actions on our societies and our biosphere, and the way in which they are connected. Now, following this failure and the stark realization of what it has entailed, we need to do just the opposite: rely on the power of imagination to get us out of the holes we’ve dug ourselves into. It is incumbent upon us to imagine the contours of a more equitable and sustainable world. Imagination being boundless, the variety of social, economic, and political solutions is infinite.

With respect to the assertion that there are things we don’t imagine to be socially or politically possible, a recent book shows that nothing is preordained. We are in fact only bound by the power of our own imaginations. In The Dawn of Everything, David Graeber and David Wengrow (an anthropologist and an archaeologist) prove this by showing that every imaginable form of social and economic organization has existed from the very beginning of humankind. Over the past 300,000 years, we’ve pursued knowledge, experimentation, happiness, development, freedom, and other human endeavors in myriad different ways. During these times that preceded our modern world, none of the arrangements that we devised to live together exhibited a single point of origin or an invariant pattern. Early societies were peaceful and violent, authoritarian and democratic, patriarchal and matriarchal, slaveholding and abolitionist, some moving between different types of organizations all the time, others not. Antique industrial cities were flourishing at the heart of empires while others existed in the absence of a sovereign entity…(More)”

Opening up Science—to Skeptics


Essay by Rohan R. Arcot  and Hunter Gehlbach: “Recently, the soaring trajectory of science skepticism seems to be rivaled only by global temperatures. Empirically established facts—around vaccines, elections, climate science, and the like—face potent headwinds. Despite the scientific consensus on these issues, much of the public remains unconvinced. In turn, science skepticism threatens our health, the health of our democracy, and the health of our planet.  

The research community is no stranger to skepticism. Its own members have been questioning the integrity of many scientific findings with particular intensity of late. In response, we have seen a swell of open science norms and practices, which provide greater transparency about key procedural details of the research process, mitigating many research skeptics’ misgivings. These open practices greatly facilitate how science is communicated—but only between scientists. 

Given the present historical moment’s critical need for science, we wondered: What if scientists allowed skeptics in the general public to look under the hood at how their studies were conducted? Could opening up the basic ideas of open science beyond scholars help combat the epidemic of science skepticism?  

Intrigued by this possibility, we sought a qualified skeptic and returned to Rohan’s father. If we could chaperone someone through a scientific journey—a person who could vicariously experience the key steps along the way—could our openness assuage their skepticism?…(More)”.