The Lancet Countdown: Tracking progress on health and climate change using data from the International Energy Agency (IEA)


Victoria Moody at the UK Data Service: “The 2015 Lancet Commission on Health and Climate Change—which assessed responses to climate change with a view to ensuring the highest attainable standards of health for populations worldwide—concluded that “tackling climate change could be the greatest global health opportunity of the 21st century”. The Commission recommended that more accurate national quantification of the health co-benefits and economic impacts of mitigation decisions was essential in promoting a low-carbon transition.

Building on these foundations, the Lancet Countdown: tracking progress on health and climate change was formed as an independent research collaboration…

The partnership comprises 24 academic institutions from every continent, bringing together individuals with a broad range of expertise across disciplines (including climate scientists, ecologists, mathematicians, geographers, engineers, energy, food, and transport experts, economists, social and political scientists, public health professionals, and physicians).

Four of the indicators developed for Working Group 3 (Mitigation actions and health co-benefits) uses International Energy Agency (IEA) data made available by the the IEA via the UK Data Service for use by researchers, learners and teaching staff in UK higher and further education. Additionally, two of the indicators developed for Working Group 4 (Finance and economics) also use IEA data.

Read our impact case study to find our more about the impact and reach of the Lancet Countdown, watch the YouTube film below, read the Lancet Countdown 2018 Report …(More)”

https://web.archive.org/web/2000/https://www.youtube.com/watch?v=moYzcYNX1iM

Claudette: an automated detector of potentially unfair clauses in online terms of service


Marco Lippi et al in AI and the Law Journal: “Terms of service of on-line platforms too often contain clauses that are potentially unfair to the consumer. We present an experimental study where machine learning is employed to automatically detect such potentially unfair clauses. Results show that the proposed system could provide a valuable tool for lawyers and consumers alike….(More)”.

Responsible AI for conservation


Oliver Wearn, RobinFreeman and David Jacoby in Nature: “Machine learning (ML) is revolutionizing efforts to conserve nature. ML algorithms are being applied to predict the extinction risk of thousands of species, assess the global footprint of fisheries, and identify animals and humans in wildlife sensor data recorded in the field. These efforts have recently been given a huge boost with support from the commercial sector. New initiatives, such as Microsoft’s AI for Earth and Google’s AI for Social Good, are bringing new resources and new ML tools to bear on some of the biggest challenges in conservation. In parallel to this, the open data revolution means that global-scale, conservation-relevant datasets can be fed directly to ML algorithms from open data repositories, such as Google Earth Engine for satellite data or Movebank for animal tracking data. Added to these will be Wildlife Insights, a Google-supported platform for hosting and analysing wildlife sensor data that launches this year. With new tools and a proliferation of data comes a bounty of new opportunities, but also new responsibilities….(More)”

Weather Service prepares to launch prediction model many forecasters don’t trust


Jason Samenow in the Washington Post: “In a month, the National Weather Service plans to launch its “next generation” weather prediction model with the aim of “better, more timely forecasts.” But many meteorologists familiar with the model fear it is unreliable.

The introduction of a model that forecasters lack confidence in matters, considering the enormous impact that weather has on the economy, valued at around $485 billion annually.

The Weather Service announced Wednesday that the model, known as the GFS-FV3 (FV3 stands for Finite­ Volume Cubed-Sphere dynamical core), is “tentatively” set to become the United States’ primary forecast model on March 20, pending tests. It is an update to the current version of the GFS (Global Forecast System), popularly known as the American model, which has existed in various forms for more than 30 years….

A concern is that if forecasters cannot rely on the FV3, they will be left to rely only on the European model for their predictions without a credible alternative for comparisons. And they’ll also have to pay large fees for the European model data. Whereas model data from the Weather Service is free, the European Center for Medium-Range Weather Forecasts, which produces the European model, charges for access.

But there is an alternative perspective, which is that forecasters will just need to adjust to the new model and learn to account for its biases. That is, a little short-term pain is worth the long-term potential benefits as the model improves….

The Weather Service’s parent agency, the National Oceanic and Atmospheric Administration, recently entered an agreement with the National Center for Atmospheric Research to increase collaboration between forecasters and researchers in improving forecast modeling.

In addition, President Trump recently signed into law the Weather Research and Forecast Innovation Act Reauthorization, which establishes the NOAA Earth Prediction Innovation Center, aimed at further enhancing prediction capabilities. But even while NOAA develops relationships and infrastructure to improve the Weather Service’s modeling, the question remains whether the FV3 can meet the forecasting needs of the moment. Until the problems identified are addressed, its introduction could represent a step back in U.S. weather prediction despite a well-intended effort to leap forward….(More).

Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice


Paper by Rashida Richardson, Jason Schultz, and Kate Crawford: “Law enforcement agencies are increasingly using algorithmic predictive policing systems to forecast criminal activity and allocate police resources. Yet in numerous jurisdictions, these systems are built on data produced within the context of flawed, racially fraught and sometimes unlawful practices (‘dirty policing’). This can include systemic data manipulation, falsifying police reports, unlawful use of force, planted evidence, and unconstitutional searches. These policing practices shape the environment and the methodology by which data is created, which leads to inaccuracies, skews, and forms of systemic bias embedded in the data (‘dirty data’). Predictive policing systems informed by such data cannot escape the legacy of unlawful or biased policing practices that they are built on. Nor do claims by predictive policing vendors that these systems provide greater objectivity, transparency, or accountability hold up. While some systems offer the ability to see the algorithms used and even occasionally access to the data itself, there is no evidence to suggest that vendors independently or adequately assess the impact that unlawful and bias policing practices have on their systems, or otherwise assess how broader societal biases may affect their systems.

In our research, we examine the implications of using dirty data with predictive policing, and look at jurisdictions that (1) have utilized predictive policing systems and (2) have done so while under government commission investigations or federal court monitored settlements, consent decrees, or memoranda of agreement stemming from corrupt, racially biased, or otherwise illegal policing practices. In particular, we examine the link between unlawful and biased police practices and the data used to train or implement these systems across thirteen case studies. We highlight three of these: (1) Chicago, an example of where dirty data was ingested directly into the city’s predictive system; (2) New Orleans, an example where the extensive evidence of dirty policing practices suggests an extremely high risk that dirty data was or will be used in any predictive policing application, and (3) Maricopa County where despite extensive evidence of dirty policing practices, lack of transparency and public accountability surrounding predictive policing inhibits the public from assessing the risks of dirty data within such systems. The implications of these findings have widespread ramifications for predictive policing writ large. Deploying predictive policing systems in jurisdictions with extensive histories of unlawful police practices presents elevated risks that dirty data will lead to flawed, biased, and unlawful predictions which in turn risk perpetuating additional harm via feedback loops throughout the criminal justice system. Thus, for any jurisdiction where police have been found to engage in such practices, the use of predictive policing in any context must be treated with skepticism and mechanisms for the public to examine and reject such systems are imperative….(More)”.

Should Libraries Be the Keepers of Their Cities’ Public Data?


Linda Poon at CityLab: “In recent years, dozens of U.S. cities have released pools of public data. It’s an effort to improve transparency and drive innovation, and done well, it can succeed at both: Governments, nonprofits, and app developers alike have eagerly gobbled up that data, hoping to improve everything from road conditions to air quality to food delivery.

But what often gets lost in the conversation is the idea of how public data should be collected, managed, and disseminated so that it serves everyone—rather than just a few residents—and so that people’s privacy and data rights are protected. That’s where librarians come in.

“As far as how private and public data should be handled, there isn’t really a strong model out there,” says Curtis Rogers, communications director for the Urban Library Council (ULC), an association of leading libraries across North America. “So to have the library as the local institution that is the most trusted, and to give them that responsibility, is a whole new paradigm for how data could be handled in a local government.”

In fact, librarians have long been advocates of digital inclusion and literacy. That’s why, last month, ULC launched a new initiative to give public libraries a leading role in a future with artificial intelligence. They kicked it off with a working group meeting in Washington, D.C., where representatives from libraries in cities like Baltimore, Toronto, Toledo, and Milwaukee met to exchange ideas on how to achieve that through education and by taking on a larger role in data governance.

It’s a broad initiative, and Rogers says they are still in the beginning stages of determining what that role will ultimately look like. But the group will discuss how data should be organized and managed, hash out the potential risks of artificial intelligence, and eventually develop a field-wide framework for how libraries can help drive equitable public data policies in cities.

Already, individual libraries are involved with their city’s data. Chattanooga Public Library (which wasn’t part of the working group, but is a member of ULC) began hosting the city’s open data portal in 2014, turning a traditionally print-centered institution into a community data hub. Since then, the portal has added more than 280 data sets and garnered hundreds of thousands of page views, according to a report for the 2018 fiscal year….

The Toronto Public Library is also in a unique position because it may soon sit inside one of North America’s “smartest” cities. Last month, the city’s board of trade published a 17-page report titled “BiblioTech,” calling for the library to oversee data governance for all smart city projects.

It’s a grand example of just how big the potential is for public libraries. Ryan says the proposal remains just that at the moment, and there are no details yet on what such a model would even look like. She adds that they were not involved in drafting the proposal, and were only asked to provide feedback. But the library is willing to entertain the idea.

Such ambitions would be a large undertaking in the U.S., however, especially for smaller libraries that are already understaffed and under-resourced. According to ULC’s survey of its members, only 23 percent of respondents said they have a staff person designated as the AI lead. A little over a quarter said they even have AI-related educational programming, and just 15 percent report being part of any local or national initiative.

Debbie Rabina, a professor of library science at Pratt Institute in New York, also cautions that putting libraries in charge of data governance has to be carefully thought out. It’s one thing for libraries to teach data literacy and privacy, and to help cities disseminate data. But to go further than that—to have libraries collecting and owning data and to have them assessing who can and can’t use the data—can lead to ethical conflicts and unintended consequences that could erode the public’s trust….(More)”.

7 things we’ve learned about computer algorithms


Aaron Smith at Pew Research Center: “Algorithms are all around us, using massive stores of data and complex analytics to make decisions with often significant impacts on humans – from choosing the content people see on social media to judging whether a person is a good credit risk or job candidate. Pew Research Center released several reports in 2018 that explored the role and meaning of algorithms in people’s lives today. Here are some of the key themes that emerged from that research.

  1. Algorithmically generated content platforms play a prominent role in Americans’ information diets. Sizable shares of U.S. adults now get news on sites like Facebook or YouTube that use algorithms to curate the content they show to their users. A study by the Center found that 81% of YouTube users say they at least occasionally watch the videos suggested by the platform’s recommendation algorithm, and that these recommendations encourage users to watch progressively longer content as they click through the videos suggested by the site.
  2. The inner workings of even the most common algorithms can be confusing to users. Facebook is among the most popular social media platforms, but roughly half of Facebook users – including six-in-ten users ages 50 and older – say they do not understand how the site’s algorithmically generated news feed selects which posts to show them. And around three-quarters of Facebook users are not aware that the site automatically estimates their interests and preferences based on their online behaviors in order to deliver them targeted advertisements and other content.
  3. The public is wary of computer algorithms being used to make decisions with real-world consequences. The public expresses widespread concern about companies and other institutions using computer algorithms in situations with potential impacts on people’s lives. More than half (56%) of U.S. adults think it is unacceptable to use automated criminal risk scores when evaluating people who are up for parole. And 68% think it is unacceptable for companies to collect large quantities of data about individuals for the purposes of offering them deals or other financial incentives. When asked to elaborate about their worries, many feel that these programs violate people’s privacy, are unfair, or simply will not work as well as decisions made by humans….(More)”.

Achieving Digital Permanence


Raymond Blum with Betsy Beyer at ACM Queu: “Digital permanence has become a prevalent issue in society. This article focuses on the forces behind it and some of the techniques to achieve a desired state in which “what you read is what was written.” While techniques that can be imposed as layers above basic data stores—blockchains, for example—are valid approaches to achieving a system’s information assurance guarantees, this article won’t discuss them.

First, let’s define digital permanence and the more basic concept of data integrity.

Data integrity is the maintenance of the accuracy and consistency of stored information. Accuracy means that the data is stored as the set of values that were intended. Consistency means that these stored values remain the same over time—they do not unintentionally waver or morph as time passes.

Digital permanence refers to the techniques used to anticipate and then meet the expected lifetime of data stored in digital media. Digital permanence not only considers data integrity, but also targets guarantees of relevance and accessibility: the ability to recall stored data and to recall it with predicted latency and at a rate acceptable to the applications that require that information.

To illustrate the aspects of relevance and accessibility, consider two counterexamples: journals that were safely stored redundantly on Zip drives or punch cards may as well not exist if the hardware required to read the media into a current computing system isn’t available. Nor is it very useful to have receipts and ledgers stored on a tape medium that will take eight days to read in when you need the information for an audit on Thursday.

The Multiple Facets of Digital Permanence

Human memory is the most subjective record imaginable. Common adages and clichés such as “He said, she said,” “IIRC (If I remember correctly),” and “You might recall” recognize the truth of memories—that they are based only on fragments of the one-time subjective perception of any objective state of affairs. What’s more, research indicates that people alter their memories over time. Over the years, as the need to provide a common ground for actions based on past transactions arises, so does the need for an objective record of fact—an independent “true” past. These records must be both immutable to a reasonable degree and durable. Media such as clay tablets, parchment, photographic prints, and microfiche became popular because they satisfied the “write once, read many” requirement of society’s record keepers.

Information storage in the digital age has evolved to fit the scale of access (frequent) and volume (high) by moving to storage media that record and deliver information in an almost intangible state. Such media have distinct advantages: electrical impulses and the polarity of magnetized ferric compounds can be moved around at great speed and density. These media, unfortunately, also score higher in another measure: fragility. Paper and clay can survive large amounts of neglect and punishment, but a stray electromagnetic discharge or microscopic rupture can render a digital library inaccessible or unrecognizable.

It stands to reason that storing permanent records in some immutable and indestructible medium would be ideal—something that, once altered to encode information, could never be altered again, either by an overwrite or destruction. Experience shows that such ideals are rarely realized; with enough force and will, the hardest stone can be broken and the most permanent markings defaced.

In considering and ensuring digital permanence, you want to guard against two different failures: the destruction of the storage medium, and a loss of the integrity or “truthfulness” of the records….(More)”.

Decoding Algorithms


Malcalester University: “Ada Lovelace probably didn’t foresee the impact of the mathematical formula she published in 1843, now considered the first computer algorithm.

Nor could she have anticipated today’s widespread use of algorithms, in applications as different as the 2016 U.S. presidential campaign and Mac’s first-year seminar registration. “Over the last decade algorithms have become embedded in every aspect of our lives,” says Shilad Sen, professor in Macalester’s Math, Statistics, and Computer Science (MSCS) Department.

How do algorithms shape our society? Why is it important to be aware of them? And for readers who don’t know, what is an algorithm, anyway?…(More)”.

Leveraging and Sharing Data for Urban Flourishing


Testimony by Stefaan Verhulst before New York City Council Committee on Technology and the Commission on Public Information and Communication (COPIC): “We live in challenging times. From climate change to economic inequality, the difficulties confronting New York City, its citizens, and decision-makers are unprecedented in their variety, and also in their complexity and urgency. Our standard policy toolkit increasingly seems stale and ineffective. Existing governance institutions and mechanisms seem outdated and distrusted by large sections of the population.

To tackle today’s problems we need not only new solutions but also new methods for arriving at solutions. Data can play a central role in this task. Access to and the use of data in a trusted and responsible manner is central to meeting the challenges we face and enabling public innovation.

This hearing, called by the Technology Committee and the Commission on Public Information and Communication, is therefore timely and very important. It is my firm belief that rapid progress on developing an effective data sharing framework is among the most important steps our New York City leaders can take to tackle the myriad of 21st challenges....

I am joined today by some of my distinguished NYU colleagues, Prof. Julia Lane and Prof. Julia Stoyanovich, who have worked extensively on the technical and privacy challenges associated with data sharing. I will, therefore, avoid duplicating our testimonies and won’t focus on issues of privacy, trust and how to establish a responsible data sharing infrastructure, while these are central considerations for the type of data-driven approaches I will discuss. I am, of course, happy to elaborate on these topics during the question and answer session.

Instead, I want to focus on four core issues associated with data collaboration. I phrase these issues as answers to four questions. For each of these questions, I also provide a set of recommended actions that this Committee could consider undertaking or studying.

The four core questions are:

  • First, why should NYC care about data and data sharing?
  • Second, if you build a data-sharing framework, will they come?
  • Third, how can we best engage the private sector when it comes to sharing and using their data?
  • And fourth, is technology is the main (or best) answer?…(More)”.