Tomorrow’s Data Heroes


Article by Florian GrönePierre Péladeau, and Rawia Abdel Samad: “Telecom companies are struggling to find a profitable identity in today’s digital sphere. What about helping customers control their information?…

By 2025, Alex had had enough. There no longer seemed to be any distinction between her analog and digital lives. Everywhere she went, every purchase she completed, and just about every move she made, from exercising at the gym to idly surfing the Web, triggered a vast flow of data. That in turn meant she was bombarded with personalized advertising messages, targeted more and more eerily to her. As she walked down the street, messages appeared on her phone about the stores she was passing. Ads popped up on her all-purpose tablet–computer–phone pushing drugs for minor health problems she didn’t know she had — until the symptoms appeared the next day. Worse, she had recently learned that she was being reassigned at work. An AI machine had mastered her current job by analyzing her use of the firm’s productivity software.

It was as if the algorithms of global companies knew more about her than she knew herself — and they probably did. How was it that her every action and conversation, even her thoughts, added to the store of data held about her? After all, it was her data: her preferences, dislikes, interests, friendships, consumer choices, activities, and whereabouts — her very identity — that was being collected, analyzed, profited from, and even used to manage her. All these companies seemed to be making money buying and selling this information. Why shouldn’t she gain some control over the data she generated, and maybe earn some cash by selling it to the companies that had long collected it free of charge?

So Alex signed up for the “personal data manager,” a new service that promised to give her control over her privacy and identity. It was offered by her U.S.-based connectivity company (in this article, we’ll call it DigiLife, but it could be one of many former telephone companies providing Internet services in 2025). During the previous few years, DigiLife had transformed itself into a connectivity hub: a platform that made it easier for customers to join, manage, and track interactions with media and software entities across the online world. Thanks to recently passed laws regarding digital identity and data management, including the “right to be forgotten,” the DigiLife data manager was more than window dressing. It laid out easy-to-follow choices that all Web-based service providers were required by law to honor….

Today, in 2019, personal data management applications like the one Alex used exist only in nascent form, and consumers have yet to demonstrate that they trust these services. Nor can they yet profit by selling their data. But the need is great, and so is the opportunity for companies that fulfill it. By 2025, the total value of the data economy as currently structured will rise to more than US$400 billion, and by monetizing the vast amounts of data they produce, consumers can potentially recapture as much as a quarter of that total.

Given the critical role of telecom operating companies within the digital economy — the central position of their data networks, their networking capabilities, their customer relationships, and their experience in government affairs — they are in a good position to seize this business opportunity. They might not do it alone; they are likely to form consortia with software companies or other digital partners. Nonetheless, for legacy connectivity companies, providing this type of service may be the most sustainable business option. It may also be the best option for the rest of us, as we try to maintain control in a digital world flooded with our personal data….(More)”.

Governance of artificial intelligence and personal health information


Jenifer Sunrise Winter in Digital Policy, Regulation and Governance: “This paper aims to assess the increasing challenges to governing the personal health information (PHI) essential for advancing artificial intelligence (AI) machine learning innovations in health care. Risks to privacy and justice/equity are discussed, along with potential solutions….

This paper argues that these characteristics of machine learning will overwhelm existing data governance approaches such as privacy regulation and informed consent. Enhanced governance techniques and tools will be required to help preserve the autonomy and rights of individuals to control their PHI. Debate among all stakeholders and informed critique of how, and for whom, PHI-fueled health AI are developed and deployed are needed to channel these innovations in societally beneficial directions.

Health data may be used to address pressing societal concerns, such as operational and system-level improvement, and innovations such as personalized medicine. This paper informs work seeking to harness these resources for societal good amidst many competing value claims and substantial risks for privacy and security….(More).

The Role of Big Data Analytics in Predicting Suicide


Chapter by Ronald C. Kessler et al: “…reviews the long history of using electronic medical records and other types of big data to predict suicide. Although a number of the most recent of these studies used machine learning (ML) methods, these studies were all suboptimal both in the features used as predictors and in the analytic approaches used to develop the prediction models. We review these limitations and describe opportunities for making improvements in future applications.

We also review the controversy among clinical experts about using structured suicide risk assessment tools (be they based on ML or older prediction methods) versus in-depth clinical evaluations of needs for treatment planning. Rather than seeing them as competitors, we propose integrating these different approaches to capitalize on their complementary strengths. We also emphasize the distinction between two types of ML analyses: those aimed at predicting which patients are at highest suicide risk, and those aimed at predicting the treatment options that will be best for individual patients. We explain why both are needed to optimize the value of big data ML methods in addressing the suicide problem….(More)”.

See also How Search Engine Data Enhance the Understanding of Determinants of Suicide in India and Inform Prevention: Observational Study.

Open-Data: A Solution When Data Constitutes an Essential Facility?


Chapter by Claire Borsenberger, Mathilde Hoang and Denis Joram: “Thanks to appropriate data algorithms, firms, especially those on-line, are able to extract detailed knowledge about consumers and markets. This raises the question of the essential facility character of data. Moreover, the features of digital markets lead to a concentration of this core input in the hands of few big “superstars” and arouse legitimate economic and societal concerns. In a more and more data-driven society, one could ask if data openness is a solution to deal with power derived from data concentration. We conclude that only a case-by-case approach should be followed. Mandatory open data policy should be conditioned on an ex-ante cost-benefit analysis proving that the benefits of disclosure exceed its costs….(More)”.

The Lancet Countdown: Tracking progress on health and climate change using data from the International Energy Agency (IEA)


Victoria Moody at the UK Data Service: “The 2015 Lancet Commission on Health and Climate Change—which assessed responses to climate change with a view to ensuring the highest attainable standards of health for populations worldwide—concluded that “tackling climate change could be the greatest global health opportunity of the 21st century”. The Commission recommended that more accurate national quantification of the health co-benefits and economic impacts of mitigation decisions was essential in promoting a low-carbon transition.

Building on these foundations, the Lancet Countdown: tracking progress on health and climate change was formed as an independent research collaboration…

The partnership comprises 24 academic institutions from every continent, bringing together individuals with a broad range of expertise across disciplines (including climate scientists, ecologists, mathematicians, geographers, engineers, energy, food, and transport experts, economists, social and political scientists, public health professionals, and physicians).

Four of the indicators developed for Working Group 3 (Mitigation actions and health co-benefits) uses International Energy Agency (IEA) data made available by the the IEA via the UK Data Service for use by researchers, learners and teaching staff in UK higher and further education. Additionally, two of the indicators developed for Working Group 4 (Finance and economics) also use IEA data.

Read our impact case study to find our more about the impact and reach of the Lancet Countdown, watch the YouTube film below, read the Lancet Countdown 2018 Report …(More)”

https://web.archive.org/web/2000/https://www.youtube.com/watch?v=moYzcYNX1iM

Claudette: an automated detector of potentially unfair clauses in online terms of service


Marco Lippi et al in AI and the Law Journal: “Terms of service of on-line platforms too often contain clauses that are potentially unfair to the consumer. We present an experimental study where machine learning is employed to automatically detect such potentially unfair clauses. Results show that the proposed system could provide a valuable tool for lawyers and consumers alike….(More)”.

Responsible AI for conservation


Oliver Wearn, RobinFreeman and David Jacoby in Nature: “Machine learning (ML) is revolutionizing efforts to conserve nature. ML algorithms are being applied to predict the extinction risk of thousands of species, assess the global footprint of fisheries, and identify animals and humans in wildlife sensor data recorded in the field. These efforts have recently been given a huge boost with support from the commercial sector. New initiatives, such as Microsoft’s AI for Earth and Google’s AI for Social Good, are bringing new resources and new ML tools to bear on some of the biggest challenges in conservation. In parallel to this, the open data revolution means that global-scale, conservation-relevant datasets can be fed directly to ML algorithms from open data repositories, such as Google Earth Engine for satellite data or Movebank for animal tracking data. Added to these will be Wildlife Insights, a Google-supported platform for hosting and analysing wildlife sensor data that launches this year. With new tools and a proliferation of data comes a bounty of new opportunities, but also new responsibilities….(More)”

Weather Service prepares to launch prediction model many forecasters don’t trust


Jason Samenow in the Washington Post: “In a month, the National Weather Service plans to launch its “next generation” weather prediction model with the aim of “better, more timely forecasts.” But many meteorologists familiar with the model fear it is unreliable.

The introduction of a model that forecasters lack confidence in matters, considering the enormous impact that weather has on the economy, valued at around $485 billion annually.

The Weather Service announced Wednesday that the model, known as the GFS-FV3 (FV3 stands for Finite­ Volume Cubed-Sphere dynamical core), is “tentatively” set to become the United States’ primary forecast model on March 20, pending tests. It is an update to the current version of the GFS (Global Forecast System), popularly known as the American model, which has existed in various forms for more than 30 years….

A concern is that if forecasters cannot rely on the FV3, they will be left to rely only on the European model for their predictions without a credible alternative for comparisons. And they’ll also have to pay large fees for the European model data. Whereas model data from the Weather Service is free, the European Center for Medium-Range Weather Forecasts, which produces the European model, charges for access.

But there is an alternative perspective, which is that forecasters will just need to adjust to the new model and learn to account for its biases. That is, a little short-term pain is worth the long-term potential benefits as the model improves….

The Weather Service’s parent agency, the National Oceanic and Atmospheric Administration, recently entered an agreement with the National Center for Atmospheric Research to increase collaboration between forecasters and researchers in improving forecast modeling.

In addition, President Trump recently signed into law the Weather Research and Forecast Innovation Act Reauthorization, which establishes the NOAA Earth Prediction Innovation Center, aimed at further enhancing prediction capabilities. But even while NOAA develops relationships and infrastructure to improve the Weather Service’s modeling, the question remains whether the FV3 can meet the forecasting needs of the moment. Until the problems identified are addressed, its introduction could represent a step back in U.S. weather prediction despite a well-intended effort to leap forward….(More).

Dirty Data, Bad Predictions: How Civil Rights Violations Impact Police Data, Predictive Policing Systems, and Justice


Paper by Rashida Richardson, Jason Schultz, and Kate Crawford: “Law enforcement agencies are increasingly using algorithmic predictive policing systems to forecast criminal activity and allocate police resources. Yet in numerous jurisdictions, these systems are built on data produced within the context of flawed, racially fraught and sometimes unlawful practices (‘dirty policing’). This can include systemic data manipulation, falsifying police reports, unlawful use of force, planted evidence, and unconstitutional searches. These policing practices shape the environment and the methodology by which data is created, which leads to inaccuracies, skews, and forms of systemic bias embedded in the data (‘dirty data’). Predictive policing systems informed by such data cannot escape the legacy of unlawful or biased policing practices that they are built on. Nor do claims by predictive policing vendors that these systems provide greater objectivity, transparency, or accountability hold up. While some systems offer the ability to see the algorithms used and even occasionally access to the data itself, there is no evidence to suggest that vendors independently or adequately assess the impact that unlawful and bias policing practices have on their systems, or otherwise assess how broader societal biases may affect their systems.

In our research, we examine the implications of using dirty data with predictive policing, and look at jurisdictions that (1) have utilized predictive policing systems and (2) have done so while under government commission investigations or federal court monitored settlements, consent decrees, or memoranda of agreement stemming from corrupt, racially biased, or otherwise illegal policing practices. In particular, we examine the link between unlawful and biased police practices and the data used to train or implement these systems across thirteen case studies. We highlight three of these: (1) Chicago, an example of where dirty data was ingested directly into the city’s predictive system; (2) New Orleans, an example where the extensive evidence of dirty policing practices suggests an extremely high risk that dirty data was or will be used in any predictive policing application, and (3) Maricopa County where despite extensive evidence of dirty policing practices, lack of transparency and public accountability surrounding predictive policing inhibits the public from assessing the risks of dirty data within such systems. The implications of these findings have widespread ramifications for predictive policing writ large. Deploying predictive policing systems in jurisdictions with extensive histories of unlawful police practices presents elevated risks that dirty data will lead to flawed, biased, and unlawful predictions which in turn risk perpetuating additional harm via feedback loops throughout the criminal justice system. Thus, for any jurisdiction where police have been found to engage in such practices, the use of predictive policing in any context must be treated with skepticism and mechanisms for the public to examine and reject such systems are imperative….(More)”.

Should Libraries Be the Keepers of Their Cities’ Public Data?


Linda Poon at CityLab: “In recent years, dozens of U.S. cities have released pools of public data. It’s an effort to improve transparency and drive innovation, and done well, it can succeed at both: Governments, nonprofits, and app developers alike have eagerly gobbled up that data, hoping to improve everything from road conditions to air quality to food delivery.

But what often gets lost in the conversation is the idea of how public data should be collected, managed, and disseminated so that it serves everyone—rather than just a few residents—and so that people’s privacy and data rights are protected. That’s where librarians come in.

“As far as how private and public data should be handled, there isn’t really a strong model out there,” says Curtis Rogers, communications director for the Urban Library Council (ULC), an association of leading libraries across North America. “So to have the library as the local institution that is the most trusted, and to give them that responsibility, is a whole new paradigm for how data could be handled in a local government.”

In fact, librarians have long been advocates of digital inclusion and literacy. That’s why, last month, ULC launched a new initiative to give public libraries a leading role in a future with artificial intelligence. They kicked it off with a working group meeting in Washington, D.C., where representatives from libraries in cities like Baltimore, Toronto, Toledo, and Milwaukee met to exchange ideas on how to achieve that through education and by taking on a larger role in data governance.

It’s a broad initiative, and Rogers says they are still in the beginning stages of determining what that role will ultimately look like. But the group will discuss how data should be organized and managed, hash out the potential risks of artificial intelligence, and eventually develop a field-wide framework for how libraries can help drive equitable public data policies in cities.

Already, individual libraries are involved with their city’s data. Chattanooga Public Library (which wasn’t part of the working group, but is a member of ULC) began hosting the city’s open data portal in 2014, turning a traditionally print-centered institution into a community data hub. Since then, the portal has added more than 280 data sets and garnered hundreds of thousands of page views, according to a report for the 2018 fiscal year….

The Toronto Public Library is also in a unique position because it may soon sit inside one of North America’s “smartest” cities. Last month, the city’s board of trade published a 17-page report titled “BiblioTech,” calling for the library to oversee data governance for all smart city projects.

It’s a grand example of just how big the potential is for public libraries. Ryan says the proposal remains just that at the moment, and there are no details yet on what such a model would even look like. She adds that they were not involved in drafting the proposal, and were only asked to provide feedback. But the library is willing to entertain the idea.

Such ambitions would be a large undertaking in the U.S., however, especially for smaller libraries that are already understaffed and under-resourced. According to ULC’s survey of its members, only 23 percent of respondents said they have a staff person designated as the AI lead. A little over a quarter said they even have AI-related educational programming, and just 15 percent report being part of any local or national initiative.

Debbie Rabina, a professor of library science at Pratt Institute in New York, also cautions that putting libraries in charge of data governance has to be carefully thought out. It’s one thing for libraries to teach data literacy and privacy, and to help cities disseminate data. But to go further than that—to have libraries collecting and owning data and to have them assessing who can and can’t use the data—can lead to ethical conflicts and unintended consequences that could erode the public’s trust….(More)”.