DATA – Page 321 – The Living Library

Assessing the Legitimacy of “Open” and “Closed” Data Partnerships for Sustainable Development

Curated on February 13, 2019February 13, 2019 by Stefaan Verhulst

Paper by Andreas Rasche, Mette Morsing and Erik Wetter in Business and Society: “This article examines the legitimacy attached to different types of multi-stakeholder data partnerships occurring in the context of sustainable development. We develop a framework to assess the democratic legitimacy of two types of data partnerships: open data partnerships (where data and insights are mainly freely available) and closed data partnerships (where data and insights are mainly shared within a network of organizations). Our framework specifies criteria for assessing the legitimacy of relevant partnerships with regard to their input legitimacy as well as their output legitimacy. We demonstrate which particular characteristics of open and closed partnerships can be expected to influence an analysis of their input and output legitimacy….(More)”.

Evaluation and accurate diagnoses of pediatric diseases using artificial intelligence

Curated on February 12, 2019February 12, 2019 by Stefaan Verhulst

Paper by Huimin Xia et al in at Nature Medicine: “Artificial intelligence (AI)-based methods have emerged as powerful tools to transform medical care. Although machine learning classifiers (MLCs) have already demonstrated strong performance in image-based diagnoses, analysis of diverse and massive electronic health record (EHR) data remains challenging. Here, we show that MLCs can query EHRs in a manner similar to the hypothetico-deductive reasoning used by physicians and unearth associations that previous statistical methods have not found. Our model applies an automated natural language processing system using deep learning techniques to extract clinically relevant information from EHRs. In total, 101.6 million data points from 1,362,559 pediatric patient visits presenting to a major referral center were analyzed to train and validate the framework.

Our model demonstrates high diagnostic accuracy across multiple organ systems and is comparable to experienced pediatricians in diagnosing common childhood diseases. Our study provides a proof of concept for implementing an AI-based system as a means to aid physicians in tackling large amounts of data, augmenting diagnostic evaluations, and to provide clinical decision support in cases of diagnostic uncertainty or complexity. Although this impact may be most evident in areas where healthcare providers are in relative shortage, the benefits of such an AI system are likely to be universal….(More)”.

Show me the Data! A Systematic Mapping on Open Government Data Visualization

Curated on February 10, 2019February 13, 2019 by Stefaan Verhulst

Paper by André Eberhardt and Milene Selbach Silveira: “During the last years many government organizations have adopted Open Government Data policies to make their data publicly available. Although governments are having success on publishing their data, the availability of the datasets is not enough to people to make use of it due to lack of technical expertise such as programming skills and knowledge on data management. In this scenario, Visualization Techniques can be applied to Open Government Data in order to help to solve this problem.

In this sense, we analyzed previously published papers related to Open Government Data Visualization in order to provide an overview about how visualization techniques are being applied to Open Government Data and which are the most common challenges when dealing with it. A systematic mapping study was conducted to survey the papers that were published in this area. The study found 775 papers and, after applying all inclusion and exclusion criteria, 32 papers were selected. Among other results, we found that datasets related to transportation are the main ones being used and Map is the most used visualization technique. Finally, we report that data quality is the main challenge being reported by studies that applied visualization techniques to Open Government Data…(More)”.

Urban Computing

Curated on February 10, 2019February 14, 2019 by Stefaan Verhulst

Book by Yu Zheng:”…Urban computing brings powerful computational techniques to bear on such urban challenges as pollution, energy consumption, and traffic congestion. Using today’s large-scale computing infrastructure and data gathered from sensing technologies, urban computing combines computer science with urban planning, transportation, environmental science, sociology, and other areas of urban studies, tackling specific problems with concrete methodologies in a data-centric computing framework. This authoritative treatment of urban computing offers an overview of the field, fundamental techniques, advanced models, and novel applications.

Each chapter acts as a tutorial that introduces readers to an important aspect of urban computing, with references to relevant research. The book outlines key concepts, sources of data, and typical applications; describes four paradigms of urban sensing in sensor-centric and human-centric categories; introduces data management for spatial and spatio-temporal data, from basic indexing and retrieval algorithms to cloud computing platforms; and covers beginning and advanced topics in mining knowledge from urban big data, beginning with fundamental data mining algorithms and progressing to advanced machine learning techniques. Urban Computing provides students, researchers, and application developers with an essential handbook to an evolving interdisciplinary field….(More)”

This is how AI bias really happens—and why it’s so hard to fix

Curated on February 7, 2019February 7, 2019 by Stefaan Verhulst

Karen Hao at MIT Technology Review: “Over the past few months, we’ve documented how the vast majority of AI’s applications today are based on the category of algorithms known as deep learning, and how deep-learning algorithms find patterns in data. We’ve also covered how these technologies affect people’s lives: how they can perpetuate injustice in hiring, retail, and security and may already be doing so in the criminal legal system.

But it’s not enough just to know that this bias exists. If we want to be able to fix it, we need to understand the mechanics of how it arises in the first place.

How AI bias happens

We often shorthand our explanation of AI bias by blaming it on biased training data. The reality is more nuanced: bias can creep in long before the data is collected as well as at many other stages of the deep-learning process. For the purposes of this discussion, we’ll focus on three key stages.Sign up for the The AlgorithmArtificial intelligence, demystified

By signing up you agree to receive email newsletters and notifications from MIT Technology Review. You can change your preferences at any time. View our Privacy Policy for more detail.

Framing the problem. The first thing computer scientists do when they create a deep-learning model is decide what they actually want it to achieve. A credit card company, for example, might want to predict a customer’s creditworthiness, but “creditworthiness” is a rather nebulous concept. In order to translate it into something that can be computed, the company must decide whether it wants to, say, maximize its profit margins or maximize the number of loans that get repaid. It could then define creditworthiness within the context of that goal. The problem is that “those decisions are made for various business reasons other than fairness or discrimination,” explains Solon Barocas, an assistant professor at Cornell University who specializes in fairness in machine learning. If the algorithm discovered that giving out subprime loans was an effective way to maximize profit, it would end up engaging in predatory behavior even if that wasn’t the company’s intention.

Collecting the data. There are two main ways that bias shows up in training data: either the data you collect is unrepresentative of reality, or it reflects existing prejudices. The first case might occur, for example, if a deep-learning algorithm is fed more photos of light-skinned faces than dark-skinned faces. The resulting face recognition system would inevitably be worse at recognizing darker-skinned faces. The second case is precisely what happened when Amazon discovered that its internal recruiting tool was dismissing female candidates. Because it was trained on historical hiring decisions, which favored men over women, it learned to do the same.

Preparing the data. Finally, it is possible to introduce bias during the data preparation stage, which involves selecting which attributes you want the algorithm to consider. (This is not to be confused with the problem-framing stage. You can use the same attributes to train a model for very different goals or use very different attributes to train a model for the same goal.) In the case of modeling creditworthiness, an “attribute” could be the customer’s age, income, or number of paid-off loans. In the case of Amazon’s recruiting tool, an “attribute” could be the candidate’s gender, education level, or years of experience. This is what people often call the “art” of deep learning: choosing which attributes to consider or ignore can significantly influence your model’s prediction accuracy. But while its impact on accuracy is easy to measure, its impact on the model’s bias is not.

Why AI bias is hard to fix

Given that context, some of the challenges of mitigating bias may already be apparent to you. Here we highlight four main ones….(More)”

Fact-Based Policy: How Do State and Local Governments Accomplish It?

Curated on February 7, 2019February 7, 2019 by Stefaan Verhulst

Report and Proposal by Justine Hastings: “Fact-based policy is essential to making government more effective and more efficient, and many states could benefit from more extensive use of data and evidence when making policy. Private companies have taken advantage of declining computing costs and vast data resources to solve problems in a fact-based way, but state and local governments have not made as much progress….

Drawing on her experience in Rhode Island, Hastings proposes that states build secure, comprehensive, integrated databases, and that they transform those databases into data lakes that are optimized for developing insights. Policymakers can then use the insights from this work to sharpen policy goals, create policy solutions, and measure progress against those goals. Policymakers, computer scientists, engineers, and economists will work together to build the data lake and analyze the data to generate policy insights….(More)”.

Hundreds of Bounty Hunters Had Access to AT&T, T-Mobile, and Sprint Customer Location Data for Years

Curated on February 7, 2019 by Stefaan Verhulst

Joseph Cox at Motherboard: ” In January, Motherboard revealed that AT&T, T-Mobile, and Sprint were selling their customers’ real-time location data, which trickled down through a complex network of companies until eventually ending up in the hands of at least one bounty hunter. Motherboard was also able to purchase the real-time location of a T-Mobile phone on the black market from a bounty hunter source for $300. In response, telecom companies said that this abuse was a fringe case.

In reality, it was far from an isolated incident.

Around 250 bounty hunters and related businesses had access to AT&T, T-Mobile, and Sprint customer location data, with one bail bond firm using the phone location service more than 18,000 times, and others using it thousands or tens of thousands of times, according to internal documents obtained by Motherboard from a company called CerCareOne, a now-defunct location data seller that operated until 2017. The documents list not only the companies that had access to the data, but specific phone numbers that were pinged by those companies.

In some cases, the data sold is more sensitive than that offered by the service used by Motherboard last month, which estimated a location based on the cell phone towers that a phone connected to. CerCareOne sold cell phone tower data, but also sold highly sensitive and accurate GPS data to bounty hunters; an unprecedented move that means users could locate someone so accurately so as to see where they are inside a building. This company operated in near-total secrecy for over 5 years by making its customers agree to “keep the existence of CerCareOne.com confidential,” according to a terms of use document obtained by Motherboard.

Some of these bounty hunters then resold location data to those unauthorized to handle it, according to two independent sources familiar with CerCareOne’s operations.

The news shows how widely available Americans’ sensitive location data was to bounty hunters. This ease-of-access dramatically increased the risk of abuse….(More)”.

Artificial Intelligence and National Security

Curated on February 7, 2019February 7, 2019 by Stefaan Verhulst

Report by Congressional Research Service: “Artificial intelligence (AI) is a rapidly growing field of technology with potentially significant implications for national security. As such, the U.S. Department of Defense (DOD) and other nations are developing AI applications for a range of military functions. AI research is underway in the fields of intelligence collection and analysis, logistics, cyber operations, information operations, command and control, and in a variety of semi-autonomous and autonomous vehicles.

Already, AI has been incorporated into military operations in Iraq and Syria. Congressional action has the potential to shape the technology’s development further, with budgetary and legislative decisions influencing the growth of military applications as well as the pace of their adoption.

AI technologies present unique challenges for military integration, particularly because the bulk of AI development is happening in the commercial sector. Although AI is not unique in this regard, the defense acquisition process may need to be adapted for acquiring emerging technologies like AI.

In addition, many commercial AI applications must undergo significant modification prior to being functional for the military. A number of cultural issues also challenge AI acquisition, as some commercial AI companies are averse to partnering with DOD due to ethical concerns, and even within the department, there can be resistance to incorporating AI technology into existing weapons systems and processes.

Potential international rivals in the AI market are creating pressure for the United States to compete for innovative military AI applications. China is a leading competitor in this regard, releasing a plan in 2017 to capture the global lead in AI development by 2030. Currently, China is primarily focused on using AI to make faster and more well-informed decisions, as well as on developing a variety of autonomous military vehicles. Russia is also active in military AI development, with a primary focus on robotics. Although AI has the potential to impart a number of advantages in the military context, it may also introduce distinct challenges.

AI technology could, for example, facilitate autonomous operations, lead to more informed military decisionmaking, and increase the speed and scale of military action. However, it may also be unpredictable or vulnerable to unique forms of manipulation. As a result of these factors, analysts hold a broad range of opinions on how influential AI will be in future combat operations.

While a small number of analysts believe that the technology will have minimal impact, most believe that AI will have at least an evolutionary—if not revolutionary—effect….(More)”.

Smart Contracts and Their Identity Crisis

Curated on February 7, 2019February 7, 2019 by Stefaan Verhulst

Paper by Alvaro Gonzalez Rivas, Mariya Tsyganova and Eliza Mik: “Many expect Smart Contracts (SC’s) to disrupt the way contracts are done implying that SC have the potential to affect all commercial relationships. SC’s are automatization tools; therefore, proponents claim that SC’s can reduce transaction costs through disintermediation and risk reduction.

This is an over-simplification of the role of relationships, contract law, and risk. We believe there is a gap in the understanding of the capabilities of SC’s. With that in mind we seek to define an amorphous term and clarify the capabilities of SC’s, intending to facilitate future SC research. We’ve examined the legal, technical, and IS views from an academic and practitioner’s perspective. We conclude that SC’s have taken many forms, becoming a suitcase word for any sort of code stored on a blockchain, including the embodiment of contractual terms; and that the immutable nature of SC’s is a barrier to their adoption in uncertain and multi-contextual environments….(More)”.

Using Personal Informatics Data in Collaboration among People with Different Expertise

Curated on February 6, 2019February 6, 2019 by Stefaan Verhulst

Dissertation by Chia-Fang Chung: “Many people collect and analyze data about themselves to improve their health and wellbeing. With the prevalence of smartphones and wearable sensors, people are able to collect detailed and complex data about their everyday behaviors, such as diet, exercise, and sleep. This everyday behavioral data can support individual health goals, help manage health conditions, and complement traditional medical examinations conducted in clinical visits. However, people often need support to interpret this self-tracked data. For example, many people share their data with health experts, hoping to use this data to support more personalized diagnosis and recommendations as well as to receive emotional support. However, when attempting to use this data in collaborations, people and their health experts often struggle to make sense of the data. My dissertation examines how to support collaborations between individuals and health experts using personal informatics data.

My research builds an empirical understanding of individual and collaboration goals around using personal informatics data, current practices of using this data to support collaboration, and challenges and expectations for integrating the use of this data into clinical workflows. These understandings help designers and researchers advance the design of personal informatics systems as well as the theoretical understandings of patient-provider collaboration.

Based on my formative work, I propose design and theoretical considerations regarding interactions between individuals and health experts mediated by personal informatics data. System designers and personal informatics researchers need to consider collaborations occurred throughout the personal tracking process. Patient-provider collaboration might influence individual decisions to track and to review, and systems supporting this collaboration need to consider individual and collaborative goals as well as support communication around these goals. Designers and researchers should also attend to individual privacy needs when personal informatics data is shared and used across different healthcare contexts. With these design guidelines in mind, I design and develop Foodprint, a photo-based food diary and visualization system. I also conduct field evaluations to understand the use of lightweight data collection and integration to support collaboration around personal informatics data. Findings from these field deployments indicate that photo-based visualizations allow both participants and health experts to easily understand eating patterns relevant to individual health goals. Participants and health experts can then focus on individual health goals and questions, exchange knowledge to support individualized diagnoses and recommendations, and develop actionable and feasible plans to accommodate individual routines….(More)”.