Book edited by Anand J. Kulkarni, Patrick Siarry, Pramod Kumar Singh, Ajith Abraham, Mengjie Zhang, Albert Zomaya and Fazle Baki: “This book includes state-of-the-art discussions on various issues and aspects of the implementation, testing, validation, and application of big data in the context of healthcare. The concept of big data is revolutionary, both from a technological and societal well-being standpoint. This book provides a comprehensive reference guide for engineers, scientists, and students studying/involved in the development of big data tools in the areas of healthcare and medicine. It also features a multifaceted and state-of-the-art literature review on healthcare data, its modalities, complexities, and methodologies, along with mathematical formulations.
The book is divided into two main sections, the first of which discusses the challenges and opportunities associated with the implementation of big data in the healthcare sector. In turn, the second addresses the mathematical modeling of healthcare problems, as well as current and potential future big data applications and platforms…(More)”.
Book edited by Normann Witzleb, Moira Paterson, and Janice Richardson on “Democracy and Privacy in the Age of Micro-Targeting”…: “In this multidisciplinary book, experts from around the globe examine how data-driven political campaigning works, what challenges it poses for personal privacy and democracy, and how emerging practices should be regulated.
The rise of big data analytics in the political process has triggered official investigations in many countries around the world, and become the subject of broad and intense debate. Political parties increasingly rely on data analytics to profile the electorate and to target specific voter groups with individualised messages based on their demographic attributes. Political micro-targeting has become a major factor in modern campaigning, because of its potential to influence opinions, to mobilise supporters and to get out votes. The book explores the legal, philosophical and political dimensions of big data analytics in the electoral process. It demonstrates that the unregulated use of big personal data for political purposes not only infringes voters’ privacy rights, but also has the potential to jeopardise the future of the democratic process, and proposes reforms to address the key regulatory and ethical questions arising from the mining, use and storage of massive amounts of voter data.
Providing an interdisciplinary assessment of the use and regulation of big data in the political process, this book will appeal to scholars from law, political science, political philosophy, and media studies, policy makers and anyone who cares about democracy in the age of data-driven political campaigning….(More)”.
Hannah Fry at The New Yorker: “Harold Eddleston, a seventy-seven-year-old from Greater Manchester, was still reeling from a cancer diagnosis he had been given that week when, on a Saturday morning in February, 1998, he received the worst possible news. He would have to face the future alone: his beloved wife had died unexpectedly, from a heart attack.
Eddleston’s daughter, concerned for his health, called their family doctor, a well-respected local man named Harold Shipman. He came to the house, sat with her father, held his hand, and spoke to him tenderly. Pushed for a prognosis as he left, Shipman replied portentously, “I wouldn’t buy him any Easter eggs.” By Wednesday, Eddleston was dead; Dr. Shipman had murdered him.
Harold Shipman was one of the most prolific serial killers in history. In a twenty-three-year career as a mild-mannered and well-liked family doctor, he injected at least two hundred and fifteen of his patients with lethal doses of opiates. He was finally arrested in September, 1998, six months after Eddleston’s death.
David Spiegelhalter, the author of an important and comprehensive new book, “The Art of Statistics” (Basic), was one of the statisticians tasked by the ensuing public inquiry to establish whether the mortality rate of Shipman’s patients should have aroused suspicion earlier. Then a biostatistician at Cambridge, Spiegelhalter found that Shipman’s excess mortality—the number of his older patients who had died in the course of his career over the number that would be expected of an average doctor’s—was a hundred and seventy-four women and forty-nine men at the time of his arrest. The total closely matched the number of victims confirmed by the inquiry….
In 1825, the French Ministry of Justice ordered the creation of a national collection of crime records. It seems to have been the first of its kind anywhere in the world—the statistics of every arrest and conviction in the country, broken down by region, assembled and ready for analysis. It’s the kind of data set we take for granted now, but at the time it was extraordinarily novel. This was an early instance of Big Data—the first time that mathematical analysis had been applied in earnest to the messy and unpredictable realm of human behavior.
Or maybe not so unpredictable. In the early eighteen-thirties, a Belgian astronomer and mathematician named Adolphe Quetelet analyzed the numbers and discovered a remarkable pattern. The crime records were startlingly consistent. Year after year, irrespective of the actions of courts and prisons, the number of murders, rapes, and robberies reached almost exactly the same total. There is a “terrifying exactitude with which crimes reproduce themselves,” Quetelet said. “We know in advance how many individuals will dirty their hands with the blood of others. How many will be forgers, how many poisoners.”
To Quetelet, the evidence suggested that there was something deeper to discover. He developed the idea of a “Social Physics,” and began to explore the possibility that human lives, like planets, had an underlying mechanistic trajectory. There’s something unsettling in the idea that, amid the vagaries of choice, chance, and circumstance, mathematics can tell us something about what it is to be human. Yet Quetelet’s overarching findings still stand: at some level, human life can be quantified and predicted. We can now forecast, with remarkable accuracy, the number of women in Germany who will choose to have a baby each year, the number of car accidents in Canada, the number of plane crashes across the Southern Hemisphere, even the number of people who will visit a New York City emergency room on a Friday evening….(More)”
Sara Fischer and Scott Rosenberg at Axios: “Over the past two years, the U.S. government has tried to rein in how major tech companies use the personal data they’ve gathered on their customers. At the same time, government agencies are themselves seeking to harness those troves of data.
Why it matters: Tech platforms use personal information to target ads, whereas the government can use it to prevent and solve crimes, deliver benefits to citizens — or (illegally) target political dissent.
Driving the news: A new report from the Wall Street Journal details the ways in which family DNA testing sites like FamilyTreeDNA are pressured by the FBI to hand over customer data to help solve criminal cases using DNA.
The trend has privacy experts worried about the potential implications of the government having access to large pools of genetic data, even though many people whose data is included never agreed to its use for that purpose.
The FBI has particular interest in data from genetic and social media sites, because it could help solve crimes and protect the public.
For example, the FBI is “soliciting proposals from outside vendors for a contract to pull vast quantities of public data” from Facebook, Twitter Inc. and other social media companies,“ the Wall Street Journal reports.
The request is meant to help the agency surveil social behavior to “mitigate multifaceted threats, while ensuring all privacy and civil liberties compliance requirements are met.”
Meanwhile, the Trump administration has also urged social media platforms to cooperate with the governmentin efforts to flag individual users as potential mass shooters.
Other agencies have their eyes on big data troves as well.
Earlier this year, settlement talks between Facebook and the Department of Housing and Urban Development broke down over an advertising discrimination lawsuit when, according to a Facebook spokesperson, HUD “insisted on access to sensitive information — like user data — without adequate safeguards.”
HUD presumably wanted access to the data to ensure advertising discrimination wasn’t occurring on the platform, but it’s unclear whether the agency needed user data to be able to support that investigation….(More)”.
Article by Priceonomics Data Studio: “For all the talk of how data is the new oil and the most valuable resource of any enterprise, there is a deep dark secret companies are reluctant to share — most of the data collected by businesses simply goes unused.
This unknown and unused data, known as dark data comprises more than half the data collected by companies. Given that some estimates indicate that 7.5 septillion (7,700,000,000,000,000,000,000) gigabytes of data are generated every single day, not using most of it is a considerable issue.
In this article, we’ll look at this dark data. Just how much of it is created by companies, what are the reasons this data isn’t being analyzed, and what are the costs and implications of companies not using the majority of the data they collect.
Before diving into the analysis, it’s worth spending a moment clarifying what we mean by the term “dark data.” Gartner defines dark data as:
“The information assets organizations collect, process and store during regular business activities, but generally fail to use for other purposes (for example, analytics, business relationships and direct monetizing).
To learn more about this phenomenon, Splunk commissioned a global survey of 1,300+ business leaders to better understand how much data they collect, and how much is dark. Respondents were from IT and business roles, and were located in Australia, China, France, Germany, Japan, the United States, and the United Kingdom. across various industries. For the report, Splunk defines dark data as: “all the unknown and untapped data across an organization, generated by systems, devices and interactions.”
While the costs of storing data has decreased overtime, the cost of saving septillions of gigabytes of wasted data is still significant. What’s more, during this time the strategic importance of data has increased as companies have found more and more uses for it. Given the cost of storage and the value of data, why does so much of it go unused?
The following chart shows the reasons why dark data isn’t currently being harnessed:
By a large margin, the number one reason given for not using dark data is that companies lack a tool to capture or analyze the data. Companies accumulate data from server logs, GPS networks, security tools, call records, web traffic and more. Companies track everything from digital transactions to the temperature of their server rooms to the contents of retail shelves. Most of this data lies in separate systems, is unstructured, and cannot be connected or analyzed.
Second, the data captured just isn’t good enough. You might have important customer information about a transaction, but it’s missing location or other important metadata because that information sits somewhere else or was never captured in useable format.
Additionally, dark data exists because there is simply too much data out there and a lot of is unstructured. The larger the dataset (or the less structured it is), the more sophisticated the tool required for analysis. Additionally, these kinds of datasets often time require analysis by individuals with significant data science expertise who are often is short supply.
The implications of the prevalence are vast. As a result of the data deluge, companies often don’t know where all the sensitive data is stored and can’t be confident they are complying with consumer data protection measures like GDPR. …(More)”.
Introduction to a special issue of Social Studies of Science by Klaus Hoeyer, Susanne Bauer, and Martyn Pickersgill: “In recent years and across many nations, public health has become subject to forms of governance that are said to be aimed at establishing accountability. In this introduction to a special issue, From Person to Population and Back: Exploring Accountability in Public Health, we suggest opening up accountability assemblages by asking a series of ostensibly simple questions that inevitably yield complicated answers: What is counted? What counts? And to whom, how and why does it count? Addressing such questions involves staying attentive to the technologies and infrastructures through which data come into being and are made available for multiple political agendas. Through a discussion of public health, accountability and datafication we present three key themes that unite the various papers as well as illustrate their diversity….(More)”.
Book by Kris Shaffer: “Human attention is in the highest demand it has ever been. The drastic increase in available information has compelled individuals to find a way to sift through the media that is literally at their fingertips. Content recommendation systems have emerged as the technological solution to this social and informational problem, but they’ve also created a bigger crisis in confirming our biases by showing us only, and exactly, what it predicts we want to see. Data versus Democracy investigates and explores how, in the era of social media, human cognition, algorithmic recommendation systems, and human psychology are all working together to reinforce (and exaggerate) human bias. The dangerous confluence of these factors is driving media narratives, influencing opinions, and possibly changing election results.
In this book, algorithmic recommendations, clickbait, familiarity bias, propaganda, and other pivotal concepts are analyzed and then expanded upon via fascinating and timely case studies: the 2016 US presidential election, Ferguson, GamerGate, international political movements, and more events that come to affect every one of us. What are the implications of how we engage with information in the digital age? Data versus Democracy explores this topic and an abundance of related crucial questions. We live in a culture vastly different from any that has come before. In a society where engagement is currency, we are the product. Understanding the value of our attention, how organizations operate based on this concept, and how engagement can be used against our best interests is essential in responsibly equipping ourselves against the perils of disinformation….(More)”.
Sheri Fink in The New York Times: “The company called One Concern has all the characteristics of a buzzy and promising Silicon Valley start-up: young founders from Stanford, tens of millions of dollars in venture capital and a board with prominent names.
Its particular niche is disaster response. And it markets a way to use artificial intelligence to address one of the most vexing issues facing emergency responders in disasters: figuring out where people need help in time to save them.
That promise to bring new smarts and resources to an anachronistic field has generated excitement. Arizona, Pennsylvania and the World Bank have entered into contracts with One Concern over the past year. New York City and San Jose, Calif., are in talks with the company. And a Japanese city recently became One Concern’s first overseas client.
But when T.J. McDonald, who works for Seattle’s office of emergency management, reviewed a simulated earthquake on the company’s damage prediction platform, he spotted problems. A popular big-box store was grayed out on the web-based map, meaning there was no analysis of the conditions there, and shoppers and workers who might be in danger would not receive immediate help if rescuers relied on One Concern’s results.
“If that Costco collapses in the middle of the day, there’s going to be a lot of people who are hurt,” he said.
The error? The simulation, the company acknowledged, missed many commercial areas because damage calculations relied largely on residential census data.
One Concern has marketed its products as lifesaving tools for emergency responders after earthquakes, floods and, soon, wildfires. But interviews and documents show the company has often exaggerated its tools’ abilities and has kept outside experts from reviewing its methodology. In addition, some product features are available elsewhere at no charge, and data-hungry insurance companies — whose interests can diverge from those of emergency workers — are among One Concern’s biggest investors and customers.
Some critics even suggest that shortcomings in One Concern’s approach could jeopardize lives….(More)”.
Paper by Marco De Nadai, Angelo Cardoso, Antonio Lima, Bruno Lepri, and Nuria Oliver: “Cognition has been found to constrain several aspects of human behaviour, such as the number of friends and the number of favourite places a person keeps stable over time. this limitation has been empirically defined in the physical and social spaces. But do people exhibit similar constraints in the digital space? We address this question through the analysis of pseudonymised mobility and mobile application (app) usage data of 400,000 individuals in a European country for six months. Despite the enormous heterogeneity of apps usage, we find that individuals exhibit a conserved capacity that limits the number of applications they regularly use. Moreover, we find that this capacity steadily decreases with age, as does the capacity in the physical space but with more complex dynamics. Even though people might have the same capacity, applications get added and removed over time.
In this respect, we identify two profiles of individuals: app keepers and explorers, which differ in their stable (keepers) vs exploratory (explorers) behaviour regarding their use of mobile applications. Finally, we show that the capacity of applications predicts mobility capacity and vice-versa. By contrast, the behaviour of keepers and explorers may considerably vary across the two domains. Our empirical findings provide an intriguing picture linking human behaviour in the physical and digital worlds which bridges research studies from Computer Science, Social Physics and Computational Social Sciences…(More)”.
Paper by Stevenson, Phillip Douglas and Mattson, Christopher Andrew: “Organizations all over the world, both national and international, gather demographic data so that the progress of nations and peoples can be tracked. This data is often made available to the public in the form of aggregated national level data or individual responses (microdata). Product designers likewise conduct surveys to better understand their customer and create personas. Personas are archetypes of the individuals who will use, maintain, sell or otherwise be affected by the products created by designers. Personas help designers better understand the person the product is designed for. Unfortunately, the process of collecting customer information and creating personas is often a slow and expensive process.
In this paper, we introduce a new method of creating personas, leveraging publicly available databanks of both aggregated national level and information on individuals in the population. A computational persona generator is introduced that creates a population of personas that mirrors a real population in terms of size and statistics. Realistic individual personas are filtered from this population for use in product development…(More)”.