Big Data and Discriminatory Pricing


White House: “In response to the big data and privacy report’s finding that these technologies and tools can enable new forms of discrimination, the White House Council of Economic Advisers conducted a study examining whether and how companies may use big data technologies to offer different prices to different consumers — a practice known as “discriminatory pricing.” The CEA found that many companies already use big data for targeted marketing, and others are experimenting in a limited way with personalized pricing, but this practice is not yet widespread. While the economic literature contends that discriminatory pricing will often, though not always, be welfare-enhancing for businesses and consumers, the CEA concludes that policymakers should be vigilant against the potential for discriminatory outcomes, particularly in cases where prices are not transparent and could give rise to fraud or scams….To read the Council of Economic Advisers report on discriminatory pricing, click here.

Digital Enlightenment Yearbook 2014


Book edited O’Hara, K. , Nguyen, M-H.C., Haynes, P.: “Tracking the evolution of digital technology is no easy task; changes happen so fast that keeping pace presents quite a challenge. This is, nevertheless, the aim of the Digital Enlightenment Yearbook.
This book is the third in the series which began in 2012 under the auspices of the Digital Enlightenment Forum. This year, the focus is on the relationship of individuals with their networks, and explores “Social networks and social machines, surveillance and empowerment”. In what is now the well-established tradition of the yearbook, different stakeholders in society and various disciplinary communities (technology, law, philosophy, sociology, economics, policymaking) bring their very different opinions and perspectives to bear on this topic.
The book is divided into four parts: the individual as data manager; the individual, society and the market; big data and open data; and new approaches. These are bookended by a Prologue and an Epilogue, which provide illuminating perspectives on the discussions in between. The division of the book is not definitive; it suggests one narrative, but others are clearly possible.
The 2014 Digital Enlightenment Yearbook gathers together the science, social science, law and politics of the digital environment in order to help us reformulate and address the timely and pressing questions which this new environment raises. We are all of us affected by digital technology, and the subjects covered here are consequently of importance to us all. (Contents)”

With a Few Bits of Data, Researchers Identify ‘Anonymous’ People


in the New York Times: “Even when real names and other personal information are stripped from big data sets, it is often possible to use just a few pieces of the information to identify a specific person, according to a study to be published Friday in the journal Science.

In the study, titled “Unique in the Shopping Mall: On the Reidentifiability of Credit Card Metadata,” a group of data scientists analyzed credit card transactions made by 1.1 million people in 10,000 stores over a three-month period. The data set contained details including the date of each transaction, amount charged and name of the store.

Although the information had been “anonymized” by removing personal details like names and account numbers, the uniqueness of people’s behavior made it easy to single them out.

In fact, knowing just four random pieces of information was enough to reidentify 90 percent of the shoppers as unique individuals and to uncover their records, researchers calculated. And that uniqueness of behavior — or “unicity,” as the researchers termed it — combined with publicly available information, like Instagram or Twitter posts, could make it possible to reidentify people’s records by name.

“The message is that we ought to rethink and reformulate the way we think about data protection,” said Yves-Alexandre de Montjoye, a graduate student in computational privacy at the M.I.T. Media Lab who was the lead author of the study. “The old model of anonymity doesn’t seem to be the right model when we are talking about large-scale metadata.”

The analysis of large data sets containing details on people’s behavior holds great potential to improve public health, city planning and education.

But the study calls into question the standard methods many companies, hospitals and government agencies currently use to anonymize their records. It may also give ammunition to some technologists and privacy advocates who have challenged the consumer-tracking processes used by advertising software and analytics companies to tailor ads to so-called anonymous users online….(More).”

Big Data Now


at Radar – O’Reilly: “In the four years we’ve been producing Big Data Now, our wrap-up of important developments in the big data field, we’ve seen tools and applications mature, multiply, and coalesce into new categories. This year’s free wrap-up of Radar coverage is organized around seven themes:

  • Cognitive augmentation: As data processing and data analytics become more accessible, jobs that can be automated will go away. But to be clear, there are still many tasks where the combination of humans and machines produce superior results.
  • Intelligence matters: Artificial intelligence is now playing a bigger and bigger role in everyone’s lives, from sorting our email to rerouting our morning commutes, from detecting fraud in financial markets to predicting dangerous chemical spills. The computing power and algorithmic building blocks to put AI to work have never been more accessible.
  • The convergence of cheap sensors, fast networks, and distributed computation: The amount of quantified data available is increasing exponentially — and aside from tools for centrally handling huge volumes of time-series data as it arrives, devices and software are getting smarter about placing their own data accurately in context, extrapolating without needing to ‘check in’ constantly.
  • Reproducing, managing, and maintaining data pipelines: The coordination of processes and personnel within organizations to gather, store, analyze, and make use of data.
  • The evolving, maturing marketplace of big data components: Open-source components like Spark, Kafka, Cassandra, and ElasticSearch are reducing the need for companies to build in-house proprietary systems. On the other hand, vendors are developing industry-specific suites and applications optimized for the unique needs and data sources in a field.
  • The value of applying techniques from design and social science: While data science knows human behavior in the aggregate, design works in the particular, where A/B testing won’t apply — you only get one shot to communicate your proposal to a CEO, for example. Similarly, social science enables extrapolation from sparse data. Both sets of tools enable you to ask the right questions, and scope your problems and solutions realistically.
  • The importance of building a data culture: An organization that is comfortable with gathering data, curious about its significance, and willing to act on its results will perform demonstrably better than one that doesn’t. These priorities must be shared throughout the business.
  • The perils of big data: From poor analysis (driven by false correlation or lack of domain expertise) to intrusiveness (privacy invasion, price profiling, self-fulfilling predictions), big data has negative potential.

Download our free snapshot of big data in 2014, and follow the story this year on Radar.”

Survive and Thrive: How Big Data Is Transforming Health Care


at Pacific Standard: “When you step on a scale, take your temperature, or check your blood pressure, you’re using data from your body to measure your health. Advances in fitness trackers have made health quantification more accessible to casual users. But for researchers, health care providers, and people with chronic conditions, advances in tracking technology, data analysis, and automation offer significant improvements in medical treatment and quality of life.

This three-part series explores health quantification through the eyes of Rutgers University Ph.D student Maria Qadri, who has both professional and personal experience in the matter. Qadri’s research aims to help people with traumatic brain injury and Parkinson’s Disease better manage their illness, and, as a Type 1 diabetic, glucose monitoring is a major part of her own life. Below, we take a look at how number crunching and personal data factors into Qadri’s research and life….(More).”

symbolia 1

 

The Black Box Society


New book by Frank Pasquale on “The Secret Algorithms That Control Money and Information”: “Every day, corporations are connecting the dots about our personal behavior—silently scrutinizing clues left behind by our work habits and Internet use. The data compiled and portraits created are incredibly detailed, to the point of being invasive. But who connects the dots about what firms are doing with this information? The Black Box Society argues that we all need to be able to do so—and to set limits on how big data affects our lives.
Hidden algorithms can make (or ruin) reputations, decide the destiny of entrepreneurs, or even devastate an entire economy. Shrouded in secrecy and complexity, decisions at major Silicon Valley and Wall Street firms were long assumed to be neutral and technical. But leaks, whistleblowers, and legal disputes have shed new light on automated judgment. Self-serving and reckless behavior is surprisingly common, and easy to hide in code protected by legal and real secrecy. Even after billions of dollars of fines have been levied, underfunded regulators may have only scratched the surface of this troubling behavior….(More).”

Big Data in Action for Development


New report by the Worldbank: “Data provide critical inputs in designing effective development policy recommendations, supporting their implementation, and evaluating results. In this new report “Big Data in Action for Development,” the World Bank Group collaborated with Second Muse, a global innovation agency, to explore big data’s transformative potential for socioeconomic development.  The report develops a conceptual framework to work with big data in the development sector and presents a variety of case studies that lay out big data’s innovations, challenges, and opportunities.”

Can Business And Tech Transform The Way Our Government Works By 2020?


Ben Schiller at Co.Exist: “The rise of open data, crowd-sourcing, predictive analytics, and other big tech trends, aren’t just for companies to contend with. They’re also a challenge for government. New technology gives public agencies the opportunity to develop and deliver services in new ways, track results more accurately, and open up decision-making.
Deloitte’s big new Government 2020 report looks at the trends impacting government and lays out a bunch of ideas for how they can innovate. We picked out a few below. There are more infographics in the slide show.

Consumerization of public services

Deloitte expects entrepreneurs to “develop innovative and radically user-friendly approaches to satisfy unmet consumer demand for better public services.” Startups like Uber or Lyft “reinvigorated transportation.” Now it expects a similar “focus on seamless customer experiences” in education and health care.

Open workforce

Deloitte expects governments to become looser: collections of people doing a job, rather than large hierarchical structures. “Governments [will] expand their talent networks to include ‘partnership talent’ (employees who are parts of joint ventures), ‘borrowed talent’ (employees of contractors), ‘freelance talent’ (independent, individual contractors) and ‘open-source talent,'” the report says.

Outcome based legislation

Just as big data analytics allows companies to measure the effectiveness of marketing campaigns, so it allows governments to measure how well legislation and regulation is working. They can “shift from a concentration on processes to the achievement of specific targets.” And, if the law isn’t working, someone has the data to throw it out….”

Governments and Citizens Getting to Know Each Other? Open, Closed, and Big Data in Public Management Reform


New paper by Amanda Clarke and Helen Margetts in Policy and Internet: “Citizens and governments live increasingly digital lives, leaving trails of digital data that have the potential to support unprecedented levels of mutual government–citizen understanding, and in turn, vast improvements to public policies and services. Open data and open government initiatives promise to “open up” government operations to citizens. New forms of “big data” analysis can be used by government itself to understand citizens’ behavior and reveal the strengths and weaknesses of policy and service delivery. In practice, however, open data emerges as a reform development directed to a range of goals, including the stimulation of economic development, and not strictly transparency or public service improvement. Meanwhile, governments have been slow to capitalize on the potential of big data, while the largest data they do collect remain “closed” and under-exploited within the confines of intelligence agencies. Drawing on interviews with civil servants and researchers in Canada, the United Kingdom, and the United States between 2011 and 2014, this article argues that a big data approach could offer the greatest potential as a vehicle for improving mutual government–citizen understanding, thus embodying the core tenets of Digital Era Governance, argued by some authors to be the most viable public management model for the digital age (Dunleavy, Margetts, Bastow, & Tinkler, 2005, 2006; Margetts & Dunleavy, 2013).”
 

Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and Transparency


at Medium: “…So why, then, does granular, social data make people uncomfortable? Well, ultimately—and at the risk of stating the obvious—it’s because data of this sort brings up issues regarding ethics, privacy, bias, fairness, and inclusion. In turn, these issues make people uncomfortable because, at least as the popular narrative goes, these are new issues that fall outside the expertise of those those aggregating and analyzing big data. But the thing is, these issues aren’t actually new. Sure, they may be new to computer scientists and software engineers, but they’re not new to social scientists.

This is why I think the world of big data and those working in it — ranging from the machine learning researchers developing new analysis tools all the way up to the end-users and decision-makers in government and industry — can learn something from computational social science….

So, if technology companies and government organizations — the biggest players in the big data game — are going to take issues like bias, fairness, and inclusion seriously, they need to hire social scientists — the people with the best training in thinking about important societal issues. Moreover, it’s important that this hiring is done not just in a token, “hire one social scientist for every hundred computer scientists” kind of way, but in a serious, “creating interdisciplinary teams” kind of kind of way.


Thanks to Moritz Hardt for the picture!

While preparing for my talk, I read an article by Moritz Hardt, entitled “How Big Data is Unfair.” In this article, Moritz notes that even in supposedly large data sets, there is always proportionally less data available about minorities. Moreover, statistical patterns that hold for the majority may be invalid for a given minority group. He gives, as an example, the task of classifying user names as “real” or “fake.” In one culture — comprising the majority of the training data — real names might be short and common, while in another they might be long and unique. As a result, the classic machine learning objective of “good performance on average,” may actually be detrimental to those in the minority group….

As an alternative, I would advocate prioritizing vital social questions over data availability — an approach more common in the social sciences. Moreover, if we’re prioritizing social questions, perhaps we should take this as an opportunity to prioritize those questions explicitly related to minorities and bias, fairness, and inclusion. Of course, putting questions first — especially questions about minorities, for whom there may not be much available data — means that we’ll need to go beyond standard convenience data sets and general-purpose “hammer” methods. Instead we’ll need to think hard about how best to instrument data aggregation and curation mechanisms that, when combined with precise, targeted models and tools, are capable of elucidating fine-grained, hard-to-see patterns….(More).”