Can We Balance Data Protection With Value Creation?


A “privacy perspective” by Sara Degli Esposti: “In the last few years there has been a dramatic change in the opportunities organizations have to generate value from the data they collect about customers or service users. Customers and users are rapidly becoming collections of “data points” and organizations can learn an awful lot from the analysis of this huge accumulation of data points, also known as “Big Data.”

Organizations are perhaps thrilled, dreaming about new potential applications of digital data but also a bit concerned about hidden risks and unintended consequences. Take, for example, the human rights protections placed on personal data by the EU.  Regulators are watching closely, intending to preserve the eight basic privacy principles without compromising the free flow of information.
Some may ask whether it’s even possible to balance the two.
Enter the Big Data Protection Project (BDPP): an Open University study on organizations’ ability to leverage Big Data while complying with EU data protection principles. The study represents a chance for you to contribute to, and learn about, the debate on the reform of the EU Data Protection Directive. It is open to staff with interests in data management or use, from all types of organizations, both for-profit and nonprofit, with interests in Europe.
Join us by visiting the study’s page on the Open University website. Participants will receive a report with all the results. The BDP is a scientific project—no commercial organization is involved—with implications relevant to both policy-makers and industry representatives..
What kind of legislation do we need to create that positive system of incentive for organizations to innovate in the privacy field?
There is no easy answer.
That’s why we need to undertake empirical research into actual information management practices to understand the effects of regulation on people and organizations. Legal instruments conceived with the best intentions can be ineffective or detrimental in practice. However, other factors can also intervene and motivate business players to develop procedures and solutions which go far beyond compliance. Good legislation should complement market forces in bringing values and welfare to both consumers and organizations.
Is European data protection law keeping its promise of protecting users’ information privacy while contributing to the flourishing of the digital economy or not? Will the proposed General Data Protection Regulation (GDPR) be able to achieve this goal? What would you suggest to do to motivate organizations to invest in information security and take information privacy seriously?
Let’s consider for a second some basic ideas such as the eight fundamental data protection principles: notice, consent, purpose specification and limitation, data quality, respect of data subjects’ rights, information security and accountability. Many of these ideas are present in the EU 1995 Data Protection Directive, the U.S. Fair Information Practice Principles (FIPPs) andthe 1980 OECD Guidelines. The fundamental question now is, should all these ideas be brought into the future, as suggested in the proposed new GDPR, orshould we reconsider our approach and revise some of them, as recommended in the 21st century version of the 1980 OECD Guidelines?
As you may know, notice and consent are often taken as examples of how very good intentions can be transformed into actions of limited importance. Rather than increase people’s awareness of the growing data economy, notice and consent have produced a tick-box tendency accompanied by long and unintelligible privacy policies. Besides, consent is rarely freely granted. Individuals give their consent in exchange for some product or service or as part of a job relationship. The imbalance between the two goods traded—think about how youngsters perceive not having access to some social media as a form of social exclusion—and the lack of feasible alternatives often make an instrument, such as the current use made of consent, meaningless.
On the other hand, a principle such as data quality, which has received very limited attention, could offer opportunities to policy-makers and businesses to reopen the debate on users’ control of their personal data. Having updated, accurate data is something very valuable for organizations. Data quality is also key to the success of many business models. New partnerships between users and organizations could be envisioned under this principle.
Finally, data collection limitation and purpose specification could be other examples of the divide between theory and practice: The tendency we see is that people and businesses want to share, merge and reuse data over time and to do new and unexpected things. Of course, we all want to avoid function creep and prevent any detrimental use of our personal data. We probably need new, stronger mechanisms to ensure data are used for good purposes.
Digital data have become economic assets these days. We need good legislation to stop the black market for personal data and open the debate on how each of us wants to contribute to, and benefit from, the data economy.”

Structuring Big Data to Facilitate Democratic Participation in International Law


New paper by Roslyn Fuller: “This is an interdisciplinary article focusing on the interplay between information and communication technology (ICT) and international law (IL). Its purpose is to open up a dialogue between ICT and IL practitioners that focuses on the ways in which ICT can enhance equitable participation in international legal structures, particularly through capturing the possibilities associated with big data. This depends on the ability of individuals to access big data, for it to be structured in a manner that makes it accessible and for the individual to be able to take action based on it.”

The City as a Platform – Stripping out complexity and Making Things Happen


Emer Coleman: “The concept of data platforms has garnered a lot of coverage over the past few years and the City as a Platform is one that has wide traction in the “Smart City” space. It’s an idea that has been widely promulgated by service integrators and large consultancy firms. This idea has been adopted into the thinking of many cities in the UK, increasingly by local authorities who have both been forced by central government diktat to open their data and who are also engaging with many of the large private companies who sell infrastructure and capabilities and with whom they may have existing contractual arrangements.
Standard interpretations of city as platform usually involve the idea that the city authority will create the platform into which it will release its data. It then seeks the integration of API’s (both external and internal) into the platform so that theoretically the user can access that data via a unified City API on which developers can then create products and services.
Picture

Some local authorities seek to monetise access to this API while others see it as a mechanism for encouraging the development of new products and services that are of value to the state but which have been developed without direct additional investment by the state thereby generating public good from the public task of collecting and storing data.
This concept of city as platform integrated by local authorities appears at first glance to be a logical, linear and achievable goal but in my view completely misunderstands a number of key factors;
1. The evolution of the open data/big data market
2. Commercial and Technical realities
3. Governance and bureaucracy
I’ll explore these below…”

Big Data for Law


legislation.gov.uk: “The National Archives has received ‘big data’ funding from the Arts and Humanities Research Council (AHRC) to deliver the ‘Big Data for Law‘ project. Just over £550,000 will enable the project to transform how we understand and use current legislation, delivering a new service – legislation.gov.uk Research – by March 2015. There are an estimated 50 million words in the statute book, with 100,000 words added or changed every month. Search engines and services like legislation.gov.uk have transformed access to legislation. Law is accessed by a much wider group of people, the majority of whom are typically not legally trained or qualified. All users of legislation are confronted by the volume of legislation, its piecemeal structure, frequent amendments, and the interaction of the statute book with common law and European law. Not surprisingly, many find the law difficult to understand and comply with. There has never been a more relevant time for research into the architecture and content of law, the language used in legislation and how, through interpretation by the courts, it is given effect. Research that will underpin the drive to deliver good, clear and effective law. Researchers typically lack the raw data, the tools, and the methods to undertake research across the whole statute book. Meanwhile, the combination of low cost cloud computing, open source software and new methods of data analysis – the enablers of the big data revolution – are transforming research in other fields. Big data research is perfectly possible with legislation if only the basic ingredients – the data, the tools and some tried and trusted methods – were as readily available as the computing power and the storage. The vision for this project is to address that gap by providing a new Legislation Data Research Infrastructure at research.legislation.gov.uk. Specifically tailored to researchers’ needs, it will consist of downloadable data, online tools for end-users; and open source tools for researchers to download, adapt and use….
There are three main areas for research:

  • Understanding researchers’ needs: to ensure the service is based on evidenced need, capabilities and limitations, putting big data technologies in the hands of non-technical researchers for the first time.
  • Deriving new open data from closed data: no one has all the data that researchers might find useful. For example, the potentially personally identifiable data about users and usage of legislation.gov.uk cannot be made available as open data but is perfect for processing using existing big data tools; eg to identify clusters in legislation or “recommendations” datasets of “people who read Act A or B also looked at Act Y or Z”. The project will look whether it is possible to create new open data sets from this type of closed data. An N-Grams dataset and appropriate user interface for legislation or related case law, for example, would contain sequences of words/phrases/statistics about their frequency of occurrence per document. N-Grams are useful for research in linguistics or history, and could be used to provide a predictive text feature in a drafting tool for legislation.
  • Pattern language for legislation: We need new ways of codifying and modelling the architecture of the statute book to make it easier to research its entirety using big data technologies. The project will seek to learn from other disciplines, applying the concept of a ‘pattern language’ to legislation. Pattern languages have revolutionised software engineering over the last twenty years and have the potential to do the same for our understanding of the statute book. A pattern language is simply a structured method of describing good design practices, providing a common vocabulary between users and specialists, structured around problems or issues, with a solution. Patterns are not created or invented – they are identified as ‘good design’ based on evidence about how useful and effective they are. Applied to legislation, this might lead to a common vocabulary between the users of legislation and legislative drafters, to identifying useful and effective drafting practices and solutions that deliver good law. This could enable a radically different approach to structuring teaching materials or guidance for legislators.”

DARPA Open Catalog Makes Agency-Sponsored Software and Publications Available to All


Press Release: “Public website aims to encourage communities interested in DARPA research to build off the agency’s work, starting with big data…
DARPA has invested in many programs that sponsor fundamental and applied research in areas of computer science, which have led to new advances in theory as well as practical software. The R&D community has asked about the availability of results, and now DARPA has responded by creating the DARPA Open Catalog, a place for organizing and sharing those results in the form of software, publications, data and experimental details. The Catalog can be found at http://go.usa.gov/BDhY.
Many DoD and government research efforts and software procurements contain publicly releasable elements, including open source software. The nature of open source software lends itself to collaboration where communities of developers augment initial products, build on each other’s expertise, enable transparency for performance evaluation, and identify software vulnerabilities. DARPA has an open source strategy for areas of work including big data to help increase the impact of government investments in building a flexible technology base.
“Making our open source catalog available increases the number of experts who can help quickly develop relevant software for the government,” said Chris White, DARPA program manager. “Our hope is that the computer science community will test and evaluate elements of our software and afterward adopt them as either standalone offerings or as components of their products.”

"Natural Cities" Emerge from Social Media Location Data


Emerging Technology From the arXiv: “Nobody agrees on how to define a city. But the emergence of “natural cities” from social media data sets may change that, say computational geographers…
A city is a large, permanent human settlement. But try and define it more carefully and you’ll soon run into trouble. A settlement that qualifies as a city in Sweden may not qualify in China, for example. And the reasons why one settlement is classified as a town while another as a city can sometimes seem almost arbitrary.
City planners know this problem well.  They tend to define cities by administrative, legal or even historical boundaries that have little logic to them. Indeed, the same city can sometimes be defined in various different ways.
That causes all kinds of problems from counting the total population to working out who pays for the upkeep of the place.  Which definition do you use?
Now help may be at hand thanks to the work of Bin Jiang and Yufan Miao at the University of Gävle in Sweden. These guys have found a way to use people’s location recorded by social media to define the boundaries of so-called natural cities which have a close resemblance to real cities in the US.
Jiang and Miao began with a dataset from the Brightkite social network, which was active between 2008 and 2010. The site encouraged users to log in with their location details so that they could see other users nearby. So the dataset consists of almost 3 million locations in the US and the dates on which they were logged.
To start off, Jiang and Miao simply placed a dot on a map at the location of each login. They then connected these dots to their neighbours to form triangles that end up covering the entire mainland US.
Next, they calculated the size of each triangle on the map and plotted this size distribution, which turns out to follow a power law. So there are lots of tiny triangles but only a few  large ones.
Finally, the calculated the average size of the triangles and then coloured in all those that were smaller than average. The coloured areas are “natural cities”, say Jiang and Miao.
It’s easy to imagine that resulting map of triangles is of little value.  But to the evident surprise of ther esearchers, it produces a pretty good approximation of the cities in the US. “We know little about why the procedure works so well but the resulting patterns suggest that the natural cities effectively capture the evolution of real cities,” they say.
That’s handy because it suddenly gives city planners a way to study and compare cities on a level playing field. It allows them to see how cities evolve and change over time too. And it gives them a way to analyse how cities in different parts of the world differ.
Of course, Jiang and Miao will want to find out why this approach reveals city structures in this way. That’s still something of a puzzle but the answer itself may provide an important insight into the nature of cities (or at least into the nature of this dataset).
A few days ago, this blog wrote about how a new science of cities is emerging from the analysis of big data.  This is another example and expect to see more.
Ref:  http://arxiv.org/abs/1401.6756 : The Evolution of Natural Cities from the Perspective of Location-Based Social Media”

Selected Readings on Personal Data: Security and Use


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of personal data was originally published in 2014.

Advances in technology have greatly increased the potential for policymakers to utilize the personal data of large populations for the public good. However, the proliferation of vast stores of useful data has also given rise to a variety of legislative, political, and ethical concerns surrounding the privacy and security of citizens’ personal information, both in terms of collection and usage. Challenges regarding the governance and regulation of personal data must be addressed in order to assuage individuals’ concerns regarding the privacy, security, and use of their personal information.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Cavoukian, Ann. “Personal Data Ecosystem (PDE) – A Privacy by Design Approach to an Individual’s Pursuit of Radical Control.” Privacy by Design, October 15, 2013. https://bit.ly/2S00Yfu.

  • In this paper, Cavoukian describes the Personal Data Ecosystem (PDE), an “emerging landscape of companies and organizations that believe individuals should be in control of their personal data, and make available a growing number of tools and technologies to enable this control.” She argues that, “The right to privacy is highly compatible with the notion of PDE because it enables the individual to have a much greater degree of control – “Radical Control” – over their personal information than is currently possible today.”
  • To ensure that the PDE reaches its privacy-protection potential, Cavouckian argues that it must practice The 7 Foundational Principles of Privacy by Design:
    • Proactive not Reactive; Preventative not Remedial
    • Privacy as the Default Setting
    • Privacy Embedded into Design
    • Full Functionality – Positive-Sum, not Zero-Sum
    • End-to-End Security – Full Lifecycle Protection
    • Visibility and Transparency – Keep it Open
    • Respect for User Privacy – Keep it User-Centric

Kirkham, T., S. Winfield, S. Ravet, and S. Kellomaki. “A Personal Data Store for an Internet of Subjects.” In 2011 International Conference on Information Society (i-Society). 92–97.  http://bit.ly/1alIGuT.

  • This paper examines various factors involved in the governance of personal data online, and argues for a shift from “current service-oriented applications where often the service provider is in control of the person’s data” to a person centric architecture where the user is at the center of personal data control.
  • The paper delves into an “Internet of Subjects” concept of Personal Data Stores, and focuses on implementation of such a concept on personal data that can be characterized as either “By Me” or “About Me.”
  • The paper also presents examples of how a Personal Data Store model could allow users to both protect and present their personal data to external applications, affording them greater control.

OECD. The 2013 OECD Privacy Guidelines. 2013. http://bit.ly/166TxHy.

  • This report is indicative of the “important role in promoting respect for privacy as a fundamental value and a condition for the free flow of personal data across borders” played by the OECD for decades. The guidelines – revised in 2013 for the first time since being drafted in 1980 – are seen as “[t]he cornerstone of OECD work on privacy.”
  • The OECD framework is built around eight basic principles for personal data privacy and security:
    • Collection Limitation
    • Data Quality
    • Purpose Specification
    • Use Limitation
    • Security Safeguards
    • Openness
    • Individual Participation
    • Accountability

Ohm, Paul. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.” UCLA Law Review 57, 1701 (2010). http://bit.ly/18Q5Mta.

  • This article explores the implications of the “astonishing ease” with which scientists have demonstrated the ability to “reidentify” or “deanonmize” supposedly anonymous personal information.
  • Rather than focusing exclusively on whether personal data is “anonymized,” Ohm offers five factors for governments and other data-handling bodies to use for assessing the risk of privacy harm: data-handling techniques, private versus public release, quantity, motive and trust.

Polonetsky, Jules and Omer Tene. “Privacy in the Age of Big Data: A Time for Big Decisions.” Stanford Law Review Online 64 (February 2, 2012): 63. http://bit.ly/1aeSbtG.

  • In this article, Tene and Polonetsky argue that, “The principles of privacy and data protection must be balanced against additional societal values such as public health, national security and law enforcement, environmental protection, and economic efficiency. A coherent framework would be based on a risk matrix, taking into account the value of different uses of data against the potential risks to individual autonomy and privacy.”
  • To achieve this balance, the authors believe that, “policymakers must address some of the most fundamental concepts of privacy law, including the definition of ‘personally identifiable information,’ the role of consent, and the principles of purpose limitation and data minimization.”

Shilton, Katie, Jeff Burke, Deborah Estrin, Ramesh Govindan, Mark Hansen, Jerry Kang, and Min Mun. “Designing the Personal Data Stream: Enabling Participatory Privacy in Mobile Personal Sensing”. TPRC, 2009. http://bit.ly/18gh8SN.

  • This article argues that the Codes of Fair Information Practice, which have served as a model for data privacy for decades, do not take into account a world of distributed data collection, nor the realities of data mining and easy, almost uncontrolled, dissemination.
  • The authors suggest “expanding the Codes of Fair Information Practice to protect privacy in this new data reality. An adapted understanding of the Codes of Fair Information Practice can promote individuals’ engagement with their own data, and apply not only to governments and corporations, but software developers creating the data collection programs of the 21st century.”
  • In order to achieve this change in approach, the paper discusses three foundational design principles: primacy of participants, data legibility, and engagement of participants throughout the data life cycle.

Big Data, Privacy, and the Public Good


Forthcoming book and website by Julia Lane, Victoria Stodden, Stefan Bender, and Helen Nissenbaum (editors): “The overarching goal of the book is to identify ways in which vast new sets of data on human beings can be collected, integrated, and analysed to improve evidence based decision making while protecting confidentiality. …
Massive amounts of new data on human beings can now be accessed and analyzed.  Much has been made of the many uses of such data for pragmatic purposes, including selling goods and services, winning political campaigns, and identifying possible terrorists. Yet “big data” can also be harnessed to serve the public good: scientists can use new forms of data to do research that improves the lives of human beings, federal, state and local governments can use data to improve services and reduce taxpayer costs and public organizations can use information to advocate for public causes.
Much has also been made of the privacy and confidentiality issues associated with access. A survey of statisticians at the 2013 Joint Statistical Meeting found that the majority thought consumers should worry about privacy issues, and that an ethical framework should be in place to guide data scientists.  Yet there are many unanswered questions. What are the ethical and legal requirements for scientists and government officials seeking to serve the public good without harming individual citizens?  What are the rules of engagement?  What are the best ways to provide access while protecting confidentiality? Are there reasonable mechanisms to compensate citizens for privacy loss?
The goal of this book is to answer some of these questions.  The book’s authors paint an intellectual landscape that includes the legal, economic and statistical context necessary to frame the many privacy issues, including the value to the public of data access.   The authors also identify core practical approaches that use new technologies to simultaneously maximize the utility of data access while minimizing information risk. As is appropriate for such a new and evolving field, each chapter also identifies important questions that require future research.
The work in this book is also intended to be accessible to an audience broader than the academy. In addition to informing the public, we hope that the book will be useful to people trying to provide data access but protect confidentiality in the roles as data custodians for federal, state and local agencies, or decision makers on institutional review boards.”
 

Visual Insights: A Practical Guide to Making Sense of Data


New book by Katy Börner and David E. Polley: “In the age of Big Data, the tools of information visualization offer us a macroscope to help us make sense of the avalanche of data available on every subject. This book offers a gentle introduction to the design of insightful information visualizations. It is the only book on the subject that teaches nonprogrammers how to use open code and open data to design insightful visualizations. Readers will learn to apply advanced data mining and visualization techniques to make sense of temporal, geospatial, topical, and network data.

The book, developed for use in an information visualization MOOC, covers data analysis algorithms that enable extraction of patterns and trends in data, with chapters devoted to “when” (temporal data), “where” (geospatial data), “what” (topical data), and “with whom” (networks and trees); and to systems that drive research and development. Examples of projects undertaken for clients include an interactive visualization of the success of game player activity in World of Warcraft; a visualization of 311 number adoption that shows the diffusion of non-emergency calls in the United States; a return on investment study for two decades of HIV/AIDS research funding by NIAID; and a map showing the impact of the HiveNYC Learning Network.
Visual Insights will be an essential resource on basic information visualization techniques for scholars in many fields, students, designers, or anyone who works with data.”

Check out also the Information Visualization MOOC at http://ivmooc.cns.iu.edu/
 

Big Data’s Dangerous New Era of Discrimination


Michael Schrage in HBR blog: “Congratulations. You bought into Big Data and it’s paying off Big Time. You slice, dice, parse and process every screen-stroke, clickstream, Like, tweet and touch point that matters to your enterprise. You now know exactly who your best — and worst — customers, clients, employees and partners are.  Knowledge is power.  But what kind of power does all that knowledge buy?
Big Data creates Big Dilemmas. Greater knowledge of customers creates new potential and power to discriminate. Big Data — and its associated analytics — dramatically increase both the dimensionality and degrees of freedom for detailed discrimination. So where, in your corporate culture and strategy, does value-added personalization and segmentation end and harmful discrimination begin?
Let’s say, for example, that your segmentation data tells you the following:
Your most profitable customers by far are single women between the ages of 34 and 55 closely followed by “happily married” women with at least one child. Divorced women are slightly more profitable than “never marrieds.” Gay males — single and in relationships — are also disproportionately profitable. The “sweet spot” is urban and 28 to 50. These segments collectively account for roughly two-thirds of your profitability.  (Unexpected factoid: Your most profitable customers are overwhelmingly Amazon Prime subscriber. What might that mean?)
Going more granular, as Big Data does, offers even sharper ethno-geographic insight into customer behavior and influence:

  • Single Asian, Hispanic, and African-American women with urban post codes are most likely to complain about product and service quality to the company. Asian and Hispanic complainers happy with resolution/refund tend to be in the top quintile of profitability. African-American women do not.
  • Suburban Caucasian mothers are most likely to use social media to share their complaints, followed closely by Asian and Hispanic mothers. But if resolved early, they’ll promote the firm’s responsiveness online.
  • Gay urban males receiving special discounts and promotions are the most effective at driving traffic to your sites.

My point here is that these data are explicit, compelling and undeniable. But how should sophisticated marketers and merchandisers use them?
Campaigns, promotions and loyalty programs targeting women and gay males seem obvious. But should Asian, Hispanic and white females enjoy preferential treatment over African-American women when resolving complaints? After all, they tend to be both more profitable and measurably more willing to effectively use social media. Does it make more marketing sense encouraging African-American female customers to become more social media savvy? Or are resources better invested in getting more from one’s best customers? Similarly, how much effort and ingenuity flow should go into making more gay male customers better social media evangelists? What kinds of offers and promotions could go viral on their networks?…
Of course, the difference between price discrimination and discrimination positively correlated with gender, ethnicity, geography, class, personality and/or technological fluency is vanishingly small. Indeed, the entire epistemological underpinning of Big Data for business is that it cost-effectively makes informed segmentation and personalization possible…..
But the main source of concern won’t be privacy, per se — it will be whether and how companies and organizations like your own use Big Data analytics to justify their segmentation/personalization/discrimination strategies. The more effective Big Data analytics are in profitably segmenting and serving customers, the more likely those algorithms will be audited by regulators or litigators.
Tomorrow’s Big Data challenge isn’t technical; it’s whether managements have algorithms and analytics that are both fairly transparent and transparently fair. Big Data champions and practitioners had better be discriminating about how discriminating they want to be.”