Neelie KROES (European Commission): “Nearly 200 years ago, the industrial revolution saw new networks take over. Not just a new form of transport, the railways connected industries, connected people, energised the economy, transformed society.
Now we stand facing a new industrial revolution: a digital one.
With cloud computing its new engine, big data its new fuel. Transporting the amazing innovations of the internet, and the internet of things. Running on broadband rails: fast, reliable, pervasive.
My dream is that Europe takes its full part. With European industry able to supply, European citizens and businesses able to benefit, European governments able and willing to support. But we must get all those components right.
What does it mean to say we’re in the big data era?
First, it means more data than ever at our disposal. Take all the information of humanity from the dawn of civilisation until 2003 – nowadays that is produced in just two days. We are also acting to have more and more of it become available as open data, for science, for experimentation, for new products and services.
Second, we have ever more ways – not just to collect that data – but to manage it, manipulate it, use it. That is the magic to find value amid the mass of data. The right infrastructure, the right networks, the right computing capacity and, last but not least, the right analysis methods and algorithms help us break through the mountains of rock to find the gold within.
Third, this is not just some niche product for tech-lovers. The impact and difference to people’s lives are huge: in so many fields.
Transforming healthcare, using data to develop new drugs, and save lives. Greener cities with fewer traffic jams, and smarter use of public money.
A business boost: like retailers who communicate smarter with customers, for more personalisation, more productivity, a better bottom line.
No wonder big data is growing 40% a year. No wonder data jobs grow fast. No wonder skills and profiles that didn’t exist a few years ago are now hot property: and we need them all, from data cleaner to data manager to data scientist.
This can make a difference to people’s lives. Wherever you sit in the data ecosystem – never forget that. Never forget that real impact and real potential.
Politicians are starting to get this. The EU’s Presidents and Prime Ministers have recognised the boost to productivity, innovation and better services from big data and cloud computing.
But those technologies need the right environment. We can’t go on struggling with poor quality broadband. With each country trying on its own. With infrastructure and research that are individual and ineffective, separate and subscale. With different laws and practices shackling and shattering the single market. We can’t go on like that.
Nor can we continue in an atmosphere of insecurity and mistrust.
Recent revelations show what is possible online. They show implications for privacy, security, and rights.
You can react in two ways. One is to throw up your hands and surrender. To give up and put big data in the box marked “too difficult”. To turn away from this opportunity, and turn your back on problems that need to be solved, from cancer to climate change. Or – even worse – to simply accept that Europe won’t figure on this mapbut will be reduced to importing the results and products of others.
Alternatively: you can decide that we are going to master big data – and master all its dependencies, requirements and implications, including cloud and other infrastructures, Internet of things technologies as well as privacy and security. And do it on our own terms.
And by the way – privacy and security safeguards do not just have to be about protecting and limiting. Data generates value, and unlocks the door to new opportunities: you don’t need to “protect” people from their own assets. What you need is to empower people, give them control, give them a fair share of that value. Give them rights over their data – and responsibilities too, and the digital tools to exercise them. And ensure that the networks and systems they use are affordable, flexible, resilient, trustworthy, secure.
One thing is clear: the answer to greater security is not just to build walls. Many millennia ago, the Greek people realised that. They realised that you can build walls as high and as strong as you like – it won’t make a difference, not without the right awareness, the right risk management, the right security, at every link in the chain. If only the Trojans had realised that too! The same is true in the digital age: keep our data locked up in Europe, engage in an impossible dream of isolation, and we lose an opportunity; without gaining any security.
But master all these areas, and we would truly have mastered big data. Then we would have showed technology can take account of democratic values; and that a dynamic democracy can cope with technology. Then we would have a boost to benefit every European.
So let’s turn this asset into gold. With the infrastructure to capture and process. Cloud capability that is efficient, affordable, on-demand. Let’s tackle the obstacles, from standards and certification, trust and security, to ownership and copyright. With the right skills, so our workforce can seize this opportunity. With new partnerships, getting all the right players together. And investing in research and innovation. Over the next two years, we are putting 90 million euros on the table for big data and 125 million for the cloud.
I want to respond to this economic imperative. And I want to respond to the call of the European Council – looking at all the aspects relevant to tomorrow’s digital economy.
You can help us build this future. All of you. Helping to bring about the digital data-driven economy of the future. Expanding and depening the ecosystem around data. New players, new intermediaries, new solutions, new jobs, new growth….”
Personal Data for the Public Good
Final report on “New Opportunities to Enrich Understanding of Individual and Population Health” of the health data exploration project: “Individuals are tracking a variety of health-related data via a growing number of wearable devices and smartphone apps. More and more data relevant to health are also being captured passively as people communicate with one another on social networks, shop, work, or do any number of activities that leave “digital footprints.”
Almost all of these forms of “personal health data” (PHD) are outside of the mainstream of traditional health care, public health or health research. Medical, behavioral, social and public health research still largely rely on traditional sources of health data such as those collected in clinical trials, sifting through electronic medical records, or conducting periodic surveys.
Self-tracking data can provide better measures of everyday behavior and lifestyle and can fill in gaps in more traditional clinical data collection, giving us a more complete picture of health. With support from the Robert Wood Johnson Foundation, the Health Data Exploration (HDE) project conducted a study to better understand the barriers to using personal health data in research from the individuals who track the data about their own personal health, the companies that market self-track- ing devices, apps or services and aggregate and manage that data, and the researchers who might use the data as part of their research.
Perspectives
Through a series of interviews and surveys, we discovered strong interest in contributing and using PHD for research. It should be noted that, because our goal was to access individuals and researchers who are already generating or using digital self-tracking data, there was some bias in our survey findings—participants tended to have more educa- tion and higher household incomes than the general population. Our survey also drew slightly more white and Asian participants and more female participants than in the general population.
Individuals were very willing to share their self-tracking data for research, in particular if they knew the data would advance knowledge in the fields related to PHD such as public health, health care, computer science and social and behavioral science. Most expressed an explicit desire to have their information shared anonymously and we discovered a wide range of thoughts and concerns regarding thoughts over privacy.
Equally, researchers were generally enthusiastic about the potential for using self-tracking data in their research. Researchers see value in these kinds of data and think these data can answer important research questions. Many consider it to be of equal quality and importance to data from existing high quality clinical or public health data sources.
Companies operating in this space noted that advancing research was a worthy goal but not their primary business concern. Many companies expressed interest in research conducted outside of their company that would validate the utility of their device or application but noted the critical importance of maintaining their customer relationships. A number were open to data sharing with academics but noted the slow pace and administrative burden of working with universities as a challenge.
In addition to this considerable enthusiasm, it seems a new PHD research ecosystem may well be emerging. Forty-six percent of the researchers who participated in the study have already used self-tracking data in their research, and 23 percent of the researchers have already collaborated with application, device, or social media companies.
The Personal Health Data Research Ecosystem
A great deal of experimentation with PHD is taking place. Some individuals are experimenting with personal data stores or sharing their data directly with researchers in a small set of clinical experiments. Some researchers have secured one-off access to unique data sets for analysis. A small number of companies, primarily those with more of a health research focus, are working with others to develop data commons to regularize data sharing with the public and researchers.
SmallStepsLab serves as an intermediary between Fitbit, a data rich company, and academic research- ers via a “preferred status” API held by the company. Researchers pay SmallStepsLab for this access as well as other enhancements that they might want.
These promising early examples foreshadow a much larger set of activities with the potential to transform how research is conducted in medicine, public health and the social and behavioral sciences.
Opportunities and Obstacles
There is still work to be done to enhance the potential to generate knowledge out of personal health data:
- Privacy and Data Ownership: Among individuals surveyed, the dominant condition (57%) for making their PHD available for research was an assurance of privacy for their data, and over 90% of respondents said that it was important that the data be anonymous. Further, while some didn’t care who owned the data they generate, a clear majority wanted to own or at least share owner- ship of the data with the company that collected it.
- InformedConsent:Researchersareconcerned about the privacy of PHD as well as respecting the rights of those who provide it. For most of our researchers, this came down to a straightforward question of whether there is informed consent. Our research found that current methods of informed consent are challenged by the ways PHD are being used and reused in research. A variety of new approaches to informed consent are being evaluated and this area is ripe for guidance to assure optimal outcomes for all stakeholders.
- Data Sharing and Access: Among individuals, there is growing interest in, as well as willingness and opportunity to, share personal health data with others. People now share these data with others with similar medical conditions in online groups like PatientsLikeMe or Crohnology, with the intention to learn as much as possible about mutual health concerns. Looking across our data, we find that individuals’ willingness to share is dependent on what data is shared, how the data will be used, who will have access to the data and when, what regulations and legal protections are in place, and the level of compensation or benefit (both personal and public).
- Data Quality: Researchers highlighted concerns about the validity of PHD and lack of standard- ization of devices. While some of this may be addressed as the consumer health device, apps and services market matures, reaching the optimal outcome for researchers might benefit from strategic engagement of important stakeholder groups.
We are reaching a tipping point. More and more people are tracking their health, and there is a growing number of tracking apps and devices on the market with many more in development. There is overwhelming enthusiasm from individuals and researchers to use this data to better understand health. To maximize personal data for the public good, we must develop creative solutions that allow individual rights to be respected while providing access to high-quality and relevant PHD for research, that balance open science with intellectual property, and that enable productive and mutually beneficial collaborations between the private sector and the academic research community.”
The Parable of Google Flu: Traps in Big Data Analysis
David Lazer: “…big data last winter had its “Dewey beats Truman” moment, when the poster child of big data (at least for behavioral data), Google Flu Trends (GFT), went way off the rails in “nowcasting” the flu–overshooting the peak last winter by 130% (and indeed, it has been systematically overshooting by wide margins for 3 years). Tomorrow we (Ryan Kennedy, Alessandro Vespignani, and Gary King) have a paper out in Science dissecting why GFT went off the rails, how that could have been prevented, and the broader lessons to be learned regarding big data.
[We are The Parable of Google Flu (WP-Final).pdf we submitted before acceptance. We have also posted an SSRN paper evaluating GFT for 2013-14, since it was reworked in the Fall.]Key lessons that I’d highlight:
1) Big data are typically not scientifically calibrated. This goes back to my post last month regarding measurement. This does not make them useless from a scientific point of view, but you do need to build into the analysis that the “measures” of behavior are being affected by unseen things. In this case, the likely culprit was the Google search algorithm, which was modified in various ways that we believe likely to have increased flu related searches.
2) Big data + analytic code used in scientific venues with scientific claims need to be more transparent. This is a tricky issue, because there are both legitimate proprietary interests involved and privacy concerns, but much more can be done in this regard than has been done in the 3 GFT papers. [One of my aspirations over the next year is to work together with big data companies, researchers, and privacy advocates to figure out how this can be done.]
3) It’s about the questions, not the size of the data. In this particular case, one could have done a better job stating the likely flu prevalence today by ignoring GFT altogether and just project 3 week old CDC data to today (better still would have been to combine the two). That is, a synthesis would have been more effective than a pure “big data” approach. I think this is likely the general pattern.
4) More generally, I’d note that there is much more that the academy needs to do. First, the academy needs to build the foundation for collaborations around big data (e.g., secure infrastructures, legal understandings around data sharing, etc). Second, there needs to be MUCH more work done to build bridges between the computer scientists who work on big data and social scientists who think about deriving insights about human behavior from data more generally. We have moved perhaps 5% of the way that we need to in this regard.”
How Maps Drive Decisions at EPA
Those spreadsheets detailed places with large oil and gas production and other possible pollutants where EPA might want to focus its own inspection efforts or reach out to state-level enforcement agencies.
During the past two years, the agency has largely replaced those spreadsheets and tables with digital maps, which make it easier for participants to visualize precisely where the top polluting areas are and how those areas correspond to population centers, said Harvey Simon, EPA’s geospatial information officer, making it easier for the agency to focus inspections and enforcement efforts where they will do the most good.
“Rather than verbally going through tables and spreadsheets you have a lot of people who are not [geographic information systems] practitioners who are able to share map information,” Simon said. “That’s allowed them to take a more targeted and data-driven approach to deciding what to do where.”
The change is a result of the EPA Geoplatform, a tool built off Esri’s ArcGIS Online product, which allows companies and government agencies to build custom Web maps using base maps provided by Esri mashed up with their own data.
When the EPA Geoplatform launched in May 2012 there were about 250 people registered to create and share mapping data within the agency. That number has grown to more than 1,000 during the past 20 months, Simon said.
“The whole idea of the platform effort is to democratize the use of geospatial information within the agency,” he said. “It’s relatively simple now to make a Web map and mash up data that’s useful for your work, so many users are creating Web maps themselves without any support from a consultant or from a GIS expert in their office.”
A governmentwide Geoplatform launched in 2012, spurred largely by agencies’ frustrations with the difficulty of sharing mapping data after the 2010 explosion of the Deepwater Horizon oil rig in the Gulf of Mexico. The platform’s goal was twofold. First officials wanted to share mapping data more widely between agencies so they could avoid duplicating each other’s work and to share data more easily during an emergency.
Second, the government wanted to simplify the process for viewing and creating Web maps so they could be used more easily by nonspecialists.
EPA’s geoplatform has essentially the same goals. The majority of the maps the agency builds using the platform aren’t publicly accessible so the EPA doesn’t have to worry about scrubbing maps of data that could reveal personal information about citizens or proprietary data about companies. It publishes some maps that don’t pose any privacy concerns on EPA websites as well as on the national geoplatform and to Data.gov, the government data repository.
Once ArcGIS Online is judged compliant with the Federal Information Security Management Act, or FISMA, which is expected this month, EPA will be able to share significantly more nonpublic maps through the national geoplatform and rely on more maps produced by other agencies, Simon said.
EPA’s geoplatform has also made it easier for the agency’s environmental justice office to share common data….”
Can A Serious Game Improve Privacy Awareness on Facebook?
Emerging Technology From the arXiv: “Do you know who can see the items you’ve posted on Facebook? This, of course, depends on the privacy settings you’ve used for each picture, text or link that you’ve shared throughout your Facebook history.
You might be extremely careful in deciding who can see these things. But as time goes on, the number of items people share increases. And the number contacts they share them with increases too. So it’s easy to lose track of who can see what.
What’s more, an item that you may have been happy to share three years ago when you were at university, you may not be quite so happy to share now that you are looking for employment.
So how best to increase people’s awareness of their privacy settings? Today, Alexandra Cetto and pals from the University of Regensburg in Germany, say they’ve developed a serious game called Friend Inspector that allows users to increase their privacy awareness on Facebook.
And they say that within five months of its launch, the game had been requested over 100,000 times.
In recent years, serious games have become an increasingly important learning medium through digital simulations and virtual environments. So Cetto and co set about developing a game that could increase people’s awareness of privacy on Facebook.
Designing serious games is something of a black art. At the very least, there needs to be motivation to play and some kind of feedback or score to beat. And at the same time, the game has to achieve some kind of learning objective, in this case an enhanced awareness of privacy.
Aimed at 16-25 year olds, the game these guys came up with is deceptively simple. When potential players land on the home page, they’re asked a simple question: “Do you know who can see your Facebook profile?” This is followed by the teaser: “Playfully discover who can see your shared items and get advice to improve your privacy.”
When players sign up, the game retrieves his or her contacts, shared items and their privacy settings from Facebook. It then presents the player with a pair of these shared items asking which is more personal….
Finally, the game assesses the player’s score and makes a set of personalised recommendations about how to improve privacy, such as how to create friend lists, how to share personal items in a targeted manner and how the term friendship on a social network site differs from friendship in the real world…. Try it at http://www.friend-inspector.org/.
Ref: arxiv.org/abs/1402.5878 : Friend Inspector: A Serious Game to Enhance Privacy Awareness in Social Networks”
Big Data, Big New Businesses
Nigel Shaboldt and Michael Chui: “Many people have long believed that if government and the private sector agreed to share their data more freely, and allow it to be processed using the right analytics, previously unimaginable solutions to countless social, economic, and commercial problems would emerge. They may have no idea how right they are.
Even the most vocal proponents of open data appear to have underestimated how many profitable ideas and businesses stand to be created. More than 40 governments worldwide have committed to opening up their electronic data – including weather records, crime statistics, transport information, and much more – to businesses, consumers, and the general public. The McKinsey Global Institute estimates that the annual value of open data in education, transportation, consumer products, electricity, oil and gas, health care, and consumer finance could reach $3 trillion.
These benefits come in the form of new and better goods and services, as well as efficiency savings for businesses, consumers, and citizens. The range is vast. For example, drawing on data from various government agencies, the Climate Corporation (recently bought for $1 billion) has taken 30 years of weather data, 60 years of data on crop yields, and 14 terabytes of information on soil types to create customized insurance products.
Similarly, real-time traffic and transit information can be accessed on smartphone apps to inform users when the next bus is coming or how to avoid traffic congestion. And, by analyzing online comments about their products, manufacturers can identify which features consumers are most willing to pay for, and develop their business and investment strategies accordingly.
Opportunities are everywhere. A raft of open-data start-ups are now being incubated at the London-based Open Data Institute (ODI), which focuses on improving our understanding of corporate ownership, health-care delivery, energy, finance, transport, and many other areas of public interest.
Consumers are the main beneficiaries, especially in the household-goods market. It is estimated that consumers making better-informed buying decisions across sectors could capture an estimated $1.1 trillion in value annually. Third-party data aggregators are already allowing customers to compare prices across online and brick-and-mortar shops. Many also permit customers to compare quality ratings, safety data (drawn, for example, from official injury reports), information about the provenance of food, and producers’ environmental and labor practices.
Consider the book industry. Bookstores once regarded their inventory as a trade secret. Customers, competitors, and even suppliers seldom knew what stock bookstores held. Nowadays, by contrast, bookstores not only report what stock they carry but also when customers’ orders will arrive. If they did not, they would be excluded from the product-aggregation sites that have come to determine so many buying decisions.
The health-care sector is a prime target for achieving new efficiencies. By sharing the treatment data of a large patient population, for example, care providers can better identify practices that could save $180 billion annually.
The Open Data Institute-backed start-up Mastodon C uses open data on doctors’ prescriptions to differentiate among expensive patent medicines and cheaper “off-patent” varieties; when applied to just one class of drug, that could save around $400 million in one year for the British National Health Service. Meanwhile, open data on acquired infections in British hospitals has led to the publication of hospital-performance tables, a major factor in the 85% drop in reported infections.
There are also opportunities to prevent lifestyle-related diseases and improve treatment by enabling patients to compare their own data with aggregated data on similar patients. This has been shown to motivate patients to improve their diet, exercise more often, and take their medicines regularly. Similarly, letting people compare their energy use with that of their peers could prompt them to save hundreds of billions of dollars in electricity costs each year, to say nothing of reducing carbon emissions.
Such benchmarking is even more valuable for businesses seeking to improve their operational efficiency. The oil and gas industry, for example, could save $450 billion annually by sharing anonymized and aggregated data on the management of upstream and downstream facilities.
Finally, the move toward open data serves a variety of socially desirable ends, ranging from the reuse of publicly funded research to support work on poverty, inclusion, or discrimination, to the disclosure by corporations such as Nike of their supply-chain data and environmental impact.
There are, of course, challenges arising from the proliferation and systematic use of open data. Companies fear for their intellectual property; ordinary citizens worry about how their private information might be used and abused. Last year, Telefónica, the world’s fifth-largest mobile-network provider, tried to allay such fears by launching a digital confidence program to reassure customers that innovations in transparency would be implemented responsibly and without compromising users’ personal information.
The sensitive handling of these issues will be essential if we are to reap the potential $3 trillion in value that usage of open data could deliver each year. Consumers, policymakers, and companies must work together, not just to agree on common standards of analysis, but also to set the ground rules for the protection of privacy and property.”
Can We Balance Data Protection With Value Creation?
A “privacy perspective” by Sara Degli Esposti: “In the last few years there has been a dramatic change in the opportunities organizations have to generate value from the data they collect about customers or service users. Customers and users are rapidly becoming collections of “data points” and organizations can learn an awful lot from the analysis of this huge accumulation of data points, also known as “Big Data.”
Some may ask whether it’s even possible to balance the two.
Enter the Big Data Protection Project (BDPP): an Open University study on organizations’ ability to leverage Big Data while complying with EU data protection principles. The study represents a chance for you to contribute to, and learn about, the debate on the reform of the EU Data Protection Directive. It is open to staff with interests in data management or use, from all types of organizations, both for-profit and nonprofit, with interests in Europe.
Join us by visiting the study’s page on the Open University website. Participants will receive a report with all the results. The BDP is a scientific project—no commercial organization is involved—with implications relevant to both policy-makers and industry representatives..
What kind of legislation do we need to create that positive system of incentive for organizations to innovate in the privacy field?
There is no easy answer.
That’s why we need to undertake empirical research into actual information management practices to understand the effects of regulation on people and organizations. Legal instruments conceived with the best intentions can be ineffective or detrimental in practice. However, other factors can also intervene and motivate business players to develop procedures and solutions which go far beyond compliance. Good legislation should complement market forces in bringing values and welfare to both consumers and organizations.
Is European data protection law keeping its promise of protecting users’ information privacy while contributing to the flourishing of the digital economy or not? Will the proposed General Data Protection Regulation (GDPR) be able to achieve this goal? What would you suggest to do to motivate organizations to invest in information security and take information privacy seriously?
Let’s consider for a second some basic ideas such as the eight fundamental data protection principles: notice, consent, purpose specification and limitation, data quality, respect of data subjects’ rights, information security and accountability. Many of these ideas are present in the EU 1995 Data Protection Directive, the U.S. Fair Information Practice Principles (FIPPs) andthe 1980 OECD Guidelines. The fundamental question now is, should all these ideas be brought into the future, as suggested in the proposed new GDPR, orshould we reconsider our approach and revise some of them, as recommended in the 21st century version of the 1980 OECD Guidelines?
As you may know, notice and consent are often taken as examples of how very good intentions can be transformed into actions of limited importance. Rather than increase people’s awareness of the growing data economy, notice and consent have produced a tick-box tendency accompanied by long and unintelligible privacy policies. Besides, consent is rarely freely granted. Individuals give their consent in exchange for some product or service or as part of a job relationship. The imbalance between the two goods traded—think about how youngsters perceive not having access to some social media as a form of social exclusion—and the lack of feasible alternatives often make an instrument, such as the current use made of consent, meaningless.
On the other hand, a principle such as data quality, which has received very limited attention, could offer opportunities to policy-makers and businesses to reopen the debate on users’ control of their personal data. Having updated, accurate data is something very valuable for organizations. Data quality is also key to the success of many business models. New partnerships between users and organizations could be envisioned under this principle.
Finally, data collection limitation and purpose specification could be other examples of the divide between theory and practice: The tendency we see is that people and businesses want to share, merge and reuse data over time and to do new and unexpected things. Of course, we all want to avoid function creep and prevent any detrimental use of our personal data. We probably need new, stronger mechanisms to ensure data are used for good purposes.
Digital data have become economic assets these days. We need good legislation to stop the black market for personal data and open the debate on how each of us wants to contribute to, and benefit from, the data economy.”
New Programming Language Removes Human Error from Privacy Equation
MIT Technology Review: “Anytime you hear about Facebook inadvertently making your location public, or revealing who is stalking your profile, it’s likely because a programmer added code that inadvertently led to a bug.
But what if there was a system in place that could substantially reduce such privacy breaches and effectively remove human error from the equation?
One MIT PhD thinks she has the answer, and its name is Jeeves.
This past month, Jean Yang released an open-source Python version of “Jeeves,” a programming language with built-in privacy features that free programmers from having to provide on-the-go ad-hoc maintenance of privacy settings.
Given that somewhere between 10 and 20 percent of all code is related to privacy policy, Yang thinks that Jeeves will be an attractive option for social app developers who are looking to be more efficient in their use of programmer resources – as well as those who are hoping to assuage users’ privacy concerns about if and how they use your data.
For more information about Jeeves visit the project site.
For more information on Yang visit her CSAIL page.”
We need a new Bismarck to tame the machines
If, in the words of Google chairman Eric Schmidt, there is a “race between people and computers” even he suspects people may not win, democrats everywhere should be worried. In the same vein, Lawrence Summers, former Treasury secretary, recently noted that new technology could be liberating but that the government needed to soften its negative effects and make sure the benefits were distributed fairly. The problem, he went on, was that “we don’t yet have the Gladstone, the Teddy Roosevelt or the Bismarck of the technology era”.
These Victorian giants have much to teach us. They were at the helm when their societies were transformed by the telegraph, the electric light, the telephone and the combustion engine. Each tried to soften the blow of change, and to equalise the benefits of prosperity for working people. With William Gladstone it was universal primary education and the vote for Britain’s working men. With Otto von Bismarck it was legislation that insured German workers against ill-health and old age. For Roosevelt it was the entire progressive agenda, from antitrust legislation and regulation of freight rates to the conservation of America’s public lands….
The Victorians created the modern state to tame the market in the name of democracy but they wanted a nightwatchman state, not a Leviathan. Thanks to the new digital technologies, the state they helped create now has powers of surveillance that threaten our privacy and freedom. What new technology makes possible, states will do. Keeping technology in the service of democracy will not be easy. Asking judges to guard the guards only bloats the state apparatus still further. Allowing dissident insiders to get away with leaking the state’s secrets will only result in more secretive, paranoid and controlling government.
The Victorians would have said there is a solution – representative government itself – but it requires citizens to trust their representatives to hold the government in check. The Victorians created modern, mass representative democracy so that collective public choice could control change for everyone’s benefit. They believed that representatives, if given the authority and the necessary information, could control the power that technology confers on the modern state.
This is still a viable ideal but we have plenty of rebuilding before our democratic institutions are ready for the task. Congress and parliament need to regain trust and capability; and, if they do, we can start recovering the faith of the Victorians we so sorely need: the belief that democracy can master the technologies that are transforming our lives.“
Selected Readings on Personal Data: Security and Use
The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of personal data was originally published in 2014.
Advances in technology have greatly increased the potential for policymakers to utilize the personal data of large populations for the public good. However, the proliferation of vast stores of useful data has also given rise to a variety of legislative, political, and ethical concerns surrounding the privacy and security of citizens’ personal information, both in terms of collection and usage. Challenges regarding the governance and regulation of personal data must be addressed in order to assuage individuals’ concerns regarding the privacy, security, and use of their personal information.
Selected Reading List (in alphabetical order)
- Ann Cavoukian – Personal Data Ecosystem (PDE) – A Privacy by Design Approach to an Individual’s Pursuit of Radical Control – a paper describing the emerging framework of technologies enabling individuals to hold greater control of their data.
- T. Kirkham, S. Winfield, S. Ravet and S. Kellomaki – A Personal Data Store for an Internet of Subjects – a paper arguing for a shift from the current service-oriented data control architecture to a system centered on individuals controlling their data.
- OECD – The 2013 OECD Privacy Guidelines – a privacy framework built around eight core principles.
- Paul Ohm – Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization – a paper revealing the ease with which researchers have been able to reattach supposedly anonymized personal data to individuals.
- Jules Polonetsky and Omer Tene – Privacy in the Age of Big Data: A Time for Big Decisions – a paper proposing the development of a risk-value matrix for personal data in the big data era.
- Katie Shilton, Jeff Burke, Deborah Estrin, Ramesh Govindan, Mark Hansen, Jerry Kang, and Min Mun – Designing the Personal Data Stream: Enabling Participatory Privacy in Mobile Personal Sensing – a paper arguing for a reimagined system for protecting the privacy of personal data, moving past the out-of-date Codes of Fair Information Practice.
Annotated Selected Reading List (in alphabetical order)
Cavoukian, Ann. “Personal Data Ecosystem (PDE) – A Privacy by Design Approach to an Individual’s Pursuit of Radical Control.” Privacy by Design, October 15, 2013. https://bit.ly/2S00Yfu.
- In this paper, Cavoukian describes the Personal Data Ecosystem (PDE), an “emerging landscape of companies and organizations that believe individuals should be in control of their personal data, and make available a growing number of tools and technologies to enable this control.” She argues that, “The right to privacy is highly compatible with the notion of PDE because it enables the individual to have a much greater degree of control – “Radical Control” – over their personal information than is currently possible today.”
- To ensure that the PDE reaches its privacy-protection potential, Cavouckian argues that it must practice The 7 Foundational Principles of Privacy by Design:
- Proactive not Reactive; Preventative not Remedial
- Privacy as the Default Setting
- Privacy Embedded into Design
- Full Functionality – Positive-Sum, not Zero-Sum
- End-to-End Security – Full Lifecycle Protection
- Visibility and Transparency – Keep it Open
- Respect for User Privacy – Keep it User-Centric
Kirkham, T., S. Winfield, S. Ravet, and S. Kellomaki. “A Personal Data Store for an Internet of Subjects.” In 2011 International Conference on Information Society (i-Society). 92–97. http://bit.ly/1alIGuT.
- This paper examines various factors involved in the governance of personal data online, and argues for a shift from “current service-oriented applications where often the service provider is in control of the person’s data” to a person centric architecture where the user is at the center of personal data control.
- The paper delves into an “Internet of Subjects” concept of Personal Data Stores, and focuses on implementation of such a concept on personal data that can be characterized as either “By Me” or “About Me.”
- The paper also presents examples of how a Personal Data Store model could allow users to both protect and present their personal data to external applications, affording them greater control.
OECD. The 2013 OECD Privacy Guidelines. 2013. http://bit.ly/166TxHy.
- This report is indicative of the “important role in promoting respect for privacy as a fundamental value and a condition for the free flow of personal data across borders” played by the OECD for decades. The guidelines – revised in 2013 for the first time since being drafted in 1980 – are seen as “[t]he cornerstone of OECD work on privacy.”
- The OECD framework is built around eight basic principles for personal data privacy and security:
- Collection Limitation
- Data Quality
- Purpose Specification
- Use Limitation
- Security Safeguards
- Openness
- Individual Participation
- Accountability
Ohm, Paul. “Broken Promises of Privacy: Responding to the Surprising Failure of Anonymization.” UCLA Law Review 57, 1701 (2010). http://bit.ly/18Q5Mta.
- This article explores the implications of the “astonishing ease” with which scientists have demonstrated the ability to “reidentify” or “deanonmize” supposedly anonymous personal information.
- Rather than focusing exclusively on whether personal data is “anonymized,” Ohm offers five factors for governments and other data-handling bodies to use for assessing the risk of privacy harm: data-handling techniques, private versus public release, quantity, motive and trust.
Polonetsky, Jules and Omer Tene. “Privacy in the Age of Big Data: A Time for Big Decisions.” Stanford Law Review Online 64 (February 2, 2012): 63. http://bit.ly/1aeSbtG.
- In this article, Tene and Polonetsky argue that, “The principles of privacy and data protection must be balanced against additional societal values such as public health, national security and law enforcement, environmental protection, and economic efficiency. A coherent framework would be based on a risk matrix, taking into account the value of different uses of data against the potential risks to individual autonomy and privacy.”
- To achieve this balance, the authors believe that, “policymakers must address some of the most fundamental concepts of privacy law, including the definition of ‘personally identifiable information,’ the role of consent, and the principles of purpose limitation and data minimization.”
Shilton, Katie, Jeff Burke, Deborah Estrin, Ramesh Govindan, Mark Hansen, Jerry Kang, and Min Mun. “Designing the Personal Data Stream: Enabling Participatory Privacy in Mobile Personal Sensing”. TPRC, 2009. http://bit.ly/18gh8SN.
- This article argues that the Codes of Fair Information Practice, which have served as a model for data privacy for decades, do not take into account a world of distributed data collection, nor the realities of data mining and easy, almost uncontrolled, dissemination.
- The authors suggest “expanding the Codes of Fair Information Practice to protect privacy in this new data reality. An adapted understanding of the Codes of Fair Information Practice can promote individuals’ engagement with their own data, and apply not only to governments and corporations, but software developers creating the data collection programs of the 21st century.”
- In order to achieve this change in approach, the paper discusses three foundational design principles: primacy of participants, data legibility, and engagement of participants throughout the data life cycle.