Remembering and Forgetting in the Digital Age


Book by Thouvenin, Florent (et al.): “… examines the fundamental question of how legislators and other rule-makers should handle remembering and forgetting information (especially personally identifiable information) in the digital age. It encompasses such topics as privacy, data protection, individual and collective memory, and the right to be forgotten when considering data storage, processing and deletion. The authors argue in support of maintaining the new digital default, that (personally identifiable) information should be remembered rather than forgotten.

The book offers guidelines for legislators as well as private and public organizations on how to make decisions on remembering and forgetting personally identifiable information in the digital age. It draws on three main perspectives: law, based on a comprehensive analysis of Swiss law that serves as an example; technology, specifically search engines, internet archives, social media and the mobile internet; and an interdisciplinary perspective with contributions from various disciplines such as philosophy, anthropology, sociology, psychology, and economics, amongst others.. Thanks to this multifaceted approach, readers will benefit from a holistic view of the informational phenomenon of “remembering and forgetting”.

This book will appeal to lawyers, philosophers, sociologists, historians, economists, anthropologists, and psychologists among many others. Such wide appeal is due to its rich and interdisciplinary approach to the challenges for individuals and society at large with regard to remembering and forgetting in the digital age…(More)”

Better ways to measure the new economy


Valerie Hellinghausen and Evan Absher at Kauffman Foundation: “The old measure of “jobs numbers” as an economic indicator is shifting to new metrics to measure a new economy.

With more communities embracing inclusive entrepreneurial ecosystems as the new model of economic development, entrepreneurs, ecosystem builders, and government agencies – at all levels – need to work together on data-driven initiatives. While established measures still have a place, new metrics have the potential to deliver the timely and granular information that is more useful at the local level….

Three better ways to measure the new economy:

  1. National and local datasets:Numbers used to discuss the economy are national level and usually not very timely. These numbers are useful to understand large trends, but fail to capture local realities. One way to better measure local economies is to use local administrative datasets. There are many obstacles with this approach, but the idea is gaining interest. Data infrastructure, policies, and projects are building connections between local and national agencies. Joining different levels of government data will provide national scale and local specificity.
  1. Private and public data:The words private and public typically reflect privacy issues, but there is another public and private dimension. Public institutions possess vast amounts of data, but so do private companies. For instance, sites like PayPal, Square, Amazon, and Etsy possess data that could provide real-time assessment of an individual company’s financial health. The concept of credit and risk could be expanded to benefit those currently underserved, if combined with local administrative information like tax, wage, and banking data. Fair and open use of private data could open credit to currently underfunded entrepreneurs.
  1. New metrics:Developing connections between different datasets will result in new metrics of entrepreneurial activity: metrics that measure human connection, social capital, community creativity, and quality of life. Metrics that capture economic activity at the community level and in real time. For example, the Kauffman Foundation has funded research that uses labor data from private job-listing sites to better understand the match between the workforce entrepreneurs need and the workforce available within the immediate community. But new metrics are not enough, they must connect to the final goal of economic independence. Using new metrics to help ecosystems understand how policies and programs impact entrepreneurship is the final step to measuring local economies….(More)”.

Self-Invasion And The Invaded Self


Rochelle Gurstein in the Baffler: “WHAT DO WE LOSE WHEN WE LOSE OUR PRIVACY? This question has become increasingly difficult to answer, living as we do in a society that offers boundless opportunities for men and women to expose themselves (in all dimensions of that word) as never before, to commit what are essentially self-invasions of privacy. Although this is a new phenomenon, it has become as ubiquitous as it is quotidian, and for that reason, it is perhaps one of the most telling signs of our time. To get a sense of the sheer range of unconscious exhibitionism, we need only think of the popularity of reality TV shows, addiction-recovery memoirs, and cancer diaries. Then there are the banal but even more conspicuous varieties, like soaring, all-glass luxury apartment buildings and hotels in which inhabitants display themselves in all phases of their private lives to the casual glance of thousands of city walkers below. Or the incessant sound of people talking loudly—sometimes gossiping, sometimes crying—on their cell phones, broadcasting to total strangers the intimate details of their lives.

And, of course, there are now unprecedented opportunities for violating one’s own privacy, furnished by the technology of the internet. The results are everywhere, from selfies and Instagrammed trivia to the almost automatic, everyday activity of Facebook users registering their personal “likes” and preferences. (As we recently learned, this online pastime is nowhere near as private as we had been led to believe; more than fifty million users’ idly generated “data” was “harvested” by Cambridge Analytica to make “personality profiles” that were then used to target voters with advertisements from Donald Trump’s presidential campaign.)

Beyond these branded and aggressively marketed forums for self-invasions of privacy there are all the giddy, salacious forms that circulate in graphic images and words online—the sort that led not so long ago to the downfall of Anthony Weiner. The mania for attention of any kind is so pervasive—and the invasion of privacy so nonchalant—that many of us no longer notice, let alone mind, what in the past would have been experienced as insolent violations of privacy….(More)”.

Trust, Security, and Privacy in Crowdsourcing


Guest Editorial to Special Issue of IEEE Internet of Things Journal: “As we become increasingly reliant on intelligent, interconnected devices in every aspect of our lives, critical trust, security, and privacy concerns are raised as well.

First, the sensing data provided by individual participants is not always reliable. It may be noisy or even faked due to various reasons, such as poor sensor quality, lack of sensor calibration, background noise, context impact, mobility, incomplete view of observations, or malicious attacks. The crowdsourcing applications should be able to evaluate the trustworthiness of collected data in order to filter out the noisy and fake data that may disturb or intrude a crowdsourcing system. Second, providing data (e.g., photographs taken with personal mobile devices) or using IoT applications may compromise data providers’ personal data privacy (e.g., location, trajectory, and activity privacy) and identity privacy. Therefore, it becomes essential to assess the trust of the data while preserving the data providers’ privacy. Third, data analytics and mining in crowdsourcing may disclose the privacy of data providers or related entities to unauthorized parities, which lowers the willingness of participants to contribute to the crowdsourcing system, impacts system acceptance, and greatly impedes its further development. Fourth, the identities of data providers could be forged by malicious attackers to intrude the whole crowdsourcing system. In this context, trust, security, and privacy start to attract a special attention in order to achieve high quality of service in each step of crowdsourcing with regard to data collection, transmission, selection, processing, analysis and mining, as well as utilization.

Trust, security, and privacy in crowdsourcing receives increasing attention. Many methods have been proposed to protect privacy in the process of data collection and processing. For example, data perturbation can be adopted to hide the real data values during data collection. When preprocessing the collected data, data anonymization (e.g., k-anonymization) and fusion can be applied to break the links between the data and their sources/providers. In application layer, anonymity is used to mask the real identities of data sources/providers. To enable privacy-preserving data mining, secure multiparty computation (SMC) and homomorphic encryption provide options for protecting raw data when multiple parties jointly run a data mining algorithm. Through cryptographic techniques, no party knows anything else than its own input and expected results. For data truth discovery, applicable solutions include correlation-based data quality analysis and trust evaluation of data sources. But current solutions are still imperfect, incomprehensive, and inefficient….(More)”.

Countries Can Learn from France’s Plan for Public Interest Data and AI


Nick Wallace at the Center for Data Innovation: “French President Emmanuel Macron recently endorsed a national AI strategy that includes plans for the French state to make public and private sector datasets available for reuse by others in applications of artificial intelligence (AI) that serve the public interest, such as for healthcare or environmental protection. Although this strategy fails to set out how the French government should promote widespread use of AI throughout the economy, it will nevertheless give a boost to AI in some areas, particularly public services. Furthermore, the plan for promoting the wider reuse of datasets, particularly in areas where the government already calls most of the shots, is a practical idea that other countries should consider as they develop their own comprehensive AI strategies.

The French strategy, drafted by mathematician and Member of Parliament Cédric Villani, calls for legislation to mandate repurposing both public and private sector data, including personal data, to enable public-interest uses of AI by government or others, depending on the sensitivity of the data. For example, public health services could use data generated by Internet of Things (IoT) devices to help doctors better treat and diagnose patients. Researchers could use data captured by motorway CCTV to train driverless cars. Energy distributors could manage peaks and troughs in demand using data from smart meters.

Repurposed data held by private companies could be made publicly available, shared with other companies, or processed securely by the public sector, depending on the extent to which sharing the data presents privacy risks or undermines competition. The report suggests that the government would not require companies to share data publicly when doing so would impact legitimate business interests, nor would it require that any personal data be made public. Instead, Dr. Villani argues that, if wider data sharing would do unreasonable damage to a company’s commercial interests, it may be appropriate to only give public authorities access to the data. But where the stakes are lower, companies could be required to share the data more widely, to maximize reuse. Villani rightly argues that it is virtually impossible to come up with generalizable rules for how data should be shared that would work across all sectors. Instead, he argues for a sector-specific approach to determining how and when data should be shared.

After making the case for state-mandated repurposing of data, the report goes on to highlight four key sectors as priorities: health, transport, the environment, and defense. Since these all have clear implications for the public interest, France can create national laws authorizing extensive repurposing of personal data without violating the General Data Protection Regulation (GDPR) which allows national laws that permit the repurposing of personal data where it serves the public interest. The French strategy is the first clear effort by an EU member state to proactively use this clause in aid of national efforts to bolster AI….(More)”.

Mapping the Privacy-Utility Tradeoff in Mobile Phone Data for Development


Paper by Alejandro Noriega-Campero, Alex Rutherford, Oren Lederman, Yves A. de Montjoye, and Alex Pentland: “Today’s age of data holds high potential to enhance the way we pursue and monitor progress in the fields of development and humanitarian action. We study the relation between data utility and privacy risk in large-scale behavioral data, focusing on mobile phone metadata as paradigmatic domain. To measure utility, we survey experts about the value of mobile phone metadata at various spatial and temporal granularity levels. To measure privacy, we propose a formal and intuitive measure of reidentification riskthe information ratioand compute it at each granularity level. Our results confirm the existence of a stark tradeoff between data utility and reidentifiability, where the most valuable datasets are also most prone to reidentification. When data is specified at ZIP-code and hourly levels, outside knowledge of only 7% of a person’s data suffices for reidentification and retrieval of the remaining 93%. In contrast, in the least valuable dataset, specified at municipality and daily levels, reidentification requires on average outside knowledge of 51%, or 31 data points, of a person’s data to retrieve the remaining 49%. Overall, our findings show that coarsening data directly erodes its value, and highlight the need for using data-coarsening, not as stand-alone mechanism, but in combination with data-sharing models that provide adjustable degrees of accountability and security….(More)”.

A roadmap for restoring trust in Big Data


Mark Lawler et al in the Lancet: “The fallout from the Cambridge Analytica–Facebook scandal marks a significant inflection point in the public’s trust concerning Big Data. The health-science community must use this crisis-in-confidence to redouble its commitment to talk openly and transparently about benefits and risks and to act decisively to deliver robust effective governance frameworks, under which personal health data can be responsibly used. Activities such as the Innovative Medicines Initiative’s Big Data for Better Outcomes emphasise how a more granular data-driven understanding of human diseases including cancer could underpin innovative therapeutic intervention.
 Health Data Research UK is developing national research expertise and infrastructure to maximise the value of health data science for the National Health Service and ultimately British citizens.
Comprehensive data analytics are crucial to national programmes such as the US Cancer Moonshot, the UK’s 100 000 Genomes project, and other national genomics programmes. Cancer Core Europe, a research partnership between seven leading European oncology centres, has personal data sharing at its core. The Global Alliance for Genomics and Health recently highlighted the need for a global cancer knowledge network to drive evidence-based solutions for a disease that kills more than 8·7 million citizens annually worldwide. These activities risk being fatally undermined by the recent data-harvesting controversy.
We need to restore the public’s trust in data science and emphasise its positive contribution in addressing global health and societal challenges. An opportunity to affirm the value of data science in Europe was afforded by Digital Day 2018, which took place on April 10, 2018, in Brussels, and where European Health Ministers signed a declaration of support to link existing or future genomic databanks across the EU, through the Million European Genomes Alliance.
So how do we address evolving challenges in analysis, sharing, and storage of information, ensure transparency and confidentiality, and restore public trust? We must articulate a clear Social Contract, where citizens (as data donors) are at the heart of decision-making. We need to demonstrate integrity, honesty, and transparency as to what happens to data and what level of control people can, or cannot, expect. We must embed ethical rigour in all our data-driven processes. The Framework for Responsible Sharing of Genomic and Health Related Data represents a practical global approach, promoting effective and ethical sharing and use of research or patient data, while safeguarding individual privacy through secure and accountable data transfer…(More)”.

Americans Want to Share Their Medical Data. So Why Can’t They?


Eleni Manis at RealClearHealth: “Americans are willing to share personal data — even sensitive medical data — to advance the common good. A recent Stanford University study found that 93 percent of medical trial participants in the United States are willing to share their medical data with university scientists and 82 percent are willing to share with scientists at for-profit companies. In contrast, less than a third are concerned that their data might be stolen or used for marketing purposes.

However, the majority of regulations surrounding medical data focus on individuals’ ability to restrict the use of their medical data, with scant attention paid to supporting the ability to share personal data for the common good. Policymakers can begin to right this balance by establishing a national medical data donor registry that lets individuals contribute their medical data to support research after their deaths. Doing so would help medical researchers pursue cures and improve health care outcomes for all Americans.

Increased medical data sharing facilitates advances in medical science in three key ways. First, de-identified participant-level data can be used to understand the results of trials, enabling researchers to better explicate the relationship between treatments and outcomes. Second, researchers can use shared data to verify studies and identify cases of data fraud and research misconduct in the medical community. For example, one researcher recently discovered a prolific Japanese anesthesiologist had falsified data for almost two decades. Third, shared data can be combined and supplemented to support new studies and discoveries.

Despite these benefits, researchers, research funders, and regulators have struggled to establish a norm for sharing clinical research data. In some cases, regulatory obstacles are to blame. HIPAA — the federal law regulating medical data — blocks some sharing on grounds of patient privacy, while federal and state regulations governing data sharing are inconsistent. Researchers themselves have a proprietary interest in data they produce, while academic researchers seeking to maximize publications may guard data jealously.

Though funding bodies are aware of this tension, they are unable to resolve it on their own. The National Institutes of Health, for example, requires a data sharing plan for big-ticket funding but recognizes that proprietary interests may make sharing impossible….(More)”.

Reclaiming the Smart City: Personal Data, Trust and the New Commons


Report by Theo Bass, Emma Sutherland and Tom Symons: “Cities are becoming a major focal point in the personal data economy. In city governments, there is a clamour for data-informed approaches to everything from waste management and public transport through to policing and emergency response

This is a triumph for advocates of the better use of data in how we run cities. After years of making the case, there is now a general acceptance that social, economic and environmental pressures can be better responded to by harnessing data.

But as that argument is won, a fresh debate is bubbling up under the surface of the glossy prospectus of the smart city: who decides what we do with all this data, and how do we ensure that its generation and use does not result in discrimination, exclusion and the erosion of privacy for citizens?

This report brings together a range of case studies featuring cities which have pioneered innovative practices and policies around the responsible use of data about people. Our methods combined desk research and over 20 interviews with city administrators in a number of cities across the world.

Recommendations

Based on our case studies, we also compile a range of lessons that policymakers can use to build an alternative version to the smart city – one which promotes ethical data collection practices and responsible innovation with new technologies:

  1. Build consensus around clear ethical principles, and translate them into practical policies.
  2. Train public sector staff in how to assess the benefits and risks of smart technologies.
  3. Look outside the council for expertise and partnerships, including with other city governments.
  4. Find and articulate the benefits of privacy and digital ethics to multiple stakeholders
  5. Become a test-bed for new services that give people more privacy and control.
  6. Make time and resources available for genuine public engagement on the use of surveillance technologies.
  7. Build digital literacy and make complex or opaque systems more understandable and accountable.
  8. Find opportunities to involve citizens in the process of data collection and analysis from start to finish….(More)”.

Big Data Is Getting Bigger. So Are the Privacy and Ethical Questions.


Goldie Blumenstyk at The Chronicle of Higher Education: “…The next step in using “big data” for student success is upon us. It’s a little cool. And also kind of creepy.

This new approach goes beyond the tactics now used by hundreds of colleges, which depend on data collected from sources like classroom teaching platforms and student-information systems. It not only makes a technological leap; it also raises issues around ethics and privacy.

Here’s how it works: Whenever you log on to a wireless network with your cellphone or computer, you leave a digital footprint. Move from one building to another while staying on the same network, and that network knows how long you stayed and where you went. That data is collected continuously and automatically from the network’s various nodes.

Now, with the help of a company called Degree Analytics, a few colleges are beginning to use location data collected from students’ cellphones and laptops as they move around campus. Some colleges are using it to improve the kind of advice they might send to students, like a text-message reminder to go to class if they’ve been absent.

Others see it as a tool for making decisions on how to use their facilities. St. Edward’s University, in Austin, Tex., used the data to better understand how students were using its computer-equipped spaces. It found that a renovated lounge, with relatively few computers but with Wi-Fi access and several comfy couches, was one of the most popular such sites on campus. Now the university knows it may not need to buy as many computers as it once thought.

As Gary Garofalo, a co-founder and chief revenue officer of Degree Analytics, told me, “the network data has very intriguing advantages” over the forms of data that colleges now collect.

Some of those advantages are obvious: If you’ve got automatic information on every person walking around with a cellphone, your dataset is more complete than if you need to extract it from a learning-management system or from the swipe-card readers some colleges use to track students’ activities. Many colleges now collect such data to determine students’ engagement with their coursework and campus activities.

Of course, the 24-7 reporting of the data is also what makes this approach seem kind of creepy….

I’m not the first to ask questions like this. A couple of years ago, a group of educators organized by Martin Kurzweil of Ithaka S+R and Mitchell Stevens of Stanford University issued a series of guidelines for colleges and companies to consider as they began to embrace data analytics. Among other principles, the guidelines highlighted the importance of being transparent about how the information is used, and ensuring that institutions’ leaders really understand what companies are doing with the data they collect. Experts at New America weighed in too.

I asked Kurzweil what he makes of the use of Wi-Fi information. Location tracking tends toward the “dicey” side of the spectrum, he says, though perhaps not as far out as using students’ social-media habits, health information, or what they check out from the library. The fundamental question, he says, is “how are they managing it?”… So is this the future? Benz, at least, certainly hopes so. Inspired by the Wi-Fi-based StudentLife research project at Dartmouth College and the experiences Purdue University is having with students’ use of its Forecast app, he’s in talks now with a research university about a project that would generate other insights that might be gleaned from students’ Wi-Fi-usage patterns….(More)