The Emerging Science of Human-Data Interaction


Emerging Technology From the arXiv: “The rapidly evolving ecosystems associated with personal data is creating an entirely new field of scientific study, say computer scientists. And this requires a much more powerful ethics-based infrastructure….
Now Richard Mortier at the University of Nottingham in the UK and a few pals say the increasingly complex, invasive and opaque use of data should be a call to arms to change the way we study data, interact with it and control its use. Today, they publish a manifesto describing how a new science of human-data interaction is emerging from this “data ecosystem” and say that it combines disciplines such as computer science, statistics, sociology, psychology and behavioural economics.
They start by pointing out that the long-standing discipline of human-computer interaction research has always focused on computers as devices to be interacted with. But our interaction with the cyber world has become more sophisticated as computing power has become ubiquitous, a phenomenon driven by the Internet but also through mobile devices such as smartphones. Consequently, humans are constantly producing and revealing data in all kinds of different ways.
Mortier and co say there is an important distinction between data that is consciously created and released such as a Facebook profile; observed data such as online shopping behaviour; and inferred data that is created by other organisations about us, such as preferences based on friends’ preferences.
This leads the team to identify three key themes associated with human-data interaction that they believe the communities involved with data should focus on.
The first of these is concerned with making data, and the analytics associated with it, both transparent and comprehensible to ordinary people. Mortier and co describe this as the legibility of data and say that the goal is to ensure that people are clearly aware of the data they are providing, the methods used to draw inferences about it and the implications of this.
Making people aware of the data being collected is straightforward but understanding the implications of this data collection process and the processing that follows is much harder. In particular, this could be in conflict with the intellectual property rights of the companies that do the analytics.
An even more significant factor is that the implications of this processing are not always clear at the time the data is collected. A good example is the way the New York Times tracked down an individual after her seemingly anonymized searches were published by AOL. It is hard to imagine that this individual had any idea that the searches she was making would later allow her identification.
The second theme is concerned with giving people the ability to control and interact with the data relating to them. Mortier and co describe this as “agency”. People must be allowed to opt in or opt out of data collection programs and to correct data if it turns out to be wrong or outdated and so on. That will require simple-to-use data access mechanisms that have yet to be developed
The final theme builds on this to allow people to change their data preferences in future, an idea the team call “negotiability”. Something like this is already coming into force in the European Union where the Court of Justice has recently begun to enforce the “right to be forgotten”, which allows people to remove information from search results under certain circumstances….”
Ref: http://arxiv.org/abs/1412.6159  Human-Data Interaction: The Human Face of the Data-Driven Society

Pricey privacy: Framing the economy of information in the digital age


Paper by Federica Fornaciari in FirstMonday: “As new information technologies become ubiquitous, individuals are often prompted rethinking disclosure. Available media narratives may influence one’s understanding of the benefits and costs related to sharing personal information. This study, guided by frame theory, undertakes a Critical Discourse Analysis (CDA) of media discourse developed to discuss the privacy concerns related to the corporate collection and trade of personal information. The aim is to investigate the frames — the central organizing ideas — used in the media to discuss such an important aspect of the economics of personal data. The CDA explored 130 articles published in the New York Times between 2000 and 2012. Findings reveal that the articles utilized four frames: confusion and lack of transparency, justification and private interests, law and self-regulation, and commodification of information. Articles used episodic framing often discussing specific instances of infringements rather than broader thematic accounts. Media coverage tended to frame personal information as a commodity that may be traded, rather than as a fundamental value.”

Restoring Confidence in Open, Shared and Personal Data


Report of the UK Digital Government Review: “It is obvious that government needs to be able to use data both to deliver services and to present information to public view. How else would government know which bank account to place a pension payment into, or a citizen know the results of an election or how to contact their elected representatives?

As more and more data is created, preserved and shared in ever-increasing volumes a number of urgent questions are begged: over opportunities and hazards; over the importance of using best-practice techniques, insights and technologies developed in the private sector, academia and elsewhere; over the promises and limitations of openness; and how all this might be articulated and made accessible to the public.

Government has already adopted “open data” (we will discuss this more in the next section) and there are now increasing calls for government to pay more attention to data analytics and so-called “big data” – although the first faltering steps to unlock benefits, here, have often ended in the discovery that using large-scale data is a far more nuanced business than was initially assumed

Debates around government and data have often been extremely high-profile – the NHS care.data [27] debate was raging while this review was in progress – but they are also shrouded in terms that can generate confusion and complexities that are not easily summarized.

In this chapter we will unpick some of these terms and some parts of the debate. This is a detailed and complex area and there is much more that could have been included [28]. This is not an area that can easily be summarized into a simple bullet-pointed list of policies.

Within this report we will use the following terms and definitions, proceeding to a detailed analysis of each in turn:

Type of Data

Definition [29]

Examples

1. Open Data Data that can be freely used, reused and redistributed by anyone – subject only, at most, to the requirement to attribute and sharealike Insolvency notices in the London Gazette
Government spending information
Public transport information
Official National Statistics
2. Shared Data Restricted data provided to restricted organisations or individuals for restricted purposes National Pupil Database
NHS care.data
Integrated health and social care
Individual census returns
3. Personal Data Data that relate to a living individual who can be identified from that data. For full legal definition see [30] Health records
Individual tax records
Insolvency notices in the London gazette
National Pupil Database
NB These definitions overlap. Personal data can exist in both open and shared data.

This social productivity will help build future economic productivity; in the meantime it will improve people’s lives and it will enhance our democracy. From our analysis it was clear that there was room for improvement…”

Gov.uk quietly disrupts the problem of online identity login


The Guardian: “A new “verified identity” scheme for gov.uk is making it simpler to apply for a new driving licence, passport or to file a tax return online, allowing users to register securely using one log in that connects and securely stores their personal data.
After nearly a year of closed testing with a few thousand Britons, the “Gov.UK Verify” scheme quietly opened to general users on 14 October, expanding across more services. It could have as many as half a million users with a year.
The most popular services are expected to be one for tax credit renewals, and CAP farm information – both expected to have around 100,000 users by April next year, and on their own making up nearly half of the total use.
The team behind the system claim this is a world first. Those countries that have developed advanced government services online, such as Estonia, rely on state identity cards – which the UK has rejected.
“This is a federated model of identity, not a centralised one,” said Janet Hughes, head of policy and engagement at the Government Digital Service’s identity assurance program, which developed and tested the system.
How it works
The Verify system has taken three years to develop, and involves checking a user’s identity against details from a range of sources, including credit reference agencies, utility bills, driving licences and mobile provider bills.
But it does not retain those pieces of information, and the credit checking companies do not know what service is being used. Only a mobile or landline number is kept in order to send verification codes for subsequent logins.
When people subsequently log in, they would have to provide a user ID and password, and verify their identity by entering a code sent to related stored phone number.
To enrol in the system, users have to be over 19, living in the UK, and been resident for over 12 months. A faked passport would not be sufficient: “they would need a very full false ID, and have to not appear on any list of fraudulent identities,” one source at the GDS told the Guardian.
Banks now following gov.uk’s lead
Government developers are confident that it presents a higher barrier to authentication than any other digital service – so that fraudulent transactions will be minimised. That has interested banks, which are understood to be expressing interest in using the same service to verify customer identities through an arms-length verification system.
The government system would not pass on people’s data, but would instead verify that someone is who they claim to be, much like Twitter and Facebook verify users’ identity to log in to third party sites, yet don’t share their users’ data.
The US, Canada and New Zealand have also expressed interest in following up the UK’s lead in the system, which requires separate pieces of verified information about themselves from different sources.
The system then cross-references that verified information with credit reference agencies and other sources, which can include a mobile phone provider, passport, bank account, utility bill or driving licence.
The level of confidence in an individual’s identity is split into four levels. The lowest is for the creation of simple accounts to receive reports or updates: “we don’t need to know who it is, only that it’s the same person returning,” said Hughes.
Level 2 requires that “on the balance of probability” someone is who they say they are – which is the level to which Verify will be able to identify people. Hughes says that this will cover the majority of services.
Level 3 requires identity “beyond reasonable doubt” – perhaps including the first application for a passport – and Level 4 would require biometric information to confirm individual identity.

Privacy Identity Innovation: Innovator Spotlight


pii2014: “Every year, we invite a select group of startup CEOs to present their technologies on stage at Privacy Identity Innovation as part of the Innovator Spotlight program. This year’s conference (pii2014) is taking place November 12-14 in Silicon Valley, and we’re excited to announce that the following eight companies will be participating in the pii2014 Innovator Spotlight:
* BeehiveID – Led by CEO Mary Haskett, BeehiveID is a global identity validation service that enables trust by identifying bad actors online BEFORE they have a chance to commit fraud.
* Five – Led by CEO Nikita Bier, Five is a mobile chat app crafted around the experience of a house party. With Five, you can browse thousands of rooms and have conversations about any topic.
* Glimpse – Led by CEO Elissa Shevinsky, Glimpse is a private (disappearing) photo messaging app just for groups.
* Humin – Led by CEO Ankur Jain, Humin is a phone and contacts app designed to think about people the way you naturally do by remembering the context of your relationships and letting you search them the way you think.
* Kpass – Led by CEO Dan Nelson, Kpass is an identity platform that provides brands, apps and developers with an easy-to-implement technology solution to help manage the notice and consent requirements for the Children’s Online Privacy Protection Act (COPPA) laws.
* Meeco – Led by CEO Katryna Dow, Meeco is a Life Management Platform that offers an all-in-one solution for you to transact online, collect your own personal data, and be more anonymous with greater control over your own privacy.
* TrustLayers – Led by CEO Adam Towvim, TrustLayers is privacy intelligence for big data. TrustLayers enables confident use of personal data, keeping companies secure in the knowledge that the organization team is following the rules.
* Virtru – Led by CEO John Ackerly, Virtru is the first company to make email privacy accessible to everyone. With a single plug-in, Virtru empowers individuals and businesses to control who receives, reviews, and retains their digital information — wherever it travels, throughout its lifespan.
Learn more about the startups on the Innovator Spotlight page…”

Ello


What is Ello?

“Ello is a simple, beautiful, and ad-free social network created by a small group of artists and designers.
We originally built Ello as a private social network. Over time, so many people wanted to join Ello that we built a public version of Ello for everyone to use.

Ad Free

Ello doesn’t sell ads. Nor do we sell data about you to third parties.
Virtually every other social network is run by advertisers. Behind the scenes they employ armies of ad salesmen and data miners to record every move you make. Data about you is then auctioned off to advertisers and data brokers. You’re the product that’s being bought and sold.
Collecting and selling your personal data, reading your posts to your friends, and mapping your social connections for profit is both creepy and unethical. Under the guise of offering a “free” service, users pay a high price in intrusive advertising and lack of privacy.
We also think ads are tacky, that they insult our intelligence and that we’re better without them.
Read more about our no-ad policy here.

Support Ello

Ello is completely free to use.
We occasionally offer special features to our users. If we create a special feature that you really like, you may choose to support Ello by paying a very small amount of money to add that feature to your Ello account.
You never have to pay anything, and you can keep using Ello forever, for free. By choosing to buy a feature now and then for a very small amount of money you support our work and help us make Ello better and better….
Read the manifesto

The Stasi, casinos and the Big Data rush


Book Review by Hannah Kuchler of “What Stays in Vegas” (by Adam Tanner) in the Financial Times: “Books with sexy titles and decidedly unsexy topics – like, say, data – have a tendency to disappoint. But What Stays in Vegas is an engrossing, story-packed takedown of the data industry.

It begins, far from America’s gambling capital, in communist East Germany. The author, Adam Tanner, now a fellow at Harvard’s Institute for Quantitative Social Science, was in the late 1980s a travel writer taking notes on Dresden. What he did not realise was that the Stasi was busy taking notes on him – 50 pages in all – which he found when the files were opened after reunification. The secret police knew where he had stopped to consult a map, to whom he asked questions and when he looked in on a hotel.
Today, Tanner explains: “Thanks to meticulous data gathering from both public documents and commercial records, companies . . . know far more about typical consumers than the feared East German secret police recorded about me.”
Shining a light on how businesses outside the tech sector have become data addicts, Tanner focuses on Las Vegas casinos, which spotted the value in data decades ago. He was given access to Caesar’s Entertainment, one of the world’s largest casino operators. When chief executive Gary Loveman joined in the late 1990s, the former Harvard Business School professor bet the company’s future on harvesting personal data from its loyalty scheme. Rather than wooing the “whales” who spent the most, the company would use the data to decide which freebies were worth giving away to lure in mid-spenders who came back often – a strategy credited with helping the business grow.
The real revelations come when Tanner examines the data brokers’ “Cheez Whiz”. Like the maker of a popular processed dairy spread, he argues, data brokers blend ingredients from a range of sources, such as public records, marketing lists and commercial records, to create a detailed picture of your identity – and you will never quite be able to pin down the origin of any component…
The Big Data rush has gone into overdrive since the global economic crisis as marketers from different industries have sought new methods to grab the limited consumer spending available. Tanner argues that while users have in theory given permission for much of this information to be made public in bits and pieces, increasingly industrial-scale aggregation often feels like an invasion of privacy.
Privacy policies are so long and obtuse (one study Tanner quotes found that it would take a person more than a month, working full-time, to read all the privacy statements they come across in a year), people are unwittingly littering their data all over the internet. Anyway, marketers can intuit what we are like from the people we are connected to online. And as the data brokers’ lists are usually private, there is no way to check the compilers have got their facts right…”

The Crypto-democracy and the Trustworthy


New Paper by Sebastien Gambs, Samuel Ranellucci, and Alain Tapp: “In the current architecture of the Internet, there is a strong asymmetry in terms of power between the entities that gather and process personal data (e.g., major Internet companies, telecom operators, cloud providers, …) and the individuals from which this personal data is issued. In particular, individuals have no choice but to blindly trust that these entities will respect their privacy and protect their personal data. In this position paper, we address this issue by proposing an utopian crypto-democracy model based on existing scientific achievements from the field of cryptography. More precisely, our main objective is to show that cryptographic primitives, including in particular secure multiparty computation, offer a practical solution to protect privacy while minimizing the trust assumptions. In the crypto-democracy envisioned, individuals do not have to trust a single physical entity with their personal data but rather their data is distributed among several institutions. Together these institutions form a virtual entity called the Trustworthy that is responsible for the storage of this data but which can also compute on it (provided first that all the institutions agree on this). Finally, we also propose a realistic proof-of-concept of the Trustworthy, in which the roles of institutions are played by universities. This proof-of-concept would have an important impact in demonstrating the possibilities offered by the crypto-democracy paradigm.”

As Data Overflows Online, Researchers Grapple With Ethics


at The New York Times: “Scholars are exhilarated by the prospect of tapping into the vast troves of personal data collected by Facebook, Google, Amazon and a host of start-ups, which they say could transform social science research.

Once forced to conduct painstaking personal interviews with subjects, scientists can now sit at a screen and instantly play with the digital experiences of millions of Internet users. It is the frontier of social science — experiments on people who may never even know they are subjects of study, let alone explicitly consent.

“This is a new era,” said Jeffrey T. Hancock, a Cornell University professor of communication and information science. “I liken it a little bit to when chemistry got the microscope.”

But the new era has brought some controversy with it. Professor Hancock was a co-author of the Facebook study in which the social network quietly manipulated the news feeds of nearly 700,000 people to learn how the changes affected their emotions. When the research was published in June, the outrage was immediate…

Such testing raises fundamental questions. What types of experiments are so intrusive that they need prior consent or prompt disclosure after the fact? How do companies make sure that customers have a clear understanding of how their personal information might be used? Who even decides what the rules should be?

Existing federal rules governing research on human subjects, intended for medical research, generally require consent from those studied unless the potential for harm is minimal. But many social science scholars say the federal rules never contemplated large-scale research on Internet users and provide inadequate guidance for it.

For Internet projects conducted by university researchers, institutional review boards can be helpful in vetting projects. However, corporate researchers like those at Facebook don’t face such formal reviews.

Sinan Aral, a professor at the Massachusetts Institute of Technology’s Sloan School of Management who has conducted large-scale social experiments with several tech companies, said any new rules must be carefully formulated.

“We need to understand how to think about these rules without chilling the research that has the promise of moving us miles and miles ahead of where we are today in understanding human populations,” he said. Professor Aral is planning a panel discussion on ethics at a M.I.T. conference on digital experimentation in October. (The professor also does some data analysis for The New York Times Company.)

Mary L. Gray, a senior researcher at Microsoft Research and associate professor at Indiana University’s Media School, who has worked extensively on ethics in social science, said that too often, researchers conducting digital experiments work in isolation with little outside guidance.

She and others at Microsoft Research spent the last two years setting up an ethics advisory committee and training program for researchers in the company’s labs who are working with human subjects. She is now working with Professor Hancock to bring such thinking to the broader research world.

“If everyone knew the right thing to do, we would never have anyone hurt,” she said. “We really don’t have a place where we can have these conversations.”…

Selected Readings on Economic Impact of Open Data


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of open data was originally published in 2014.

Open data is publicly available data – often released by governments, scientists, and occasionally private companies – that is made available for anyone to use, in a machine-readable format, free of charge. Considerable attention has been devoted to the economic potential of open data for businesses and other organizations, and it is now widely accepted that open data plays an important role in spurring innovation, growth, and job creation. From new business models to innovation in local governance, open data is being quickly adopted as a valuable resource at many levels.

Measuring and analyzing the economic impact of open data in a systematic way is challenging, and governments as well as other providers of open data seek to provide access to the data in a standardized way. As governmental transparency increases and open data changes business models and activities in many economic sectors, it is important to understand best practices for releasing and using non-proprietary, public information. Costs, social challenges, and technical barriers also influence the economic impact of open data.

These selected readings are intended as a first step in the direction of answering the question of if we can and how we consider if opening data spurs economic impact.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Bonina, Carla. New Business Models and the Values of Open Data: Definitions, Challenges, and Opportunities. NEMODE 3K – Small Grants Call 2013. http://bit.ly/1xGf9oe

  • In this paper, Dr. Carla Bonina provides an introduction to open data and open data business models, evaluating their potential economic value and identifying future challenges for the effectiveness of open data, such as personal data and privacy, the emerging data divide, and the costs of collecting, producing and releasing open (government) data.

Carpenter, John and Phil Watts. Assessing the Value of OS OpenData™ to the Economy of Great Britain – Synopsis. June 2013. Accessed July 25, 2014. http://bit.ly/1rTLVUE

  • John Carpenter and Phil Watts of Ordnance Survey undertook a study to examine the economic impact of open data to the economy of Great Britain. Using a variety of methods such as case studies, interviews, downlad analysis, adoption rates, impact calculation, and CGE modeling, the authors estimates that the OS OpenData initiative will deliver a net of increase in GDP of £13 – 28.5 million for Great Britain in 2013.

Capgemini Consulting. The Open Data Economy: Unlocking Economic Value by Opening Government and Public Data. Capgemini Consulting. Accessed July 24, 2014. http://bit.ly/1n7MR02

  • This report explores how governments are leveraging open data for economic benefits. Through using a compariative approach, the authors study important open data from organizational, technological, social and political perspectives. The study highlights the potential of open data to drive profit through increasing the effectiveness of benchmarking and other data-driven business strategies.

Deloitte. Open Growth: Stimulating Demand for Open Data in the UK. Deloitte Analytics. December 2012. Accessed July 24, 2014. http://bit.ly/1oeFhks

  • This early paper on open data by Deloitte uses case studies and statistical analysis on open government data to create models of businesses using open data. They also review the market supply and demand of open government data in emerging sectors of the economy.

Gruen, Nicholas, John Houghton and Richard Tooth. Open for Business: How Open Data Can Help Achieve the G20 Growth Target.  Accessed July 24, 2014, http://bit.ly/UOmBRe

  • This report highlights the potential economic value of the open data agenda in Australia and the G20. The report provides an initial literature review on the economic value of open data, as well as a asset of case studies on the economic value of open data, and a set of recommendations for how open data can help the G20 and Australia achieve target objectives in the areas of trade, finance, fiscal and monetary policy, anti-corruption, employment, energy, and infrastructure.

Heusser, Felipe I. Understanding Open Government Data and Addressing Its Impact (draft version). World Wide Web Foundation. http://bit.ly/1o9Egym

  • The World Wide Web Foundation, in collaboration with IDRC has begun a research network to explore the impacts of open data in developing countries. In addition to the Web Foundation and IDRC, the network includes the Berkman Center for Internet and Society at Harvard, the Open Development Technology Alliance and Practical Participation.

Howard, Alex. San Francisco Looks to Tap Into the Open Data Economy. O’Reilly Radar: Insight, Analysis, and Reach about Emerging Technologies.  October 19, 2012.  Accessed July 24, 2014. http://oreil.ly/1qNRt3h

  • Alex Howard points to San Francisco as one of the first municipalities in the United States to embrace an open data platform.  He outlines how open data has driven innovation in local governance.  Moreover, he discusses the potential impact of open data on job creation and government technology infrastructure in the City and County of San Francisco.

Huijboom, Noor and Tijs Van den Broek. Open Data: An International Comparison of Strategies. European Journal of ePractice. March 2011. Accessed July 24, 2014.  http://bit.ly/1AE24jq

  • This article examines five countries and their open data strategies, identifying key features, main barriers, and drivers of progress for of open data programs. The authors outline the key challenges facing European, and other national open data policies, highlighting the emerging role open data initiatives are playing in political and administrative agendas around the world.

Manyika, J., Michael Chui, Diana Farrell, Steve Van Kuiken, Peter Groves, and Elizabeth Almasi Doshi. Open Data: Unlocking Innovation and Performance with Liquid Innovation. McKinsey Global Institute. October 2013. Accessed July 24, 2014.  http://bit.ly/1lgDX0v

  • This research focuses on quantifying the potential value of open data in seven “domains” in the global economy: education, transportation, consumer products, electricity, oil and gas, health care, and consumer finance.

Moore, Alida. Congressional Transparency Caucus: How Open Data Creates Jobs. April 2, 2014. Accessed July 30, 2014. Socrata. http://bit.ly/1n7OJpp

  • Socrata provides a summary of the March 24th briefing of the Congressional Transparency Caucus on the need to increase government transparency through adopting open data initiatives. They include key takeaways from the panel discussion, as well as their role in making open data available for businesses.

Stott, Andrew. Open Data for Economic Growth. The World Bank. June 25, 2014. Accessed July 24, 2014. http://bit.ly/1n7PRJF

  • In this report, The World Bank examines the evidence for the economic potential of open data, holding that the economic potential is quite large, despite a variation in the published estimates, and difficulties assessing its potential methodologically. They provide five archetypes of businesses using open data, and provides recommendations for governments trying to maximize economic growth from open data.