Michelle Winowatan

By Alexandra Shaw, Andrew Zahuranec, Andrew Young, Stefaan Verhulst

The Living Library Index–inspired by the Harper’s Index–provides important statistics and highlights global trends in governance innovation. This installment focuses on public perceptions regarding secondary uses of personal data (or the re-use of data initially collected for a different purpose). It provides a summary of societal perspectives toward personal data usage, sharing, and control. It is not meant to be comprehensive–rather, it intends to illustrate conflicting, and often confusing, attitudes toward the re-use of personal data.

Please share any additional, illustrative statistics on data, or other issues at the nexus of technology and governance, with us at info@thelivinglib.org

Data ownership and control

Percentage of Americans who say it is “very important” they control information collected about them: 74% – 2016
Americans who think that today’s privacy laws are not good enough at protecting people’s privacy online: 68% – 2016
Americans who say they have “a lot” of control over how companies collect and use their information: 9% – 2015
In a survey of 507 online shoppers, the number of respondents who indicated they don’t want brands tracking their location: 62% – 2015
In a survey of 507 online shoppers, the amount who “prefer offers that are targeted to where they are and what they are doing:” 60% – 2015
Number of surveyed American consumers willing to provide data to corporations under the following conditions:
- “Data about my social concerns to better connect me with non-profit organizations that advance those causes:” 19% – 2018
- “Data about my DNA to help me uncover any hereditary illnesses:” 21% – 2018
- “Data about my interests and hobbies to receive relevant information and offers from online sellers:” 32% – 2018
- “Data about my location to help me find the fastest route to my destination:” 40% – 2018
- “My email address to receive exclusive offers from my favorite brands:” 56% – 2018

Consumer Attitudes

Academic study participants willing to donate personal data to research if it could lead to public good: 60% – 2014
Academic study participants willing to share personal data for research purposes in the interest of public good: 25% – 2014
Percentage who expect companies to “treat [them] like an individual, not as a member of some segment like ‘millennials’ or ‘suburban mothers:’” 74% – 2018
- Percentage who believe that brands should understand a “consumer’s individual situation (e.g. marital status, age, location, etc.)” when they’re being marketed to: 70% – 2018 Number who are “more annoyed” by companies now compared to 5 years ago: 40% – 2018Percentage worried their data is shared across companies without their permission: 88% – 2018Amount worried about a brand’s ability to track their behavior while on the brand’s website, app, or neither: 75% – 2018
Consumers globally who expect brands to anticipate needs before they arise: 33% – 2018
Surveyed residents of the United Kingdom who identify as:
- “Data pragmatists” willing to share personal data “under the right circumstances:” 58% – 2017
- “Fundamentalists,” who would not share personal data for better services: 24% – 2017
- Respondents who think data sharing is part of participating in the modern economy: 62% – 2018
- Respondents who believe that data sharing benefits enterprises more than consumers: 75% – 2018
- People who want more control over their data that enterprises collect: 84% – 2018
- Percentage “unconcerned” about personal data protection: 18% – 2018
Percentage of Americans who think that government should do more to regulate large technology companies: 55% – 2018
Registered American voters who trust broadband companies with personal data “a great deal” or “a fair amount”: 43% – 2017
Americans who report experiencing a major data breach: 64% – 2017
Number of Americans who believe that their personal data is less secure than it was 5 years ago: 49% – 2019
Amount of surveyed American citizens who consider trust in a company an important factor for sharing data: 54% – 2018

Convenience

Microsoft’s 2015 Consumer Data Value Exchange Report attempts to understand consumer attitudes on the exchange of personal data across the global markets of Australia, Brazil, Canada, Colombia, Egypt, Germany, Kenya, Mexico, Nigeria, Spain, South Africa, United Kingdom and the United States. From their survey of 16,500 users, they find:

The most popular incentives for sharing data are:
- Cash rewards: 64% – 2015
- Significant discounts: 49% – 2015
- Streamlined processes: 29% – 2015
- New ideas: 28% – 2015
Respondents who would prefer to see more ads to get new services: 34% – 2015
Respondents willing to share search terms for a service that enabled fewer steps to get things done: 70% – 2015
Respondents willing to share activity data for such an improvement: 82% – 2015
Respondents willing to share their gender for “a service that inspires something new based on others like them:” 79% – 2015

A 2015 Pew Research Center survey presented Americans with several data-sharing scenarios related to convenience. Participants could respond: “acceptable,” “it depends,” or “not acceptable” to the following scenarios:

Share health information to get access to personal health records and arrange appointments more easily:
- Acceptable: 52% – 2015
- It depends: 20% – 2015
- Not acceptable: 26% – 2015
Share data for discounted auto insurance rates:
- Acceptable: 37% – 2015
- It depends: 16% – 2015
- Not acceptable: 45% – 2015
Share data for free social media services:
- Acceptable: 33% – 2015
- It depends: 15% – 2015
- Not acceptable: 51% – 2015
Share data on smart thermostats for cheaper energy bills:
- Acceptable: 33% – 2015
- It depends: 15% – 2015
- Not acceptable: 51% – 2015

Other Studies

Surveyed banking and insurance customers who would exchange personal data for:
- Targeted auto insurance premiums: 64% – 2019
- Better life insurance premiums for healthy lifestyle choices: 52% – 2019
Surveyed banking and insurance customers willing to share data specifically related to income, location and lifestyle habits to:
- Secure faster loan approvals: 81.3% – 2019
- Lower the chances of injury or loss: 79.7% – 2019
- Receive discounts on non-insurance products or services: 74.6% – 2019
- Receive text alerts related to banking account activity: 59.8% – 2019
- Get saving advice based on spending patterns: 56.6% – 2019
In a survey of over 7,000 members of the public around the globe, respondents indicated:
- They thought “smartphone and tablet apps used for navigation, chat, and news that can access your contacts, photos, and browsing history” is “creepy;” 16% – 2016
- Emailing a friend about a trip to Paris and receiving advertisements for hotels, restaurants and excursions in Paris is “creepy:” 32% – 2016
- A free fitness-tracking device that monitors your well-being and sends a monthly report to you and your employer is “creepy:” 45% – 2016
- A telematics device that allows emergency services to track your vehicle is “creepy:” 78% – 2016
The number of British residents who do not want to work with virtual agents of any kind: 48% – 2017
Americans who disagree that “if companies give me a discount, it is a fair exchange for them to collect information about me without my knowing”: 91% – 2015

Data Brokers, Intermediaries, and Third Parties

Americans who consider it acceptable for a grocery store to offer a free loyalty card in exchange for selling their shopping data to third parties: 47% – 2016
Number of people who know that “searches, site visits and purchases” are reviewed without consent: 55% – 2015
The number of people in 1991 who wanted companies to ask them for permission first before collecting their personal information and selling that data to intermediaries: 93% – 1991
- Number of Americans who “would be very concerned if the company at which their data were stored sold it to another party:” 90% – 2008
- Percentage of Americans who think it’s unacceptable for their grocery store to share their shopping data with third parties in exchange for a free loyalty card: 32% – 2016
Percentage of Americans who think that government needs to do more to regulate advertisers: 64% – 2016
- Number of Americans who “want to have control over what marketers can learn about” them online: 84% – 2015
- Percentage of Americans who think they have no power over marketers to figure out what they’re learning about them: 58% – 2015
Registered American voters who are “somewhat uncomfortable” or “very uncomfortable” with companies like Internet service providers or websites using personal data to recommend stories, articles, or videos: 56% – 2017
Registered American voters who are “somewhat uncomfortable” or “very uncomfortable” with companies like Internet service providers or websites selling their personal information to third parties for advertising purposes: 64% – 2017

Personal Health Data

The Robert Wood Johnson Foundation’s 2014 Health Data Exploration Project Report analyzes attitudes about personal health data (PHD). PHD is self-tracking data related to health that is traceable through wearable devices and sensors. The three major stakeholder groups involved in using PHD for public good are users, companies that track the users’ data, and researchers.

Overall Respondents:
- Percentage who believe anonymity is “very” or “extremely” important: 67% – 2014
- Percentage who “probably would” or “definitely would” share their personal data with researchers: 78% – 2014
- Percentage who believe that they own—or should own—all the data about them, even when it is indirectly collected: 54% – 2014
- Percentage who think they share or ought to share ownership with the company: 30% – 2014
- Percentage who think companies alone own or should own all the data about them: 4% – 2014
- Percentage for whom data ownership “is not something I care about”: 13% – 2014
- Percentage who indicated they wanted to own their data: 75% – 2014
- Percentage who would share data only if “privacy were assured:” 68% – 2014
- People who would supply data regardless of privacy or compensation: 27% – 2014
  - Percentage of participants who mentioned privacy, anonymity, or confidentiality when asked under what conditions they would share their data: 63% – 2014
  - Percentage who would be “more” or “much more” likely to share data for compensation: 56% – 2014
  - Percentage who indicated compensation would make no difference: 38% – 2014
  - Amount opposed to commercial or profit-making use of their data: 13% – 2014
- Percentage of people who would only share personal health data with a guarantee of:
  - Privacy: 57% – 2014
  - Anonymization: 90% – 2014

Surveyed Researchers:
- Percentage who agree or strongly agree that self-tracking data would help provide more insights in their research: 89% – 2014
- Percentage who say PHD could answer questions that other data sources could not: 95% – 2014
- Percentage who have used public datasets: 57% – 2014
- Percentage who have paid for data for research: 19% – 2014
- Percentage who have used self-tracking data before for research purposes: 46% – 2014
- Percentage who have worked with application, device, or social media companies: 23% – 2014
- Percentage who “somewhat disagree” or “strongly disagree” there are barriers that cannot be overcome to using self-tracking data in their research: 82% – 2014

SOURCES:

“2019 Accenture Global Financial Services Consumer Study: Discover the Patterns in Personality”, Accenture, 2019.

“Americans’ Views About Data Collection and Security”, Pew Research Center, 2015.

“Data Donation: Sharing Personal Data for Public Good?”, ResearchGate, 2014.

“Data privacy: What the consumer really thinks,” Acxiom, 2018.

“Exclusive: Public wants Big Tech regulated”, Axios, 2018.

“Consumer data value exchange,” Microsoft, 2015.

“Crossing the Line: Staying on the right side of consumer privacy,” KPMG International Cooperative, 2016.

“How do you feel about the government sharing our personal data? – livechat”, The Guardian, 2017.

“Personal data for public good: using health information in medical research”, The Academy of Medical Sciences, 2006.

“Personal Data for the Public Good: New Opportunities to Enrich Understanding of Individual and Population Health”, Robert Wood Johnson Foundation, Health Data Exploration Project, Calit2, UC Irvine and UC San Diego, 2014.

“Pew Internet and American Life Project: Cloud Computing Raises Privacy Concerns”, Pew Research Center, 2008.

“Poll: Little Trust That Tech Giants Will Keep Personal Data Private”, Morning Consult & Politico, 2017.

“Privacy and Information Sharing”, Pew Research Center, 2016.

“Privacy, Data and the Consumer: What US Thinks About Sharing Data”, MarTech Advisor, 2018.

“Public Opinion on Privacy”, Electronic Privacy Information Center, 2019.

“Selligent Marketing Cloud Study Finds Consumer Expectations and Marketer Challenges are Rising in Tandem”, Selligent Marketing Cloud, 2018.

“The Data-Sharing Disconnect: The Impact of Context, Consumer Trust, and Relevance in Retail Marketing,” Boxever, 2015.

“Microsoft Research reveals understanding gap in the brand-consumer data exchange,” Microsoft Research, 2015.

“Survey: 58% will share personal data under the right circumstances”, Marketing Land: Third Door Media, 2019.

“The state of privacy in post-Snowden America”, Pew Research Center, 2016.

The Tradeoff Fallacy: How Marketers Are Misrepresenting American Consumers And Opening Them Up to Exploitation”, University of Pennsylvania, 2015.

Index: Secondary Uses of Personal Data

Amount of mobile data traffic worldwide. (Ericsson, 2019)

Nearly 30 billion GB

By Michelle Winowatan, Andrew J. Zahuranec, Andrew Young, Stefaan Verhulst, Max Jun Kim

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on the data universe.

Please share any additional, illustrative statistics on data, or other issues at the nexus of technology and governance, with us at info@thelivinglib.org

Internet Traffic:

Percentage of the world’s population that uses the internet: 51.2% (3.9 billion people) – 2018
Number of search processed worldwide by Google every year: at least 2 trillion – 2016
Website traffic worldwide generated through mobile phones: 52.2% – 2018
The total number of mobile subscriptions in the first quarter of 2019: 7.9 billion (addition of 44 million in quarter) – 2019
Amount of mobile data traffic worldwide: nearly 30 billion GB – 2018
Data category with highest traffic worldwide: video (60%) – 2018
Global average of data traffic per smartphone per month: 5.6 GB – 2018
- North America: 7 GB – 2018
- Latin America: 3.1 GB – 2018
- Western Europe: 6.7 GB – 2018
- Central and Eastern Europe: 4.5 GB – 2018
- North East Asia: 7.1 GB – 2018
- Southeast Asia and Oceania: 3.6 GB – 2018
- India, Nepal, and Bhutan: 9.8 GB – 2018
- Middle East and Africa: 3.0 GB – 2018
Time between the creation of each new bitcoin block: 9.27 minutes – 2019

Streaming Services:

Total hours of video streamed by Netflix users every minute: 97,222 – 2017
Hours of YouTube watched per day: over 1 billion – 2018
Number of tracks uploaded to Spotify every day: Over 20,000 – 2019
Number of Spotify’s monthly active users: 232 million – 2019
Spotify’s total subscribers: 108 million – 2019
Spotify’s hours of content listened: 17 billion – 2019
Total number of songs on Spotify’s catalog: over 30 million – 2019
Apple Music’s total subscribers: 60 million – 2019
Total number of songs on Apple Music’s catalog: 45 million – 2019

Social Media:

Number of snaps shared by Snapchat users every day: Over 3.5 billion – 2017
Number of tweets sent every day: 500 million – 2019
Number of Instagram users: over 700 million – 2017
Amount of data created by Facebook in a day: 4,000,000 GB – 2014
Number of LinkedIn members: 645 million – 2019
LinkedIn sign-up rate: 2 members per second – 2019
Number of photos and videos shared on Instagram every day: 95 million – 2019
Tinder dates per week: 1 million – 2019
Total matches on Tinder: over 30 billion – 2019
Most popular month on Tinder in the US: August – 2018
- Day: Monday – 2018
Time of day: 9 PM EST – 2018

Calls and Messaging:

Estimated robocalls made in the US: 47.8 billion – 2018
Number of messages sent over WhatsApp each day: 65 billion – 2018
Minutes of voice and video calls made on WhatsApp each day: 2 billion – 2018
Top 3 most popular messaging apps worldwide: WhatsApp, Facebook Messenger, WeChat – 2019
Worldwide email users: 2.943 billion – 2019
Number of emails sent/received per day: 246.5 billion – 2019

Retail/Financial Transaction:

Number of packages shipped by Amazon in a year: 5 billion – 2017
Total value of payments processed by Venmo in a year: USD 62 billion – 2019
Based on an independent analysis of public transactions on Venmo in 2017:
- Number of public transactions on Venmo: 207,984,218 – 2017
- Top 5 most frequent last name on Venmo: Smith, Johnson, Lee, Williams, Brown
- Number of transactions that involved pizza or the pizza emoji: 2,979,619 – 2017
- Number of transactions involving house and money with wings emoji or rent: 3,020,484
Based on a non-representative survey of 2,436 US consumers between the ages of 21 and 72 on P2P platforms:
- The average volume of transactions handled by Venmo: USD 64.2 billion – 2019
- The average volume of transactions handled by Zelle: USD 122.0 billion – 2019
- The average volume of transactions handled by PayPal: USD 141.8 billion – 2019
- Platform with the highest percent adoption among all consumers: PayPal (48%) – 2019

Internet of Things:

Number of connected IoT devices worldwide: 8.3 billion – 2018
Number of new devices connected to the Internet every second: 127 – 2017
Number of wearable devices: 526 million – 2017
Based on aggregated and anonymized data of Fitbit users from January 1, 2018 – 2018
- Total steps taken: 27 trillion – 2018
- Total hours slept: 12 billion – 2018
- Total active minutes: 119 billion – 2018
- Top 5 countries/territories with most steps: Hong Kong, Spain, Ireland, Sweden, Germany – 2018
- Top 5 countries that get the most sleep: Finland, New Zealand, Ireland, Belgium, Netherlands – 2018
- Top 5 US locales with the lowest resting heart rate: Bend, OR; Santa Barbara, Santa Maria, and San Luis Obispo, CA; Twin Falls, ID; Monterey-Salinas, CA; Juneau, AK – 2018
Amount of data produced by an autonomous car in a one and a half hour of driving: – 4,000 GB

Sources:

Al-Heeti, Abrar. “WhatsApp: 65B Messages Sent Each Day, and More than 2B Minutes of Calls.” CNET, May 1, 2018. https://www.cnet.com/news/whatsapp-65-billion-messages-sent-each-day-and-more-than-2-billion-minutes-of-calls/.
Bhuiyan, Johana. “Uber Powered Four Billion Rides in 2017. It Wants to Do More — and Cheaper — in 2018.” Vox, January 5, 2018. https://www.vox.com/2018/1/5/16854714/uber-four-billion-rides-coo-barney-harford-2018-cut-costs-customer-service.
Blockchain Staff. “Bitcoin Currency Statistics.” Blockchain.com, August 2019. https://www.blockchain.com/stats.
Carman, Ashley. “Amazon Shipped over 5 Billion Items Worldwide through Prime in 2017.” The Verge, January 2, 2018. https://www.theverge.com/2018/1/2/16841786/amazon-prime-2017-users-ship-five-billion.
Cisco®. “Cisco Visual Networking Index: Global Mobile Data Traffic Forecast Update, 2017–2022 White Paper.” Cisco, February 18, 2019. https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white-paper-c11-738429.html.
Clement, J. “Mobile Share of Website Visits Worldwide 2018.” Statistica, July 22, 2019. https://www.statista.com/statistics/241462/global-mobile-phone-website-traffic-share/.
———. “Most Popular Messaging Apps 2019.” Statistica, August 9, 2019. https://www.statista.com/statistics/258749/most-popular-global-mobile-messenger-apps/.
Desjardins, Jeff. “How Much Data Is Generated Each Day?” World Economic Forum, April 17, 2019. https://www.weforum.org/agenda/2019/04/how-much-data-is-generated-each-day-cf4bddf29f/.
Do Thi Duc, Hang. “PUBLIC BY DEFAULT – Venmo Stories of 2017.” Public By Default FYI, 2018. https://publicbydefault.fyi.
Dwyer, Erin. “2017 on Netflix – A Year in Bingeing.” Netflix Media Center, December 11, 2017. https://media.netflix.com/en/press-releases/2017-on-netflix-a-year-in-bingeing.
Fisher, Christine. “Apple Music Now Has 60 Million Paid Subscribers.” Engadget, June 27, 2019. https://www.engadget.com/2019/06/27/apple-music-60-million-paid-subscribers/.
Instagram. “700 Million.” Instagram Press (blog), April 26, 2017. https://instagram-press.com/blog/2017/04/26/700-million/.
Jonsson, Peter, Stephen Carson, Andres Torres, Per Lindberg, Kati Öhman, Athanasios Karapantelakis, Shamil Bajgin, et al. “Ericsson Mobility Report.” Stockholm, Sweden: Ericsson, 2019. https://www.ericsson.com/49d1d9/assets/local/mobility-report/documents/2019/ericsson-mobility-report-june-2019.pdf.
Lasse Lueth, Knud. “State of the IoT 2018: Number of IoT Devices Now at 7B – Market Accelerating,” August 8, 2018. https://iot-analytics.com/state-of-the-iot-update-q1-q2-2018-number-of-iot-devices-now-7b/.
Levenson, Josh, and Parker Hall. “Apple Music vs. Spotify.” Digital Trends, August 7, 2019. https://www.digitaltrends.com/music/apple-music-vs-spotify/.
LinkedIn. “About Us.” LinkedIn, 2019. https://news.linkedin.com/about-us.
Patel, Mark, Jason Shangkuan, and Christopher Thomas. “What’s New with the Internet of Things? | McKinsey.” McKinsey & Company, May 2017. https://www.mckinsey.com/industries/semiconductors/our-insights/whats-new-with-the-internet-of-things.
Trefis Research Team. “Estimating Lyft’s Valuation.” Trefis, 2019. https://dashboards.trefis.com/no-login-required/zrRBRShU/Estimating-Lyft’s-Valuation.
Rooney, Kate. “PayPal’s Venmo Had a Break-out Quarter with Payments Surging 80%.” CNBC, January 31, 2019. https://www.cnbc.com/2019/01/31/venmo-had-a-break-out-quarter-but-wont-make-money-for-paypal-until-at-mid-2019–.html.
Shevlin, Ron. “Fintech Adoption in the US: The Opportunity for Banks and Credit Unions.” Scottsdale, AZ: Cornerstone Advisors, 2018. https://www.q2ebanking.com/wp-content/uploads/2019/01/20181107-Q2-Fintech-Adoption-Index.pdf.
Smith. “Fitbit’s Fittest: The Countries (And Cities) That Stepped It Up and Slept More In 2018.” Fitbit Blog, January 12, 2019. https://blog.fitbit.com/fitbit-year-in-review-2018/.
Snap, Inc. “Snap Inc. Reports Fourth Quarter and Full Year 2017 Results.” Snap, February 6, 2018. https://investor.snap.com/~/media/Files/S/Snap-IR/reports-and-presentations/q4-17-earnings-slides.pdf.
Spotify. “Music – FAQ.” Spotify, 2019. https://artists.spotify.com/faq/music.
———. “Spotify Reports Second Quarter 2019 Earnings.” Spotify, July 31, 2019. https://newsroom.spotify.com/2019-07-31/spotify-reports-second-quarter-2019-earnings/.
Sullivan, Danny. “Google Now Handles at Least 2 Trillion Searches per Year.” Search Engine Land, May 24, 2016. https://searchengineland.com/google-now-handles-2-999-trillion-searches-per-year-250247.
The Radicati Group, Inc. “Email Statistics Report, 2015-2019: Executive Summary.” The Radicati Group, Inc, March 2015. https://www.radicati.com/wp/wp-content/uploads/2015/02/Email-Statistics-Report-2015-2019-Executive-Summary.pdf.
Tinder Press Team. “Tinder Press and Brand Assets.” Tinder, 2019. https://tinder.com.
Tinder Staff. “This Is What Happened On Tinder In 2018.” Swipe Life, December 5, 2018. https://swipelife.tinder.com/post/tinder-2018.
Twitter, Inc. “Twitter for Business.” Twitter, 2019. https://business.twitter.com/en.html.
Wiener, Janet, and Nathan Bronson. “Facebook’s Top Open Data Problems.” Facebook Research (blog), October 22, 2014. https://research.fb.com/blog/2014/10/facebook-s-top-open-data-problems/.
Winter, Kathy. “For Self-Driving Cars, There’s Big Meaning Behind One Big Number: 4 Terabytes.” Intel Newsroom, April 14, 2017. https://newsroom.intel.com/editorials/self-driving-cars-big-meaning-behind-one-number-4-terabytes/.
YouMail. “YouMail Robocall Index: July 2019 Nationwide Robocall Data.” Robocall Index, 2019. https://robocallindex.com/.
YouTube Press Team. “Press – YouTube.” YouTube, August 2019. https://www.youtube.com/yt/about/press/.
Zavazava, Cosmas, Rati Skhirtladze, Vanessa Gray, Esperanza Magpantay, Daniela Pokorna, Martin Schaaper, and Ivan Vallejo. “Measuring the Information Society Report 2018 – Volume 1.” Geneva, Switzerland: International Telecommunication Union, 2018. https://www.itu.int/en/ITU-D/Statistics/Documents/publications/misr2018/MISR-2018-Vol-1-E.pdf.

Index: The Data Universe 2019

/baɪˈlɪŋgwəl/

Practitioners across disciplines who possess both domain knowledge and data science expertise.

The Governance Lab (GovLab) just launched the 100 Questions Initiative, “an effort to identify the most important societal questions whose answers can be found in data and data science if the power of data collaboratives is harnessed.”

The initiative will seek to identify questions that could help unlock the potential of data and data science in solving various global and domestic issues, including but not limited to, climate change, economic inequality, and migration. These questions will be sourced from individuals who have expertise in both a public issue and data science or what The GovLab calls “bilinguals.”

Tom Kalil, the Chief Innovation Officer at Schmidt Futures, argues that the emergent use of data science and machine learning in the public sector will increase the demand for individuals “who speak data science and social sector.”

Similarly, within the business context, David Meer wrote that “being bilingual isn’t just a matter of native English speakers learning how to conjugate verbs in French or Spanish. Rather, it’s important that businesses cultivate talent that can simultaneously speak the language of advanced data analysis and nuts-and-bolts business operations. As data analysis becomes a more prevalent and powerful lever for strategy and growth, organizations increasingly need bilinguals to form the bridge between the work of advanced data scientists and business decision makers.”

For more info, visit www.the100questions.org

Bilingual

/ˈdɪʤətəl ˈsɜrfdəm/

A condition where consumers give up their personal and private information in order to be able to use a particular product or service.

Serfdom is a system of forced labor that exists in a feudalistic society. It was very common in Europe during the medieval age. In this system, serfs or peasants do a variety of labor for their lords in exchange for protection from bandits and a small piece of land that they can cultivate for themselves. Serfs are also required to pay some form of tax often in the form of chickens or crops yielded from their piece of land.

Hassan Khan in The Next Web points out that the decline of property ownership is indicative that we are living in digital serfdom. In an article he says:

“The percentage of households without a car is increasing. Ride-hailing services have multiplied. Netflix boasts over 188 million subscribers. Spotify gains ten million paid members every five to six months.
“The model of “impermanence” has become the new normal. But there’s still one place where permanence finds its home, with over two billion active monthly users, Facebook has become a platform of record for the connected world. If it’s not on social media, it may as well have never happened.”

Joshua A. T. Fairfield elaborates this phenomenon in his book Owned: Property, Privacy, and the New Digital Serfdom. Fairfield discusses his book in an article in The Conversation, stating that:

“The issue of who gets to control property has a long history. In the feudal system of medieval Europe, the king owned almost everything, and everyone else’s property rights depended on their relationship with the king. Peasants lived on land granted by the king to a local lord, and workers didn’t always even own the tools they used for farming or other trades like carpentry and blacksmithing.
[…]
“Yet the expansion of the internet of things seems to be bringing us back to something like that old feudal model, where people didn’t own the items they used every day. In this 21st-century version, companies are using intellectual property law – intended to protect ideas – to control physical objects consumers think they own.”

In other words, Fairfield is suggesting that the devices and services that we use—iPhones, Fitbits, Roomba, digital door locks, Spotify, Uber, and many more—are constantly capturing data about behaviors. By using these products, consumers have no choice but to trade their personal data in order to access the full functionalities of these devices or services. This data is used by private corporations for targeted advertisement, among others. This system of digital serfdom binds consumers to private corporations that dictate the terms of use for their products or services.

Janet Burns wrote about Alex Rosenblat’s UBERLAND: How Algorithms Are Rewriting The Rules Of Work and gave some examples of how algorithms use personal data to manipulate consumers’ behaviors:

“For example, algorithms in control of assigning and pricing rides have often surprised drivers and riders, quietly taking into account other traffic in the area, regionally adjusted rates, and data on riders and drivers themselves.
“In recent years, we’ve seen similar adjustments happen behind the scenes in online shopping, as UBERLAND points out: major retailers have tweaked what price different customers see for the same item based on where they live, and how feasibly they could visit a brick-and-mortar store for it.”

To conclude, an excerpt from Fairfield’s book cautions:

“In the coming decade, if we do not take back our ownership rights, the same will be said of our self-driving cars and software-enabled homes. We risk becoming digital peasants, owned by software and advertising companies, not to mention overreaching governments.”

Sources and Further Readings:

Fairfield, Joshua A. T. Owned: Property, Privacy, and the New Digital Serfdom. Cambridge Press, 2017.
Fairfield, Joshua A.T. “The ‘internet of things’ is sending us back to the Middle Ages.” The Conversation, September 6, 2017.
Burns, Janet. “Gigs And AI Are Driving Us Into Digital Servitude.” Forbes, October 28, 2018.
Khan, Hassan. “We’re living in digital serfdom — trading privacy for convenience.” The Next Web, November 10, 2018.

Digital Serfdom

/sɛlf-ˈsɑvrən aɪˈdɛntəti/

A decentralized identification mechanism that gives individuals control over what, when, and to whom their personal information is shared.

An identification document (ID) is a crucial part of every individual’s life, in that it is often a prerequisite for accessing a variety of services—ranging from creating a bank account to enrolling children in school to buying alcoholic beverages to signing up for an email account to voting in an election—and also a proof of simply being. This system poses fundamental problems, which a field report by The GovLab on Blockchain and Identity frames as follows:

“One of the central challenges of modern identity is its fragmentation and variation across platform and individuals. There are also issues related to interoperability between different forms of identity, and the fact that different identities confer very different privileges, rights, services or forms of access. The universe of identities is vast and manifold. Every identity in effect poses its own set of challenges and difficulties—and, of course, opportunities.”

A report published in New America echoed this point, by arguing that:

“Societally, we lack a coherent approach to regulating the handling of personal data. Users share and generate far too much data—both personally identifiable information (PII) and metadata, or “data exhaust”—without a way to manage it. Private companies, by storing an increasing amount of PII, are taking on an increasing level of risk. Solution architects are recreating the wheel, instead of flying over the treacherous terrain we have just described.”

SSI is dubbed as the solution for those identity problems mentioned above. Identity Woman, a researcher and advocate for SSI, goes even further by arguing that generating “a digital identity that is not under the control of a corporation, an organization or a government” is essential “in pursuit of social justice, deep democracy, and the development of new economies that share wealth and protect the environment.”

To inform the analysis of blockchain-based Self-Sovereign Identity (SSI), The GovLab report argues that identity is “a process, not a thing” and breaks it into a 5-stage lifecycle, which are provisioning, administration, authentication, authorization, and auditing/monitoring. At each stage, identification serves a unique function and poses different challenges.

With SSI, individuals have full control over how their personal information is shared, who gets access to it, and when. The New America report summarizes the potential of SSI in the following paragraphs:

“We believe that the great potential of SSI is that it can make identity in the digital world function more like identity in the physical world, in which every person has a unique and persistent identity which is represented to others by means of both their physical attributes and a collection of credentials attested to by various external sources of authority.”

[…]

“SSI, in contrast, gives the user a portable, digital credential (like a driver’s license or some other document that proves your age), the authenticity of which can be securely validated via cryptography without the recipient having to check with the authority that issued it. This means that while the credential can be used to access many different sites and services, there is no third-party broker to track the services to which the user is authenticating. Furthermore, cryptographic techniques called “zero-knowledge proofs” (ZKPs) can be used to prove possession of a credential without revealing the credential itself. This makes it possible, for example, for users to prove that they are over the age of 21 without having to share their actual birth dates, which are both sensitive information and irrelevant to a binary, yes-or-no ID transaction.”

Some case studies on the application of SSI in the real world presented on The GovLab Blockchange website include a government-issued self-sovereign ID using blockchain technology in the city of Zug in Switzerland; a mobile election voting platform, secured via smart biometrics, real-time ID verification and the blockchain for irrefutability piloted in West Virginia; and a blockchain-based land and property transaction/registration in Sweden.

Nevertheless, on the hype of this new and emerging technology, the authors write:

“At their core, blockchain technologies offer new capacity for increasing the immutability, integrity, and resilience of information capture and disclosure mechanisms, fostering the potential to address some of the information asymmetries described above. By leveraging a shared and verified database of ledgers stored in a distributed manner, blockchain seeks to redesign information ecosystems in a more transparent, immutable, and trusted manner. Solving information asymmetries may turn out to be the real contribution of blockchain, and this—much more than the current enthusiasm over virtual currencies—is the real reason to assess its potential.

“It is important to emphasize, of course, that blockchain’s potential remains just that for the moment—only potential. Considerable hype surrounds the emerging technology, and much remains to be done and many obstacles to overcome if blockchain is to achieve the enthusiasts’ vision of “radical transparency.”

Further readings:

Allen, Christopher. “The Path to Self-Sovereign Identity.” Coindesk, 2016.
Apostle, Julia. “Lessons from Cambridge Analytica: one way to protect your data.” Financial Times, 2018.
Graglia, Michael, Christopher Mellon, and Tim Robustelli. “The Nail Finds a Hammer: Self-Sovereign Identity, Design Principles, and Property Rights in the Developing World.” New America, 2018.
Identity Woman, Kaliya. “Humanizing Technology.” Open Democracy, 2017.
Verhulst, Stefaan G., and Andrew Young. “On the Emergent Use of Distributed Ledger Technologies for Identity Management.” The GovLab, 2018.

Self-Sovereign Identity

Academic study participants willing to donate personal data to research if it could lead to public good (Skatova, Ng, and Goulding, 2014)

60%

/greɪ ˈdeɪtə/

Data accumulated by an institution for operational purposes that does not fall under any traditional data protection policies.

Organizations across all sectors accumulate a massive amount of data just by virtue of operating alone, and universities are among such organizations. In a paper, Christine L. Borgman categorizes these as grey data and further suggested that universities should take a lead in demonstrating stewardship of these data, which include student applications, faculty dossier, registrar records, ID card data, security cameras, and many others.

“Some of these data are collected for mandatory reporting obligations such as enrollments, diversity, budgets, grants, and library collections. Many types of data about individuals are collected for operational and design purposes, whether for instruction, libraries, travel, health, or student services.”
(Borgman, p. 380)

Grey data typically does not fall under traditional data protection policies such as Health Insurance Portability and Accountability Act (HIPAA), Family Educational Rights and Privacy Act (FERPA), or Institutional Review Boards. Consequently, there are a lot of debates about how to use (or misuse) them. Borgman points out that universities have been “exploiting these data for research, learning analytics, faculty evaluation, strategic decisions, and other sensitive matters.” On top of this, for-profit companies “are besieging universities with requests for access to data or for partnerships to mine them.”

Recognizing both the value of data and the risks arising from the accumulation of grey data, Borgman proposes a model of Data Stewardship by drawing on the practices of data protection at the University of California which concern information security, data governance, and cyber risk.

This model is an example of a good Data Stewardship practice that the GovLab is advocating amidst the rise of public-private collaboration in leveraging data for public good.

The GovLab’s Data Stewards website presents the need for such practice as follows:

“With these new practices of data collaborations come the need to reimagine roles and responsibilities to steer the process of using private data, and the insights it can generate, to address some of society’s biggest questions and challenges: Data Stewards.
“Today, establishing and sustaining these new collaborative and accountable approaches requires significant and time-consuming effort and investment of resources for both data holders on the supply side, and institutions that represent the demand. By establishing Data Stewardship as a function, recognized within the private sector as a valued responsibility, the practice of Data Collaboratives can become more predictable, scaleable, sustainable and de-risked.”

Sources and Further Readings:

Borgman, Christine L. “Open Data, Grey Data, and Stewardship: Universities at the Privacy Frontier.” ArXiv, 2018.
Young, Andrew. “About the Data Stewards Network.” Medium, October 28, 2018.

Grey Data

/rɑwəfɪˈkeɪʃən/

A process of making datasets raw in three steps: reformatting, cleaning, and ungrounding (Denis and Goeta).

Hundreds of thousands of datasets are now made available via numerous channels from both public and private domains. Based on the stage of processing, these datasets can be categorized as either raw data or processed data. According to an Open Government Data principle, raw data (or primary data) “are published as collected at the source, with the finest possible level of granularity, not in aggregate or modified forms.” While processed data is data that has been through some sort of adulteration, categorization, codification, aggregation, and other similar processes.

A large amount of data that is made publicly available come in processed form. For example, population, trade, and budget data are often presented in aggregated forms, preventing researchers from understanding the underlying stories behind these data, such as the differences in patterns or trends when gender, location, or other variables come into factor. Therefore, a rawification process is oftentimes needed in order for a dataset to be useful for a more detailed, secondary, and valuable analysis.

Jérôme Denis and Samuel Goëta define ‘rawification’ as a process of reformatting, cleaning, and ungrounding data in order to obtain a truly ‘raw’ datasets.

According to Denis and Goëta, reformatting data means making sure that data that has been opened can also be easily readable by the users. This is usually achieved by reformatting the data so that it can be read and manipulated by most processing programs. One of the most commonly used formats is CSV (Comma Separated Values).

The next step in a rawification process is cleaning. In this stage, cleaning means correcting mistakes within the datasets, which include but are not limited to, redundancies and incoherence. In many cases, datasets can have multiple entries for the same item, for example ‘New York University’ and ‘NYU’ might be interpreted as two different entities, or ‘the GovLab’ and ‘the Governance Lab’ might experience a similar issue. Cleaning helps address issues like this.

The final step in a rawification process is ungrounding, which means taking out any ties or links from previous data use. Such ties include color coding, comments, and subcategories. This way the datasets can be purely raw and free of all associations and bias.

Opening up data is a clear step for increasing public access to information held within institutions. However, in order to ensure the utility of that data for those accessing it, a rawification process will likely be necessary.

Additional resources:

Denis, Jérôme, & Goëta, Samuel. “Rawification and the careful generation of open government data.” Social Studies of Science, 47, no. 5 (2017): 604–629.
Denis, Jérôme, & Goëta, Samuel. “Exploration, Extraction and ‘Rawification’.” In The Shaping of Transparency in the Back Rooms of Open Data (SSRN Scholarly Paper No. ID 2403069). Rochester, NY: Social Science Research Network, 2014.

Rawification

Open data demand in New York City in fiscal year 2018 measured by unique users (NYC Annual Open Data Report 2018).

1,000,000+