New paper by Marthews, Alex and Tucker, Catherine: “This paper uses data from Google Trends on search terms from before and after the surveillance revelations of June 2013 to analyze whether Google users’ search behavior shifted as a result of an exogenous shock in information about how closely their internet searches were being monitored by the U. S. government. We use data from Google Trends on search volume for 282 search terms across eleven different countries. These search terms were independently rated for their degree of privacy-sensitivity along multiple dimensions. Using panel data, our result suggest that cross-nationally, users were less likely to search using search terms that they believed might get them in trouble with the U. S. government. In the U. S., this was the main subset of search terms that were affected. However, internationally there was also a drop in traffic for search terms that were rated as personally sensitive. These results have implications for policy makers in terms of understanding the actual effects on search behavior of disclosures relating to the scale of government surveillance on the Internet and their potential effects on international competitiveness.“
Index: Privacy and Security
The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on privacy and security and was originally published in 2014.
Globally
- Percentage of people who feel the Internet is eroding their personal privacy: 56%
- Internet users who feel comfortable sharing personal data with an app: 37%
- Number of users who consider it important to know when an app is gathering information about them: 70%
- How many people in the online world use privacy tools to disguise their identity or location: 28%, or 415 million people
- Country with the highest penetration of general anonymity tools among Internet users: Indonesia, where 42% of users surveyed use proxy servers
- Percentage of China’s online population that disguises their online location to bypass governmental filters: 34%
In the United States
Over the Years
- In 1996, percentage of the American public who were categorized as having “high privacy concerns”: 25%
- Those with “Medium privacy concerns”: 59%
- Those who were unconcerned with privacy: 16%
- In 1998, number of computer users concerned about threats to personal privacy: 87%
- In 2001, those who reported “medium to high” privacy concerns: 88%
- Individuals who are unconcerned about privacy: 18% in 1990, down to 10% in 2004
- How many online American adults are more concerned about their privacy in 2014 than they were a year ago, indicating rising privacy concerns: 64%
- Number of respondents in 2012 who believe they have control over their personal information: 35%, downward trend for 7 years
- How many respondents in 2012 continue to perceive privacy and the protection of their personal information as very important or important to the overall trust equation: 78%, upward trend for seven years
- How many consumers in 2013 trust that their bank is committed to ensuring the privacy of their personal information is protected: 35%, down from 48% in 2004
Privacy Concerns and Beliefs
- How many Internet users worry about their privacy online: 92%
- Those who report that their level of concern has increased from 2013 to 2014: 7 in 10
- How many are at least sometimes worried when shopping online: 93%, up from 89% in 2012
- Those who have some concerns when banking online: 90%, up from 86% in 2012
- Number of Internet users who are worried about the amount of personal information about them online: 50%, up from 33% in 2009
- Those who report that their photograph is available online: 66%
- Their birthdate: 50%
- Home address: 30%
- Cell number: 24%
- A video: 21%
- Political affiliation: 20%
- Those who report that their photograph is available online: 66%
- Consumers who are concerned about companies tracking their activities: 58%
- Those who are concerned about the government tracking their activities: 38%
- How many users surveyed felt that the National Security Association (NSA) overstepped its bounds in light of recent NSA revelations: 44%
- Respondents who are comfortable with advertisers using their web browsing history to tailor advertisements as long as it is not tied to any other personally identifiable information: 36%, up from 29% in 2012
- Percentage of voters who do not want political campaigns to tailor their advertisements based on their interests: 86%
- Percentage of respondents who do not want news tailored to their interests: 56%
- Percentage of users who are worried about their information will be stolen by hackers: 75%
- Those who are worried about companies tracking their browsing history for targeted advertising: 54%
- How many consumers say they do not trust businesses with their personal information online: 54%
- Top 3 most trusted companies for privacy identified by consumers from across 25 different industries in 2012: American Express, Hewlett Packard and Amazon
- Most trusted industries for privacy: Healthcare, Consumer Products and Banking
- Least trusted industries for privacy: Internet and Social Media, Non-Profits and Toys
- Respondents who admit to sharing their personal information with companies they did not trust in 2012 for reasons such as convenience when making a purchase: 63%
- Percentage of users who say they prefer free online services supported by targeted ads: 61%
- Those who prefer paid online services without targeted ads: 33%
- How many Internet users believe that it is not possible to be completely anonymous online: 59%
- Those who believe complete online anonymity is still possible: 37%
- Those who say people should have the ability to use the Internet anonymously: 59%
- Percentage of Internet users who believe that current laws are not good enough in protecting people’s privacy online: 68%
- Those who believe current laws provide reasonable protection: 24%
Security Related Issues
- How many have had an email or social networking account compromised or taken over without permission: 21%
- Those who have been stalked or harassed online: 12%
- Those who think the federal government should do more to act against identity theft: 74%
- Consumers who agree that they will avoid doing business with companies who they do not believe protect their privacy online: 89%
- Among 65+ year old consumers: 96%
Privacy-Related Behavior
- How many mobile phone users have decided not to install an app after discovering the amount of information it collects: 54%
- Number of Internet users who have taken steps to remove or mask their digital footprint (including clearing cookies, encrypting emails, and using virtual networks to mask their IP addresses): 86%
- Those who have set their browser to disable cookies: 65%
- Number of users who have not allowed a service to remember their credit card information: 73%
- Those who have chosen to block an app from accessing their location information: 53%
- How many have signed up for a two-step sign-in process: 57%
- Percentage of Gen-X (33-48 year olds) and Millennials (18-32 year olds) who say they never change their passwords or only change them when forced to: 41%
- How many report using a unique password for each site and service: 4 in 10
- Those who use the same password everywhere: 7%
Sources
- “2012 Most Trusted Companies for Privacy,” Ponemon Institute, January 28, 2013.
- “Fortinet 2014 Privacy Survey Reveals Gen-X and Millennial Attitudes Surrounding Passwords, Personal Data, Email Snooping and Online Marketing Practices” Fortinet, Feb 24, 2014.
- “Global Privacy Survey 2013,” MEFMobile, 2013.
- “Internet Security Priorities,” Computer and Communications Industry Association, December 20, 2013.
- Kiss, Jemima. “Privacy Tools used by 28% of the online world, research finds,” The Guardian, 21 January 2014.
- Kumaraguru, Ponnurangam, and Lorrie Faith Cranor. “Privacy indexes: a survey of Westin’s studies.” Institute for Software Research International, Carnegie Mellon University, and Carnegie Mellon CyLab. December 2005.
- “Most Trusted Retail Banks for Privacy,” Ponemon Institute, October 10, 2013.
- “TRUSTe 2014 US Consumer Confidence Privacy Report,” TRUSTe, 2014.
- Rainie, Lee, Kiesler, Sara, Kang, Ruogu, and Madden, Mary. “Anonymity, Privacy, and Security Online,” PewResearch Internet Project, September 05, 2013.
- Smith, Aaron and Mary Madden. “Privacy and Data Management on Mobile Devices,” PewResearch Internet Project, September 5, 2012.
- Turow, Joseph, Carpini, Michael, Draper, Nora, and Rowan Howard-Williams. “Americans Roundly Reject Tailored Political Advertising,” Annenberg School for Communication, University of Pennsylvania.
How Twitter Could Help Police Departments Predict Crime
Eric Jaffe in Atlantic Cities: “Initially, Matthew Gerber didn’t believe Twitter could help predict where crimes might occur. For one thing, Twitter’s 140-character limit leads to slang and abbreviations and neologisms that are hard to analyze from a linguistic perspective. Beyond that, while criminals occasionally taunt law enforcement via Twitter, few are dumb or bold enough to tweet their plans ahead of time. “My hypothesis was there was nothing there,” says Gerber.
But then, that’s why you run the data. Gerber, a systems engineer at the University of Virginia’s Predictive Technology Lab, did indeed find something there. He reports in a new research paper that public Twitter data improved the predictions for 19 of 25 crimes that occurred early last year in metropolitan Chicago, compared with predictions based on historical crime patterns alone. Predictions for stalking, criminal damage, and gambling saw the biggest bump…..
Of course, the method says nothing about why Twitter data improved the predictions. Gerber speculates that people are tweeting about plans that correlate highly with illegal activity, as opposed to tweeting about crimes themselves.
Let’s use criminal damage as an example. The algorithm identified 700 Twitter topics related to criminal damage; of these, one topic involved the words “united center blackhawks bulls” and so on. Gather enough sports fans with similar tweets and some are bound to get drunk enough to damage public property after the game. Again this scenario extrapolates far more than the data tells, but it offers a possible window into the algorithm’s predictive power.
The map on the left shows predicted crime threat based on historical patterns; the one on the right includes Twitter data. (Via Decision Support Systems)
From a logistical standpoint, it wouldn’t be too difficult for police departments to use this method in their own predictions; both the Twitter data and modeling software Gerber used are freely available. The big question, he says, is whether a department used the same historical crime “hot spot” data as a baseline for comparison. If not, a new round of tests would have to be done to show that the addition of Twitter data still offered a predictive upgrade.
There’s also the matter of public acceptance. Data-driven crime prediction tends to raise any number of civil rights concerns. In 2012, privacy advocates criticized the FBI for a similar plan to use Twitter for crime predictions. In recent months the Chicago Police Department’s own methods have been knocked as a high-tech means of racial profiling. Gerber says his algorithms don’t target any individuals and only cull data posted voluntarily to a public account.”
The data gold rush
Neelie KROES (European Commission): “Nearly 200 years ago, the industrial revolution saw new networks take over. Not just a new form of transport, the railways connected industries, connected people, energised the economy, transformed society.
Now we stand facing a new industrial revolution: a digital one.
With cloud computing its new engine, big data its new fuel. Transporting the amazing innovations of the internet, and the internet of things. Running on broadband rails: fast, reliable, pervasive.
My dream is that Europe takes its full part. With European industry able to supply, European citizens and businesses able to benefit, European governments able and willing to support. But we must get all those components right.
What does it mean to say we’re in the big data era?
First, it means more data than ever at our disposal. Take all the information of humanity from the dawn of civilisation until 2003 – nowadays that is produced in just two days. We are also acting to have more and more of it become available as open data, for science, for experimentation, for new products and services.
Second, we have ever more ways – not just to collect that data – but to manage it, manipulate it, use it. That is the magic to find value amid the mass of data. The right infrastructure, the right networks, the right computing capacity and, last but not least, the right analysis methods and algorithms help us break through the mountains of rock to find the gold within.
Third, this is not just some niche product for tech-lovers. The impact and difference to people’s lives are huge: in so many fields.
Transforming healthcare, using data to develop new drugs, and save lives. Greener cities with fewer traffic jams, and smarter use of public money.
A business boost: like retailers who communicate smarter with customers, for more personalisation, more productivity, a better bottom line.
No wonder big data is growing 40% a year. No wonder data jobs grow fast. No wonder skills and profiles that didn’t exist a few years ago are now hot property: and we need them all, from data cleaner to data manager to data scientist.
This can make a difference to people’s lives. Wherever you sit in the data ecosystem – never forget that. Never forget that real impact and real potential.
Politicians are starting to get this. The EU’s Presidents and Prime Ministers have recognised the boost to productivity, innovation and better services from big data and cloud computing.
But those technologies need the right environment. We can’t go on struggling with poor quality broadband. With each country trying on its own. With infrastructure and research that are individual and ineffective, separate and subscale. With different laws and practices shackling and shattering the single market. We can’t go on like that.
Nor can we continue in an atmosphere of insecurity and mistrust.
Recent revelations show what is possible online. They show implications for privacy, security, and rights.
You can react in two ways. One is to throw up your hands and surrender. To give up and put big data in the box marked “too difficult”. To turn away from this opportunity, and turn your back on problems that need to be solved, from cancer to climate change. Or – even worse – to simply accept that Europe won’t figure on this mapbut will be reduced to importing the results and products of others.
Alternatively: you can decide that we are going to master big data – and master all its dependencies, requirements and implications, including cloud and other infrastructures, Internet of things technologies as well as privacy and security. And do it on our own terms.
And by the way – privacy and security safeguards do not just have to be about protecting and limiting. Data generates value, and unlocks the door to new opportunities: you don’t need to “protect” people from their own assets. What you need is to empower people, give them control, give them a fair share of that value. Give them rights over their data – and responsibilities too, and the digital tools to exercise them. And ensure that the networks and systems they use are affordable, flexible, resilient, trustworthy, secure.
One thing is clear: the answer to greater security is not just to build walls. Many millennia ago, the Greek people realised that. They realised that you can build walls as high and as strong as you like – it won’t make a difference, not without the right awareness, the right risk management, the right security, at every link in the chain. If only the Trojans had realised that too! The same is true in the digital age: keep our data locked up in Europe, engage in an impossible dream of isolation, and we lose an opportunity; without gaining any security.
But master all these areas, and we would truly have mastered big data. Then we would have showed technology can take account of democratic values; and that a dynamic democracy can cope with technology. Then we would have a boost to benefit every European.
So let’s turn this asset into gold. With the infrastructure to capture and process. Cloud capability that is efficient, affordable, on-demand. Let’s tackle the obstacles, from standards and certification, trust and security, to ownership and copyright. With the right skills, so our workforce can seize this opportunity. With new partnerships, getting all the right players together. And investing in research and innovation. Over the next two years, we are putting 90 million euros on the table for big data and 125 million for the cloud.
I want to respond to this economic imperative. And I want to respond to the call of the European Council – looking at all the aspects relevant to tomorrow’s digital economy.
You can help us build this future. All of you. Helping to bring about the digital data-driven economy of the future. Expanding and depening the ecosystem around data. New players, new intermediaries, new solutions, new jobs, new growth….”
Personal Data for the Public Good
Final report on “New Opportunities to Enrich Understanding of Individual and Population Health” of the health data exploration project: “Individuals are tracking a variety of health-related data via a growing number of wearable devices and smartphone apps. More and more data relevant to health are also being captured passively as people communicate with one another on social networks, shop, work, or do any number of activities that leave “digital footprints.”
Almost all of these forms of “personal health data” (PHD) are outside of the mainstream of traditional health care, public health or health research. Medical, behavioral, social and public health research still largely rely on traditional sources of health data such as those collected in clinical trials, sifting through electronic medical records, or conducting periodic surveys.
Self-tracking data can provide better measures of everyday behavior and lifestyle and can fill in gaps in more traditional clinical data collection, giving us a more complete picture of health. With support from the Robert Wood Johnson Foundation, the Health Data Exploration (HDE) project conducted a study to better understand the barriers to using personal health data in research from the individuals who track the data about their own personal health, the companies that market self-track- ing devices, apps or services and aggregate and manage that data, and the researchers who might use the data as part of their research.
Perspectives
Through a series of interviews and surveys, we discovered strong interest in contributing and using PHD for research. It should be noted that, because our goal was to access individuals and researchers who are already generating or using digital self-tracking data, there was some bias in our survey findings—participants tended to have more educa- tion and higher household incomes than the general population. Our survey also drew slightly more white and Asian participants and more female participants than in the general population.
Individuals were very willing to share their self-tracking data for research, in particular if they knew the data would advance knowledge in the fields related to PHD such as public health, health care, computer science and social and behavioral science. Most expressed an explicit desire to have their information shared anonymously and we discovered a wide range of thoughts and concerns regarding thoughts over privacy.
Equally, researchers were generally enthusiastic about the potential for using self-tracking data in their research. Researchers see value in these kinds of data and think these data can answer important research questions. Many consider it to be of equal quality and importance to data from existing high quality clinical or public health data sources.
Companies operating in this space noted that advancing research was a worthy goal but not their primary business concern. Many companies expressed interest in research conducted outside of their company that would validate the utility of their device or application but noted the critical importance of maintaining their customer relationships. A number were open to data sharing with academics but noted the slow pace and administrative burden of working with universities as a challenge.
In addition to this considerable enthusiasm, it seems a new PHD research ecosystem may well be emerging. Forty-six percent of the researchers who participated in the study have already used self-tracking data in their research, and 23 percent of the researchers have already collaborated with application, device, or social media companies.
The Personal Health Data Research Ecosystem
A great deal of experimentation with PHD is taking place. Some individuals are experimenting with personal data stores or sharing their data directly with researchers in a small set of clinical experiments. Some researchers have secured one-off access to unique data sets for analysis. A small number of companies, primarily those with more of a health research focus, are working with others to develop data commons to regularize data sharing with the public and researchers.
SmallStepsLab serves as an intermediary between Fitbit, a data rich company, and academic research- ers via a “preferred status” API held by the company. Researchers pay SmallStepsLab for this access as well as other enhancements that they might want.
These promising early examples foreshadow a much larger set of activities with the potential to transform how research is conducted in medicine, public health and the social and behavioral sciences.
Opportunities and Obstacles
There is still work to be done to enhance the potential to generate knowledge out of personal health data:
- Privacy and Data Ownership: Among individuals surveyed, the dominant condition (57%) for making their PHD available for research was an assurance of privacy for their data, and over 90% of respondents said that it was important that the data be anonymous. Further, while some didn’t care who owned the data they generate, a clear majority wanted to own or at least share owner- ship of the data with the company that collected it.
- InformedConsent:Researchersareconcerned about the privacy of PHD as well as respecting the rights of those who provide it. For most of our researchers, this came down to a straightforward question of whether there is informed consent. Our research found that current methods of informed consent are challenged by the ways PHD are being used and reused in research. A variety of new approaches to informed consent are being evaluated and this area is ripe for guidance to assure optimal outcomes for all stakeholders.
- Data Sharing and Access: Among individuals, there is growing interest in, as well as willingness and opportunity to, share personal health data with others. People now share these data with others with similar medical conditions in online groups like PatientsLikeMe or Crohnology, with the intention to learn as much as possible about mutual health concerns. Looking across our data, we find that individuals’ willingness to share is dependent on what data is shared, how the data will be used, who will have access to the data and when, what regulations and legal protections are in place, and the level of compensation or benefit (both personal and public).
- Data Quality: Researchers highlighted concerns about the validity of PHD and lack of standard- ization of devices. While some of this may be addressed as the consumer health device, apps and services market matures, reaching the optimal outcome for researchers might benefit from strategic engagement of important stakeholder groups.
We are reaching a tipping point. More and more people are tracking their health, and there is a growing number of tracking apps and devices on the market with many more in development. There is overwhelming enthusiasm from individuals and researchers to use this data to better understand health. To maximize personal data for the public good, we must develop creative solutions that allow individual rights to be respected while providing access to high-quality and relevant PHD for research, that balance open science with intellectual property, and that enable productive and mutually beneficial collaborations between the private sector and the academic research community.”
The Parable of Google Flu: Traps in Big Data Analysis
David Lazer: “…big data last winter had its “Dewey beats Truman” moment, when the poster child of big data (at least for behavioral data), Google Flu Trends (GFT), went way off the rails in “nowcasting” the flu–overshooting the peak last winter by 130% (and indeed, it has been systematically overshooting by wide margins for 3 years). Tomorrow we (Ryan Kennedy, Alessandro Vespignani, and Gary King) have a paper out in Science dissecting why GFT went off the rails, how that could have been prevented, and the broader lessons to be learned regarding big data.
[We are The Parable of Google Flu (WP-Final).pdf we submitted before acceptance. We have also posted an SSRN paper evaluating GFT for 2013-14, since it was reworked in the Fall.]Key lessons that I’d highlight:
1) Big data are typically not scientifically calibrated. This goes back to my post last month regarding measurement. This does not make them useless from a scientific point of view, but you do need to build into the analysis that the “measures” of behavior are being affected by unseen things. In this case, the likely culprit was the Google search algorithm, which was modified in various ways that we believe likely to have increased flu related searches.
2) Big data + analytic code used in scientific venues with scientific claims need to be more transparent. This is a tricky issue, because there are both legitimate proprietary interests involved and privacy concerns, but much more can be done in this regard than has been done in the 3 GFT papers. [One of my aspirations over the next year is to work together with big data companies, researchers, and privacy advocates to figure out how this can be done.]
3) It’s about the questions, not the size of the data. In this particular case, one could have done a better job stating the likely flu prevalence today by ignoring GFT altogether and just project 3 week old CDC data to today (better still would have been to combine the two). That is, a synthesis would have been more effective than a pure “big data” approach. I think this is likely the general pattern.
4) More generally, I’d note that there is much more that the academy needs to do. First, the academy needs to build the foundation for collaborations around big data (e.g., secure infrastructures, legal understandings around data sharing, etc). Second, there needs to be MUCH more work done to build bridges between the computer scientists who work on big data and social scientists who think about deriving insights about human behavior from data more generally. We have moved perhaps 5% of the way that we need to in this regard.”
How Maps Drive Decisions at EPA

Those spreadsheets detailed places with large oil and gas production and other possible pollutants where EPA might want to focus its own inspection efforts or reach out to state-level enforcement agencies.
During the past two years, the agency has largely replaced those spreadsheets and tables with digital maps, which make it easier for participants to visualize precisely where the top polluting areas are and how those areas correspond to population centers, said Harvey Simon, EPA’s geospatial information officer, making it easier for the agency to focus inspections and enforcement efforts where they will do the most good.
“Rather than verbally going through tables and spreadsheets you have a lot of people who are not [geographic information systems] practitioners who are able to share map information,” Simon said. “That’s allowed them to take a more targeted and data-driven approach to deciding what to do where.”
The change is a result of the EPA Geoplatform, a tool built off Esri’s ArcGIS Online product, which allows companies and government agencies to build custom Web maps using base maps provided by Esri mashed up with their own data.
When the EPA Geoplatform launched in May 2012 there were about 250 people registered to create and share mapping data within the agency. That number has grown to more than 1,000 during the past 20 months, Simon said.
“The whole idea of the platform effort is to democratize the use of geospatial information within the agency,” he said. “It’s relatively simple now to make a Web map and mash up data that’s useful for your work, so many users are creating Web maps themselves without any support from a consultant or from a GIS expert in their office.”
A governmentwide Geoplatform launched in 2012, spurred largely by agencies’ frustrations with the difficulty of sharing mapping data after the 2010 explosion of the Deepwater Horizon oil rig in the Gulf of Mexico. The platform’s goal was twofold. First officials wanted to share mapping data more widely between agencies so they could avoid duplicating each other’s work and to share data more easily during an emergency.
Second, the government wanted to simplify the process for viewing and creating Web maps so they could be used more easily by nonspecialists.
EPA’s geoplatform has essentially the same goals. The majority of the maps the agency builds using the platform aren’t publicly accessible so the EPA doesn’t have to worry about scrubbing maps of data that could reveal personal information about citizens or proprietary data about companies. It publishes some maps that don’t pose any privacy concerns on EPA websites as well as on the national geoplatform and to Data.gov, the government data repository.
Once ArcGIS Online is judged compliant with the Federal Information Security Management Act, or FISMA, which is expected this month, EPA will be able to share significantly more nonpublic maps through the national geoplatform and rely on more maps produced by other agencies, Simon said.
EPA’s geoplatform has also made it easier for the agency’s environmental justice office to share common data….”
Can A Serious Game Improve Privacy Awareness on Facebook?
Emerging Technology From the arXiv: “Do you know who can see the items you’ve posted on Facebook? This, of course, depends on the privacy settings you’ve used for each picture, text or link that you’ve shared throughout your Facebook history.
You might be extremely careful in deciding who can see these things. But as time goes on, the number of items people share increases. And the number contacts they share them with increases too. So it’s easy to lose track of who can see what.
What’s more, an item that you may have been happy to share three years ago when you were at university, you may not be quite so happy to share now that you are looking for employment.
So how best to increase people’s awareness of their privacy settings? Today, Alexandra Cetto and pals from the University of Regensburg in Germany, say they’ve developed a serious game called Friend Inspector that allows users to increase their privacy awareness on Facebook.
And they say that within five months of its launch, the game had been requested over 100,000 times.
In recent years, serious games have become an increasingly important learning medium through digital simulations and virtual environments. So Cetto and co set about developing a game that could increase people’s awareness of privacy on Facebook.
Designing serious games is something of a black art. At the very least, there needs to be motivation to play and some kind of feedback or score to beat. And at the same time, the game has to achieve some kind of learning objective, in this case an enhanced awareness of privacy.
Aimed at 16-25 year olds, the game these guys came up with is deceptively simple. When potential players land on the home page, they’re asked a simple question: “Do you know who can see your Facebook profile?” This is followed by the teaser: “Playfully discover who can see your shared items and get advice to improve your privacy.”
When players sign up, the game retrieves his or her contacts, shared items and their privacy settings from Facebook. It then presents the player with a pair of these shared items asking which is more personal….
Finally, the game assesses the player’s score and makes a set of personalised recommendations about how to improve privacy, such as how to create friend lists, how to share personal items in a targeted manner and how the term friendship on a social network site differs from friendship in the real world…. Try it at http://www.friend-inspector.org/.
Ref: arxiv.org/abs/1402.5878 : Friend Inspector: A Serious Game to Enhance Privacy Awareness in Social Networks”
Predicting Individual Behavior with Social Networks
Article by Sharad Goel and Daniel Goldstein (Microsoft Research): “With the availability of social network data, it has become possible to relate the behavior of individuals to that of their acquaintances on a large scale. Although the similarity of connected individuals is well established, it is unclear whether behavioral predictions based on social data are more accurate than those arising from current marketing practices. We employ a communications network of over 100 million people to forecast highly diverse behaviors, from patronizing an off-line department store to responding to advertising to joining a recreational league. Across all domains, we find that social data are informative in identifying individuals who are most likely to undertake various actions, and moreover, such data improve on both demographic and behavioral models. There are, however, limits to the utility of social data. In particular, when rich transactional data were available, social data did little to improve prediction.”
Big Data, Big New Businesses
Nigel Shaboldt and Michael Chui: “Many people have long believed that if government and the private sector agreed to share their data more freely, and allow it to be processed using the right analytics, previously unimaginable solutions to countless social, economic, and commercial problems would emerge. They may have no idea how right they are.
Even the most vocal proponents of open data appear to have underestimated how many profitable ideas and businesses stand to be created. More than 40 governments worldwide have committed to opening up their electronic data – including weather records, crime statistics, transport information, and much more – to businesses, consumers, and the general public. The McKinsey Global Institute estimates that the annual value of open data in education, transportation, consumer products, electricity, oil and gas, health care, and consumer finance could reach $3 trillion.
These benefits come in the form of new and better goods and services, as well as efficiency savings for businesses, consumers, and citizens. The range is vast. For example, drawing on data from various government agencies, the Climate Corporation (recently bought for $1 billion) has taken 30 years of weather data, 60 years of data on crop yields, and 14 terabytes of information on soil types to create customized insurance products.
Similarly, real-time traffic and transit information can be accessed on smartphone apps to inform users when the next bus is coming or how to avoid traffic congestion. And, by analyzing online comments about their products, manufacturers can identify which features consumers are most willing to pay for, and develop their business and investment strategies accordingly.
Opportunities are everywhere. A raft of open-data start-ups are now being incubated at the London-based Open Data Institute (ODI), which focuses on improving our understanding of corporate ownership, health-care delivery, energy, finance, transport, and many other areas of public interest.
Consumers are the main beneficiaries, especially in the household-goods market. It is estimated that consumers making better-informed buying decisions across sectors could capture an estimated $1.1 trillion in value annually. Third-party data aggregators are already allowing customers to compare prices across online and brick-and-mortar shops. Many also permit customers to compare quality ratings, safety data (drawn, for example, from official injury reports), information about the provenance of food, and producers’ environmental and labor practices.
Consider the book industry. Bookstores once regarded their inventory as a trade secret. Customers, competitors, and even suppliers seldom knew what stock bookstores held. Nowadays, by contrast, bookstores not only report what stock they carry but also when customers’ orders will arrive. If they did not, they would be excluded from the product-aggregation sites that have come to determine so many buying decisions.
The health-care sector is a prime target for achieving new efficiencies. By sharing the treatment data of a large patient population, for example, care providers can better identify practices that could save $180 billion annually.
The Open Data Institute-backed start-up Mastodon C uses open data on doctors’ prescriptions to differentiate among expensive patent medicines and cheaper “off-patent” varieties; when applied to just one class of drug, that could save around $400 million in one year for the British National Health Service. Meanwhile, open data on acquired infections in British hospitals has led to the publication of hospital-performance tables, a major factor in the 85% drop in reported infections.
There are also opportunities to prevent lifestyle-related diseases and improve treatment by enabling patients to compare their own data with aggregated data on similar patients. This has been shown to motivate patients to improve their diet, exercise more often, and take their medicines regularly. Similarly, letting people compare their energy use with that of their peers could prompt them to save hundreds of billions of dollars in electricity costs each year, to say nothing of reducing carbon emissions.
Such benchmarking is even more valuable for businesses seeking to improve their operational efficiency. The oil and gas industry, for example, could save $450 billion annually by sharing anonymized and aggregated data on the management of upstream and downstream facilities.
Finally, the move toward open data serves a variety of socially desirable ends, ranging from the reuse of publicly funded research to support work on poverty, inclusion, or discrimination, to the disclosure by corporations such as Nike of their supply-chain data and environmental impact.
There are, of course, challenges arising from the proliferation and systematic use of open data. Companies fear for their intellectual property; ordinary citizens worry about how their private information might be used and abused. Last year, Telefónica, the world’s fifth-largest mobile-network provider, tried to allay such fears by launching a digital confidence program to reassure customers that innovations in transparency would be implemented responsibly and without compromising users’ personal information.
The sensitive handling of these issues will be essential if we are to reap the potential $3 trillion in value that usage of open data could deliver each year. Consumers, policymakers, and companies must work together, not just to agree on common standards of analysis, but also to set the ground rules for the protection of privacy and property.”