The Changing Nature of Privacy Practice


Numerous commenters have observed that Facebook, among many marketers (including political campaigns like U.S. President Barack Obama’s), regularly conducts A-B tests and other research to measure how consumers respond to different products, messages and messengers. So what makes the Facebook-Cornell study different from what goes on all the time in an increasingly data-driven world? After all, the ability to conduct such testing continuously on a large scale is considered one of the special features of big data.
The answer calls for broader judgments than parsing the language of privacy policies or managing compliance with privacy laws and regulations. Existing legal tools such as notice-and-choice and use limitations are simply too narrow to address the array of issues presented and inform the judgment needed. Deciding whether Facebook ought to participate in research like its newsfeed study is not really about what the company can do but what it should do.
As Omer Tene and Jules Polonetsky, CIPP/US, point out in an article on Facebook’s research study, “Increasingly, corporate officers find themselves struggling to decipher subtle social norms and make ethical choices that are more befitting of philosophers than business managers or lawyers.” They add, “Going forward, companies will need to create new processes, deploying a toolbox of innovative solutions to engender trust and mitigate normative friction.” Tene and Polonetsky themselves have proposed a number of such tools. In recent comments on Consumer Privacy Bill of Rights legislation filed with the Commerce Department, the Future of Privacy Forum (FPF) endorsed the use of internal review boards along the lines of those used in academia for human-subject research. The FPF also submitted an initial framework for benefit-risk analysis in the big data context “to understand whether assuming the risk is ethical, fair, legitimate and cost-effective.” Increasingly, companies and other institutions are bringing to bear more holistic review of privacy issues. Conferences and panels on big data research ethics are proliferating.
The expanding variety and complexity of data uses also call for a broader public policy approach. The Obama administration’s Consumer Privacy Bill of Rights (of which I was an architect) adapted existing Fair Information Practice Principles to a principles-based approach that is intended not as a formalistic checklist but as a set of principles that work holistically in ways that are “flexible” and “dynamic.” In turn, much of the commentary submitted to the Commerce Department on the Consumer Privacy Bill of Rights addressed the question of the relationship between these principles and a “responsible use framework” as discussed in the White House Big Data Report….”

CrowdCriMa – a complete Next Generation Crowsourced Crisis Management Platform


Pitch at ClimateCoLab: “The proposed #CrowdCriMa would be a disaster management platform based on an innovative, interactive and accountable Digital Governance Framework– in which common people, crisis response or disaster response workers, health workers, decision makers would participate actively. This application would be available for mobile phones and other smart devices.
Crowdsourcing Unheard Voices
The main function would be to help collecting voice messages of disaster victims in the forms of phone call, recorded voice, SMS, E-mail and Fax to seek urgent help from the authority and spread those voices via online media, social media, SMS and etc to inform the world about the situation. As still in developing countries, Fax communication is more powerful than SMS or email, we have also included FAX as one of the reporting tools.
People will be able to record their observations, potential crisis, seek helps and appeals for funds for disaster response works, different environment related activities (e.g. project for pollution free environment and etc). To have all functions in the #CrowdCriMa platform, an IVR system, FrontlineSMS-type software will be developed / integrated in the proposed platform.
A cloud-based information management system would be used to sustain the flow of information. This would help not to lose any information if communications infrastructures are not functioning properly during and after the disaster.
Crowdfunding:
Another function of this #CrowdCriMa platform would be the crowdfunding function. When individual donor logs in, they find the list of issues / crisis where fund is needed. An innovative and sustainable approach would be taken to meet the financial need in crisis / disaster and post-crisis financial empowerment work for victims.
Some services are available differently but innovative parts of this proposal is several services to deal disaster would be in platform so people do not need to use different platforms for disaster management work. ..”

Social Media and the ‘Spiral of Silence’


Report by By , , , , and : “A major insight into human behavior from pre-internet era studies of communication is the tendency of people not to speak up about policy issues in public—or among their family, friends, and work colleagues—when they believe their own point of view is not widely shared. This tendency is called the “spiral of silence.”1
Some social media creators and supporters have hoped that social media platforms like Facebook and Twitter might produce different enough discussion venues that those with minority views might feel freer to express their opinions, thus broadening public discourse and adding new perspectives to everyday discussion of political issues.
We set out to study this by conducting a survey of 1,801 adults.2 It focused on one important public issue: Edward Snowden’s 2013 revelations of widespread government surveillance of Americans’ phone and email records. We selected this issue because other surveys by the Pew Research Center at the time we were fielding this poll showed that Americans were divided over whether the NSA contractor’s leaks about surveillance were justified and whether the surveillance policy itself was a good or bad idea. For instance, Pew Research found in one survey that 44% say the release of classified information harms the public interest while 49% said it serves the public interest.
The survey reported in this report sought people’s opinions about the Snowden leaks, their willingness to talk about the revelations in various in-person and online settings, and their perceptions of the views of those around them in a variety of online and off-line contexts.
This survey’s findings produced several major insights:

Google's fact-checking bots build vast knowledge bank


Hal Hodson in the New Scientist: “The search giant is automatically building Knowledge Vault, a massive database that could give us unprecedented access to the world’s facts

GOOGLE is building the largest store of knowledge in human history – and it’s doing so without any human help. Instead, Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it.

The breadth and accuracy of this gathered knowledge is already becoming the foundation of systems that allow robots and smartphones to understand what people ask them. It promises to let Google answer questions like an oracle rather than a search engine, and even to turn a new lens on human history.

Knowledge Vault is a type of “knowledge base” – a system that stores information so that machines as well as people can read it. Where a database deals with numbers, a knowledge base deals with facts. When you type “Where was Madonna born” into Google, for example, the place given is pulled from Google’s existing knowledge base.

This existing base, called Knowledge Graph, relies on crowdsourcing to expand its information. But the firm noticed that growth was stalling; humans could only take it so far. So Google decided it needed to automate the process. It started building the Vault by using an algorithm to automatically pull in information from all over the web, using machine learning to turn the raw data into usable pieces of knowledge.

Knowledge Vault has pulled in 1.6 billion facts to date. Of these, 271 million are rated as “confident facts”, to which Google’s model ascribes a more than 90 per cent chance of being true. It does this by cross-referencing new facts with what it already knows.

“It’s a hugely impressive thing that they are pulling off,” says Fabian Suchanek, a data scientist at Télécom ParisTech in France.

Google’s Knowledge Graph is currently bigger than the Knowledge Vault, but it only includes manually integrated sources such as the CIA Factbook.

Knowledge Vault offers Google fast, automatic expansion of its knowledge – and it’s only going to get bigger. As well as the ability to analyse text on a webpage for facts to feed its knowledge base, Google can also peer under the surface of the web, hunting for hidden sources of data such as the figures that feed Amazon product pages, for example.

Tom Austin, a technology analyst at Gartner in Boston, says that the world’s biggest technology companies are racing to build similar vaults. “Google, Microsoft, Facebook, Amazon and IBM are all building them, and they’re tackling these enormous problems that we would never even have thought of trying 10 years ago,” he says.

The potential of a machine system that has the whole of human knowledge at its fingertips is huge. One of the first applications will be virtual personal assistants that go way beyond what Siri and Google Now are capable of, says Austin…”

Twitter Analytics Project HealthMap Outperforming WHO in Ebola Tracking


HIS Talk: “HealthMap, a collaborative data analytics project launched in 2006 between Harvard Medical School and Boston Children’s Hospital, has been quietly tracking the recent Ebola outbreak in Western Africa with notable accuracy, beating the World Health Organization’s own tracking efforts by two weeks in some instances.
HealthMap aggregates information from a variety of online sources to plot real-time disease outbreaks. Currently, the platform analyzes data from the World Health Organization, Google News, and GeoSentinel, a global disease tracking platform that tracks major geography changes in diseases carried through travelers, foreign visitors, and immigrants. The analytics project also got a new source of feeder-data this February when Twitter announced that the HealthMap project had been selected as a Twitter Data Grant recipient, which gives the 45 epidemiologists working on the project access to the “fire hose” of unfiltered data generated from Twitter’s 500 million daily tweets….”

Technology’s Crucial Role in the Fight Against Hunger


Crowdsourcing, predictive analytics and other new tools could go far toward finding innovative solutions for America’s food insecurity.

National Geographic recently sent three photographers to explore hunger in the United States. It was an effort to give a face to a very troubling statistic: Even today, one-sixth of Americans do not have enough food to eat. Fifty million people in this country are “food insecure” — having to make daily trade-offs among paying for food, housing or medical care — and 17 million of them skip at least one meal a day to get by. When choosing what to eat, many of these individuals must make choices between lesser quantities of higher-quality food and larger quantities of less-nutritious processed foods, the consumption of which often leads to expensive health problems down the road.
This is an extremely serious, but not easily visible, social problem. Nor does the challenge it poses become any easier when poorly designed public-assistance programs continue to count the sauce on a pizza as a vegetable. The deficiencies caused by hunger increase the likelihood that a child will drop out of school, lowering her lifetime earning potential. In 2010 alone, food insecurity cost America $167.5 billion, a figure that includes lost economic productivity, avoidable health-care expenses and social-services programs.
As much as we need specific policy innovations, if we are to eliminate hunger in America food insecurity is just one of many extraordinarily complex and interdependent “systemic” problems facing us that would benefit from the application of technology, not just to identify innovative solutions but to implement them as well. In addition to laudable policy initiatives by such states as Illinois and Nevada, which have made hunger a priority, or Arkansas, which suffers the greatest level of food insecurity but which is making great strides at providing breakfast to schoolchildren, we can — we must — bring technology to bear to create a sustained conversation between government and citizens to engage more Americans in the fight against hunger.

Identifying who is genuinely in need cannot be done as well by a centralized government bureaucracy — even one with regional offices — as it can through a distributed network of individuals and organizations able to pinpoint with on-the-ground accuracy where the demand is greatest. Just as Ushahidi uses crowdsourcing to help locate and identify disaster victims, it should be possible to leverage the crowd to spot victims of hunger. As it stands, attempts to eradicate so-called food deserts are often built around developing solutions for residents rather than with residents. Strategies to date tend to focus on the introduction of new grocery stores or farmers’ markets but with little input from or involvement of the citizens actually affected.

Applying predictive analytics to newly available sources of public as well as private data, such as that regularly gathered by supermarkets and other vendors, could also make it easier to offer coupons and discounts to those most in need. In addition, analyzing nonprofits’ tax returns, which are legally open and available to all, could help map where the organizations serving those in need leave gaps that need to be closed by other efforts. The Governance Lab recently brought together U.S. Department of Agriculture officials with companies that use USDA data in an effort to focus on strategies supporting a White House initiative to use climate-change and other open data to improve food production.

Such innovative uses of technology, which put citizens at the center of the service-delivery process and streamline the delivery of government support, could also speed the delivery of benefits, thus reducing both costs and, every bit as important, the indignity of applying for assistance.

Being open to new and creative ideas from outside government through brainstorming and crowdsourcing exercises using social media can go beyond simply improving the quality of the services delivered. Some of these ideas, such as those arising from exciting new social-science experiments involving the use of incentives for “nudging” people to change their behaviors, might even lead them to purchase more healthful food.

Further, new kinds of public-private collaborative partnerships could create the means for people to produce their own food. Both new kinds of financing arrangements and new apps for managing the shared use of common real estate could make more community gardens possible. Similarly, with the kind of attention, convening and funding that government can bring to an issue, new neighbor-helping-neighbor programs — where, for example, people take turns shopping and cooking for one another to alleviate time away from work — could be scaled up.

Then, too, advances in citizen engagement and oversight could make it more difficult for lawmakers to cave to the pressures of lobbying groups that push for subsidies for those crops, such as white potatoes and corn, that result in our current large-scale reliance on less-nutritious foods. At the same time, citizen scientists reporting data through an app would be able do a much better job than government inspectors in reporting what is and is not working in local communities.

As a society, we may not yet be able to banish hunger entirely. But if we commit to using new technologies and mechanisms of citizen engagement widely and wisely, we could vastly reduce its power to do harm.

Reddit, Imgur and Twitch team up as 'Derp' for social data research


in The Guardian: “Academic researchers will be granted unprecedented access to the data of major social networks including Imgur, Reddit, and Twitch as part of a joint initiative: The Digital Ecologies Research Partnership (Derp).
Derp – and yes, that really is its name – will be offering data to universities including Harvard, MIT and McGill, to promote “open, publicly accessible, and ethical academic inquiry into the vibrant social dynamics of the web”.
It came about “as a result of Imgur talking with a number of other community platforms online trying to learn about how they work with academic researchers,” says Tim Hwang, the image-sharing site’s head of special initiatives.
“In most cases, the data provided through Derp will already be accessible through public APIs,” he says. “Our belief is that there are ways of doing research better, and in a way that strongly respects user privacy and responsible use of data.
“Derp is an alliance of platforms that all believe strongly in this. In working with academic researchers, we support projects that meet institutional review at their home institution, and all research supported by Derp will be released openly and made publicly available.”
Hwang points to a Stanford paper analysing the success of Reddit’s Random Acts of Pizza subforum as an example of the sort of research Derp hopes to foster. In the research, Tim Althoff, Niloufar Salehi and Tuan Nguyen found that the likelihood of getting a free pizza from the Reddit community depended on a number of factors, including how the request was phrased, how much the user posted on the site, and how many friends they had online. In the end, they were able to predict with 67% accuracy whether or not a given request would be fulfilled.
The grouping aims to solve two problems academic research faces. Researchers themselves find it hard to get data outside of the larges social media platforms, such as Twitter and Facebook. The major services at least have a vibrant community of developers and researchers working on ways to access and use data, but for smaller communities, there’s little help provided.
Yet smaller is relative: Reddit may be a shrimp compared to Facebook, but with 115 million unique visitors every month, it’s still a sizeable community. And so Derp aims to offer “a single point of contact for researchers to get in touch with relevant team members across a range of different community sites….”

As Data Overflows Online, Researchers Grapple With Ethics


at The New York Times: “Scholars are exhilarated by the prospect of tapping into the vast troves of personal data collected by Facebook, Google, Amazon and a host of start-ups, which they say could transform social science research.

Once forced to conduct painstaking personal interviews with subjects, scientists can now sit at a screen and instantly play with the digital experiences of millions of Internet users. It is the frontier of social science — experiments on people who may never even know they are subjects of study, let alone explicitly consent.

“This is a new era,” said Jeffrey T. Hancock, a Cornell University professor of communication and information science. “I liken it a little bit to when chemistry got the microscope.”

But the new era has brought some controversy with it. Professor Hancock was a co-author of the Facebook study in which the social network quietly manipulated the news feeds of nearly 700,000 people to learn how the changes affected their emotions. When the research was published in June, the outrage was immediate…

Such testing raises fundamental questions. What types of experiments are so intrusive that they need prior consent or prompt disclosure after the fact? How do companies make sure that customers have a clear understanding of how their personal information might be used? Who even decides what the rules should be?

Existing federal rules governing research on human subjects, intended for medical research, generally require consent from those studied unless the potential for harm is minimal. But many social science scholars say the federal rules never contemplated large-scale research on Internet users and provide inadequate guidance for it.

For Internet projects conducted by university researchers, institutional review boards can be helpful in vetting projects. However, corporate researchers like those at Facebook don’t face such formal reviews.

Sinan Aral, a professor at the Massachusetts Institute of Technology’s Sloan School of Management who has conducted large-scale social experiments with several tech companies, said any new rules must be carefully formulated.

“We need to understand how to think about these rules without chilling the research that has the promise of moving us miles and miles ahead of where we are today in understanding human populations,” he said. Professor Aral is planning a panel discussion on ethics at a M.I.T. conference on digital experimentation in October. (The professor also does some data analysis for The New York Times Company.)

Mary L. Gray, a senior researcher at Microsoft Research and associate professor at Indiana University’s Media School, who has worked extensively on ethics in social science, said that too often, researchers conducting digital experiments work in isolation with little outside guidance.

She and others at Microsoft Research spent the last two years setting up an ethics advisory committee and training program for researchers in the company’s labs who are working with human subjects. She is now working with Professor Hancock to bring such thinking to the broader research world.

“If everyone knew the right thing to do, we would never have anyone hurt,” she said. “We really don’t have a place where we can have these conversations.”…

Reality Mining: Using Big Data to Engineer a Better World


New book by Nathan Eagle and Kate Greene : “Big Data is made up of lots of little data: numbers entered into cell phones, addresses entered into GPS devices, visits to websites, online purchases, ATM transactions, and any other activity that leaves a digital trail. Although the abuse of Big Data—surveillance, spying, hacking—has made headlines, it shouldn’t overshadow the abundant positive applications of Big Data. In Reality Mining, Nathan Eagle and Kate Greene cut through the hype and the headlines to explore the positive potential of Big Data, showing the ways in which the analysis of Big Data (“Reality Mining”) can be used to improve human systems as varied as political polling and disease tracking, while considering user privacy.

Eagle, a recognized expert in the field, and Greene, an experienced technology journalist, describe Reality Mining at five different levels: the individual, the neighborhood and organization, the city, the nation, and the world. For each level, they first offer a nontechnical explanation of data collection methods and then describe applications and systems that have been or could be built. These include a mobile app that helps smokers quit smoking; a workplace “knowledge system”; the use of GPS, Wi-Fi, and mobile phone data to manage and predict traffic flows; and the analysis of social media to track the spread of disease. Eagle and Greene argue that Big Data, used respectfully and responsibly, can help people live better, healthier, and happier lives.”

Monitoring Arms Control Compliance With Web Intelligence


Chris Holden and Maynard Holliday at Commons Lab: “Traditional monitoring of arms control treaties, agreements, and commitments has required the use of National Technical Means (NTM)—large satellites, phased array radars, and other technological solutions. NTM was a good solution when the treaties focused on large items for observation, such as missile silos or nuclear test facilities. As the targets of interest have shrunk by orders of magnitude, the need for other, more ubiquitous, sensor capabilities has increased. The rise in web-based, or cloud-based, analytic capabilities will have a significant influence on the future of arms control monitoring and the role of citizen involvement.
Since 1999, the U.S. Department of State has had at its disposal the Key Verification Assets Fund (V Fund), which was established by Congress. The Fund helps preserve critical verification assets and promotes the development of new technologies that support the verification of and compliance with arms control, nonproliferation, and disarmament requirements.
Sponsored by the V Fund to advance web-based analytic capabilities, Sandia National Laboratories, in collaboration with Recorded Future (RF), synthesized open-source data streams from a wide variety of traditional and nontraditional web sources in multiple languages along with topical texts and articles on national security policy to determine the efficacy of monitoring chemical and biological arms control agreements and compliance. The team used novel technology involving linguistic algorithms to extract temporal signals from unstructured text and organize that unstructured text into a multidimensional structure for analysis. In doing so, the algorithm identifies the underlying associations between entities and events across documents and sources over time. Using this capability, the team analyzed several events that could serve as analogs to treaty noncompliance, technical breakout, or an intentional attack. These events included the H7N9 bird flu outbreak in China, the Shanghai pig die-off and the fungal meningitis outbreak in the United States last year.
h7n9-for-blog
 
For H7N9 we found that open source social media were the first to report the outbreak and give ongoing updates.  The Sandia RF system was able to roughly estimate lethality based on temporal hospitalization and fatality reporting.  For the Shanghai pig die-off the analysis tracked the rapid assessment by Chinese authorities that H7N9 was not the cause of the pig die-off as had been originally speculated. Open source reporting highlighted a reduced market for pork in China due to the very public dead pig display in Shanghai. Possible downstream health effects were predicted (e.g., contaminated water supply and other overall food ecosystem concerns). In addition, legitimate U.S. food security concerns were raised based on the Chinese purchase of the largest U.S. pork producer (Smithfield) because of a fear of potential import of tainted pork into the United States….
To read the full paper, please click here.”