Learning Privacy Expectations by Crowdsourcing Contextual Informational Norms


 at Freedom to Tinker: “The advent of social apps, smart phones and ubiquitous computing has brought a great transformation to our day-to-day life. The incredible pace with which the new and disruptive services continue to emerge challenges our perception of privacy. To keep apace with this rapidly evolving cyber reality, we need to devise agile methods and frameworks for developing privacy-preserving systems that align with evolving user’s privacy expectations.

Previous efforts have tackled this with the assumption that privacy norms are provided through existing sources such law, privacy regulations and legal precedents. They have focused on formally expressing privacy norms and devising a corresponding logic to enable automatic inconsistency checks and efficient enforcement of the logic.

However, because many of the existing regulations and privacy handbooks were enacted well before the Internet revolution took place, they often lag behind and do not adequately reflect the application of logic in modern systems. For example, the Family Rights and Privacy Act (FERPA) was enacted in 1974, long before Facebook, Google and many other online applications were used in an educational context. More recent legislation faces similar challenges as novel services introduce new ways to exchange information, and consequently shape new, unconsidered information flows that can change our collective perception of privacy.

Crowdsourcing Contextual Privacy Norms

Armed with the theory of Contextual Integrity (CI) in our work, we are exploring ways to uncover societal norms by leveraging the advances in crowdsourcing technology.

In our recent paper, we present the methodology that we believe can be used to extract a societal notion of privacy expectations. The results can be used to fine tune the existing privacy guidelines as well as get a better perspective on the users’ expectations of privacy.

CI defines privacy as collection of norms (privacy rules) that reflect appropriate information flows between different actors. Norms capture who shares what, with whom, in what role, and under which conditions. For example, while you are comfortable sharing your medical information with your doctor, you might be less inclined to do so with your colleagues.

We use CI as a proxy to reason about privacy in the digital world and a gateway to understanding how people perceive privacy in a systematic way. Crowdsourcing is a great tool for this method. We are able to ask hundreds of people how they feel about a particular information flow, and then we can capture their input and map it directly onto the CI parameters. We used a simple template to write Yes-or-No questions to ask our crowdsourcing participants:

“Is it acceptable for the [sender] to share the [subject’s] [attribute] with [recipient] [transmission principle]?”

For example:

“Is it acceptable for the student’s professor to share the student’s record of attendance with the department chair if the student is performing poorly? ”

In our experiments, we leveraged Amazon’s Mechanical Turk (AMT) to ask 450 turkers over 1400 such questions. Each question represents a specific contextual information flow that users can approve, disapprove or mark under the Doesn’t Make Sense category; the last category could be used when 1) the sender is unlikely to have the information, 2) the receiver would already have the information, or 3) the question is ambiguous….(More)”

Tackling Corruption with People-Powered Data


Sandra Prüfer at Mastercard Center for Inclusive Growth: “Informal fees plague India’s “free” maternal health services. In Nigeria, village households don’t receive the clean cookstoves their government paid for. Around the world, corruption – coupled with the inability to find and share information about it – stymies development in low-income communities.

Now, digital transparency platforms – supplemented with features illiterate and rural populations can use – make it possible for traditionally excluded groups to make their voices heard and access tools they need to grow.

Mapping Corruption Hot Spots in India

One of the problems surrounding access to information is the lack of reliable information in the first place: a popular method to create knowledge is crowdsourcing and enlisting the public to monitor and report on certain issues.

The Mera Swasthya Meri Aawaz platform, which means “Our Health, Our Voice”, is an interactive map in Uttar Pradesh launched by the Indian non-profit organization SAHAYOG. It enables women to anonymously report illicit fees charged for services at maternal health clinics using their mobile phones.

To reduce infant mortality and deaths in childbirth, the Indian government provides free prenatal care and cash incentives to use maternal health clinics, but many charge illegal fees anyway – cutting mothers off from lifesaving healthcare and inhibiting communities’ growth. An estimated 45,000 women in India died in 2015 from complications of pregnancy and childbirth – one of the highest rates of any country in the world; low-income women are disproportionately affected….“Documenting illegal payment demands in real time and aggregating the data online increased governmental willingness to listen,” Sandhya says. “Because the data is linked to technology, its authenticity is not questioned.”

Following the Money in Nigeria

In Nigeria, Connected Development (CODE) also champions open data to combat corruption in infrastructure building, health and education projects. Its mission is to improve access to information and empower local communities to share data that can expose financial irregularities. Since 2012, the Abuja-based watchdog group has investigated twelve capital projects, successfully pressuring the government to release funds including $5.3 million to treat 1,500 lead-poisoned children.

“People activate us: if they know about any project that is supposed to be in their community, but isn’t, they tell us they want us to follow the money – and we’ll take it from there,” says CODE co-founder Oludotun Babayemi.

Users alert the watchdog group directly through its webpage, which publishes open-source data about development projects that are supposed to be happening, based on reports from freedom of information requests to Nigeria’s federal minister of environment, World Bank data and government press releases.

Last year, as part of their #WomenCookstoves reporting campaign, CODE revealed an apparent scam by tracking a $49.8 million government project that was supposed to purchase 750,000 clean cookstoves for rural women. Smoke inhalation diseases disproportionately affect women who spend time cooking over wood fires; according to the World Health Organization, almost 100,000 people die yearly in Nigeria from inhaling wood smoke, the country’s third biggest killer after malaria and AIDS.

“After three months, we found out that only 15 percent of the $48 million was given to the contractor – meaning there were only 45,000 cook stoves out of 750,000 in the county,” Babayemi says….(More)”

Civic Crowd Analytics: Making sense of crowdsourced civic input with big data tools


Paper by  that: “… examines the impact of crowdsourcing on a policymaking process by using a novel data analytics tool called Civic CrowdAnalytics, applying Natural Language Processing (NLP) methods such as concept extraction, word association and sentiment analysis. By drawing on data from a crowdsourced urban planning process in the City of Palo Alto in California, we examine the influence of civic input on the city’s Comprehensive City Plan update. The findings show that the impact of citizens’ voices depends on the volume and the tone of their demands. A higher demand with a stronger tone results in more policy changes. We also found an interesting and unexpected result: the city government in Palo Alto mirrors more or less the online crowd’s voice while citizen representatives rather filter than mirror the crowd’s will. While NLP methods show promise in making the analysis of the crowdsourced input more efficient, there are several issues. The accuracy rates should be improved. Furthermore, there is still considerable amount of human work in training the algorithm….(More)”

Essays on collective intelligence


Thesis by Yiftach Nagar: “This dissertation consists of three essays that advance our understanding of collective-intelligence: how it works, how it can be used, and how it can be augmented. I combine theoretical and empirical work, spanning qualitative inquiry, lab experiments, and design, exploring how novel ways of organizing, enabled by advancements in information technology, can help us work better, innovate, and solve complex problems.

The first essay offers a collective sensemaking model to explain structurational processes in online communities. I draw upon Weick’s model of sensemaking as committed-interpretation, which I ground in a qualitative inquiry into Wikipedia’s policy discussion pages, in attempt to explain how structuration emerges as interpretations are negotiated, and then committed through conversation. I argue that the wiki environment provides conditions that help commitments form, strengthen and diffuse, and that this, in turn, helps explain trends of stabilization observed in previous research.

In the second essay, we characterize a class of semi-structured prediction problems, where patterns are difficult to discern, data are difficult to quantify, and changes occur unexpectedly. Making correct predictions under these conditions can be extremely difficult, and is often associated with high stakes. We argue that in these settings, combining predictions from humans and models can outperform predictions made by groups of people, or computers. In laboratory experiments, we combined human and machine predictions, and find the combined predictions more accurate and more robust than predictions made by groups of only people or only machines.

The third essay addresses a critical bottleneck in open-innovation systems: reviewing and selecting the best submissions, in settings where submissions are complex intellectual artifacts whose evaluation require expertise. To aid expert reviewers, we offer a computational approach we developed and tested using data from the Climate CoLab – a large citizen science platform. Our models approximate expert decisions about the submissions with high accuracy, and their use can save review labor, and accelerate the review process….(More)”

Kenyans have launched a campaign on Twitter to fix their roads


Lily Kuo in Quartz: “Traffic is a problem in Nairobi. A short commute can last for hours during morning or evening rush hour. Buses and motorbikes cut in and out of traffic, worsening congestion. It’s estimated that road congestion costs Kenya’s capital as much as $570,000 a day in lost productivity.

One of the reasons for the city’s bad traffic is the state of the roads: drivers swerve to avoid potholes, bumps, or breaks in the roads, causing a buildup of traffic. To help, an online campaign called “What is a Road” is crowdsourcing the location and condition of potholes around the city in an effort to push local officials to fix them.

Nairobians tweet a photo and location of a pothole under the hashtag #whatisaroad. Those reports are uploaded to a map, used to analyze where the city’s potholes are located and track which ones have been fixed. “We decided to take a more data driven approach to track progress, promises made and projects delivered,” says Muthuri Kinyamu, one of the organizers.

A map showing crowdsourced reports of potholes across Nairobi. (What Is a Road)

The campaign is also about addressing some of the fundamental problems that hold cities like Nairobi back. In Nairobi, branded the center of “Silicon Savannah” in recent years, there’s often more focus on entrepreneurship and innovation than resolving simpler problems like the state of the roads. …The

The campaign, started in August, will continue until January. Chris Orwa, a data analyst helping with the project, says that they can’t take credit for all the repairs they have been documented around the city, but they have noticed that roads are being fixed within days of a #Whatisaroad report. The average response time for fixing a road reported by a What is a Road user is three days, according to Orwa….(More)”

Crowdsourcing campaign rectifies translation errors


Springwise: “A few months ago, Seoul City launched a month long campaign during September and October asking people to help correct poorly translated street signs. For examples, the sign pictured below has incorrectly abbreviated “Bridge,” which should be corrected to “Brg.” Those who find mistakes can submit them via email, including a picture of the sign and location details. The initiative is targeting signs in English, Chinese and Japanese in public places such as subway stations, bus stops and tourist information sites. Seoul city is offering prizes to those who successfully spot mistakes. Top spotters receive a rewards of KRW 200,000 (around USD 180).

450bridgeerror

The scheme comes as part of a drive to improve the experience of tourists travelling to the South Korean capital. According to a Seoul city official, “Multilingual signs are important standards to assess a country’s competitiveness in the tourism business. We want to make sure that foreigners in Seoul suffer no inconvenience.”…(More)”

Crowdsourcing and cellphone data could help guide urban revitalization


Science Magazine: “For years, researchers at the MIT Media Lab have been developing a database of images captured at regular distances around several major cities. The images are scored according to different visual characteristics — how safe the depicted areas look, how affluent, how lively, and the like….Adjusted for factors such as population density and distance from city centers, the correlation between perceived safety and visitation rates was strong, but it was particularly strong for women and people over 50. The correlation was negative for people under 30, which means that males in their 20s were actually more likely to visit neighborhoods generally perceived to be unsafe than to visit neighborhoods perceived to be safe.

In the same paper, the researchers also identified several visual features that are highly correlated with judgments that a particular area is safe or unsafe. Consequently, the work could help guide city planners in decisions about how to revitalize declining neighborhoods.,,,

Jacobs’ theory, Hidalgo says, is that neighborhoods in which residents can continuously keep track of street activity tend to be safer; a corollary is that buildings with street-facing windows tend to create a sense of safety, since they imply the possibility of surveillance. Newman’s theory is an elaboration on Jacobs’, suggesting that architectural features that demarcate public and private spaces, such as flights of stairs leading up to apartment entryways or archways separating plazas from the surrounding streets, foster the sense that crossing a threshold will bring on closer scrutiny….(More)”

The openness buzz in the knowledge economy: Towards taxonomy


Paper by Anne Lundgren in “Environment and Planning C: Government and Policy”: “In the networked information and knowledge-based economy and society, the notions of ‘open’ and ‘openness’ are used in a variety of contexts; open source, open access, open economy, open government, open innovation – just to name a few. This paper aims at discussing openness and developing a taxonomy that may be used to analyse the concept of openness. Are there different qualities of openness? How are these qualities interrelated? What analytical tools may be used to understand openness? In this paper four qualities of openness recurrent in literature and debate are explored: accessibility, transparency, participation and sharing. To further analyse openness new institutional theory as interpreted by Williamson (2000) is used, encompassing four different institutional levels; cultural embeddedness, institutional environment, governance structure and resource allocations. At what institutional levels is openness supported and/or constrained? Accessibility as a quality of openness seems to have a particularly strong relation to the other qualities of openness, whereas the notions of sharing and collaborative economics seem to be the most complex and contested quality of openness in the knowledge-based economy. This research contributes to academia, policy and governance, as handling of challenges with regard to openness vs. closure in different contexts, territorial, institutional and/or organizational, demand not only a better understanding of the concept, but also tools for analysis….(More)”

Crowdsourcing Gun Violence Research


Penn Engineering: “Gun violence is often described as an epidemic, but as visible and shocking as shooting incidents are, epidemiologists who study that particular source of mortality have a hard time tracking them. The Centers for Disease Control is prohibited by federal law from conducting gun violence research, so there is little in the way of centralized infrastructure to monitor where, how,when, why and to whom shootings occur.

Chris Callison-Burch, Aravind K.Joshi Term Assistant Professor in Computer and InformationScience, and graduate studentEllie Pavlick are working to solve this problem.

They have developed the GunViolence Database, which combines machine learning and crowdsourcing techniques to produce a national registry of shooting incidents. Callison-Burch and Pavlick’s algorithm scans thousands of articles from local newspaper and television stations,determines which are about gun violence, then asks everyday people to pullout vital statistics from those articles, compiling that information into a unified, open database.

For natural language processing experts like Callison-Burch and Pavlick, the most exciting prospect of this effort is that it is training computer systems to do this kind of analysis automatically. They recently presented their work on that front at Bloomberg’s Data for Good Exchange conference.

The Gun Violence Database project started in 2014, when it became the centerpiece of Callison-Burch’s “Crowdsourcing and Human Computation”class. There, Pavlick developed a series of homework assignments that challenged undergraduates to develop a classifier that could tell whether a given news article was about a shooting incident.

“It allowed us to teach the things we want students to learn about datascience and natural language processing, while giving them the motivation to do a project that could contribute to the greater good,” says Callison-Burch.

The articles students used to train their classifiers were sourced from “TheGun Report,” a daily blog from New York Times reporters that attempted to catalog shootings from around the country in the wake of the Sandy Hook massacre. Realizing that their algorithmic approach could be scaled up to automate what the Times’ reporters were attempting, the researchers began exploring how such a database could work. They consulted with DouglasWiebe, a Associate Professor of Epidemiology in Biostatistics andEpidemiology in the Perelman School of Medicine, to learn more about what kind of information public health researchers needed to better study gun violence on a societal scale.

From there, the researchers enlisted people to annotate the articles their classifier found, connecting with them through Mechanical Turk, Amazon’scrowdsourcing platform, and their own website, http://gun-violence.org/…(More)”

Innovando para una mejor gestión: La contribución de los laboratorios de innovación pública


Paper by Acevedo, Sebastián; and Dassen, Nicolás for IDB: “Los cambios tecnológicos, económicos y sociales de los últimos años exigen gobiernos capaces de adaptarse a nuevos desafíos y a las crecientes demandas de la ciudadanía. En muchos países y en distintos niveles de gobierno, esto ha llevado a la creación de laboratorios de innovación, unidades cuyo objetivo es promover de diversos modos la innovación en el sector público. En este trabajo se analizan los roles y desafíos de los laboratorios latinoamericanos, contrastándolos con buenas prácticas y características que la literatura ha asociado a mayores niveles de innovación en el sector público y en otras organizaciones.

A partir de una encuesta a directores de laboratorios y dos estudios de casos, se describe el panorama de los laboratorios latinoamericanos y se discuten sus desafíos para: i) trabajar sobre temas centrales de la gestión, ii) conseguir la adopción de innovaciones y el escalamiento de las mismas y iii) asegurar la sostenibilidad de estas.

En particular, hay cuatro factores clave para su desempeño en esos aspectos: dos factores político-institucionales –el apoyo del liderazgo y las redes de política– y dos factores metodológicos –la adecuación técnica de las innovaciones y la construcción de un significado compartido sobre ellas–.

Además, se identifican dos diferencias principales entre la mayoría de los laboratorios relevados aquí y la experiencia de otras regiones, descripta por la literatura existente: un foco más intenso en temas de gobierno abierto y menos actividades para el testeo controlado de innovaciones, como experimentos aleatorios y evaluaciones de impacto. Finalmente, se presentan conclusiones y recomendaciones para la consolidación de los laboratorios como canales efectivos para gestionar innovaciones, manejando los riesgos inherentes, y modernizar la gestión… (More Español)