Learning Privacy Expectations by Crowdsourcing Contextual Informational Norms


 at Freedom to Tinker: “The advent of social apps, smart phones and ubiquitous computing has brought a great transformation to our day-to-day life. The incredible pace with which the new and disruptive services continue to emerge challenges our perception of privacy. To keep apace with this rapidly evolving cyber reality, we need to devise agile methods and frameworks for developing privacy-preserving systems that align with evolving user’s privacy expectations.

Previous efforts have tackled this with the assumption that privacy norms are provided through existing sources such law, privacy regulations and legal precedents. They have focused on formally expressing privacy norms and devising a corresponding logic to enable automatic inconsistency checks and efficient enforcement of the logic.

However, because many of the existing regulations and privacy handbooks were enacted well before the Internet revolution took place, they often lag behind and do not adequately reflect the application of logic in modern systems. For example, the Family Rights and Privacy Act (FERPA) was enacted in 1974, long before Facebook, Google and many other online applications were used in an educational context. More recent legislation faces similar challenges as novel services introduce new ways to exchange information, and consequently shape new, unconsidered information flows that can change our collective perception of privacy.

Crowdsourcing Contextual Privacy Norms

Armed with the theory of Contextual Integrity (CI) in our work, we are exploring ways to uncover societal norms by leveraging the advances in crowdsourcing technology.

In our recent paper, we present the methodology that we believe can be used to extract a societal notion of privacy expectations. The results can be used to fine tune the existing privacy guidelines as well as get a better perspective on the users’ expectations of privacy.

CI defines privacy as collection of norms (privacy rules) that reflect appropriate information flows between different actors. Norms capture who shares what, with whom, in what role, and under which conditions. For example, while you are comfortable sharing your medical information with your doctor, you might be less inclined to do so with your colleagues.

We use CI as a proxy to reason about privacy in the digital world and a gateway to understanding how people perceive privacy in a systematic way. Crowdsourcing is a great tool for this method. We are able to ask hundreds of people how they feel about a particular information flow, and then we can capture their input and map it directly onto the CI parameters. We used a simple template to write Yes-or-No questions to ask our crowdsourcing participants:

“Is it acceptable for the [sender] to share the [subject’s] [attribute] with [recipient] [transmission principle]?”

For example:

“Is it acceptable for the student’s professor to share the student’s record of attendance with the department chair if the student is performing poorly? ”

In our experiments, we leveraged Amazon’s Mechanical Turk (AMT) to ask 450 turkers over 1400 such questions. Each question represents a specific contextual information flow that users can approve, disapprove or mark under the Doesn’t Make Sense category; the last category could be used when 1) the sender is unlikely to have the information, 2) the receiver would already have the information, or 3) the question is ambiguous….(More)”

Tackling Corruption with People-Powered Data


Sandra Prüfer at Mastercard Center for Inclusive Growth: “Informal fees plague India’s “free” maternal health services. In Nigeria, village households don’t receive the clean cookstoves their government paid for. Around the world, corruption – coupled with the inability to find and share information about it – stymies development in low-income communities.

Now, digital transparency platforms – supplemented with features illiterate and rural populations can use – make it possible for traditionally excluded groups to make their voices heard and access tools they need to grow.

Mapping Corruption Hot Spots in India

One of the problems surrounding access to information is the lack of reliable information in the first place: a popular method to create knowledge is crowdsourcing and enlisting the public to monitor and report on certain issues.

The Mera Swasthya Meri Aawaz platform, which means “Our Health, Our Voice”, is an interactive map in Uttar Pradesh launched by the Indian non-profit organization SAHAYOG. It enables women to anonymously report illicit fees charged for services at maternal health clinics using their mobile phones.

To reduce infant mortality and deaths in childbirth, the Indian government provides free prenatal care and cash incentives to use maternal health clinics, but many charge illegal fees anyway – cutting mothers off from lifesaving healthcare and inhibiting communities’ growth. An estimated 45,000 women in India died in 2015 from complications of pregnancy and childbirth – one of the highest rates of any country in the world; low-income women are disproportionately affected….“Documenting illegal payment demands in real time and aggregating the data online increased governmental willingness to listen,” Sandhya says. “Because the data is linked to technology, its authenticity is not questioned.”

Following the Money in Nigeria

In Nigeria, Connected Development (CODE) also champions open data to combat corruption in infrastructure building, health and education projects. Its mission is to improve access to information and empower local communities to share data that can expose financial irregularities. Since 2012, the Abuja-based watchdog group has investigated twelve capital projects, successfully pressuring the government to release funds including $5.3 million to treat 1,500 lead-poisoned children.

“People activate us: if they know about any project that is supposed to be in their community, but isn’t, they tell us they want us to follow the money – and we’ll take it from there,” says CODE co-founder Oludotun Babayemi.

Users alert the watchdog group directly through its webpage, which publishes open-source data about development projects that are supposed to be happening, based on reports from freedom of information requests to Nigeria’s federal minister of environment, World Bank data and government press releases.

Last year, as part of their #WomenCookstoves reporting campaign, CODE revealed an apparent scam by tracking a $49.8 million government project that was supposed to purchase 750,000 clean cookstoves for rural women. Smoke inhalation diseases disproportionately affect women who spend time cooking over wood fires; according to the World Health Organization, almost 100,000 people die yearly in Nigeria from inhaling wood smoke, the country’s third biggest killer after malaria and AIDS.

“After three months, we found out that only 15 percent of the $48 million was given to the contractor – meaning there were only 45,000 cook stoves out of 750,000 in the county,” Babayemi says….(More)”

Civic Crowd Analytics: Making sense of crowdsourced civic input with big data tools


Paper by  that: “… examines the impact of crowdsourcing on a policymaking process by using a novel data analytics tool called Civic CrowdAnalytics, applying Natural Language Processing (NLP) methods such as concept extraction, word association and sentiment analysis. By drawing on data from a crowdsourced urban planning process in the City of Palo Alto in California, we examine the influence of civic input on the city’s Comprehensive City Plan update. The findings show that the impact of citizens’ voices depends on the volume and the tone of their demands. A higher demand with a stronger tone results in more policy changes. We also found an interesting and unexpected result: the city government in Palo Alto mirrors more or less the online crowd’s voice while citizen representatives rather filter than mirror the crowd’s will. While NLP methods show promise in making the analysis of the crowdsourced input more efficient, there are several issues. The accuracy rates should be improved. Furthermore, there is still considerable amount of human work in training the algorithm….(More)”

Kenyans have launched a campaign on Twitter to fix their roads


Lily Kuo in Quartz: “Traffic is a problem in Nairobi. A short commute can last for hours during morning or evening rush hour. Buses and motorbikes cut in and out of traffic, worsening congestion. It’s estimated that road congestion costs Kenya’s capital as much as $570,000 a day in lost productivity.

One of the reasons for the city’s bad traffic is the state of the roads: drivers swerve to avoid potholes, bumps, or breaks in the roads, causing a buildup of traffic. To help, an online campaign called “What is a Road” is crowdsourcing the location and condition of potholes around the city in an effort to push local officials to fix them.

Nairobians tweet a photo and location of a pothole under the hashtag #whatisaroad. Those reports are uploaded to a map, used to analyze where the city’s potholes are located and track which ones have been fixed. “We decided to take a more data driven approach to track progress, promises made and projects delivered,” says Muthuri Kinyamu, one of the organizers.

A map showing crowdsourced reports of potholes across Nairobi. (What Is a Road)

The campaign is also about addressing some of the fundamental problems that hold cities like Nairobi back. In Nairobi, branded the center of “Silicon Savannah” in recent years, there’s often more focus on entrepreneurship and innovation than resolving simpler problems like the state of the roads. …The

The campaign, started in August, will continue until January. Chris Orwa, a data analyst helping with the project, says that they can’t take credit for all the repairs they have been documented around the city, but they have noticed that roads are being fixed within days of a #Whatisaroad report. The average response time for fixing a road reported by a What is a Road user is three days, according to Orwa….(More)”

Crowdsourcing campaign rectifies translation errors


Springwise: “A few months ago, Seoul City launched a month long campaign during September and October asking people to help correct poorly translated street signs. For examples, the sign pictured below has incorrectly abbreviated “Bridge,” which should be corrected to “Brg.” Those who find mistakes can submit them via email, including a picture of the sign and location details. The initiative is targeting signs in English, Chinese and Japanese in public places such as subway stations, bus stops and tourist information sites. Seoul city is offering prizes to those who successfully spot mistakes. Top spotters receive a rewards of KRW 200,000 (around USD 180).

450bridgeerror

The scheme comes as part of a drive to improve the experience of tourists travelling to the South Korean capital. According to a Seoul city official, “Multilingual signs are important standards to assess a country’s competitiveness in the tourism business. We want to make sure that foreigners in Seoul suffer no inconvenience.”…(More)”

Crowdsourcing and cellphone data could help guide urban revitalization


Science Magazine: “For years, researchers at the MIT Media Lab have been developing a database of images captured at regular distances around several major cities. The images are scored according to different visual characteristics — how safe the depicted areas look, how affluent, how lively, and the like….Adjusted for factors such as population density and distance from city centers, the correlation between perceived safety and visitation rates was strong, but it was particularly strong for women and people over 50. The correlation was negative for people under 30, which means that males in their 20s were actually more likely to visit neighborhoods generally perceived to be unsafe than to visit neighborhoods perceived to be safe.

In the same paper, the researchers also identified several visual features that are highly correlated with judgments that a particular area is safe or unsafe. Consequently, the work could help guide city planners in decisions about how to revitalize declining neighborhoods.,,,

Jacobs’ theory, Hidalgo says, is that neighborhoods in which residents can continuously keep track of street activity tend to be safer; a corollary is that buildings with street-facing windows tend to create a sense of safety, since they imply the possibility of surveillance. Newman’s theory is an elaboration on Jacobs’, suggesting that architectural features that demarcate public and private spaces, such as flights of stairs leading up to apartment entryways or archways separating plazas from the surrounding streets, foster the sense that crossing a threshold will bring on closer scrutiny….(More)”

Crowdsourcing Gun Violence Research


Penn Engineering: “Gun violence is often described as an epidemic, but as visible and shocking as shooting incidents are, epidemiologists who study that particular source of mortality have a hard time tracking them. The Centers for Disease Control is prohibited by federal law from conducting gun violence research, so there is little in the way of centralized infrastructure to monitor where, how,when, why and to whom shootings occur.

Chris Callison-Burch, Aravind K.Joshi Term Assistant Professor in Computer and InformationScience, and graduate studentEllie Pavlick are working to solve this problem.

They have developed the GunViolence Database, which combines machine learning and crowdsourcing techniques to produce a national registry of shooting incidents. Callison-Burch and Pavlick’s algorithm scans thousands of articles from local newspaper and television stations,determines which are about gun violence, then asks everyday people to pullout vital statistics from those articles, compiling that information into a unified, open database.

For natural language processing experts like Callison-Burch and Pavlick, the most exciting prospect of this effort is that it is training computer systems to do this kind of analysis automatically. They recently presented their work on that front at Bloomberg’s Data for Good Exchange conference.

The Gun Violence Database project started in 2014, when it became the centerpiece of Callison-Burch’s “Crowdsourcing and Human Computation”class. There, Pavlick developed a series of homework assignments that challenged undergraduates to develop a classifier that could tell whether a given news article was about a shooting incident.

“It allowed us to teach the things we want students to learn about datascience and natural language processing, while giving them the motivation to do a project that could contribute to the greater good,” says Callison-Burch.

The articles students used to train their classifiers were sourced from “TheGun Report,” a daily blog from New York Times reporters that attempted to catalog shootings from around the country in the wake of the Sandy Hook massacre. Realizing that their algorithmic approach could be scaled up to automate what the Times’ reporters were attempting, the researchers began exploring how such a database could work. They consulted with DouglasWiebe, a Associate Professor of Epidemiology in Biostatistics andEpidemiology in the Perelman School of Medicine, to learn more about what kind of information public health researchers needed to better study gun violence on a societal scale.

From there, the researchers enlisted people to annotate the articles their classifier found, connecting with them through Mechanical Turk, Amazon’scrowdsourcing platform, and their own website, http://gun-violence.org/…(More)”

Crowdsourcing investigative journalism


Convoca in Peru: “…collaborative effort is the essence of Convoca. We are a team of journalists and programmers who work with professionals from different disciplines and generations to expose facts that are hidden by networks of power and affect the life of citizens. We bet on the work in partnership to publish findings of high impact from Peru, where the Amazon survives in almost 60% of the country, in the middle of oil exploitation, minerals and criminal activities such as logging, illegal mining and human trafficking. Fifty percent of social conflicts have as epicenter extractives areas of natural resources where the population and communities with the highest poverty rates live.

Over one year and seven months, Convoca has uncovered facts of public relevance such as the patterns of corruption and secrecy networking with journalists from Latin America and the world. The series of reports with the BRIO platform revealed the cost overruns of highways and public works in Latin American countries in the hands of Brazilian companies financed by the National Bank of Economic and Social Development (BNDES), nowadays investigated in the most notorious corruption scandal in the region, ‘Lava Jato’. This research won the 2016 Journalistic Excellence Award granted by the Inter American Press Association (SIP). On a global scale, we dove into 11 million and a half files of the ‘Panama Papers’ with more than a hundred media and organizations led by the International Consortium of Investigative Journalists (ICIJ), which allowed to undress the world of tax havens where companies and characters hide their fortune.

Our work on extractive industries ‘Excesses unpunished’ won the most important award of data journalism in the world, the Data Journalism Awards 2016, and is a finalist of the Gabriel Garcia Marquez Award which recognized the best of journalism in Latin America. We invite you to be the voice of this effort to keep publishing new reports that allow citizens to make better decisions about their destinies and compel groups of power to come clean about their activities and fulfill their commitments. So join ConBoca: The Power of Citizens Call, our first fundraising campaign alongside our readers. We believe that journalism is a public service….(More)”

Remote Data Collection: Three Ways to Rethink How You Collect Data in the Field


Magpi : “As mobile devices have gotten less and less expensive – and as millions worldwide have climbed out of poverty – it’s become quite common to see a mobile phone in every person’s hand, or at least in every family, and this means that we can utilize an additional approach to data collection that were simply not possible before….

In our Remote Data Collection Guide, we discuss these new technologies and the:

  • Key benefits of remote data collection in each of three different situations.
  • The direct impact of remote data collection on reducing the cost of your efforts.
  • How to start the process of choosing the right option for your needs….(More)”

When is the crowd wise or can the people ever be trusted?


Julie Simon at NESTA: “Democratic theory has tended to take a pretty dim view of people and their ability to make decisions. Many political philosophers believe that people are at best uninformed and at worst, ignorant and incompetent.  This view is a common justification for our system of representative democracy – people can’t be trusted to make decisions so this responsibility should fall to those who have the expertise, knowledge or intelligence to do so.

Think back to what Edmund Burke said on the subject in his speech to the Electors of Bristol in 1774, “Your representative owes you, not his industry only, but his judgement; and he betrays, instead of serving you, if he sacrifices it to your opinion.” He reminds us that “government and legislation are matters of reason and judgement, and not of inclination”. Others, like the journalist Charles Mackay, whose book on economic bubbles and crashes,Extraordinary Popular Delusions and the Madness of Crowds, had an even more damning view of the crowd’s capacity to exercise either judgement or reason.

The thing is, if you believe that ‘the crowd’ isn’t wise then there isn’t much point in encouraging participation – these sorts of activities can only ever be tokenistic or a way of legitimising the decisions taken by others.

There are then those political philosophers who effectively argue that citizens’ incompetence doesn’t matter. They argue that the aggregation of views – through voting – eliminates ‘noise’ which enables you to arrive at optimal decisions. The larger the group, the better its decisions will be.  The corollary of this view is that political decision making should involve mass participation and regular referenda – something akin to the Swiss model.

Another standpoint is to say that there is wisdom within crowds – it’s just that it’s domain specific, unevenly distributed and quite hard to transfer. This idea was put forward by Friedrich Hayek in his seminal 1945 essay on The Use of Knowledge in Society in which he argues that:

“…the knowledge of the circumstances of which we must make use never exists in concentrated or integrated form, but solely as the dispersed bits of incomplete and frequently contradictory knowledge which all the separate individuals possess. The economic problem of society is thus not merely a problem of how to allocate ‘given’ resources……it is a problem of the utilization of knowledge not given to anyone in its totality”.

Hayek argued that it was for this reason that central planning couldn’t work since no central planner could ever aggregate all the knowledge distributed across society to make good decisions.

More recently, Eric Von Hippel built on these foundations by introducing the concept of information stickiness; information is ‘sticky’ if it is costly to move from one place to another. One type of information that is frequently ‘sticky’ is information about users’ needs and preferences.[1] This helps to account for why manufacturers tend to develop innovations which are incremental – meeting already identified needs – and why so many organisations are engaging users in their innovation processes:  if knowledge about needs and tools for developing new solutions can be co-located in the same place (i.e. the user) then the cost of transferring sticky information is eliminated…..

There is growing evidence on how crowdsourcing can be used by governments to solve clearly defined technical, scientific or informational problems. Evidently there are significant needs and opportunities for governments to better engage citizens to solve these types of problems.

There’s also a growing body of evidence on how digital tools can be used to support and promote collective intelligence….

So, the critical task for public officials is to have greater clarity over the purpose of engagement –  in order to better understand which methods of engagement should be used and what kinds of  groups should be targeted.

At the same time, the central question for researchers is when and how to tap into collective intelligence: what tools and approaches can be used when we’re looking at arenas which are often sites of contestation? Should this input be limited to providing information and expertise to be used by public officials or representatives, or should these distributed experts exercise some decision making power too? And when we’re dealing with value based judgements when should we rely on large scale voting as a mechanism for making ‘smarter’ decisions and when are deliberative forms of engagement more appropriate? These are all issues we’re exploring as part of our ongoing programme of work on democratic innovations….(More)”