Wikipedia’s not as biased as you might think


Ananya Bhattacharya in Quartz: “The internet is as open as people make it. Often, people limit their Facebook and Twitter circles to likeminded people and only follow certain subreddits, blogs, and news sites, creating an echo chamber of sorts. In a sea of biased content, Wikipedia is one of the few online outlets that strives for neutrality. After 15 years in operation, it’s starting to see results

Researchers at Harvard Business School evaluated almost 4,000 articles in Wikipedia’s online database against the same entries in Encyclopedia Brittanica to compare their biases. They focused on English-language articles about US politics, especially controversial topics, that appeared in both outlets in 2012.

“That is just not a recipe for coming to a conclusion,” Shane Greenstein, one of the study’s authors, said in an interview. “We were surprised that Wikipedia had not failed, had not fallen apart in the last several years.”

Greenstein and his co-author Feng Zhu categorized each article as “blue” or “red.” Drawing from research in political science, they identified terms that are idiosyncratic to each party. For instance, political scientists have identified that Democrats were more likely to use phrases such as “war in Iraq,” “civil rights,” and “trade deficit,” while Republicans used phrases such as “economic growth,” “illegal immigration,” and “border security.”…

“In comparison to expert-based knowledge, collective intelligence does not aggravate the bias of online content when articles are substantially revised,” the authors wrote in the paper. “This is consistent with a best-case scenario in which contributors with different ideologies appear to engage in fruitful online conversations with each other, in contrast to findings from offline settings.”

More surprisingly, the authors found that the 2.8 million registered volunteer editors who were reviewing the articles also became less biased over time. “You can ask questions like ‘do editors with red tendencies tend to go to red articles or blue articles?’” Greenstein said. “You find a prevalence of opposites attract, and that was striking.” The researchers even identified the political stance for a number of anonymous editors based on their IP locations, and the trend held steadfast….(More)”

Learning Privacy Expectations by Crowdsourcing Contextual Informational Norms


 at Freedom to Tinker: “The advent of social apps, smart phones and ubiquitous computing has brought a great transformation to our day-to-day life. The incredible pace with which the new and disruptive services continue to emerge challenges our perception of privacy. To keep apace with this rapidly evolving cyber reality, we need to devise agile methods and frameworks for developing privacy-preserving systems that align with evolving user’s privacy expectations.

Previous efforts have tackled this with the assumption that privacy norms are provided through existing sources such law, privacy regulations and legal precedents. They have focused on formally expressing privacy norms and devising a corresponding logic to enable automatic inconsistency checks and efficient enforcement of the logic.

However, because many of the existing regulations and privacy handbooks were enacted well before the Internet revolution took place, they often lag behind and do not adequately reflect the application of logic in modern systems. For example, the Family Rights and Privacy Act (FERPA) was enacted in 1974, long before Facebook, Google and many other online applications were used in an educational context. More recent legislation faces similar challenges as novel services introduce new ways to exchange information, and consequently shape new, unconsidered information flows that can change our collective perception of privacy.

Crowdsourcing Contextual Privacy Norms

Armed with the theory of Contextual Integrity (CI) in our work, we are exploring ways to uncover societal norms by leveraging the advances in crowdsourcing technology.

In our recent paper, we present the methodology that we believe can be used to extract a societal notion of privacy expectations. The results can be used to fine tune the existing privacy guidelines as well as get a better perspective on the users’ expectations of privacy.

CI defines privacy as collection of norms (privacy rules) that reflect appropriate information flows between different actors. Norms capture who shares what, with whom, in what role, and under which conditions. For example, while you are comfortable sharing your medical information with your doctor, you might be less inclined to do so with your colleagues.

We use CI as a proxy to reason about privacy in the digital world and a gateway to understanding how people perceive privacy in a systematic way. Crowdsourcing is a great tool for this method. We are able to ask hundreds of people how they feel about a particular information flow, and then we can capture their input and map it directly onto the CI parameters. We used a simple template to write Yes-or-No questions to ask our crowdsourcing participants:

“Is it acceptable for the [sender] to share the [subject’s] [attribute] with [recipient] [transmission principle]?”

For example:

“Is it acceptable for the student’s professor to share the student’s record of attendance with the department chair if the student is performing poorly? ”

In our experiments, we leveraged Amazon’s Mechanical Turk (AMT) to ask 450 turkers over 1400 such questions. Each question represents a specific contextual information flow that users can approve, disapprove or mark under the Doesn’t Make Sense category; the last category could be used when 1) the sender is unlikely to have the information, 2) the receiver would already have the information, or 3) the question is ambiguous….(More)”

Predicting judicial decisions of the European Court of Human Rights: a Natural Language Processing perspective


 et al at Peer J. Computer Science: “Recent advances in Natural Language Processing and Machine Learning provide us with the tools to build predictive models that can be used to unveil patterns driving judicial decisions. This can be useful, for both lawyers and judges, as an assisting tool to rapidly identify cases and extract patterns which lead to certain decisions. This paper presents the first systematic study on predicting the outcome of cases tried by the European Court of Human Rights based solely on textual content. We formulate a binary classification task where the input of our classifiers is the textual content extracted from a case and the target output is the actual judgment as to whether there has been a violation of an article of the convention of human rights. Textual information is represented using contiguous word sequences, i.e., N-grams, and topics. Our models can predict the court’s decisions with a strong accuracy (79% on average). Our empirical analysis indicates that the formal facts of a case are the most important predictive factor. This is consistent with the theory of legal realism suggesting that judicial decision-making is significantly affected by the stimulus of the facts. We also observe that the topical content of a case is another important feature in this classification task and explore this relationship further by conducting a qualitative analysis….(More)”

Civic Crowd Analytics: Making sense of crowdsourced civic input with big data tools


Paper by  that: “… examines the impact of crowdsourcing on a policymaking process by using a novel data analytics tool called Civic CrowdAnalytics, applying Natural Language Processing (NLP) methods such as concept extraction, word association and sentiment analysis. By drawing on data from a crowdsourced urban planning process in the City of Palo Alto in California, we examine the influence of civic input on the city’s Comprehensive City Plan update. The findings show that the impact of citizens’ voices depends on the volume and the tone of their demands. A higher demand with a stronger tone results in more policy changes. We also found an interesting and unexpected result: the city government in Palo Alto mirrors more or less the online crowd’s voice while citizen representatives rather filter than mirror the crowd’s will. While NLP methods show promise in making the analysis of the crowdsourced input more efficient, there are several issues. The accuracy rates should be improved. Furthermore, there is still considerable amount of human work in training the algorithm….(More)”

Essays on collective intelligence


Thesis by Yiftach Nagar: “This dissertation consists of three essays that advance our understanding of collective-intelligence: how it works, how it can be used, and how it can be augmented. I combine theoretical and empirical work, spanning qualitative inquiry, lab experiments, and design, exploring how novel ways of organizing, enabled by advancements in information technology, can help us work better, innovate, and solve complex problems.

The first essay offers a collective sensemaking model to explain structurational processes in online communities. I draw upon Weick’s model of sensemaking as committed-interpretation, which I ground in a qualitative inquiry into Wikipedia’s policy discussion pages, in attempt to explain how structuration emerges as interpretations are negotiated, and then committed through conversation. I argue that the wiki environment provides conditions that help commitments form, strengthen and diffuse, and that this, in turn, helps explain trends of stabilization observed in previous research.

In the second essay, we characterize a class of semi-structured prediction problems, where patterns are difficult to discern, data are difficult to quantify, and changes occur unexpectedly. Making correct predictions under these conditions can be extremely difficult, and is often associated with high stakes. We argue that in these settings, combining predictions from humans and models can outperform predictions made by groups of people, or computers. In laboratory experiments, we combined human and machine predictions, and find the combined predictions more accurate and more robust than predictions made by groups of only people or only machines.

The third essay addresses a critical bottleneck in open-innovation systems: reviewing and selecting the best submissions, in settings where submissions are complex intellectual artifacts whose evaluation require expertise. To aid expert reviewers, we offer a computational approach we developed and tested using data from the Climate CoLab – a large citizen science platform. Our models approximate expert decisions about the submissions with high accuracy, and their use can save review labor, and accelerate the review process….(More)”

Show, Don’t Tell


on Alberto Cairo, Power BI & the rise of data journalism for Microsoft Stories: “From the election of Pope Francis to the passing of Nelson Mandela to Miley Cyrus’ MTV #twerk heard ’round the world, 2013 was full of big headlines and viral hits. Yet The New York Times’ top story of the year was the humble result of a vocabulary survey of 350,000 randomly selected Americans conducted by a then-intern at the paper.

Instead of presenting these findings in a written article, “How Y’all, Youse and You Guys Talk” achieved breakout success as an interactive data visualization. It asked readers 25 questions such as “How would you address a group of two or more people?” or “How do you pronounce ‘aunt’?” and then heat-mapped their responses to the most similar regional dialect in the U.S. The interactivity and colorful visuals transmuted survey data into a fun, insightful tour through the contours of contemporary AmericanEnglish.

Visualization no longer just complements a written story. It is the story. In our increasingly data-driven world, visualization is becoming an essential tool for journalists from national papers to blogs with a staff of one.

I recently spent two days discussing the state of data journalism with Alberto Cairo, the Knight Chair of Visual Journalism at the School of Communication at the University of Miami. While he stressed the importance of data visualization for efficient communication and audience engagement, Cairo argued that “Above all else, visualizations — when done right — are a vehicle of clarification and truth.”…(More)”

Supporting Collaborative Political Decision Making: An Interactive Policy Process Visualization System


Paper by Tobias Ruppert et al: “The process of political decision making is often complex and tedious. The policy process consists of multiple steps, most of them are highly iterative. In addition, different stakeholder groups are involved in political decision making and contribute to the process. A series of textual documents accompanies the process. Examples are official documents, discussions, scientific reports, external reviews, newspaper articles, or economic white papers. Experts from the political domain report that this plethora of textual documents often exceeds their ability to keep track of the entire policy process. We present PolicyLine, a visualization system that supports different stakeholder groups in overview-and-detail tasks for large sets of textual documents in the political decision making process. In a longitudinal design study conducted together with domain experts in political decision making, we identified missing analytical functionality on the basis of a problem and domain characterization. In an iterative design phase, we created PolicyLine in close collaboration with the domain experts. Finally, we present the results of three evaluation rounds, and reflect on our collaborative visualization system….(More)”

‘Good Nudge Lullaby’: Choice Architecture and Default Bias Reinforcement


Thomas De Haan and Jona Linde in The Economic Journal: “Because people disproportionally follow defaults, both libertarian paternalists and marketers try to present options they want to promote as the default. However, setting certain defaults and thereby influencing current decisions, may also affect choices in later, similar decisions. In this paper we explore experimentally whether the default bias can be reinforced by providing good defaults. We show that people who faced better defaults in the past are more likely to follow defaults than people who faced random defaults, hurting their later performance. This malleability of the default bias explains certain marketing practices and serves as an insight for libertarian paternalists….(More)”

Open parliament policy applied to the Brazilian Chamber of Deputies


Paper by  &   in The Journal of Legislative Studies:”…analyse the implementation of an open parliament policy that is taking place at the Chamber of Deputies, in accordance with the guidelines of the Open Government Partnership international programme (OGP), regarding the action plan of the Opening Parliament Work Group in particular, one of the subgroups of OGP. The authors will evaluate two blocks of initiatives for open parliaments executed by the Chamber in the last few years, that is, digital participation in the legislative process and Transparency 2.0, in order to observe their impasses and results obtained until now. In the first part the authors will study the e-Democracy portal and in the second part the authors will focus on open data, collaborative activities to use those data (hackathons) and the creation of the Hacker Lab, a permanent space dedicated to open parliament practices. The analysis considers the initiatives that the authors evaluated as part of the transformative and arena profiles of the Brazilian Parliament, according to Polsby’s classification, with exclusive characteristics…. (More)”

See also Hacking Parliament

Crowdsourcing and cellphone data could help guide urban revitalization


Science Magazine: “For years, researchers at the MIT Media Lab have been developing a database of images captured at regular distances around several major cities. The images are scored according to different visual characteristics — how safe the depicted areas look, how affluent, how lively, and the like….Adjusted for factors such as population density and distance from city centers, the correlation between perceived safety and visitation rates was strong, but it was particularly strong for women and people over 50. The correlation was negative for people under 30, which means that males in their 20s were actually more likely to visit neighborhoods generally perceived to be unsafe than to visit neighborhoods perceived to be safe.

In the same paper, the researchers also identified several visual features that are highly correlated with judgments that a particular area is safe or unsafe. Consequently, the work could help guide city planners in decisions about how to revitalize declining neighborhoods.,,,

Jacobs’ theory, Hidalgo says, is that neighborhoods in which residents can continuously keep track of street activity tend to be safer; a corollary is that buildings with street-facing windows tend to create a sense of safety, since they imply the possibility of surveillance. Newman’s theory is an elaboration on Jacobs’, suggesting that architectural features that demarcate public and private spaces, such as flights of stairs leading up to apartment entryways or archways separating plazas from the surrounding streets, foster the sense that crossing a threshold will bring on closer scrutiny….(More)”