Why Some Teams Are Smarter Than Others


Article by Anita Woolley,  Thomas W. Malone and Christopher Chabris in The New York Times: “…Psychologists have known for a century that individuals vary in their cognitive ability. But are some groups, like some people, reliably smarter than others?

Working with several colleagues and students, we set out to answer that question. In our first two studies, which we published with Alex Pentland and Nada Hashmi of M.I.T. in 2010 in the journal Science, we grouped 697 volunteer participants into teams of two to five members….

Instead, the smartest teams were distinguished by three characteristics.

First, their members contributed more equally to the team’s discussions, rather than letting one or two people dominate the group.

Second, their members scored higher on a test called Reading the Mind in the Eyes, which measures how well people can read complex emotional states from images of faces with only the eyes visible.

Finally, teams with more women outperformed teams with more men. Indeed, it appeared that it was not “diversity” (having equal numbers of men and women) that mattered for a team’s intelligence, but simply having more women. This last effect, however, was partly explained by the fact that women, on average, were better at “mindreading” than men.

In a new study that we published with David Engel and Lisa X. Jing of M.I.T…(More)”

Coop’s Citizen Sci Scoop: Try it, you might like it


Response by Caren Cooper at PLOS: “Margaret Mead, the world-famous anthropologist said, “never doubt that a small group of thoughtful, committed citizens can change the world; indeed, it’s the only thing that ever has.”
The sentiment rings true for citizen science.
Yet, recent news in the citizen science world has been headlined “Most participants in citizen science projects give up almost immediately.” This was based on a study of participation in seven different projects within the crowdsourcing hub called Zooniverse. Most participants tried a project once, very briefly, and never returned.
What’s unusual about Zooniverse projects is not the high turnover of quitters. Rather, it’s unusual that even early quitters do some important work. That’s a cleverly designed project. An ethical principle of Zooniverse is to not waste people’s time. The crowdsourcing tasks are pivotal to advancing research. They cannot be accomplished by computer algorithms or machines. They require crowds of people, each chipping in a tiny bit. What is remarkable is that the quitters matter at all….
An Internet rule of thumb in that only 1% (or less) of users add new content to sites like Wikipedia. Citizen science appears to operate on this dynamic, except instead of a core group adding existing knowledge for the crowd to use, a core group is involved in making new knowledge for the crowd to use….
In citizen science, a crowd can be four or a crowd can be hundreds of thousands. A citizen scientist is not a person who will participate in any project. They are individuals – gamers, birders, stargazers, gardeners, weather bugs, hikers, naturalists, and more – with particular interests and motivations.
As my grandfather said, “Try it, you might like it.” It’s fabulous that millions are trying it. Sooner or later, when participants and projects find one another, a good match translates into a job well done….(More)”.

Motivations for sustained participation in crowdsourcing: The role of talk in a citizen science case study


Paper by CB. Jackson, C. Østerlund, G. Mugar, KDV. Hassman for the Proceedings of the Forty-eighth Hawai’i International Conference on System Science (HICSS-48): “The paper explores the motivations of volunteers in a large crowdsourcing project and contributes to our understanding of the motivational factors that lead to deeper engagement beyond initial participation. Drawing on the theory of legitimate peripheral participation (LPP) and the literature on motivation in crowdsourcing, we analyze interview and trace data from a large citizen science project. The analyses identify ways in which the technical features of the projects may serve as motivational factors leading participants towards sustained participation. The results suggest volunteers first engage in activities to support knowledge acquisition and later share knowledge with other volunteers and finally increase participation in Talk through a punctuated process of role discovery…(More)”

.

New open access journal will publish across all disciplines


Claudia Lupp at Elsevier: “When it comes to publishing, there is no one-size-fits-all approach or format. In years gone by, getting published was largely limited to presenting research in a specialized field. But with the vast increase in research output – and more and more researchers collaborating across borders and disciplines – things are changing rapidly. While there is still a vital role for the traditional field-specific journal, researchers want more choices of where and how to publish their research. Journals that feature sound research across all disciplines significantly broaden those much-coveted publishing options.
To expand and refine that concept even further, Elsevier is preparing to collaborate with the research community to develop an open access journal covering all disciplines on a platform that will enable continual experimentation and innovation. Plans include improving the end-to-end publishing process and integrating our smart technologies to improve search and discovery.
The new journal will offer researchers a streamlined, simple and intuitive publishing platform that connects their research to the relevant communities. Articles will be assessed for sound research rather than their scope or impact….
We are building an online interface that provides authors with a step-by-step, quick and intuitive submission process. As part of a transparent publishing process, we will alert authors on the progress of their submitted papers at each stage.
To streamline the editorial process, we plan to use assets and technology developed by Elsevier. For example, by using data from Scopus and the technology behind it, we can quickly match papers to relevant editors and reviewers, significantly shortening peer review times….
Once papers have been reviewed, edited and published, the goal is to bring this vast amount of information to readers and help them make sense of it for their own research. Every reputable journal aims to publish papers that are accurate and disseminate them to the right reader to support the advancement of science. But how do you do that effectively when there are more researchers and research papers than ever before?… (More)”

Businesses dig for treasure in open data


Lindsay Clark in ComputerWeekly: “Open data, a movement which promises access to vast swaths of information held by public bodies, has started getting its hands dirty, or rather its feet.
Before a spade goes in the ground, construction and civil engineering projects face a great unknown: what is down there? In the UK, should someone discover anything of archaeological importance, a project can be halted – sometimes for months – while researchers study the site and remove artefacts….
During an open innovation day hosted by the Science and Technologies Facilities Council (STFC), open data services and technology firm Democrata proposed analytics could predict the likelihood of unearthing an archaeological find in any given location. This would help developers understand the likely risks to construction and would assist archaeologists in targeting digs more accurately. The idea was inspired by a presentation from the Archaeological Data Service in the UK at the event in June 2014.
The proposal won support from the STFC which, together with IBM, provided a nine-strong development team and access to the Hartree Centre’s supercomputer – a 131,000 core high-performance facility. For natural language processing of historic documents, the system uses two components of IBM’s Watson – the AI service which famously won the US TV quiz show Jeopardy. The system uses SPSS modelling software, the language R for algorithm development and Hadoop data repositories….
The proof of concept draws together data from the University of York’s archaeological data, the Department of the Environment, English Heritage, Scottish Natural Heritage, Ordnance Survey, Forestry Commission, Office for National Statistics, the Land Registry and others….The system analyses sets of indicators of archaeology, including historic population dispersal trends, specific geology, flora and fauna considerations, as well as proximity to a water source, a trail or road, standing stones and other archaeological sites. Earlier studies created a list of 45 indicators which was whittled down to seven for the proof of concept. The team used logistic regression to assess the relationship between input variables and come up with its prediction….”

The Emerging Science of Human-Data Interaction


Emerging Technology From the arXiv: “The rapidly evolving ecosystems associated with personal data is creating an entirely new field of scientific study, say computer scientists. And this requires a much more powerful ethics-based infrastructure….
Now Richard Mortier at the University of Nottingham in the UK and a few pals say the increasingly complex, invasive and opaque use of data should be a call to arms to change the way we study data, interact with it and control its use. Today, they publish a manifesto describing how a new science of human-data interaction is emerging from this “data ecosystem” and say that it combines disciplines such as computer science, statistics, sociology, psychology and behavioural economics.
They start by pointing out that the long-standing discipline of human-computer interaction research has always focused on computers as devices to be interacted with. But our interaction with the cyber world has become more sophisticated as computing power has become ubiquitous, a phenomenon driven by the Internet but also through mobile devices such as smartphones. Consequently, humans are constantly producing and revealing data in all kinds of different ways.
Mortier and co say there is an important distinction between data that is consciously created and released such as a Facebook profile; observed data such as online shopping behaviour; and inferred data that is created by other organisations about us, such as preferences based on friends’ preferences.
This leads the team to identify three key themes associated with human-data interaction that they believe the communities involved with data should focus on.
The first of these is concerned with making data, and the analytics associated with it, both transparent and comprehensible to ordinary people. Mortier and co describe this as the legibility of data and say that the goal is to ensure that people are clearly aware of the data they are providing, the methods used to draw inferences about it and the implications of this.
Making people aware of the data being collected is straightforward but understanding the implications of this data collection process and the processing that follows is much harder. In particular, this could be in conflict with the intellectual property rights of the companies that do the analytics.
An even more significant factor is that the implications of this processing are not always clear at the time the data is collected. A good example is the way the New York Times tracked down an individual after her seemingly anonymized searches were published by AOL. It is hard to imagine that this individual had any idea that the searches she was making would later allow her identification.
The second theme is concerned with giving people the ability to control and interact with the data relating to them. Mortier and co describe this as “agency”. People must be allowed to opt in or opt out of data collection programs and to correct data if it turns out to be wrong or outdated and so on. That will require simple-to-use data access mechanisms that have yet to be developed
The final theme builds on this to allow people to change their data preferences in future, an idea the team call “negotiability”. Something like this is already coming into force in the European Union where the Court of Justice has recently begun to enforce the “right to be forgotten”, which allows people to remove information from search results under certain circumstances….”
Ref: http://arxiv.org/abs/1412.6159  Human-Data Interaction: The Human Face of the Data-Driven Society

Mastering ’Metrics: The Path from Cause to Effect


Book by Joshua D. Angrist & Jörn-Steffen Pischke : “Applied econometrics, known to aficionados as ‘metrics, is the original data science. ‘Metrics encompasses the statistical methods economists use to untangle cause and effect in human affairs. Through accessible discussion and with a dose of kung fu–themed humor, Mastering ‘Metrics presents the essential tools of econometric research and demonstrates why econometrics is exciting and useful.
The five most valuable econometric methods, or what the authors call the Furious Five–random assignment, regression, instrumental variables, regression discontinuity designs, and differences in differences–are illustrated through well-crafted real-world examples (vetted for awesomeness by Kung Fu Panda’s Jade Palace). Does health insurance make you healthier? Randomized experiments provide answers. Are expensive private colleges and selective public high schools better than more pedestrian institutions? Regression analysis and a regression discontinuity design reveal the surprising truth. When private banks teeter, and depositors take their money and run, should central banks step in to save them? Differences-in-differences analysis of a Depression-era banking crisis offers a response. Could arresting O. J. Simpson have saved his ex-wife’s life? Instrumental variables methods instruct law enforcement authorities in how best to respond to domestic abuse….(More).”

People around you control your mind: The latest evidence


in the Washington Post: “…That’s the power of peer pressure.In a recent working paper, Pedro Gardete looked at 65,525 transactions across 1,966 flights and more than 257,000 passengers. He parsed the data into thousands of mini-experiments such as this:

If someone beside you ordered a snack or a film, Gardete was able to see whether later you did, too. In this natural experiment, the person sitting directly in front of you was the control subject. Purchases were made on a touchscreen; that person wouldn’t have been able to see anything. If you bought something, and the person in front of you didn’t, peer pressure may have been the reason.
Because he had reservation data, Gardete could exclude people flying together, and he controlled for all kinds of other factors such as seat choice. This is purely the effect of a stranger’s choice — not just that, but a stranger whom you might be resenting because he is sitting next to you, and this is a plane.
By adding up thousands of these little experiments, Gardete, an assistant professor of marketing at Stanford, came up with an estimate. On average, people bought stuff 15 to 16 percent of the time. But if you saw someone next to you order something, your chances of buying something, too, jumped by 30 percent, or about four percentage points…
The beauty of this paper is that it looks at social influences in a controlled situation. (What’s more of a trap than an airplane seat?) These natural experiments are hard to come by.
Economists and social scientists have long wondered about the power of peer pressure, but it’s one of the trickiest research problems….(More)”.

Uncle Sam Wants You…To Crowdsource Science


at Co-Labs: “It’s not just for the private sector anymore: Government scientists are embracing crowdsourcing. At a White House-sponsored workshop in late November, representatives from more than 20 different federal agencies gathered to figure out how to integrate crowdsourcing and citizen scientists into various government efforts. The workshop is part of a bigger effort with a lofty goal: Building a set of best practices for the thousands of citizens who are helping federal agencies gather data, from the Environmental Protection Agency (EPA) to NASA….Perhaps the best known federal government crowdsourcing project is Nature’s Notebook, a collaboration between the U.S. Geological Survey and the National Park Service which asks ordinary citizens to take notes on plant and animal species during different times of year. These notes are then cleansed and collated into a massive database on animal and plant phenology that’s used for decision-making by national and local governments. The bulk of the observations, recorded through smartphone apps, are made by ordinary people who spend a lot of time outdoors….Dozens of government agencies are now asking the public for help. The Centers for Disease Control and Prevention runs a student-oriented, Mechanical Turk-style “micro-volunteering” service called CDCology, the VA crowdsources design of apps for homeless veterans, while the National Weather Service distributes a mobile app called mPING that asks ordinary citizens to help fine-tune public weather reports by giving information on local conditions. The Federal Communication Commission’s Measuring Broadband America app, meanwhile, allows citizens to volunteer information on their Internet broadband speeds, and the Environmental Protection Agency’s Air Sensor Toolbox asks users to track local air pollution….
As of now, however, when it comes to crowdsourcing data for government scientific research, there’s no unified set of standards or best practices. This can lead to wild variations in how various agencies collect data and use it. For officials hoping to implement citizen science projects within government, the roadblocks to crowdsourcing include factors that crowdsourcing is intended to avoid: limited budgets, heavy bureaucracy, and superiors who are skeptical about the value of relying on the crowd for data.
Benforado and Shanley also pointed out that government agencies are subject to additional regulations, such as the Paperwork Reduction Act, which can make implementation of crowdsourcing projects more challenging than they would be in academia or the private sector… (More)”

Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and Transparency


at Medium: “…So why, then, does granular, social data make people uncomfortable? Well, ultimately—and at the risk of stating the obvious—it’s because data of this sort brings up issues regarding ethics, privacy, bias, fairness, and inclusion. In turn, these issues make people uncomfortable because, at least as the popular narrative goes, these are new issues that fall outside the expertise of those those aggregating and analyzing big data. But the thing is, these issues aren’t actually new. Sure, they may be new to computer scientists and software engineers, but they’re not new to social scientists.

This is why I think the world of big data and those working in it — ranging from the machine learning researchers developing new analysis tools all the way up to the end-users and decision-makers in government and industry — can learn something from computational social science….

So, if technology companies and government organizations — the biggest players in the big data game — are going to take issues like bias, fairness, and inclusion seriously, they need to hire social scientists — the people with the best training in thinking about important societal issues. Moreover, it’s important that this hiring is done not just in a token, “hire one social scientist for every hundred computer scientists” kind of way, but in a serious, “creating interdisciplinary teams” kind of kind of way.


Thanks to Moritz Hardt for the picture!

While preparing for my talk, I read an article by Moritz Hardt, entitled “How Big Data is Unfair.” In this article, Moritz notes that even in supposedly large data sets, there is always proportionally less data available about minorities. Moreover, statistical patterns that hold for the majority may be invalid for a given minority group. He gives, as an example, the task of classifying user names as “real” or “fake.” In one culture — comprising the majority of the training data — real names might be short and common, while in another they might be long and unique. As a result, the classic machine learning objective of “good performance on average,” may actually be detrimental to those in the minority group….

As an alternative, I would advocate prioritizing vital social questions over data availability — an approach more common in the social sciences. Moreover, if we’re prioritizing social questions, perhaps we should take this as an opportunity to prioritize those questions explicitly related to minorities and bias, fairness, and inclusion. Of course, putting questions first — especially questions about minorities, for whom there may not be much available data — means that we’ll need to go beyond standard convenience data sets and general-purpose “hammer” methods. Instead we’ll need to think hard about how best to instrument data aggregation and curation mechanisms that, when combined with precise, targeted models and tools, are capable of elucidating fine-grained, hard-to-see patterns….(More).”