The Power to Decide


Special Report by Antonio Regalado in MIT Technology Review: “Back in 1956, an engineer and a mathematician, William Fair and Earl Isaac, pooled $800 to start a company. Their idea: a score to handicap whether a borrower would repay a loan.
It was all done with pen and paper. Income, gender, and occupation produced numbers that amounted to a prediction about a person’s behavior. By the 1980s the three-digit scores were calculated on computers and instead took account of a person’s actual credit history. Today, Fair Isaac Corp., or FICO, generates about 10 billion credit scores annually, calculating 50 times a year for many Americans.
This machinery hums in the background of our financial lives, so it’s easy to forget that the choice of whether to lend used to be made by a bank manager who knew a man by his handshake. Fair and Isaac understood that all this could change, and that their company didn’t merely sell numbers. “We sell a radically different way of making decisions that flies in the face of tradition,” Fair once said.
This anecdote suggests a way of understanding the era of “big data”—terabytes of information from sensors or social networks, new computer architectures, and clever software. But even supercharged data needs a job to do, and that job is always about a decision.
In this business report, MIT Technology Review explores a big question: how are data and the analytical tools to manipulate it changing decision making today? On Nasdaq, trading bots exchange a billion shares a day. Online, advertisers bid on hundreds of thousands of keywords a minute, in deals greased by heuristic solutions and optimization models rather than two-martini lunches. The number of variables and the speed and volume of transactions are just too much for human decision makers.
When there’s a person in the loop, technology takes a softer approach (see “Software That Augments Human Thinking”). Think of recommendation engines on the Web that suggest products to buy or friends to catch up with. This works because Internet companies maintain statistical models of each of us, our likes and habits, and use them to decide what we see. In this report, we check in with LinkedIn, which maintains the world’s largest database of résumés—more than 200 million of them. One of its newest offerings is University Pages, which crunches résumé data to offer students predictions about where they’ll end up working depending on what college they go to (see “LinkedIn Offers College Choices by the Numbers”).
These smart systems, and their impact, are prosaic next to what’s planned. Take IBM. The company is pouring $1 billion into its Watson computer system, the one that answered questions correctly on the game show Jeopardy! IBM now imagines computers that can carry on intelligent phone calls with customers, or provide expert recommendations after digesting doctors’ notes. IBM wants to provide “cognitive services”—computers that think, or seem to (see “Facing Doubters, IBM Expands Plans for Watson”).
Andrew Jennings, chief analytics officer for FICO, says automating human decisions is only half the story. Credit scores had another major impact. They gave lenders a new way to measure the state of their portfolios—and to adjust them by balancing riskier loan recipients with safer ones. Now, as other industries get exposed to predictive data, their approach to business strategy is changing, too. In this report, we look at one technique that’s spreading on the Web, called A/B testing. It’s a simple tactic—put up two versions of a Web page and see which one performs better (see “Seeking Edge, Websites Turn to Experiments” and “Startups Embrace a Way to Fail Fast”).
Until recently, such optimization was practiced only by the largest Internet companies. Now, nearly any website can do it. Jennings calls this phenomenon “systematic experimentation” and says it will be a feature of the smartest companies. They will have teams constantly probing the world, trying to learn its shifting rules and deciding on strategies to adapt. “Winners and losers in analytic battles will not be determined simply by which organization has access to more data or which organization has more money,” Jennings has said.

Of course, there’s danger in letting the data decide too much. In this report, Duncan Watts, a Microsoft researcher specializing in social networks, outlines an approach to decision making that avoids the dangers of gut instinct as well as the pitfalls of slavishly obeying data. In short, Watts argues, businesses need to adopt the scientific method (see “Scientific Thinking in Business”).
To do that, they have been hiring a highly trained breed of business skeptics called data scientists. These are the people who create the databases, build the models, reveal the trends, and, increasingly, author the products. And their influence is growing in business. This could be why data science has been called “the sexiest job of the 21st century.” It’s not because mathematics or spreadsheets are particularly attractive. It’s because making decisions is powerful…”

How should we analyse our lives?


Gillian Tett in the Financial Times on the challenge of using the new form of data science: “A few years ago, Alex “Sandy” Pentland, a professor of computational social sciences at MIT Media Lab, conducted a curious experiment at a Bank of America call centre in Rhode Island. He fitted 80 employees with biometric devices to track all their movements, physical conversations and email interactions for six weeks, and then used a computer to analyse “some 10 gigabytes of behaviour data”, as he recalls.
The results showed that the workers were isolated from each other, partly because at this call centre, like others of its ilk, the staff took their breaks in rotation so that the phones were constantly manned. In response, Bank of America decided to change its system to enable staff to hang out together over coffee and swap ideas in an unstructured way. Almost immediately there was a dramatic improvement in performance. “The average call-handle time decreased sharply, which means that the employees were much more productive,” Pentland writes in his forthcoming book Social Physics. “[So] the call centre management staff converted the break structure of all their call centres to this new system and forecast a $15m per year productivity increase.”
When I first heard Pentland relate this tale, I was tempted to give a loud cheer on behalf of all long-suffering call centre staff and corporate drones. Pentland’s data essentially give credibility to a point that many people know instinctively: that it is horribly dispiriting – and unproductive – to have to toil in a tiny isolated cubicle by yourself all day. Bank of America deserves credit both for letting Pentland’s team engage in this people-watching – and for changing its coffee-break schedule in response.
But there is a bigger issue at stake here too: namely how academics such as Pentland analyse our lives. We have known for centuries that cultural and social dynamics influence how we behave but until now academics could usually only measure this by looking at micro-level data, which were often subjective. Anthropology (a discipline I know well) is a case in point: anthropologists typically study cultures by painstakingly observing small groups of people and then extrapolating this in a subjective manner.

Pentland and others like him are now convinced that the great academic divide between “hard” and “soft” sciences is set to disappear, since researchers these days can gather massive volumes of data about human behaviour with precision. Sometimes this information is volunteered by individuals, on sites such as Facebook; sometimes it can be gathered from the electronic traces – the “digital breadcrumbs” – that we all deposit (when we use a mobile phone, say) or deliberately collected with biometric devices like the ones used at Bank of America. Either way, it can enable academics to monitor and forecast social interaction in a manner we could never have dreamed of before. “Social physics helps us understand how ideas flow from person to person . . . and ends up shaping the norms, productivity and creative output of our companies, cities and societies,” writes Pentland. “Just as the goal of traditional physics is to understand how the flow of energy translates into change in motion, social physics seems to understand how the flow of ideas and information translates into changes in behaviour….

But perhaps the most important point is this: whether you love or hate this new form of data science, the genie cannot be put back in the bottle. The experiments that Pentland and many others are conducting at call centres, offices and other institutions across America are simply the leading edge of a trend.

The only question now is whether these powerful new tools will be mostly used for good (to predict traffic queues or flu epidemics) or for more malevolent ends (to enable companies to flog needless goods, say, or for government control). Sadly, “social physics” and data crunching don’t offer any prediction on this issue, even though it is one of the dominant questions of our age.”

Algorithms and the Changing Frontier


A GMU School of Public Policy Research Paper by Agwara, Hezekiah and Auerswald, Philip E. and Higginbotham, Brian D.: “We first summarize the dominant interpretations of the “frontier” in the United States and predecessor colonies over the past 400 years: agricultural (1610s-1880s), industrial (1890s-1930s), scientific (1940s-1980s), and algorithmic (1990s-present). We describe the difference between the algorithmic frontier and the scientific frontier. We then propose that the recent phenomenon referred to as “globalization” is actually better understood as the progression of the algorithmic frontier, as enabled by standards that in turn have facilitated the interoperability of firm-level production algorithms. We conclude by describing implications of the advance of the algorithmic frontier for scientific discovery and technological innovation.”

MIT Crowdsources the Next Great (free) IQ Test


ThePsychReport: “Raven’s Matrices have long been a gold standard for psychologists needing to measure general intelligence. But the good ones, the ones scientists like to use, are too expensive for most research projects.

Christopher Chabris, associate professor of psychology at Union College, and David Engel, postdoctoral associate at MIT Sloan School of Management, think the public can help. They recently launched a campaign to crowdsource “the next great IQ test.” The Matrix Reasoning Challenge, created through MIT’s Center for Collective Intelligence with Anita Woolley and Tom Malone,  calls on the public to design and submit matrix puzzles – 3×3 grids that asks subjects to complete a pattern by filling in a missing square.

Chabris says they aren’t trying to compete with commercially available tests used for diagnostic or clinical purposes, but rather want to provide a trustworthy and free alternative for scientists. Because these types of puzzles are nonverbal, culturally neutral, and objective, they have wide-ranging applications and are particularly useful when conducting research across various demographics. If this project is successful, a lot more scientists could do a lot more research.

A simple example of a matrix puzzle. Source: Matrix Reasoning Challenge

“Researchers typically don’t have that much money,” Chabris said. “They can’t afford pay per use tests. Sometimes they have no research budgets, or if they do, they’re not large enough for that kind of thing. Our real goal is to create something useful for researchers.”

Through the Matrix Reasoning Challenge, Chabris and Engel also hope to better understand how crowdsourcing can be used to problem-solve in social and cognitive sciences.

Social scientists already widely use crowdsourcing sites like Amazon’s Mechanical Turk to recruit participants for their studies, but the matrix project is different in that it seeks to tap into the public’s expertise to help solve scientific problems. Scientists in computer science and bioinformatics have been able to harness this expertise to yield some incredible results. Using TopCoder.com, NASA was able to find a more efficient way to deploy solar panels on the International Space Station. Harvard Medical School was able to develop better software for analyzing immune-system genes. With The Matrix Reasoning Challenge, Chabris and Engel are beginning to explore crowdsourcing’s potential in the social sciences.”

Needed: A New Generation of Game Changers to Solve Public Problems


Beth Noveck: “In order to change the way we govern, it is important to train and nurture a new generation of problem solvers who possess the multidisciplinary skills to become effective agents of change. That’s why we at the GovLab have launched The GovLab Academy with the support of the Knight Foundation.
In an effort to help people in their own communities become more effective at developing and implementing creative solutions to compelling challenges, The Gov Lab Academy is offering two new training programs:
1) An online platform with an unbundled and evolving set of topics, modules and instructors on innovations in governance, including themes such as big and open data and crowdsourcing and forthcoming topics on behavioral economics, prizes and challenges, open contracting and performance management for governance;
2) Gov 3.0: A curated and sequenced, 14-week mentoring and training program.
While the online-platform is always freely available, Gov 3.0 begins on January 29, 2014 and we invite you to to participate. Please forward this email to your networks and help us spread the word about the opportunity to participate.
Please consider applying (individuals or teams may apply), if you are:

  • an expert in communications, public policy, law, computer science, engineering, business or design who wants to expand your ability to bring about social change;

  • a public servant who wants to bring innovation to your job;

  • someone with an important idea for positive change but who lacks key skills or resources to realize the vision;

  • interested in joining a network of like-minded, purpose-driven individuals across the country; or

  • someone who is passionate about using technology to solve public problems.

The program includes live instruction and conversation every Wednesday from 5:00– 6:30 PM EST for 14 weeks starting Jan 29, 2014. You will be able to participate remotely via Google Hangout.

Gov 3.0 will allow you to apply evolving technology to the design and implementation of effective solutions to public interest challenges. It will give you an overview of the most current approaches to smarter governance and help you improve your skills in collaboration, communication, and developing and presenting innovative ideas.

Over 14 weeks, you will develop a project and a plan for its implementation, including a long and short description, a presentation deck, a persuasive video and a project blog. Last term’s projects covered such diverse issues as post-Fukushima food safety, science literacy for high schoolers and prison reform for the elderly. In every case, the goal was to identify realistic strategies for making a difference quickly.  You can read the entire Gov 3.0 syllabus here.

The program will include national experts and instructors in technology and governance both as guests and as mentors to help you design your project. Last term’s mentors included current and former officials from the White House and various state, local and international governments, academics from a variety of fields, and prominent philanthropists.

People who complete the program will have the opportunity to apply for a special fellowship to pursue their projects further.

Previously taught only on campus, we are offering Gov 3.0 in beta as an online program. This is not a MOOC. It is a mentoring-intensive coaching experience. To maximize the quality of the experience, enrollment is limited.

Please submit your application by January 22, 2014. Accepted applicants (individuals and teams) will be notified on January 24, 2014. We hope to expand the program in the future so please use the same form to let us know if you would like to be kept informed about future opportunities.”

Innovation by Competition: How Challenges and Competition Get the Most Out of the Crowd


Innocentive: “Crowdsourcing has become the 21st century’s alternative to problem solving in place of traditional employee-based strategies. It has become the modern solution to provide for needed services, content, and ideas. Crowdsourced ideas are paving the way for today’s organizations to tackle innovation challenges that confront them in today’s competitive global marketplace. To put it all in perspective, crowds used to be thought of as angry mobs. Today, crowds are more like friendly and helpful contributors. What an interesting juxtaposition, eh?
Case studies proving the effectiveness of crowdsourcing to conquer innovation challenge, particularly in the fields of science and engineering abound. Despite this fact that success stories involving crowdsourcing are plentiful, very few firms are really putting its full potential to use. Advances in ALS and AIDS research have both made huge advances thanks to crowdsourcing, just to name a couple.
Biologists at the University of Washington were able to map the structure of an AIDS related virus thanks to the collaboration involved with crowdsourcing. How did they do this?  With the help of gamers playing a game designed to help get the information the University of Washington needed. It was a solution that remained unattainable for over a decade until enough top notch scientific minds were expertly probed from around the world with effective crowdsourcing techniques.
Dr. Seward Rutkove discovered an ALS biomarker to accurately measure the progression of the disease in patients through the crowdsourcing tactics utilized in a prize contest by an organization named Prize4Life, who utilized our Challenge Driven Innovation approach to engage the crowd.
The truth is, the concept of crowdsourcing to innovate has been around for centuries. But, with the growing connectedness of the world due to sheer Internet access, the power and ability to effectively crowdsource has increased exponentially. It’s time for corporations to realize this, and stop relying on stale sources of innovation. ..”

EPA Launches New Citizen Science Website


Press Release:The U.S. Environmental Protection Agency has revamped its Citizen Science website to provide new resources and success stories to assist the public in conducting scientific research and collecting data to better understand their local environment and address issues of concern. The website can be found at www.epa.gov/region2/citizenscience.
“Citizen Science is an increasingly important part of EPA’s commitment to using sound science and technology to protect people’s health and safeguard the environment,” said Judith A. Enck, EPA Regional Administrator. “The EPA encourages the public to use the new website as a tool in furthering their scientific investigations and developing solutions to pollution problems.”
The updated website now offers detailed information about air, water and soil monitoring, including recommended types of equipment and resources for conducting investigations. It also includes case studies and videotapes that showcase successful citizen science projects in New York and New Jersey, provides funding opportunities, quality assurance information and workshops and webinars.”

Bad Data


Bad Data is a site providing real-world examples of how not to prepare or provide data. It showcases the poorly structured, the mis-formatted, or the just plain ugly. Its primary purpose is to educate – though there may also be some aspect of entertainment.
As a side-product it also provides a source of good practice material for budding data wranglers (the repo in fact began as a place to keep practice data for Data Explorer).
New examples wanted and welcome – submit them here »

Examples

Prospects for Online Crowdsourcing of Social Science Research Tasks: A Case Study Using Amazon Mechanical Turk


New paper by Catherine E. Schmitt-Sands and Richard J. Smith: “While the internet has created new opportunities for research, managing the increased complexity of relationships and knowledge also creates challenges. Amazon.com has a Mechanical Turk service that allows people to crowdsource simple tasks for a nominal fee. The online workers may be anywhere in North America or India and range in ability. Social science researchers are only beginning to use this service. While researchers have used crowdsourcing to find research subjects or classify texts, we used Mechanical Turk to conduct a policy scan of local government websites. This article describes the process used to train and ensure quality of the policy scan. It also examines choices in the context of research ethics.”

From funding agencies to scientific agency –


New paper on “Collective allocation of science funding as an alternative to peer review”: “Publicly funded research involves the distribution of a considerable amount of money. Funding agencies such as the US National Science Foundation (NSF), the US National Institutes of Health (NIH) and the European Research Council (ERC) give billions of dollars or euros of taxpayers’ money to individual researchers, research teams, universities, and research institutes each year. Taxpayers accordingly expect that governments and funding agencies will spend their money prudently and efficiently.

Investing money to the greatest effect is not a challenge unique to research funding agencies and there are many strategies and schemes to choose from. Nevertheless, most funders rely on a tried and tested method in line with the tradition of the scientific community: the peer review of individual proposals to identify the most promising projects for funding. This method has been considered the gold standard for assessing the scientific value of research projects essentially since the end of the Second World War.

However, there is mounting critique of the use of peer review to direct research funding. High on the list of complaints is the cost, both in terms of time and money. In 2012, for example, NSF convened more than 17,000 scientists to review 53,556 proposals [1]. Reviewers generally spend a considerable time and effort to assess and rate proposals of which only a minority can eventually get funded. Of course, such a high rejection rate is also frustrating for the applicants. Scientists spend an increasing amount of time writing and submitting grant proposals. Overall, the scientific community invests an extraordinary amount of time, energy, and effort into the writing and reviewing of research proposals, most of which end up not getting funded at all. This time would be better invested in conducting the research in the first place.

Peer review may also be subject to biases, inconsistencies, and oversights. The need for review panels to reach consensus may lead to sub‐optimal decisions owing to the inherently stochastic nature of the peer review process. Moreover, in a period where the money available to fund research is shrinking, reviewers may tend to “play it safe” and select proposals that have a high chance of producing results, rather than more challenging and ambitious projects. Additionally, the structuring of funding around calls‐for‐proposals to address specific topics might inhibit serendipitous discovery, as scientists work on problems for which funding happens to be available rather than trying to solve more challenging problems.

The scientific community holds peer review in high regard, but it may not actually be the best possible system for identifying and supporting promising science. Many proposals have been made to reform funding systems, ranging from incremental changes to peer review—including careful selection of reviewers [2] and post‐hoc normalization of reviews [3]—to more radical proposals such as opening up review to the entire online population [4] or removing human reviewers altogether by allocating funds through an objective performance measure [5].

We would like to add another alternative inspired by the mathematical models used to search the internet for relevant information: a highly decentralized funding model in which the wisdom of the entire scientific community is leveraged to determine a fair distribution of funding. It would still require human insight and decision‐making, but it would drastically reduce the overhead costs and may alleviate many of the issues and inefficiencies of the proposal submission and peer review system, such as bias, “playing it safe”, or reluctance to support curiosity‐driven research.

Our proposed system would require funding agencies to give all scientists within their remit an unconditional, equal amount of money each year. However, each scientist would then be required to pass on a fixed percentage of their previous year’s funding to other scientists whom they think would make best use of the money (Fig 1). Every year, then, scientists would receive a fixed basic grant from their funding agency combined with an elective amount of funding donated by their peers. As a result of each scientist having to distribute a given percentage of their previous year’s budget to other scientists, money would flow through the scientific community. Scientists who are generally anticipated to make the best use of funding will accumulate more.”