Explore our articles
View All Results

Stefaan Verhulst

New Paper by Junqué de Fortuny, Enric, Martens, David, and Provost, Foster in Big Data :“With the increasingly widespread collection and processing of “big data,” there is natural interest in using these data assets to improve decision making. One of the best understood ways to use data to improve decision making is via predictive analytics. An important, open question is: to what extent do larger data actually lead to better predictive models? In this article we empirically demonstrate that when predictive models are built from sparse, fine-grained data—such as data on low-level human behavior—we continue to see marginal increases in predictive performance even to very large scale. The empirical results are based on data drawn from nine different predictive modeling applications, from book reviews to banking transactions. This study provides a clear illustration that larger data indeed can be more valuable assets for predictive analytics. This implies that institutions with larger data assets—plus the skill to take advantage of them—potentially can obtain substantial competitive advantage over institutions without such access or skill. Moreover, the results suggest that it is worthwhile for companies with access to such fine-grained data, in the context of a key predictive task, to gather both more data instances and more possible data features. As an additional contribution, we introduce an implementation of the multivariate Bernoulli Naïve Bayes algorithm that can scale to massive, sparse data.”

Predictive Modeling With Big Data: Is Bigger Really Better?

John Podesta at the White House blog: “Last Friday, the President spoke to the American people, and the international community, about how to keep us safe from terrorism in a changing world while upholding America’s commitment to liberty and privacy that our values and Constitution require. Our national security challenges are real, but that is surely not the only space where changes in technology are altering the landscape and challenging conceptions of privacy.
That’s why in his speech, the President asked me to lead a comprehensive review of the way that “big data” will affect the way we live and work; the relationship between government and citizens; and how public and private sectors can spur innovation and maximize the opportunities and free flow of this information while minimizing the risks to privacy. I will be joined in this effort by Secretary of Commerce Penny Pritzker, Secretary of Energy Ernie Moniz, the President’s Science Advisor John Holdren, the President’s Economic Advisor Gene Sperling and other senior government officials.
I would like to explain a little bit more about the review, its scope, and what you can expect over the next 90 days.
We are undergoing a revolution in the way that information about our purchases, our conversations, our social networks, our movements, and even our physical identities are collected, stored, analyzed and used. The immense volume, diversity and potential value of data will have profound implications for privacy, the economy, and public policy. The working group will consider all those issues, and specifically how the present and future state of these technologies might motivate changes in our policies across a range of sectors.
When we complete our work, we expect to deliver to the President a report that anticipates future technological trends and frames the key questions that the collection, availability, and use of “big data” raise – both for our government, and the nation as a whole. It will help identify technological changes to watch, whether those technological changes are addressed by the U.S.’s current policy framework and highlight where further government action, funding, research and consideration may be required.
This is going to be a collaborative effort. The President’s Council of Advisors on Science and Technology (PCAST) will conduct a study to explore in-depth the technological dimensions of the intersection of big data and privacy, which will feed into this broader effort. Our working group will consult with industry, civil liberties groups, technologists, privacy experts, international partners, and other national and local government officials on the significance of and future for these technologies. Finally, we will be working with a number of think tanks, academic institutions, and other organizations around the country as they convene stakeholders to discuss these very issues and questions. Likewise, many abroad are analyzing and responding to the challenge and seizing the opportunity of big data. These discussions will help to inform our study.
While we don’t expect to answer all these questions, or produce a comprehensive new policy in 90 days, we expect this work to serve as the foundation for a robust and forward-looking plan of action. Check back on this blog for updates on how you can get involved in the debate and for status updates on our progress.”

Big Data and the Future of Privacy

Joel Gurin in Information Week: “At the GovLab at New York University, where I am senior adviser, we’re taking a different approach than McKinsey’s to understand the evolving value of government open data: We’re studying open data companies from the ground up. I’m now leading the GovLab’s Open Data 500 project, funded by the John S. and James L. Knight Foundation, to identify and examine 500 American companies that use government open data as a key business resource.
Our preliminary results show that government open data is fueling companies both large and small, across the country, and in many sectors of the economy, including health, finance, education, energy, and more. But it’s not always easy to use this resource. Companies that use government open data tell us it is often incomplete, inaccurate, or trapped in hard-to-use systems and formats.
It will take a thorough and extended effort to make government data truly useful. Based on what we are hearing and the research I did for my book, here are some of the most important steps the federal government can take, starting now, to make it easier for companies to add economic value to the government’s data.
1. Improve data quality
The Open Data Policy not only directs federal agencies to release more open data; it also requires them to release information about data quality. Agencies will have to begin improving the quality of their data simply to avoid public embarrassment. We can hope and expect that they will do some data cleanup themselves, demand better data from the businesses they regulate, or use creative solutions like turning to crowdsourcing for help, as USAID did to improve geospatial data on its grantees.
 
 

2. Keep improving open data resources
The government has steadily made Data.gov, the central repository of federal open data, more accessible and useful, including a significant relaunch last week. To the agency’s credit, the GSA, which administers Data.gov, plans to keep working to make this key website still better. As part of implementing the Open Data Policy, the administration has also set up Project Open Data on GitHub, the world’s largest community for open-source software. These resources will be helpful for anyone working with open data either inside or outside of government. They need to be maintained and continually improved.
3. Pass DATA
The Digital Accountability and Transparency Act would bring transparency to federal government spending at an unprecedented level of detail. The Act has strong bipartisan support. It passed the House with only one dissenting vote and was unanimously approved by a Senate committee, but still needs full Senate approval and the President’s signature to become law. DATA is also supported by technology companies who see it as a source of new open data they can use in their businesses. Congress should move forward and pass DATA as the logical next step in the work that the Obama administration’s Open Data Policy has begun.
4. Reform the Freedom of Information Act
Since it was passed in 1966, the federal Freedom of Information Act has gone through two major revisions, both of which strengthened citizens’ ability to access many kinds of government data. It’s time for another step forward. Current legislative proposals would establish a centralized web portal for all federal FOIA requests, strengthen the FOIA ombudsman’s office, and require agencies to post more high-interest information online before they receive formal requests for it. These changes could make more information from FOIA requests available as open data.
5. Engage stakeholders in a genuine way
Up to now, the government’s release of open data has largely been a one-way affair: Agencies publish datasets that they hope will be useful without consulting the organizations and companies that want to use it. Other countries, including the UK, France, and Mexico, are building in feedback loops from data users to government data providers, and the US should, too. The Open Data Policy calls for agencies to establish points of contact for public feedback. At the GovLab, we hope that the Open Data 500 will help move that process forward. Our research will provide a basis for new, productive dialogue between government agencies and the businesses that rely on them.
6. Keep using federal challenges to encourage innovation
The federal Challenge.gov website applies the best principles of crowdsourcing and collective intelligence. Agencies should use this approach extensively, and should pose challenges using the government’s open data resources to solve business, social, or scientific problems. Other approaches to citizen engagement, including federally sponsored hackathons and the White House Champions of Change program, can play a similar role.
Through the Open Data Policy and other initiatives, the Obama administration has set the right goals. Now it’s time to implement and move toward what US CTO Todd Park calls “data liberation.” Thousands of companies, organizations, and individuals will benefit.”

How Government Can Make Open Data Work

Press Release: “The Ash Center for Democratic Governance and Innovation at the John F. Kennedy School of Government at Harvard University today announced the U.S. General Services Administration’s (GSA) Challenge.gov as a winner of the 2013 Innovations in American Government Award from a pool of more than 600 applicants.
GSA launched Challenge.gov in July 2010 in response to an Obama Administration memo tasking the agency with building a platform that allowed entrepreneurs, innovators, and the public to compete for prestige and prizes by providing the government with novel solutions to tough problems. Challenge.gov was developed in partnership with New York City-based ChallengePost, the leading platform for software competitions and hackathons. Since its launch, Challenge.gov has been used by 59 federal agencies to crowd source solutions and has received 3.5 million visits from 220 countries and territories and more than 11,000 U.S. cities. Challenge.gov has conducted nearly 300 scientific, engineering, design, multimedia, ideation, and software challenges, resulting in unprecedented public-private partnerships….
Examples of Challenge.gov competitions include a Robocall Challenge that has blocked 84,000 computer driven advertising phone calls so far, a Disability Employment Apps Challenge that sought innovative technology tools to improve employment opportunities and outcomes for people with disabilities, and the Blue Button for All Americans Contest that helps veterans have access to their health information.
Established in 1985 at Harvard University by the Ford Foundation, the Innovations in American Government Award Program has honored nearly 200 federal, state, local, and tribal government agencies. The Innovations Award Program provides concrete evidence that government can work to improve the quality of life of citizens. Many award-winning programs have been replicated across jurisdictions and policy areas, and some have served as harbingers of today’s reform strategies or as forerunners to state and federal legislation. By highlighting exemplary models of government’s innovative programs for more than 20 years, the Innovations Award Program drives continued progress and encourages research and teaching cases at Harvard University and other academic institutions worldwide. Nominations for the next Innovations in American Government Awards competition may be submitted at www.innovationsaward.harvard.edu.”

GSA’s Challenge.gov Earns Harvard Innovation Award

Allison Schrager: “I have a confession to make: I think behavioral economics is over-rated. Recently, Nobelist Robert Shiller called on economists to incorporate more psychology into their work. While there are certainly things economists can learn from psychology and other disciplines to enrich their understanding of the economy, this approach is not a revolution in economics. Often models that incorporate richer aspects of human behavior are the same models economists always use—they simply rationalize seemingly irrational behavior. Even if we can understand why people don’t always act rationally, it’s not clear if that can lead to better economic policy and regulation.

Mixing behavioral economics and policy raises two questions: should we change behavior and if so, can we? Sometimes people make bad choices—they under-save, take on too much debt or risk. These behaviors appear irrational and lead to bad outcomes, which would seem to demand more regulation. But if these choices reflect individuals’ preferences and values can we justify changing their behavior? Part of a free-society is letting people make bad choices, as long as his or her irrational economic behavior doesn’t pose costs to others. For example: Someone who under-saves may wind up dependent on taxpayers for financial support. High household debt has been associated with a weaker economy

It’s been argued that irrational economic behavior merits regulation to encourage or force choices that will benefit both the individual and the economy as a whole. But the limits of these policies are apparent in a new OECD report on the application of behavioral economics to policy. The report gives examples of regulations adopted by different OECD countries that draw on insights from behavioral economics. Thus it’s disappointing that, with all economists have learned studying behavioral economics the last ten years,   the big changes in regulation seem limited to more transparent fee disclosure, a ban on automatically selling people more goods than they explicitly ask for, and standard disclosures fees and energy use. These are certainly good policies. But is this a result of behavioral economics (helping consumers over-come behavioral bias that leads to sub-optimal choices) or is it simply requiring banks and merchants to be more honest?

Poor risk management and short-term thinking on Wall Street nearly took down the entire financial system. Can what we know about behavioral finance regulate Wall Street? According to Shiller, markets are inefficient and misprice assets because of behavioral biases (over-confidence, under-reaction to news, home bias). This leads to speculative bubbles. But it’s not clear what financial regulation can do to curb this behavior. According Gene Fama, Shiller’s co-laureate who believes markets are rational, (Disclosure: I used to work at Dimensional Fund Advisors where Fama is a consultant and shareholder) it’s not possible to systematically separate “irrational” behavior (that distorts prices) from healthy speculation, which aids price discovery. If speculators (who have an enormous financial interest) don’t know better, how can we expect regulators to?…

So far, the most promising use of behavioral economics has been in retirement saving. Automatically enrolling people into a company pension plan and raising their saving rates has been found to increase savings—especially among people not inclined to save. That is probably why the OECD report concedes behavioral economics has had the biggest impact in retirement saving….

The OECD report cites some other new policies based on behavioral science like the the 2009 CARD act in America. Credit card statements used to only list the minimum required payment, which people may have interpreted as a suggested payment plan and wound up taking years to pay their balance, incurring large fees. Now, in the US, statements must include how much it will cost to pay your balance within 36 months and the time and cost required to repay your balance if you pay the minimum. It’s still too early to see how this will impact behavior, but a 2013 study suggests it will offer modest savings to consumers, perhaps because the bias to under-value the future still exists.

But what’s striking from the OECD report is, when it comes to behavioral biases that contributed to the financial crisis (speculation on housing, too much housing debt, under-estimating risk), few policies have used what we’ve learned.”

Don’t believe the hype about behavioral economics

World Bank Group: “The World Bank Group today launched a new open data tool that provides in-depth, comparative, and easily accessible data on education policies around the world. The Systems Approach for Better Education Results (SABER) web tool helps countries collect and analyze information on their education policies, benchmark themselves against other countries, and prioritize areas for reform, with the goal of ensuring that all children and youth go to school and learn….
To date, the Bank Group, through SABER, has analyzed more than 100 countries to guide more effective reforms and investments in education at all levels, from pre-primary to tertiary education and workforce development.
Through SABER, the Bank Group aims to improve education quality by supplying policymakers, civil society, school administrators, teachers, parents, and students with more, and more meaningful, data about key education policy areas, including early childhood development, student assessment, teachers, school autonomy and accountability, and workforce development, among others.
SABER helps countries improve their education systems in three ways:

  1. Providing new data on policies and institutions. SABER collects comparable country data on education policies and institutions that are publicly available at: http://worldbank.org/education/saber, allowing governments, researchers, and other stakeholders to measure and monitor progress.
  2. Benchmarking education policies and institutions. Each policy area is rated on a four-point scale, from “Latent” to “Emerging” to “Established” and “Advanced.” These ratings highlight a country’s areas of strength and weakness while promoting cross-country learning.
  3. Highlighting key policy choices. SABER data collection and analysis produce an objective snapshot of how well a country’s education system is performing in relation to global good practice. This helps highlight the most important policy choices to spur learning.”
New Open Data Tool Helps Countries Compare Progress on Education

FierceGovernmentIT: “A changing set of counter-nuclear proliferation problems requires a paradigm shift in monitoring that should include big data analytics and crowdsourcing, says a report from the Defense Science Board.
Much has changed since the Cold War when it comes to ensuring that nuclear weapons are subject to international controls, meaning that monitoring in support of treaties covering declared capabilities should be only one part of overall U.S. monitoring efforts, says the board in a January report (.pdf).
There are challenges related to covert operations, such as testing calibrated to fall below detection thresholds, and non-traditional technologies that present ambiguous threat signatures. Knowledge about how to make nuclear weapons is widespread and in the hands of actors who will give the United States or its allies limited or no access….
The report recommends using a slew of technologies including radiation sensors, but also exploitation of digital sources of information.
“Data gathered from the cyber domain establishes a rich and exploitable source for determining activities of individuals, groups and organizations needed to participate in either the procurement or development of a nuclear device,” it says.
Big data analytics could be used to take advantage of the proliferation of potential data sources including commercial satellite imaging, social media and other online sources.
The report notes that the proliferation of readily available commercial satellite imagery has created concerns about the introduction of more noise than genuine signal. “On balance, however, it is the judgment from the task force that more information from remote sensing systems, both commercial and dedicated national assets, is better than less information,” it says.
In fact, the ready availability of commercial imagery should be an impetus of governmental ability to find weak signals “even within the most cluttered and noisy environments.”
Crowdsourcing also holds potential, although the report again notes that nuclear proliferation analysis by non-governmental entities “will constrain the ability of the United States to keep its options open in dealing with potential violations.” The distinction between gathering information and making political judgments “will erode.”
An effort by Georgetown University students (reported in the Washington Post in 2011) to use open source data analyzing the network of tunnels used in China to hide its missile and nuclear arsenal provides a proof-of-concept on how crowdsourcing can be used to augment limited analytical capacity, the report says – despite debate on the students’ work, which concluded that China’s arsenal could be many times larger than conventionally accepted…
For more:
download the DSB report, “Assessment of Nuclear Monitoring and Verification Technologies” (.pdf)
read the WaPo article on the Georgetown University crowdsourcing effort”

Use big data and crowdsourcing to detect nuclear proliferation, says DSB

Brian Wampler and Mike Touchton in the Washington Post: “Over the past 20 years, “participatory institutions” have spread around the world. Participatory institutions delegate decision-making authority directly to citizens, often in local politics, and have attracted widespread support.  International organizations, such as the World Bank and USAID, promote citizen participation in hopes that it will generate more accountable governments, strengthen social networks, improve public services, and inform voters. Elected officials often support citizen participation because it provides them the legitimacy necessary to alter spending patterns, develop new programs, mobilize citizens, or open murky policymaking processes to greater public scrutiny. Civil society organizations and citizens support participating institution because they get unprecedented access to policymaking venues, public budgets and government officials.
But do participatory institutions actually achieve any of these beneficial outcomes?  In a new study of participatory institutions in Brazil, we find that they do.  In particular, we find that municipalities with participatory programs improve the lives of their citizens.
Brazil is a leading innovator in participatory institutions. Brazilian municipal governments can voluntarily adopt a program known as Participatory Budgeting. This program directly incorporates citizens into public meetings where citizens decide how to allocate public funds. The funding amounts can represent up to 100 percent of all new capital spending projects and generally fall between 5 and 15 percent of the total municipal budget.  This is not enough to radically change how cities spend limited resources, but it is enough to generate meaningful change. For example, the Brazilian cities of Belo Horizonte and Porto Alegre have each spent hundreds of millions of U.S. dollars over the past two decades on projects that citizens selected. Moreover, many Participatory Budgeting programs have an outsize impact because they focus resources on areas that have lower incomes and fewer public services.
Between 1990 and 2008, over 120 of Brazil’s largest 250 cities adopted Participatory Budgeting. In order to assess whether PB had an impact, we compared the number of cities that adopted Participatory Budgeting during each mayoral period to cities that did not adopt it, and accounted for a range of other factors that might distinguish these two groups of cities.
The results are promising. Municipal governments that adopted Participatory Budgeting spent more on education and sanitation and saw infant mortality decrease as well. We estimate cities without PB to have infant mortality levels similar to Brazil’s mean. However, infant mortality drops by almost 20 percent for municipalities that have used PB for more than eight years — again, after accounting for other political and economic factors that might also influence infant mortality.  The evidence strongly suggests that the investment in these programs is paying important dividends. We are not alone in this conclusion: Sónia Gonçalves has reached similar conclusions about Participatory Budgeting in Brazil….
Our results also show that Participatory Budgeting’s influence strengthens over time, which indicates that its benefits do not merely result from governments making easy policy changes. Instead, Participatory Budgeting’s increasing impact indicates that governments, citizens, and civil society organizations are building new institutions that produce better forms of governance. These cities incorporate citizens at multiple moments of the policy process, allowing community leaders and public officials to exchange better information. The cities are also retraining policy experts and civil servants to better work with poor communities. Finally, public deliberation about spending priorities makes these city governments more transparent, which decreases corruption…”

Brazil let its citizens make decisions about city budgets. Here’s what happened.

New book by Claudio Cioffi-Revilla: “This reader-friendly textbook is the first work of its kind to provide a unified Introduction to Computational Social Science (CSS). Four distinct methodological approaches are examined in detail, namely automated social information extraction, social network analysis, social complexity theory and social simulation modeling. The coverage of these approaches is supported by a discussion of the historical context, as well as by a list of texts for further reading. Features: highlights the main theories of the CSS paradigm as causal explanatory frameworks that shed new light on the nature of human and social dynamics; explains how to distinguish and analyze the different levels of analysis of social complexity using computational approaches; discusses a number of methodological tools; presents the main classes of entities, objects and relations common to the computational analysis of social complexity; examines the interdisciplinary integration of knowledge in the context of social phenomena.”

Introduction to Computational Social Science: Principles and Applications

Special Report by Antonio Regalado in MIT Technology Review: “Back in 1956, an engineer and a mathematician, William Fair and Earl Isaac, pooled $800 to start a company. Their idea: a score to handicap whether a borrower would repay a loan.
It was all done with pen and paper. Income, gender, and occupation produced numbers that amounted to a prediction about a person’s behavior. By the 1980s the three-digit scores were calculated on computers and instead took account of a person’s actual credit history. Today, Fair Isaac Corp., or FICO, generates about 10 billion credit scores annually, calculating 50 times a year for many Americans.
This machinery hums in the background of our financial lives, so it’s easy to forget that the choice of whether to lend used to be made by a bank manager who knew a man by his handshake. Fair and Isaac understood that all this could change, and that their company didn’t merely sell numbers. “We sell a radically different way of making decisions that flies in the face of tradition,” Fair once said.
This anecdote suggests a way of understanding the era of “big data”—terabytes of information from sensors or social networks, new computer architectures, and clever software. But even supercharged data needs a job to do, and that job is always about a decision.
In this business report, MIT Technology Review explores a big question: how are data and the analytical tools to manipulate it changing decision making today? On Nasdaq, trading bots exchange a billion shares a day. Online, advertisers bid on hundreds of thousands of keywords a minute, in deals greased by heuristic solutions and optimization models rather than two-martini lunches. The number of variables and the speed and volume of transactions are just too much for human decision makers.
When there’s a person in the loop, technology takes a softer approach (see “Software That Augments Human Thinking”). Think of recommendation engines on the Web that suggest products to buy or friends to catch up with. This works because Internet companies maintain statistical models of each of us, our likes and habits, and use them to decide what we see. In this report, we check in with LinkedIn, which maintains the world’s largest database of résumés—more than 200 million of them. One of its newest offerings is University Pages, which crunches résumé data to offer students predictions about where they’ll end up working depending on what college they go to (see “LinkedIn Offers College Choices by the Numbers”).
These smart systems, and their impact, are prosaic next to what’s planned. Take IBM. The company is pouring $1 billion into its Watson computer system, the one that answered questions correctly on the game show Jeopardy! IBM now imagines computers that can carry on intelligent phone calls with customers, or provide expert recommendations after digesting doctors’ notes. IBM wants to provide “cognitive services”—computers that think, or seem to (see “Facing Doubters, IBM Expands Plans for Watson”).
Andrew Jennings, chief analytics officer for FICO, says automating human decisions is only half the story. Credit scores had another major impact. They gave lenders a new way to measure the state of their portfolios—and to adjust them by balancing riskier loan recipients with safer ones. Now, as other industries get exposed to predictive data, their approach to business strategy is changing, too. In this report, we look at one technique that’s spreading on the Web, called A/B testing. It’s a simple tactic—put up two versions of a Web page and see which one performs better (see “Seeking Edge, Websites Turn to Experiments” and “Startups Embrace a Way to Fail Fast”).
Until recently, such optimization was practiced only by the largest Internet companies. Now, nearly any website can do it. Jennings calls this phenomenon “systematic experimentation” and says it will be a feature of the smartest companies. They will have teams constantly probing the world, trying to learn its shifting rules and deciding on strategies to adapt. “Winners and losers in analytic battles will not be determined simply by which organization has access to more data or which organization has more money,” Jennings has said.

Of course, there’s danger in letting the data decide too much. In this report, Duncan Watts, a Microsoft researcher specializing in social networks, outlines an approach to decision making that avoids the dangers of gut instinct as well as the pitfalls of slavishly obeying data. In short, Watts argues, businesses need to adopt the scientific method (see “Scientific Thinking in Business”).
To do that, they have been hiring a highly trained breed of business skeptics called data scientists. These are the people who create the databases, build the models, reveal the trends, and, increasingly, author the products. And their influence is growing in business. This could be why data science has been called “the sexiest job of the 21st century.” It’s not because mathematics or spreadsheets are particularly attractive. It’s because making decisions is powerful…”

The Power to Decide

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday