Book by Joshua D. Angrist & Jörn-Steffen Pischke : “Applied econometrics, known to aficionados as ‘metrics, is the original data science. ‘Metrics encompasses the statistical methods economists use to untangle cause and effect in human affairs. Through accessible discussion and with a dose of kung fu–themed humor, Mastering ‘Metrics presents the essential tools of econometric research and demonstrates why econometrics is exciting and useful.
The five most valuable econometric methods, or what the authors call the Furious Five–random assignment, regression, instrumental variables, regression discontinuity designs, and differences in differences–are illustrated through well-crafted real-world examples (vetted for awesomeness by Kung Fu Panda’s Jade Palace). Does health insurance make you healthier? Randomized experiments provide answers. Are expensive private colleges and selective public high schools better than more pedestrian institutions? Regression analysis and a regression discontinuity design reveal the surprising truth. When private banks teeter, and depositors take their money and run, should central banks step in to save them? Differences-in-differences analysis of a Depression-era banking crisis offers a response. Could arresting O. J. Simpson have saved his ex-wife’s life? Instrumental variables methods instruct law enforcement authorities in how best to respond to domestic abuse….(More).”
People around you control your mind: The latest evidence
Washington Post: “…That’s the power of peer pressure.In a recent working paper, Pedro Gardete looked at 65,525 transactions across 1,966 flights and more than 257,000 passengers. He parsed the data into thousands of mini-experiments such as this:
in theIf someone beside you ordered a snack or a film, Gardete was able to see whether later you did, too. In this natural experiment, the person sitting directly in front of you was the control subject. Purchases were made on a touchscreen; that person wouldn’t have been able to see anything. If you bought something, and the person in front of you didn’t, peer pressure may have been the reason.
Because he had reservation data, Gardete could exclude people flying together, and he controlled for all kinds of other factors such as seat choice. This is purely the effect of a stranger’s choice — not just that, but a stranger whom you might be resenting because he is sitting next to you, and this is a plane.
By adding up thousands of these little experiments, Gardete, an assistant professor of marketing at Stanford, came up with an estimate. On average, people bought stuff 15 to 16 percent of the time. But if you saw someone next to you order something, your chances of buying something, too, jumped by 30 percent, or about four percentage points…
The beauty of this paper is that it looks at social influences in a controlled situation. (What’s more of a trap than an airplane seat?) These natural experiments are hard to come by.
Economists and social scientists have long wondered about the power of peer pressure, but it’s one of the trickiest research problems….(More)”.
Uncle Sam Wants You…To Crowdsource Science
Neal Ungerleider at Co-Labs: “It’s not just for the private sector anymore: Government scientists are embracing crowdsourcing. At a White House-sponsored workshop in late November, representatives from more than 20 different federal agencies gathered to figure out how to integrate crowdsourcing and citizen scientists into various government efforts. The workshop is part of a bigger effort with a lofty goal: Building a set of best practices for the thousands of citizens who are helping federal agencies gather data, from the Environmental Protection Agency (EPA) to NASA….Perhaps the best known federal government crowdsourcing project is Nature’s Notebook, a collaboration between the U.S. Geological Survey and the National Park Service which asks ordinary citizens to take notes on plant and animal species during different times of year. These notes are then cleansed and collated into a massive database on animal and plant phenology that’s used for decision-making by national and local governments. The bulk of the observations, recorded through smartphone apps, are made by ordinary people who spend a lot of time outdoors….Dozens of government agencies are now asking the public for help. The Centers for Disease Control and Prevention runs a student-oriented, Mechanical Turk-style “micro-volunteering” service called CDCology, the VA crowdsources design of apps for homeless veterans, while the National Weather Service distributes a mobile app called mPING that asks ordinary citizens to help fine-tune public weather reports by giving information on local conditions. The Federal Communication Commission’s Measuring Broadband America app, meanwhile, allows citizens to volunteer information on their Internet broadband speeds, and the Environmental Protection Agency’s Air Sensor Toolbox asks users to track local air pollution….
As of now, however, when it comes to crowdsourcing data for government scientific research, there’s no unified set of standards or best practices. This can lead to wild variations in how various agencies collect data and use it. For officials hoping to implement citizen science projects within government, the roadblocks to crowdsourcing include factors that crowdsourcing is intended to avoid: limited budgets, heavy bureaucracy, and superiors who are skeptical about the value of relying on the crowd for data.
Benforado and Shanley also pointed out that government agencies are subject to additional regulations, such as the Paperwork Reduction Act, which can make implementation of crowdsourcing projects more challenging than they would be in academia or the private sector… (More)”
Big Data, Machine Learning, and the Social Sciences: Fairness, Accountability, and Transparency
Medium: “…So why, then, does granular, social data make people uncomfortable? Well, ultimately—and at the risk of stating the obvious—it’s because data of this sort brings up issues regarding ethics, privacy, bias, fairness, and inclusion. In turn, these issues make people uncomfortable because, at least as the popular narrative goes, these are new issues that fall outside the expertise of those those aggregating and analyzing big data. But the thing is, these issues aren’t actually new. Sure, they may be new to computer scientists and software engineers, but they’re not new to social scientists.
atThis is why I think the world of big data and those working in it — ranging from the machine learning researchers developing new analysis tools all the way up to the end-users and decision-makers in government and industry — can learn something from computational social science….
So, if technology companies and government organizations — the biggest players in the big data game — are going to take issues like bias, fairness, and inclusion seriously, they need to hire social scientists — the people with the best training in thinking about important societal issues. Moreover, it’s important that this hiring is done not just in a token, “hire one social scientist for every hundred computer scientists” kind of way, but in a serious, “creating interdisciplinary teams” kind of kind of way.
While preparing for my talk, I read an article by Moritz Hardt, entitled “How Big Data is Unfair.” In this article, Moritz notes that even in supposedly large data sets, there is always proportionally less data available about minorities. Moreover, statistical patterns that hold for the majority may be invalid for a given minority group. He gives, as an example, the task of classifying user names as “real” or “fake.” In one culture — comprising the majority of the training data — real names might be short and common, while in another they might be long and unique. As a result, the classic machine learning objective of “good performance on average,” may actually be detrimental to those in the minority group….
As an alternative, I would advocate prioritizing vital social questions over data availability — an approach more common in the social sciences. Moreover, if we’re prioritizing social questions, perhaps we should take this as an opportunity to prioritize those questions explicitly related to minorities and bias, fairness, and inclusion. Of course, putting questions first — especially questions about minorities, for whom there may not be much available data — means that we’ll need to go beyond standard convenience data sets and general-purpose “hammer” methods. Instead we’ll need to think hard about how best to instrument data aggregation and curation mechanisms that, when combined with precise, targeted models and tools, are capable of elucidating fine-grained, hard-to-see patterns….(More).”
MIT to Pioneer Science of Innovation
Irving Wladawsky-Berger in the Wall Street Journal: ““Innovation – identified by MIT economist and Nobel laureate Robert Solow as the driver of long-term, sustainable economic growth and prosperity – has been a hallmark of the Massachusetts Institute of Technology since its inception.” Thus starts The MIT Innovation Initiative: Sustaining and Extending a Legacy of Innovation, the preliminary report of a yearlong effort to define the innovation needed to address some of the world’s most challenging problems. Released earlier this month, the report was developed by the MIT Innovation Initiative, launched a year ago by MIT President Rafael Reif…. Its recommendations are focused on four key priorities.
Strengthen and expand idea-to-impact education and research. Students are asking for career preparation that enables them to make a positive difference early in their careers. Twenty percent of incoming students say that they want to launch a company or NGO during their undergraduate years…
The report includes a number of specific ideas-to-impact recommendations. In education, they include new undergraduate minor programs focused on the engineering, scientific, economic and social dimensions of innovation projects. In research, it calls for supplementing research activities with specific programs designed to extend the work beyond publication with practical solutions, including proof-of-concept grants.
Extend innovation communities. Conversations with students, faculty and other stakeholders uncovered that the process of engaging with MIT’s innovation programs and activities is somewhat fragmented. The report proposes tighter integration and improved coordinations with three key types of communities:
- Students and postdocs with shared interests in innovation, including links to appropriate mentors;
- External partners, focused on linking the MIT groups more closely to corporate partners and entrepreneurs; and
- Global communities focused on linking MIT with key stakeholders in innovation hubs around the world.
Enhance innovation infrastructures. The report includes a number of recommendations for revitalizing innovation-centric infrastructures in four key areas…..
Pioneer the development of the Science of Innovation. In my opinion, the report’s most important and far reaching recommendation calls for MIT to create a new Laboratory for Innovation Science and Policy –…”
Climaps
Climaps: “This website presents the results of the EU research project EMAPS, as well as its process: an experiment to use computation and visualization to harness the increasing availability of digital data and mobilize it for public debate. To do so, EMAPS gathered a team of social and data scientists, climate experts and information designers. It also reached out beyond the walls of Academia and engaged with the actors of the climate debate.
Geneticists Begin Tests of an Internet for DNA
Antonio Regalado in MIT Technology Review: “A coalition of geneticists and computer programmers calling itself the Global Alliance for Genomics and Health is developing protocols for exchanging DNA information across the Internet. The researchers hope their work could be as important to medical science as HTTP, the protocol created by Tim Berners-Lee in 1989, was to the Web.
One of the group’s first demonstration projects is a simple search engine that combs through the DNA letters of thousands of human genomes stored at nine locations, including Google’s server farms and the University of Leicester, in the U.K. According to the group, which includes key players in the Human Genome Project, the search engine is the start of a kind of Internet of DNA that may eventually link millions of genomes together.
The technologies being developed are application program interfaces, or APIs, that let different gene databases communicate. Pooling information could speed discoveries about what genes do and help doctors diagnose rare birth defects by matching children with suspected gene mutations to others who are known to have them.
The alliance was conceived two years ago at a meeting in New York of 50 scientists who were concerned that genome data was trapped in private databases, tied down by legal consent agreements with patients, limited by privacy rules, or jealously controlled by scientists to further their own scientific work. It styles itself after the World Wide Web Consortium, or W3C, a body that oversees standards for the Web.
“It’s creating the Internet language to exchange genetic information,” says David Haussler, scientific director of the genome institute at the University of California, Santa Cruz, who is one of the group’s leaders.
The group began releasing software this year. Its hope—as yet largely unrealized—is that any scientist will be able to ask questions about genome data possessed by other laboratories, without running afoul of technical barriers or privacy rules….(More)”
Just say no to digital hoarding
Dominic Basulto at the Washington Post: “We have become a nation of digital hoarders. We save everything, even stuff that we know, deep down, we’ll never need or be able to find. We save every e-mail, every photo, every file, every text message and every video clip. If we don’t have enough space on our mobile devices, we move it to a different storage device, maybe even a hard drive or a flash drive. Or, better yet, we just move it to “the cloud.”….
If this were simply a result of the exponential growth of information — the “information overload” — that would be one thing. That’s what technology is supposed to do for us – provide new ways of creating, storing and manipulating information. Innovation, from this perspective, can be viewed as technology’s frantic quest to keep up with society’s information needs.
But digital hoarding is about something much different – it’s about hoarding data for the sake of data. When Apple creates a new “Burst Mode” on the iPhone 5s, enabling you to rapidly save a series of up to 10 photos in succession – and you save all of them – is that not an example of hoarding? When you save every e-book, every movie and every TV season that you’ve “binge-watched” on your tablet or other digital device — isn’t that another symptom of being a digital hoarder? In the analog era, you would have donated used books to charity, hosted a garage sale to get rid of old albums you never listen to, or simply dumped these items in the trash.
You may not think you are a digital hoarder. You may think that the desire to save each and every photo, e-mail or file is something relatively harmless. Storage is cheap and abundant, right? You may watch a reality TV show such as “Hoarders” and think to yourself, “That’s not me.” But maybe it is you. (Especially if you still have those old episodes of “Hoarders” on your digital device.)
Unlike hoarding in the real world — where massive stacks of papers, books, clothing and assorted junk might physically obstruct your ability to move and signal to others that you need help – there are no obvious outward signs of being a digital hoarder. And, in fact, owning the newest, super-slim 128GB tablet capable of hoarding more information than anyone else strikes many as being progressive. However, if you are constantly increasing the size of your data plan or buying new digital devices with ever more storage capacity, you just might be a digital hoarder…
In short, innovation should be about helping us transform data into information. “Search” was perhaps the first major innovation that helped us transform data into information. The “cloud” is currently the innovation that has the potential to organize our data better and more efficiently, keeping it from clogging up our digital devices. The next big innovation may be “big data,” which claims that it can make sense of all the new data we’re creating. This may be either brilliant — helping us find the proverbial needle in the digital haystack — or disastrous — encouraging us to build bigger and bigger haystacks in the hope that there’s a needle in there somewhere… (More).”
Climate Resilience Toolkit
“Meeting the challenges of a changing climate
The U.S. Climate Resilience Toolkit provides scientific tools, information, and expertise to help people manage their climate-related risks and opportunities, and improve their resilience to extreme events. The site is designed to serve interested citizens, communities, businesses, resource managers, planners, and policy leaders at all levels of government.
A climate-smart approach to taking action
In response to the President’s Climate Action Plan and Executive Order to help the nation prepare for climate-related changes and impacts, U.S. federal government agencies gathered resources that can help people take action to build their climate resilience. The impacts of climate change—including higher temperatures, heavier downpours, more frequent and intense droughts, wildfires, and floods, and sea level rise—are affecting communities, businesses, and natural resources across the nation.
Now is the time to act. For some, taking a business-as-usual approach has become more risky than taking steps to build their climate resilience. People who recognize they are vulnerable to climate variability and change can work to reduce their vulnerabilities, and find win-win opportunities that simultaneously boost local economies, create new jobs, and improve the health of ecosystems. This is a climate-smart approach—investing in activities that build resilience and capacity while reducing risk.
What’s in the Toolkit? How can it help?
Using plain language, the U.S. Climate Resilience Toolkit helps people face climate problems and find climate opportunities. The site offers:
- Steps to Resilience—a five-step process you can follow to initiate, plan, and implement projects to become more resilient to climate-related hazards.
- Taking Action stories—real-world case studies describing climate-related risks and opportunities that communities and businesses face, steps they’re taking to plan and respond, and tools and techniques they’re using to improve resilience.
- A catalog of freely available Tools for accessing and analyzing climate data, generating visualizations, exploring climate projections, estimating hazards, and engaging stakeholders in resilience-building efforts.
- Climate Explorer—a visualization tool that offers maps of climate stressors and impacts as well as interactive graphs showing daily observations and long-term averages from thousands of weather stations.
- Topic narratives that explain how climate variability and change can impact particular regions of the country and sectors of society.
- Pointers to free, federally developed training courses that can build skills for using climate tools and data.
- Maps highlighting the locations of centers where federal and state agencies can provide regional climate information.
- The ability to Search the entire federal government’s climate science domain and filter results according to your interests.”
Opening Government: Designing Open Innovation Processes to Collaborate With External Problem Solvers
New paper by Ines Mergel in Social Science Computer Review: “Open government initiatives in the U.S. government focus on three main aspects: transparency, participation, and collaboration. Especially the collaboration mandate is relatively unexplored in the literature. In practice, government organizations recognize the need to include external problem solvers into their internal innovation creation processes. This is partly derived from a sense of urgency to improve the efficiency and quality of government service delivery. Another formal driver is the America Competes Act that instructs agencies to search for opportunities to meaningfully promote excellence in technology, education, and science. Government agencies are responding to these requirements by using open innovation (OI) approaches to invite citizens to crowdsource and peer produce solutions to public management problems. These distributed innovation processes occur at all levels of the U.S. government and it is important to understand what design elements are used to create innovative public management ideas. This article systematically reviews existing government crowdsourcing and peer production initiatives and shows that after agencies have defined their public management problem, they go through four different phases of the OI process: (1) idea generation through crowdsourcing, (2) incubation of submitted ideas with peer voting and collaborative improvements of favorite solutions, (3) validation with a proof of concept of implementation possibilities, and (4) reveal of the selected solution and the (internal) implementation of the winning idea. Participation and engagement are incentivized both with monetary and nonmonetary rewards, which lead to tangible solutions as well as intangible innovation outcomes, such as increased public awareness.”