Science Now: “Crowdsourcing is the latest research rage—Kickstarter to raise funding, screen savers that number-crunch, and games to find patterns in data—but most efforts have been confined to the virtual lab of the Internet. In a new twist, researchers have now crowdsourced their experiments by connecting players of a video game to an actual biochemistry lab. The game, called EteRNA, allows players to remotely carry out real experiments to verify their predictions of how RNA molecules fold. The first big result: a study published this week in the Proceedings of the National Academy of Sciences, bearing the names of more than 37,000 authors—only 10 of them professional scientists. “It’s pretty amazing stuff,” says Erik Winfree, a biophysicist at the California Institute of Technology in Pasadena.
Some see EteRNA as a sign of the future for science, not only for crowdsourcing citizen scientists but also for giving them remote access to a real lab. “Cloud biochemistry,” as some call it, isn’t just inevitable, Winfree says: It’s already here. DNA sequencing, gene expression testing, and many biochemical assays are already outsourced to remote companies, and any “wet lab” experiment that can be automated will be automated, he says. “Then the scientists can focus on the non-boring part of their work.”
EteRNA grew out of an online video game called Foldit. Created in 2008 by a team led by David Baker and Zoran Popović, a molecular biologist and computer scientist, respectively, at the University of Washington, Seattle, Foldit focuses on predicting the shape into which a string of amino acids will fold. By tweaking virtual strings, Foldit players can surpass the accuracy of the fastest computers in the world at predicting the structure of certain proteins. Two members of the Foldit team, Adrien Treuille and Rhiju Das, conceived of EteRNA back in 2009. “The idea was to make a version of Foldit for RNA,” says Treuille, who is now based at Carnegie Mellon University in Pittsburgh, Pennsylvania. Treuille’s doctoral student Jeehyung Lee developed the needed software, but then Das persuaded them to take it a giant step further: hooking players up directly to a real-world, robot-controlled biochemistry lab. After all, RNA can be synthesized and its folded-up structure determined far more cheaply and rapidly than protein can.
Lee went back to the drawing board, redesigning the game so that it had not only a molecular design interface like Foldit, but also a laboratory interface for designing RNA sequences for synthesis, keeping track of hypotheses for RNA folding rules, and analyzing data to revise those hypotheses. By 2010, Lee had a prototype game ready for testing. Das had the RNA wet lab ready to go at Stanford University in Palo Alto, California, where he is now a professor. All they lacked were players.
A message to the Foldit community attracted a few hundred players. Then in early 2011, The New York Times wrote about EteRNA and tens of thousands of players flooded in.
The game comes with a detailed tutorial and a series of puzzles involving known RNA structures. Only after winning 10,000 points do you unlock the ability to join EteRNA’s research team. There the goal is to design RNA sequences that will fold into a target structure. Each week, eight sequences are chosen by vote and sent to Stanford for synthesis and structure determination. The data that come back reveal how well the sequences’ true structures matched their targets. That way, Treuille says, “reality keeps score.” The players use that feedback to tweak a set of hypotheses: design rules for determining how an RNA sequence will fold.
Two years and hundreds of RNA structures later, the players of EteRNA have proven themselves to be a potent research team. Of the 37,000 who played, about 1000 graduated to participating in the lab for the study published today. (EteRNA now has 133,000 players, 4000 of them doing research.) They generated 40 new rules for RNA folding. For example, at the junctions between different parts of the RNA structure—such as between a loop and an arm—the players discovered that it is far more stable if enriched with guanines and cytosines, the strongest bonding of the RNA base pairs. To see how well those rules describe reality, the humans then competed toe to toe against computers in a new series of RNA structure challenges. The researchers distilled the humans’ 40 rules into an algorithm called EteRNA Bot.”
The Moneyball Effect: How smart data is transforming criminal justice, healthcare, music, and even government spending
TED: “When Anne Milgram became the Attorney General of New Jersey in 2007, she was stunned to find out just how little data was available on who was being arrested, who was being charged, who was serving time in jails and prisons, and who was being released. It turns out that most big criminal justice agencies like my own didn’t track the things that matter,” she says in today’s talk, filmed at TED@BCG. “We didn’t share data, or use analytics, to make better decisions and reduce crime.”
Milgram’s idea for how to change this: “I wanted to moneyball criminal justice.”
Moneyball, of course, is the name of a 2011 movie starring Brad Pitt and the book it’s based on, written by Michael Lewis in 2003. The term refers to a practice adopted by the Oakland A’s general manager Billy Beane in 2002 — the organization began basing decisions not on star power or scout instinct, but on statistical analysis of measurable factors like on-base and slugging percentages. This worked exceptionally well. On a tiny budget, the Oakland A’s made it to the playoffs in 2002 and 2003, and — since then — nine other major league teams have hired sabermetric analysts to crunch these types of numbers.
Milgram is working hard to bring smart statistics to criminal justice. To hear the results she’s seen so far, watch this talk. And below, take a look at a few surprising sectors that are getting the moneyball treatment as well.
Moneyballing music. Last year, Forbes magazine profiled the firm Next Big Sound, a company using statistical analysis to predict how musicians will perform in the market. The idea is that — rather than relying on the instincts of A&R reps — past performance on Pandora, Spotify, Facebook, etc can be used to predict future potential. The article reads, “For example, the company has found that musicians who gain 20,000 to 50,000 Facebook fans in one month are four times more likely to eventually reach 1 million. With data like that, Next Big Sound promises to predict album sales within 20% accuracy for 85% of artists, giving labels a clearer idea of return on investment.”
Moneyballing human resources. In November, The Atlantic took a look at the practice of “people analytics” and how it’s affecting employers. (Billy Beane had something to do with this idea — in 2012, he gave a presentation at the TLNT Transform Conference called “The Moneyball Approach to Talent Management.”) The article describes how Bloomberg reportedly logs its employees’ keystrokes and the casino, Harrah’s, tracks employee smiles. It also describes where this trend could be going — for example, how a video game called Wasabi Waiter could be used by employers to judge potential employees’ ability to take action, solve problems and follow through on projects. The article looks at the ways these types of practices are disconcerting, but also how they could level an inherently unequal playing field. After all, the article points out that gender, race, age and even height biases have been demonstrated again and again in our current hiring landscape.
Moneyballing healthcare. Many have wondered: what about a moneyball approach to medicine? (See this call out via Common Health, this piece in Wharton Magazine or this op-ed on The Huffington Post from the President of the New York State Health Foundation.) In his TED Talk, “What doctors can learn from each other,” Stefan Larsson proposed an idea that feels like something of an answer to this question. In the talk, Larsson gives a taste of what can happen when doctors and hospitals measure their outcomes and share this data with each other: they are able to see which techniques are proving the most effective for patients and make adjustments. (Watch the talk for a simple way surgeons can make hip surgery more effective.) He imagines a continuous learning process for doctors — that could transform the healthcare industry to give better outcomes while also reducing cost.
Moneyballing government. This summer, John Bridgeland (the director of the White House Domestic Policy Council under President George W. Bush) and Peter Orszag (the director of the Office of Management and Budget in Barack Obama’s first term) teamed up to pen a provocative piece for The Atlantic called, “Can government play moneyball?” In it, the two write, “Based on our rough calculations, less than $1 out of every $100 of government spending is backed by even the most basic evidence that the money is being spent wisely.” The two explain how, for example, there are 339 federally-funded programs for at-risk youth, the grand majority of which haven’t been evaluated for effectiveness. And while many of these programs might show great results, some that have been evaluated show troubling results. (For example, Scared Straight has been shown to increase criminal behavior.) Yet, some of these ineffective programs continue because a powerful politician champions them. While Bridgeland and Orszag show why Washington is so averse to making data-based appropriation decisions, the two also see the ship beginning to turn around. They applaud the Obama administration for a 2014 budget with an “unprecendented focus on evidence and results.” The pair also gave a nod to the nonprofit Results for America, which advocates that for every $99 spent on a program, $1 be spent on evaluating it. The pair even suggest a “Moneyball Index” to encourage politicians not to support programs that don’t show results.
In any industry, figuring out what to measure, how to measure it and how to apply the information gleaned from those measurements is a challenge. Which of the applications of statistical analysis has you the most excited? And which has you the most terrified?”
Open Data and Clinical Trials
Editorial by Jeffrey M. Drazen, M.D at NEJM.org :”In the fall of 2013, the Institute of Medicine (IOM) convened a committee, on which I serve, to examine the sharing of data in the setting of clinical trials. The committee is charged with reviewing current practices on data sharing in the context of randomized, controlled trials and with making recommendations for future data-sharing standards. Over the past few months, the committee has prepared a draft report that reviews current practices on data sharing and lays out a number of potential data-sharing models. Full details regarding the committee’s charge and the interim report are available at www.iom.edu/activities/research/sharingclinicaltrialdata.aspx….
Open-data advocates argue that all the study data should be available to anyone at the time the first report is published or even earlier. Others argue that to maintain an incentive for researchers to pursue clinical investigations and to give those who gathered the data a chance to prepare and publish further reports, there should be a period of some specified length during which the data gatherers would have exclusive access to the information. Since these researchers could always agree to collaborate with others who were not involved in the study in order to use the data to help answer a scientific question, the period of exclusivity would really apply only to noncollaborative use of the data. That is, there would be a defined period during which the data would not be available to those who wanted to perform their own analyses and draw conclusions that could, for example, provide them with a scientific or commercial competitive advantage over the researchers who had originally gathered the data or allow them to derive conclusions that are potentially at odds with those drawn in the original publication.
As members of a community that either produces or uses data, what approach do you think serves our community best? There is no need to reply to the Journal, but please read the interim report and let the IOM know how you feel about this and the many other critical issues related to data sharing that are reviewed in the document. The IOM is collecting comments until March 24, 2014, at www8.nationalacademies.org/cp/projectview.aspx?key=49578.”
How a New Science of Cities Is Emerging from Mobile Phone Data Analysis
MIT Technology Review: “Mobile phones have generated enormous insight into the human condition thanks largely to the study of the data they produce. Mobile phone companies record the time of each call, the caller and receiver ids, as well as the locations of the cell towers involved, among other things.
The combined data from millions of people produces some fascinating new insights in the nature of our society. Anthropologists have crunched it to reveal human reproductive strategies, a universal law of commuting and even the distribution of wealth in Africa.
Today, computer scientists have gone one step further by using mobile phone data to map the structure of cities and how people use them throughout the day. “These results point towards the possibility of a new, quantitative classification of cities using high resolution spatio-temporal data,” say Thomas Louail at the Institut de Physique Théorique in Paris and a few pals.
They say their work is part of a new science of cities that aims to objectively measure and understand the nature of large population centers.
These guys begin with a database of mobile phone calls made by people in the 31 Spanish cities that have populations larger than 200,000. The data consists of the number of unique individuals using a given cell tower (whether making a call or not) for each hour of the day over almost two months….The results reveal some fascinating patterns in city structure. For a start, every city undergoes a kind of respiration in which people converge into the center and then withdraw on a daily basis, almost like breathing. And this happens in all cities. This “suggests the existence of a single ‘urban rhythm’ common to all cities,” say Louail and co.
During the week, the number of phone users peaks at about midday and then again at about 6 p.m. During the weekend the numbers peak a little later: at 1 p.m. and 8 p.m. Interestingly, the second peak starts about an hour later in western cities, such as Sevilla and Cordoba.
The data also reveals that small cities tend to have a single center that becomes busy during the day, such as the cities of Salamanca and Vitoria.
But it also shows that the number of hotspots increases with city size; so-called polycentric cities include Spain’s largest, such as Madrid, Barcelona, and Bilboa.
That could turn out to be useful for automatically classifying cities.
There is a growing interest in the nature of cities, the way they evolve and how their residents use them. The goal of this new science is to make better use of these spaces that more than 50 percent of the planet inhabit. Louail and co show that mobile phone data clearly has an important role to play in this endeavor to better understanding these complex giants.
Ref: arxiv.org/abs/1401.4540 : From Mobile Phone Data To The Spatial Structure Of Cities”
Mapping the ‘Space of Flows’
Paper by Reades J. and Smith D. A. in Regional Studies on the Geography of Global Business Telecommunications and Employment Specialization in the London Mega-City-Region: “Telecommunications has radically reshaped the way that firms organize industrial activity. And yet, because much of this technology – and the interactions that it enables – is invisible, the corporate ‘space of flows’ remains poorly mapped. This article combines detailed employment and telecoms usage data for the South-east of England to build a sector-by-sector profile of globalization at the mega-city-region scale. The intersection of these two datasets allows a new empirical perspective on industrial geography and regional structure to be developed.”
100 Data Innovations
New report by the Information Technology and Innovation Foundation (ITIF): “Businesses, government agencies, and non-profits in countries around the world are transforming virtually every facet of the economy and society through innovative uses of data. These changes, brought about by new technologies and techniques for collecting, storing, analyzing, disseminating, and visualizing data, are improving the quality of life for billions of individuals around the world, opening up new economic opportunities, and creating more efficient and effective governments. This list provides a sampling, in no particular order, of some of the most interesting and important contributions data-driven innovations have made in the past year. (Download)”
Sharing and Caring
Tom Slee: “A new wave of technology companies claims to be expanding the possibilities of sharing and collaboration, and is clashing with established industries such as hospitality and transit. These companies make up what is being called the “sharing economy”: they provide web sites and applications through which individual residents or drivers can offer to “share” their apartment or car with a guest, for a price.
The industries they threaten have long been subject to city-level consumer protection and zoning regulations, but sharing economy advocates claim that these rules are rendered obsolete by the Internet. Battle lines are being drawn between the new companies and city governments. Where’s a good leftist to stand in all of this?
To figure this out, we need to look at the nature of the sharing economy. Some would say it fits squarely into an ideology of unregulated free markets, as described recently by David Golumbia here in Jacobin. Others note that the people involved in American technology industries lean liberal. There’s also a clear Euro/American split in the sharing economy: while the Americans are entrepreneurial and commercial in the way they drive the initiative, the Europeans focus more on the civic, the collaborative, and the non-commercial.
The sharing economy invokes values familiar to many on the Left: decentralization, sustainability, community-level connectedness, and opposition to hierarchical and rigid regulatory regimes, seen mostly clearly in the movement’s bible What’s Mine is Yours: The Rise of Collaborative Consumption by Rachel Botsman and Roo Rogers. It’s the language of co-operatives and of civic groups.
There’s a definite green slant to the movement, too: ideas of “sharing rather than owning” make appeals to sustainability, and the language of sharing also appeals to anti-consumerist sentiments popular on the Left: property and consumption do not make us happy, and we should put aside the pursuit of possessions in favour of connections and experiences. All of which leads us to ideas of community: the sharing economy invokes images of neighbourhoods, villages, and “human-scale” interactions. Instead of buying from a mega-store, we get to share with neighbours.
These ideals have been around for centuries, but the Internet has given them a new slant. An influential line of thought emphasizes that the web lowers the “transaction costs” of group formation and collaboration. The key text is Yochai Benkler’s 2006 book The Wealth of Networks, which argues that the Internet brings with it an alternative style of economic production: networked rather than managed, self-organized rather than ordered. It’s a language associated strongly with both the Left (who see it as an alternative to monopoly capital), and the free-market libertarian right (who see it as an alternative to the state).
Clay Shirky’s 2008 book Here Comes Everybody popularized the ideas further, and in 2012 Steven Johnson announced the appearance of the “Peer Progressive” in his book Future Perfect. The idea of internet-enabled collaboration in the “real” world is a next step from online collaboration in the form of open source software, open government data, and Wikipedia, and the sharing economy is its manifestation.
As with all things technological, there’s an additional angle: the involvement of capital…”
Selected Readings on Big Data
The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of big data was originally published in 2014.
Big Data refers to the wide-scale collection, aggregation, storage, analysis and use of data. Government is increasingly in control of a massive amount of raw data that, when analyzed and put to use, can lead to new insights on everything from public opinion to environmental concerns. The burgeoning literature on Big Data argues that it generates value by: creating transparency; enabling experimentation to discover needs, expose variability, and improve performance; segmenting populations to customize actions; replacing/supporting human decision making with automated algorithms; and innovating new business models, products and services. The insights drawn from data analysis can also be visualized in a manner that passes along relevant information, even to those without the tech savvy to understand the data on its own terms (see The GovLab Selected Readings on Data Visualization).
Selected Reading List (in alphabetical order)
- Australian Government Information Management Office – The Australian Public Service Big Data Strategy: Improved Understanding through Enhanced Data-analytics Capability Strategy Report – a big data strategy report for Australian Government senior executives focused on the use of analytics to gain insight from publicly-held data.
- David Bollier – The Promise and Peril of Big Data – a report describing the promise (such as improving public health through analytics) and potential peril (such as powerful actors abusing large databases of personal information) of big data.
- Danah Boyd and Kate Crawford – Six Provocations for Big Data – a paper pushing back against unchecked faith in big data as a means for solving all problems.
- The Economist Intelligence Unit – Big Data and the Democratisation of Decisions – a report arguing that in both private and public contexts, greater access to big data can improve decision-making.
- James Manyika, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers – Big Data: The Next Frontier for Innovation, Competition, and Productivity – a report predicting the unprecedented impact of big data across practically all sectors and industries.
- The Partnership for Public Service and the IBM Center for The Business of Government – https://bit.ly/2EbBTMg – a collection of strategies for government actors to improve their decision-makers through data analysis.
- TechAmerica Foundation’s Federal Big Data Commission – Demystifying Big Data: A Practical Guide to Transforming the Business of Government – a big data framework for government actors focused on defining, assessing, planning, executing and reviewing.
- World Economic Forum – Big Data, Big Impact: New Possibilities for International
Development – a report arguing that for big data to reach its greatest potential impact, government must take a leading role in increasing open access to useful data.
Annotated Selected Reading List (in alphabetical order)
Australian Government Information Management Office. The Australian Public Service Big Data Strategy: Improved Understanding through Enhanced Data-analytics Capability Strategy Report. August 2013. http://bit.ly/17hs2xY.
- This Big Data Strategy produced for Australian Government senior executives with responsibility for delivering services and developing policy is aimed at ingraining in government officials that the key to increasing the value of big data held by government is the effective use of analytics. Essentially, “the value of big data lies in [our] ability to extract insights and make better decisions.”
- This positions big data as a national asset that can be used to “streamline service delivery, create opportunities for innovation, identify new service and policy approaches as well as supporting the effective delivery of existing programs across a broad range of government operations.”
Bollier, David. The Promise and Peril of Big Data. The Aspen Institute, Communications and Society Program, 2010. http://bit.ly/1a3hBIA.
- This report captures insights from the 2009 Roundtable exploring uses of Big Data within a number of important consumer behavior and policy implication contexts.
- The report concludes that, “Big Data presents many exciting opportunities to improve modern society. There are incalculable opportunities to make scientific research more productive, and to accelerate discovery and innovation. People can use new tools to help improve their health and well-being, and medical care can be made more efficient and effective. Government, too, has a great stake in using large databases to improve the delivery of government services and to monitor for threats to national security.
- However, “Big Data also presents many formidable challenges to government and citizens precisely because data technologies are becoming so pervasive, intrusive and difficult to understand. How shall society protect itself against those who would misuse or abuse large databases? What new regulatory systems, private-law innovations or social practices will be capable of controlling anti-social behaviors–and how should we even define what is socially and legally acceptable when the practices enabled by Big Data are so novel and often arcane?”
Boyd, Danah and Kate Crawford. “Six Provocations for Big Data.” A Decade in Internet Time: Symposium on the Dynamics of the Internet and Society. September 2011. http://bit.ly/1jJstmz.
- In this paper, Boyd and Crawford raise challenges to unchecked assumptions and biases regarding big data. The paper makes a number of assertions about the “computational culture” of big data and pushes back against those who consider big data to be a panacea.
- The authors’ provocations for big data are:
- Automating Research Changes the Definition of Knowledge
- Claims to Objectivity and Accuracy are Misleading
- Big Data is not always Better Data
- Not all Data is Equivalent
- Just Because it is accessible doesn’t make it ethical
- Limited Access to Big Data creates New Digital Divide
The Economist Intelligence Unit. Big Data and the Democratisation of Decisions. October 2012. http://bit.ly/17MpH8L.
- This report from the Economist Intelligence Unit focuses on the positive impact of big data adoption in the private sector, but its insights can also be applied to the use of big data in governance.
- The report argues that innovation can be spurred by democratizing access to data, allowing a diversity of stakeholders to “tap data, draw lessons and make business decisions,” which in turn helps companies and institutions respond to new trends and intelligence at varying levels of decision-making power.
Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh, and Angela Hung Byers. Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey & Company. May 2011. http://bit.ly/18Q5CSl.
- This report argues that big data “will become a key basis of competition, underpinning new waves of productivity growth, innovation, and consumer surplus, and that “leaders in every sector will have to grapple with the implications of big data.”
- The report offers five broad ways in which using big data can create value:
- First, big data can unlock significant value by making information transparent and usable at much higher frequency.
- Second, as organizations create and store more transactional data in digital form, they can collect more accurate and detailed performance information on everything from product inventories to sick days, and therefore expose variability and boost performance.
- Third, big data allows ever-narrower segmentation of customers and therefore much more precisely tailored products or services.
- Fourth, big sophisticated analytics can substantially improve decision-making.
- Finally, big data can be used to improve the development of the next generation of products and services.
The Partnership for Public Service and the IBM Center for The Business of Government. “From Data to Decisions II: Building an Analytics Culture.” October 17, 2012. https://bit.ly/2EbBTMg.
- This report discusses strategies for better leveraging data analysis to aid decision-making. The authors argue that, “Organizations that are successful at launching or expanding analytics program…systematically examine their processes and activities to ensure that everything they do clearly connects to what they set out to achieve, and they use that examination to pinpoint weaknesses or areas for improvement.”
- While the report features many strategies for government decisions-makers, the central recommendation is that, “leaders incorporate analytics as a way of doing business, making data-driven decisions transparent and a fundamental approach to day-to-day management. When an analytics culture is built openly, and the lessons are applied routinely and shared widely, an agency can embed valuable management practices in its DNA, to the mutual benet of the agency and the public it serves.”
TechAmerica Foundation’s Federal Big Data Commission. “Demystifying Big Data: A Practical Guide to Transforming the Business of Government.” 2013. http://bit.ly/1aalUrs.
- This report presents key big data imperatives that government agencies must address, the challenges and the opportunities posed by the growing volume of data and the value Big Data can provide. The discussion touches on the value of big data to businesses and organizational mission, presents case study examples of big data applications, technical underpinnings and public policy applications.
- The authors argue that new digital information, “effectively captured, managed and analyzed, has the power to change every industry including cyber security, healthcare, transportation, education, and the sciences.” To ensure that this opportunity is realized, the report proposes a detailed big data strategy framework with the following steps: define, assess, plan, execute and review.
World Economic Forum. “Big Data, Big Impact: New Possibilities for International Development.” 2012. http://bit.ly/17hrTKW.
- This report examines the potential for channeling the “flood of data created every day by the interactions of billions of people using computers, GPS devices, cell phones, and medical devices” into “actionable information that can be used to identify needs, provide services, and predict and prevent crises for the benefit of low-income populations”
- The report argues that, “To realise the mutual benefits of creating an environment for sharing mobile-generated data, all ecosystem actors must commit to active and open participation. Governments can take the lead in setting policy and legal frameworks that protect individuals and require contractors to make their data public. Development organisations can continue supporting governments and demonstrating both the public good and the business value that data philanthropy can deliver. And the private sector can move faster to create mechanisms for the sharing data that can benefit the public.”
Predictive Modeling With Big Data: Is Bigger Really Better?
New Paper by Junqué de Fortuny, Enric, Martens, David, and Provost, Foster in Big Data :“With the increasingly widespread collection and processing of “big data,” there is natural interest in using these data assets to improve decision making. One of the best understood ways to use data to improve decision making is via predictive analytics. An important, open question is: to what extent do larger data actually lead to better predictive models? In this article we empirically demonstrate that when predictive models are built from sparse, fine-grained data—such as data on low-level human behavior—we continue to see marginal increases in predictive performance even to very large scale. The empirical results are based on data drawn from nine different predictive modeling applications, from book reviews to banking transactions. This study provides a clear illustration that larger data indeed can be more valuable assets for predictive analytics. This implies that institutions with larger data assets—plus the skill to take advantage of them—potentially can obtain substantial competitive advantage over institutions without such access or skill. Moreover, the results suggest that it is worthwhile for companies with access to such fine-grained data, in the context of a key predictive task, to gather both more data instances and more possible data features. As an additional contribution, we introduce an implementation of the multivariate Bernoulli Naïve Bayes algorithm that can scale to massive, sparse data.”
Big Data and the Future of Privacy
John Podesta at the White House blog: “Last Friday, the President spoke to the American people, and the international community, about how to keep us safe from terrorism in a changing world while upholding America’s commitment to liberty and privacy that our values and Constitution require. Our national security challenges are real, but that is surely not the only space where changes in technology are altering the landscape and challenging conceptions of privacy.
That’s why in his speech, the President asked me to lead a comprehensive review of the way that “big data” will affect the way we live and work; the relationship between government and citizens; and how public and private sectors can spur innovation and maximize the opportunities and free flow of this information while minimizing the risks to privacy. I will be joined in this effort by Secretary of Commerce Penny Pritzker, Secretary of Energy Ernie Moniz, the President’s Science Advisor John Holdren, the President’s Economic Advisor Gene Sperling and other senior government officials.
I would like to explain a little bit more about the review, its scope, and what you can expect over the next 90 days.
We are undergoing a revolution in the way that information about our purchases, our conversations, our social networks, our movements, and even our physical identities are collected, stored, analyzed and used. The immense volume, diversity and potential value of data will have profound implications for privacy, the economy, and public policy. The working group will consider all those issues, and specifically how the present and future state of these technologies might motivate changes in our policies across a range of sectors.
When we complete our work, we expect to deliver to the President a report that anticipates future technological trends and frames the key questions that the collection, availability, and use of “big data” raise – both for our government, and the nation as a whole. It will help identify technological changes to watch, whether those technological changes are addressed by the U.S.’s current policy framework and highlight where further government action, funding, research and consideration may be required.
This is going to be a collaborative effort. The President’s Council of Advisors on Science and Technology (PCAST) will conduct a study to explore in-depth the technological dimensions of the intersection of big data and privacy, which will feed into this broader effort. Our working group will consult with industry, civil liberties groups, technologists, privacy experts, international partners, and other national and local government officials on the significance of and future for these technologies. Finally, we will be working with a number of think tanks, academic institutions, and other organizations around the country as they convene stakeholders to discuss these very issues and questions. Likewise, many abroad are analyzing and responding to the challenge and seizing the opportunity of big data. These discussions will help to inform our study.
While we don’t expect to answer all these questions, or produce a comprehensive new policy in 90 days, we expect this work to serve as the foundation for a robust and forward-looking plan of action. Check back on this blog for updates on how you can get involved in the debate and for status updates on our progress.”