Data for policy: when the haystack is made of needles. A call for contributions


Diana Vlad-Câlcic at the European Commission: “If policy-making is ‘whatever government chooses to do or not to do’ (Th. Dye), then how do governments actually decide? Evidence-based policy-making is not a new answer to this question, but it is constantly challenging both policy-makers and scientists to sharpen their thinking, their tools and their responsiveness.  The European Commission has recognised this and has embedded in its processes, namely through Impact Assessment, policy monitoring and evaluation, an evidence-informed decision-making approach.

With four parameters I can fit an elephant, and with five I can make him wiggle his trunk. (John von Neumann)

New data technologies raise the bar high for advanced modelling, dynamic visualisation, real-time data flows and a variety of data sources, from sensors, to cell phones or the Internet as such. An abundance of (big) data, a haystack made of needles, but do public administrations have the right tools and skills to exploit it? How much of it adds real value to established statistics and to scientific evidence? Are the high hopes and the high expectations partly just hype? And what lessons can we learn from experience?

To explore these questions, the European Commission is launching a study with the Oxford Internet Institute, Technopolis and CEPS  on ‘Data for policy: big data and other innovative data-driven approaches for evidence-informed policymaking’. As a first step, the study will collect examples of initiatives in public institutions at national and international level, where innovative data technologies contribute to the policy process. It will eventually develop case-studies for EU policies.

Contribute to the collective reflection by sharing with us good practices and examples you have from other public administrations. Follow the developments of the study also on Twitter @data4policyEU

Big Data Is an Economic Justice Issue, Not Just a Privacy Problem


in the Huffington Post: “The control of personal data by “big data” companies is not just an issue of privacy but is becoming a critical issue of economic justice, argues a new report issued by the organization Data Justice>, which itself is being publicly launched in conjunction with the report. ..

At the same time, big data is fueling economic concentration across our economy. As a handful of data platforms generate massive amounts of user data, the barriers to entry rise, since potential competitors have little data themselves to entice advertisers compared with the incumbents, who have both the concentrated processing power and the supply of user data to dominate particular sectors. With little competition, companies end up with little incentive to either protect user privacy or share the economic value of that user data with the consumers generating those profits.

The report argues for a threefold approach to making big data work for everyone in the economy, not just for the big data platforms’ shareholders:

  • First, regulators need to strengthen user control of their own data by both requiring explicit consent for all uses of the data and better informing users of how it’s being used and how companies profit from that data.
  • Second, regulators need to factor control of data into merger review, and to initiate antitrust actions against companies like Google where monopoly control of a sector like search advertising has been established.
  • Third, policymakers should restrict practices that harm consumers, including banning price discrimination where consumers are not informed of all discount options available and bringing the participation of big data platforms in marketing financial services under the regulation of the Consumer Financial Protection Bureau.

Data Justice itself has been founded as an organization “to promote public education and new alliances to challenge the danger of big data to workers, consumers and the public.” It will work to educate the public, policymakers and organizational allies on how big data is contributing to economic inequality in the economy. Its new website at datajustice.org is intended to bring together a wide range of resources highlighting the economic justice aspects of big data.”

States Use Big Data to Nab Tax Fraudsters


at Governing: “It’s tax season again. For most of us, that means undergoing the laborious and thankless task of assembling financial records and calculating taxes for state and federal returns. But for a small group of us, tax season is profit season. It’s the time of year when fraudsters busy themselves with stealing identities and electronically submitting fraudulent tax returns for refunds.
Nobody knows for sure just how much tax return fraud is committed, but the amount is rising fast. According to the U.S. Treasury, the number of identified fraudulent federal returns has increased by 40 percent from 2011 to 2012, an increase of more than $4 billion. Ten years ago, New York state stopped refunds on 50,000 fraudulently filed tax returns. Last year, the number of stopped refunds was 250,000, according to Nonie Manion, executive deputy commissioner for the state’s Department of Taxation and Finance….
To combat the problem, state revenue and tax agencies are using software programs to sift through mounds of data and detect patterns that would indicate when a return is not valid. Just about every state with a tax fraud detection program already compares tax return data with information from other state agencies and private firms to spot incorrect mailing addresses and stolen identities. Because so many returns are filed electronically, fraud spotting systems look for suspicious Internet protocol (IP) addresses. For example, tax auditors in New York noticed that similar IP addresses in Fort Lauderdale, Fla., were submitting a series of returns for refunds. When the state couldn’t match the returns with any employer data, they were flagged for further scrutiny and  ultimately found to be fraudulent.
High-tech analytics is one way states keep up with the war on fraud. The other is accurate data. The third component is well trained staff. But it takes time and money to put together the technology and the expertise to combat the growing sophistication of fraudsters….(More)”

US government and private sector developing ‘precrime’ system to anticipate cyber-attacks


Martin Anderson at The Stack: “The USA’s Office of the Director of National Intelligence (ODNI) is soliciting the involvement of the private and academic sectors in developing a new ‘precrime’ computer system capable of predicting cyber-incursions before they happen, based on the processing of ‘massive data streams from diverse data sets’ – including social media and possibly deanonymised Bitcoin transactions….
At its core the predictive technologies to be developed in association with the private sector and academia over 3-5 years are charged with the mission ‘to invest in high-risk/high-payoff research that has the potential to provide the U.S. with an overwhelming intelligence advantage over our future adversaries’.
The R&D program is intended to generate completely automated, human-free prediction systems for four categories of event: unauthorised access, Denial of Service (DoS), malicious code and scans and probes which are seeking access to systems.
The CAUSE project is an unclassified program, and participating companies and organisations will not be granted access to NSA intercepts. The scope of the project, in any case, seems focused on the analysis of publicly available Big Data, including web searches, social media exchanges and trawling ungovernable avalanches of information in which clues to future maleficent actions are believed to be discernible.
Program manager Robert Rahmer says: “It is anticipated that teams will be multidisciplinary and might include computer scientists, data scientists, social and behavioral scientists, mathematicians, statisticians, content extraction experts, information theorists, and cyber-security subject matter experts having applied experience with cyber capabilities,”
Battelle, one of the concerns interested in participating in CAUSE, is interested in employing Hadoop and Apache Spark as an approach to the data mountain, and includes in its preliminary proposal an intent to ‘de-anonymize Bitcoin sale/purchase activity to capture communication exchanges more accurately within threat-actor forums…’.
Identifying and categorising quality signal in the ‘white noise’ of Big Data is a central plank in CAUSE, and IARPA maintains several offices to deal with different aspects of it. Its pointedly-named ‘Office for Anticipating Surprise’  frames the CAUSE project best, since it initiated it. The OAS is occupied with ‘Detecting and forecasting the emergence of new technical capabilities’, ‘Early warning of social and economic crises, disease outbreaks, insider threats, and cyber attacks’ and ‘Probabilistic forecasts of major geopolitical trends and rare events’.
Another concerned department is The Office of Incisive Analysis, which is attempting to break down the ‘data static’ problem into manageable mission stages:
1) Large data volumes and varieties – “Providing powerful new sources of information from massive, noisy data that currently overwhelm analysts”
2) Social-Cultural and Linguistic Factors – “Analyzing language and speech to produce insights into groups and organizations. “
3) Improving Analytic Processes – “Dramatic enhancements to the analytic process at the individual and group level. “
The Office of Smart Collection develops ‘new sensor and transmission technologies, with the seeking of ‘Innovative approaches to gain access to denied environments’ as part of its core mission, while the Office of Safe and Secure Operations concerns itself with ‘Revolutionary advances in science and engineering to solve problems intractable with today’s computers’.
The CAUSE program, which attracted 150 developers, organisations, academics and private companies to the initial event, will announce specific figures about funding later in the year, and practice ‘predictions’ from participants will begin in the summer, in an accelerating and stage-managed program over five years….(More)”

Why Information Grows: The Evolution of Order, from Atoms to Economies


Forthcoming book: “In Why Information Grows, rising star César Hidalgo offers a radical interpretation of global economicsWhile economists often turn to measures like GDP or per-capita income, César Hidalgo turns to information theory to explain the success or failure of a country’s economic performance. Through a radical rethinking of what the economy is, Hidalgo shows that natural constraints in our ability to accumulate knowledge, knowhow and information explain the evolution of social and economic complexity. This is a rare tour de force, linking economics, sociology, physics, biology and information theory, to explain the evolution of social and economic systems as a consequence of the physical embodiment of information in a world where knowledge is quite literally power.
César Hidalgo leads the Macro Connections group at the MIT Media Lab. A trained statistical physicist and an expert on Networks and Complex Systems, he also has extensive experience in the field of economic development and has pioneered research on how big data impacts economic decision-making….(More)”

Platform lets patients contribute to their own medical records


Springwise: “Those with complex medical conditions often rely heavily on their own ability to communicate their symptoms in short — and sometimes stressful — healthcare visits. We have recently seen Ginger.io, a smartphone app which uses big data to improve communication between patients and clinicians in between visits, and now OurNotes is a Commonwealth grant funded program that will enable patients to contribute to their own electronic medical records.
The scheme, currently being researched at Beth Isreal Deaconess Medical Centre in Boston and four other sites in the US, is part of a countrywide initiative called OpenNotes, which has already enabled five million patients to read their medical records online. Since an initial pilot scheme in 2012, OpenNotes has met with great success — creating improved communication between patients and doctors, and making patients feel more in control of their healthcare and treatments.
The new OurNotes scheme is expected to have particular benefits for medically complex patients who have have multiple chronic health conditions. It will enable patients to make notes on an upcoming visit, listing topics and questions they want to cover. In turn, this presents doctors with an opportunity to prepare and research for tricky or niche questions before meeting their patient…(More)”

Data-Driven Development Pathways for Progress


Report from the World Economic Forum: “Data is the lifeblood of sustainable development and holds tremendous potential for transformative positive change particularly for lower- and middle-income countries. Yet despite the promise of a “Data Revolution”, progress is not a certainty. Lack of clarity on privacy and ethical issues, asymmetric power dynamics and an array of entangled societal and commercial risks threaten to hinder progress.
Written by the World Economic Forum Global Agenda Council on Data-Driven Development, this report serves to clarify how big data can be leveraged to address the challenges of sustainable development. Providing a blueprint for balancing competing tensions, areas of focus include: addressing the data deficit of the Global South, establishing resilient governance and strengthening capacities at the community and individual level. (PDF)”

Unleashing the Power of Data to Serve the American People


Memorandum: Unleashing the Power of Data to Serve the American People
To: The American People
From: Dr. DJ Patil, Deputy U.S. CTO for Data Policy and Chief Data Scientist

….While there is a rich history of companies using data to their competitive advantage, the disproportionate beneficiaries of big data and data science have been Internet technologies like social media, search, and e-commerce. Yet transformative uses of data in other spheres are just around the corner. Precision medicine and other forms of smarter health care delivery, individualized education, and the “Internet of Things” (which refers to devices like cars or thermostats communicating with each other using embedded sensors linked through wired and wireless networks) are just a few of the ways in which innovative data science applications will transform our future.

The Obama administration has embraced the use of data to improve the operation of the U.S. government and the interactions that people have with it. On May 9, 2013, President Obama signed Executive Order 13642, which made open and machine-readable data the new default for government information. Over the past few years, the Administration has launched a number of Open Data Initiatives aimed at scaling up open data efforts across the government, helping make troves of valuable data — data that taxpayers have already paid for — easily accessible to anyone. In fact, I used data made available by the National Oceanic and Atmospheric Administration to improve numerical methods of weather forecasting as part of my doctoral work. So I know firsthand just how valuable this data can be — it helped get me through school!

Given the substantial benefits that responsibly and creatively deployed data can provide to us and our nation, it is essential that we work together to push the frontiers of data science. Given the importance this Administration has placed on data, along with the momentum that has been created, now is a unique time to establish a legacy of data supporting the public good. That is why, after a long time in the private sector, I am returning to the federal government as the Deputy Chief Technology Officer for Data Policy and Chief Data Scientist.

Organizations are increasingly realizing that in order to maximize their benefit from data, they require dedicated leadership with the relevant skills. Many corporations, local governments, federal agencies, and others have already created such a role, which is usually called the Chief Data Officer (CDO) or the Chief Data Scientist (CDS). The role of an organization’s CDO or CDS is to help their organization acquire, process, and leverage data in a timely fashion to create efficiencies, iterate on and develop new products, and navigate the competitive landscape.

The Role of the First-Ever U.S. Chief Data Scientist

Similarly, my role as the U.S. CDS will be to responsibly source, process, and leverage data in a timely fashion to enable transparency, provide security, and foster innovation for the benefit of the American public, in order to maximize the nation’s return on its investment in data.

So what specifically am I here to do? As I start, I plan to focus on these four activities:

…(More)”

Choosing Not to Choose: Understanding the Value of Choice


New book by Cass Sunstein: “Our ability to make choices is fundamental to our sense of ourselves as human beings, and essential to the political values of freedom-protecting nations. Whom we love; where we work; how we spend our time; what we buy; such choices define us in the eyes of ourselves and others, and much blood and ink has been spilt to establish and protect our rights to make them freely.
Choice can also be a burden. Our cognitive capacity to research and make the best decisions is limited, so every active choice comes at a cost. In modern life the requirement to make active choices can often be overwhelming. So, across broad areas of our lives, from health plans to energy suppliers, many of us choose not to choose. By following our default options, we save ourselves the costs of making active choices. By setting those options, governments and corporations dictate the outcomes for when we decide by default. This is among the most significant ways in which they effect social change, yet we are just beginning to understand the power and impact of default rules. Many central questions remain unanswered: When should governments set such defaults, and when should they insist on active choices? How should such defaults be made? What makes some defaults successful while others fail?….
The onset of big data gives corporations and governments the power to make ever more sophisticated decisions on our behalf, defaulting us to buy the goods we predictably want, or vote for the parties and policies we predictably support. As consumers we are starting to embrace the benefits this can bring. But should we? What will be the long-term effects of limiting our active choices on our agency? And can such personalized defaults be imported from the marketplace to politics and the law? Confronting the challenging future of data-driven decision-making, Sunstein presents a manifesto for how personalized defaults should be used to enhance, rather than restrict, our freedom and well-being. (More)”

Ad hoc encounters with big data: Engaging citizens in conversations around tabletops


Morten Fjeld, Paweł Woźniak, Josh Cowls, Bonnie Nardi at FirstMonday: “The increasing abundance of data creates new opportunities for communities of interest and communities of practice. We believe that interactive tabletops will allow users to explore data in familiar places such as living rooms, cafés, and public spaces. We propose informal, mobile possibilities for future generations of flexible and portable tabletops. In this paper, we build upon current advances in sensing and in organic user interfaces to propose how tabletops in the future could encourage collaboration and engage users in socially relevant data-oriented activities. Our work focuses on the socio-technical challenges of future democratic deliberation. As part of our vision, we suggest switching from fixed to mobile tabletops and provide two examples of hypothetical interface types: TableTiles and Moldable Displays. We consider how tabletops could foster future civic communities, expanding modes of participation originating in the Greek Agora and in European notions of cafés as locales of political deliberation….(More)”