The trouble with Big Data? It is called the “recency bias”.


One of the problems with such a rate of information increase is that the present moment will always loom far larger than even the recent past. Imagine looking back over a photo album representing the first 18 years of your life, from birth to adulthood. Let’s say that you have two photos for your first two years. Assuming a rate of information increase matching that of the world’s data, you will have an impressive 2,000 photos representing the years six to eight; 200,000 for the years 10 to 12; and a staggering 200,000,000 for the years 16 to 18. That’s more than three photographs for every single second of those final two years.

The moment you start looking backwards to seek the longer view, you have far too much of the recent stuff and far too little of the old

This isn’t a perfect analogy with global data, of course. For a start, much of the world’s data increase is due to more sources of information being created by more people, along with far larger and more detailed formats. But the point about proportionality stands. If you were to look back over a record like the one above, or try to analyse it, the more distant past would shrivel into meaningless insignificance. How could it not, with so many times less information available?

Here’s the problem with much of the big data currently being gathered and analysed. The moment you start looking backwards to seek the longer view, you have far too much of the recent stuff and far too little of the old. Short-sightedness is built into the structure, in the form of an overwhelming tendency to over-estimate short-term trends at the expense of history.

To understand why this matters, consider the findings from social science about ‘recency bias’, which describes the tendency to assume that future events will closely resemble recent experience. It’s a version of what is also known as the availability heuristic: the tendency to base your thinking disproportionately on whatever comes most easily to mind. It’s also a universal psychological attribute. If the last few years have seen exceptionally cold summers where you live, for example, you might be tempted to state that summers are getting colder – or that your local climate may be cooling. In fact, you shouldn’t read anything whatsoever into the data. You would need to take a far, far longer view to learn anything meaningful about climate trends. In the short term, you’d be best not speculating at all – but who among us can manage that?

Short-term analyses aren’t only invalid – they’re actively unhelpful and misleading

The same tends to be true of most complex phenomena in real life: stock markets, economies, the success or failure of companies, war and peace, relationships, the rise and fall of empires. Short-term analyses aren’t only invalid – they’re actively unhelpful and misleading. Just look at the legions of economists who lined up to pronounce events like the 2009 financial crisis unthinkable right until it happened. The very notion that valid predictions could be made on that kind of scale was itself part of the problem.

It’s also worth remembering that novelty tends to be a dominant consideration when deciding what data to keep or delete. Out with the old and in with the new: that’s the digital trend in a world where search algorithms are intrinsically biased towards freshness, and where so-called link rot infests everything from Supreme Court decisions to entire social media services. A bias towards the present is structurally engrained in almost all the technology surrounding us, not least thanks to our habit of ditching most of our once-shiny machines after about five years.

What to do? This isn’t just a question of being better at preserving old data – although this wouldn’t be a bad idea, given just how little is currently able to last decades rather than years. More importantly, it’s about determining what is worth preserving in the first place – and what it means meaningfully to cull information in the name of knowledge.

What’s needed is something that I like to think of as “intelligent forgetting”: teaching our tools to become better at letting go of the immediate past in order to keep its larger continuities in view. It’s an act of curation akin to organising a photograph album – albeit with more maths….(More)

White House Challenges Artificial Intelligence Experts to Reduce Incarceration Rates


Jason Shueh at GovTech: “The U.S. spends $270 billion on incarceration each year, has a prison population of about 2.2 million and an incarceration rate that’s spiked 220 percent since the 1980s. But with the advent of data science, White House officials are asking experts for help.

On Tuesday, June 7, the White House Office of Science and Technology Policy’s Lynn Overmann, who also leads the White House Police Data Initiative, stressed the severity of the nation’s incarceration crisis while asking a crowd of data scientists and artificial intelligence specialists for aid.

“We have built a system that is too large, and too unfair and too costly — in every sense of the word — and we need to start to change it,” Overmann said, speaking at a Computing Community Consortium public workshop.

She argued that the U.S., a country that has the highest amount incarcerated citizens in the world, is in need of systematic reforms with both data tools to process alleged offenders and at the policy level to ensure fair and measured sentences. As a longtime counselor, advisor and analyst for the Justice Department and at the city and state levels, Overman said she has studied and witnessed an alarming number of issues in terms of bias and unwarranted punishments.

For instance, she said that statistically, while drug use is about equal between African-Americans and Caucasians, African-Americans are more likely to be arrested and convicted. They also receive longer prison sentences compared to Caucasian inmates convicted of the same crimes….

Data and digital tools can help curb such pitfalls by increasing efficiency, transparency and accountability, she said.

“We think these types of data exchanges [between officials and technologists] can actually be hugely impactful if we can figure out how to take this information and operationalize it for the folks who run these systems,” Obermann noted.

The opportunities to apply artificial intelligence and data analytics, she said, might include using it to improve questions on parole screenings, using it to analyze police body camera footage, and applying it to criminal justice data for legislators and policy workers….

If the private sector is any indication, artificial intelligence and machine learning techniques could be used to interpret this new and vast supply of law enforcement data. In an earlier presentation by Eric Horvitz, the managing director at Microsoft Research, Horvitz showcased how the company has applied artificial intelligence to vision and language to interpret live video content for the blind. The app, titled SeeingAI, can translate live video footage, captured from an iPhone or a pair of smart glasses, into instant audio messages for the seeing impaired. Twitter’s live-streaming app Periscope has employed similar technology to guide users to the right content….(More)”

What Algorithmic Injustice Looks Like in Real Life


Julia Angwin, Jeff Larson, Surya Mattu & Lauren Kirchner at Pacific Standard: “Courtrooms across the nation are using computer programs to predict who will be a future criminal. The programs help inform decisions on everything from bail to sentencing. They are meant to make the criminal justice system fairer — and to weed out human biases.

ProPublica tested one such program and found that it’s often wrong — and biased against blacks.

We looked at the risk scores the program spit out for more than 7,000 people arrested in Broward County, Florida, in 2013 and 2014. We checked to see how many defendants were charged with new crimes over the next two years — the same benchmark used by the creators of the algorithm. Our analysis showed:

  • The formula was particularly likely to falsely flag black defendants as future criminals, wrongly labeling them this way at almost twice the rate as white defendants.
  • White defendants were mislabeled as low risk more often than black defendants.

What does that look like in real life? Here are five comparisons of defendants — one black and one white — who were charged with similar offenses but got very different scores.

Two Shoplifting Arrests

James Rivelli, 53: In August 2014, Rivelli allegedly shoplifted seven boxes of Crest Whitestrips from a CVS. An employee called the police. When the cops found Rivelli and pulled him over, they found the Whitestrips as well as heroin and drug paraphernalia in his car. He was charged with two felony counts and four misdemeanors for grand theft, drug possession, and driving with a suspended license and expired tags.

Past offenses: He had been charged with felony aggravated assault for domestic violence in 1996, felony grand theft also in 1996, and a misdemeanor theft in 1998. He also says that he was incarcerated in Massachusetts for felony drug trafficking.

COMPAS score: 3 — low

Subsequent offense: In April 2015, he was charged with two felony counts of grand theft in the 3rd degree for shoplifting about $1,000 worth of tools from a Home Depot.

He says: Rivelli says his crimes were fueled by drug use and he is now sober. “I’m surprised [my risk score] is so low,” Rivelli said in an interview in his mother’s apartment in April. “I spent five years in state prison in Massachusetts.”…(More)

Private Data and the Public Good


Gideon Mann‘s remarks on the occasion of the Robert Khan distinguished lecture at The City College of New York on 5/22/16: and opportunities about a specific aspect of this relationship, the broader need for computer science to engage with the real world. Right now, a key aspect of this relationship is being built around the risks and opportunities of the emerging role of data.

Ultimately, I believe that these relationships, between computer science andthe real world, between data science and real problems, hold the promise tovastly increase our public welfare. And today, we, the people in this room,have a unique opportunity to debate and define a more moral dataeconomy….

The hybrid research model proposes something different. The hybrid research model, embeds, as it were, researchers as practitioners.The thought was always that you would be going about your regular run of business,would face a need to innovate to solve a crucial problem, and would do something novel. At that point, you might choose to work some extra time and publish a paper explaining your innovation. In practice, this model rarely works as expected. Tight deadlines mean the innovation that people do in their normal progress of business is incremental..

This model separated research from scientific publication, and shortens thetime-window of research, to what can be realized in a few year time zone.For me, this always felt like a tremendous loss, with respect to the older so-called “ivory tower” research model. It didn’t seem at all clear how this kindof model would produce the sea change of thought engendered byShannon’s work, nor did it seem that Claude Shannon would ever want towork there. This kind of environment would never support the freestanding wonder, like the robot mouse that Shannon worked on. Moreover, I always believed that crucial to research is publication and participation in the scientific community. Without this engagement, it feels like something different — innovation perhaps.

It is clear that the monopolistic environment that enabled AT&T to support this ivory tower research doesn’t exist anymore. .

Now, the hybrid research model was one model of research at Google, butthere is another model as well, the moonshot model as exemplified byGoogle X. Google X brought together focused research teams to driveresearch and development around a particular project — Google Glass and the Self-driving car being two notable examples. Here the focus isn’t research, but building a new product, with research as potentially a crucial blocking issue. Since the goal of Google X is directly to develop a new product, by definition they don’t publish papers along the way, but they’re not as tied to short-term deliverables as the rest of Google is. However, they are again decidedly un-Bell-Labs like — a secretive, tightly focused, non-publishing group. DeepMind is a similarly constituted initiative — working, for example, on a best-in-the-world Go playing algorithm, with publications happening sparingly.

Unfortunately, both of these approaches, the hybrid research model and the moonshot model stack the deck towards a particular kind of research — research that leads to relatively short term products that generate corporate revenue. While this kind of research is good for society, it isn’t the only kind of research that we need. We urgently need research that is longterm, and that is undergone even without a clear financial local impact. Insome sense this is a “tragedy of the commons”, where a shared public good (the commons) is not supported because everyone can benefit from itwithout giving back. Academic research is thus a non-rival, non-excludible good, and thus reasonably will be underfunded. In certain cases, this takes on an ethical dimension — particularly in health care, where the choice ofwhat diseases to study and address has a tremendous potential to affect human life. Should we research heart disease or malaria? This decisionmakes a huge impact on global human health, but is vastly informed by the potential profit from each of these various medicines….

Private Data means research is out of reach

The larger point that I want to make, is that in the absence of places where long-term research can be done in industry, academia has a tremendous potential opportunity. Unfortunately, it is actually quite difficult to do the work that needs to be done in academia, since many of the resources needed to push the state of the art are only found in industry: in particular data.

Of course, academia also lacks machine resources, but this is a simpler problem to fix — it’s a matter of money, resources form the government could go to enabling research groups building their own data centers or acquiring the computational resources from the market, e.g. Amazon. This is aided by the compute philanthropy that Google and Microsoft practice that grant compute cycles to academic organizations.

But the data problem is much harder to address. The data being collected and generated at private companies could enable amazing discoveries and research, but is impossible for academics to access. The lack of access to private data from companies actually is much more significant effects than inhibiting research. In particular, the consumer level data, collected by social networks and internet companies could do much more than ad targeting.

Just for public health — suicide prevention, addiction counseling, mental health monitoring — there is enormous potential in the use of our online behavior to aid the most needy, and academia and non-profits are set-up to enable this work, while companies are not.

To give a one examples, anorexia and eating disorders are vicious killers. 20 million women and 10 million men suffer from a clinically significant eating disorder at some time in their life, and sufferers of eating disorders have the highest mortality rate of any other mental health disorder — with a jaw-dropping estimated mortality rate of 10%, both directly from injuries sustained by the disorder and by suicide resulting from the disorder.

Eating disorders are particular in that sufferers often seek out confirmatory information, blogs, images and pictures that glorify and validate what sufferers see as “lifestyle” choices. Browsing behavior that seeks out images and guidance on how to starve yourself is a key indicator that someone is suffering. Tumblr, pinterest, instagram are places that people host and seek out this information. Tumblr has tried to help address this severe mental health issue by banning blogs that advocate for self-harm and by adding PSA announcements to query term searches for queries for or related to anorexia. But clearly — this is not the be all and end all of work that could be done to detect and assist people at risk of dying from eating disorders. Moreover, this data could also help understand the nature of those disorders themselves…..

There is probably a role for a data ombudsman within private organizations — someone to protect the interests of the public’s data inside of an organization. Like a ‘public editor’ in a newspaper according to how you’ve set it up. There to protect and articulate the interests of the public, which means probably both sides — making sure a company’s data is used for public good where appropriate, and making sure the ‘right’ to privacy of the public is appropriately safeguarded (and probably making sure the public is informed when their data is compromised).

Next, we need a platform to make collaboration around social good between companies and between companies and academics. This platform would enable trusted users to have access to a wide variety of data, and speed process of research.

Finally, I wonder if there is a way that government could support research sabbaticals inside of companies. Clearly, the opportunities for this research far outstrip what is currently being done…(more)”

Value and Vulnerability: The Internet of Things in a Connected State Government


Pressrelease: “The National Association of State Chief Information Officers (NASCIO) today released a policy brief on the Internet of Things (IoT) in state government. The paper focuses on the different ways state governments are using IoT now and in the future and the policy considerations involved.

“In NASCIO’s 2015 State CIO Survey, we asked state CIOs to what extent IoT was on their agenda. Just over half said they were in informal discussions, however only one in five had moved to the formal discussion phase. We believe IoT needs to be a formal part of each state’s policy considerations,” explained NASCIO Executive Director Doug Robinson.

The paper encourages state CIOs to make IoT part of the enterprise architecture discussions on asset management and risk assessment and to develop an IoT roadmap.

“Cities and municipalities have been working toward the designation of ‘smart city’ for a while now,” said Darryl Ackley, cabinet secretary for the New Mexico Department of Information Technology and NASCIO president. “While states provide different services than cities, we are seeing a lot of activity around IoT to improve citizen services and we see great potential for growth. The more organized and methodical states can be about implementing IoT, the more successful and useful the outcomes.”

Read the policy brief at www.NASCIO.org/ValueAndVulnerability 

These Online Platforms Make Direct Democracy Possible


Tom Ladendorf in InTheseTimes: “….Around the world, organizations from political parties to cooperatives are experimenting with new modes of direct democracy made possible by the internet.

“The world has gone through extraordinary technological innovation,” says Agustín Frizzera of Argentina’s Net Party. “But governments and political institutions haven’t innovated enough.”

The founders of the four-year-old party have also built an online platform, DemocracyOS, that lets users discuss and vote on proposals being considered by their legislators.

Anyone can adopt the technology, but the Net Party uses it to let Buenos Aires residents debate City Council measures. A 2013 thread, for example, concerned a plan to require bars and restaurants to make bathrooms free and open to the public.

“I recognize the need for freely available facilities, but it is the state who should be offering this service,” reads the top comment, voted most helpful by users. Others argued that private bathrooms open the door to discrimination. Ultimately, 56.9 percent of participants supported the proposal, while 35.3 percent voted against and 7.8 percent abstained….

A U.S. company called PlaceAVote, launched in 2014, takes what it calls a more pragmatic approach. According to cofounder Job Melton, PlaceAVote’s goal is to “work within the system we have now and fix it from the inside out” instead of attempting the unlikely feat of building a third U.S. party.

Like the Net Party and its brethren, PlaceAVote offers an online tool that lets voters participate in decision making. Right now, the technology is in public beta at PlaceAVote.com, allowing users nationwide to weigh in on legislation before Congress….

But digital democracy has applications that extend beyond electoral politics. A wide range of groups are using web-based decision-making tools internally. The Mexican government, for example, has used DemocracyOS to gather citizen feedback on a data-protection law, and Brazilian civil society organizations are using it to encourage engagement with federal and municipal policy-making.

Another direct-democracy tool in wide use is Loomio, developed by a cooperative in New Zealand. Ben Knight, one of Loomio’s cofounders, sums up his experience with Occupy as one of “seeing massive potential of collective decision making, and then realizing how difficult it could be in person.” After failing to find an online tool to facilitate the process, the Loomio team created a platform that enables online discussion with a personal element: Votes are by name and voters can choose to “disagree” with or even “block” proposals. Provo, Utah, uses Loomio for public consultation, and a number of political parties use Loomio for local decision making, including the Brazilian Pirate Party, several regional U.K. Green Party chapters and Spain’s Podemos. Podemos has enthusiastically embraced digital-democracy tools for everything from its selection of European Parliament candidates to the creation of its party platform….(More)”

All European scientific articles to be freely accessible by 2020


EU Presidency: “All scientific articles in Europe must be freely accessible as of 2020. EU member states want to achieve optimal reuse of research data. They are also looking into a European visa for foreign start-up founders.

And, according to the new Innovation Principle, new European legislation must take account of its impact on innovation. These are the main outcomes of the meeting of the Competitiveness Council in Brussels on 27 May.

Sharing knowledge freely

Under the presidency of Netherlands State Secretary for Education, Culture and Science Sander Dekker, the EU ministers responsible for research and innovation decided unanimously to take these significant steps. Mr Dekker is pleased that these ambitions have been translated into clear agreements to maximise the impact of research. ‘Research and innovation generate economic growth and more jobs and provide solutions to societal challenges,’ the state secretary said. ‘And that means a stronger Europe. To achieve that, Europe must be as attractive as possible for researchers and start-ups to locate here and for companies to invest. That calls for knowledge to be freely shared. The time for talking about open access is now past. With these agreements, we are going to achieve it in practice.’

Open access

Open access means that scientific publications on the results of research supported by public and public-private funds must be freely accessible to everyone. That is not yet the case. The results of publicly funded research are currently not accessible to people outside universities and knowledge institutions. As a result, teachers, doctors and entrepreneurs do not have access to the latest scientific insights that are so relevant to their work, and universities have to take out expensive subscriptions with publishers to gain access to publications.

Reusing research data

From 2020, all scientific publications on the results of publicly funded research must be freely available. It also must be able to optimally reuse research data. To achieve that, the data must be made accessible, unless there are well-founded reasons for not doing so, for example intellectual property rights or security or privacy issues….(More)”

An App to Save Syria’s Lost Generation? What Technology Can and Can’t Do


 in Foreign Affairs: ” In January this year, when the refugee and migrant crisis in Europe had hit its peak—more than a million had crossed into Europe over the course of 2015—the U.S. State Department and Google hosted a forum of over 100 technology experts. The goal was to “bridge the education gap for Syrian refugee children.” Speaking to the group assembled at Stanford University, Deputy Secretary of State Antony Blinken announced a $1.7 million prize “to develop a smartphone app that can help Syrian children learn how to read and improve their wellbeing.” The competition, known as EduApp4Syria, is being run by the Norwegian Agency for Development Cooperation (Norad) and is supported by the Australian government and the French mobile company Orange.

Less than a month later, a group called Techfugees brought together over 100 technologists for a daylong brainstorm in New York City focused exclusively on education solutions. “We are facing the largest refugee crisis since World War II,” said U.S. Ambassador to the United Nations Samantha Power to open the conference. “It is a twenty-first-century crisis and we need a twenty-first-century solution.” Among the more promising, according to Power, were apps that enable “refugees to access critical services,” new “web platforms connecting refugees with one another,” and “education programs that teach refugees how to code.”

For example, the nonprofit PeaceGeeks created the Services Advisor app for the UN Refugee Agency, which maps the location of shelters, food distribution centers, and financial services in Jordan….(More)”

Open data + increased disclosure = better public-private partnerships


David Bloomgarden and Georg Neumann at Fomin Blog: “The benefits of open and participatory public procurement are increasingly being recognized by international bodies such as the Group of 20 major economies, the Organisation for Economic Co-operation and Development, and multilateral development banks. Value for money, more competition, and better goods and services for citizens all result from increased disclosure of contract data. Greater openness is also an effective tool to fight fraud and corruption.

However, because public-private partnerships (PPPs) are planned during a long timeframe and involve a large number of groups, therefore, implementing greater levels of openness in disclosure is complicated. This complexity can be a challenge to good design. Finding a structured and transparent approach to managing PPP contract data is fundamental for a project to be accepted and used by its local community….

In open contracting, all data is disclosed during the public procurement process—from the planning stage, to the bidding and awarding of the contract, to the monitoring of the implementation. A global open source data standard is used to publish that data, which is already being implemented in countries as diverse as Canada, Paraguay, and the Ukraine. Using open data throughout the contracting process provides opportunities to innovate in managing bids, fixing problems, and integrating feedback as needed. Open contracting contributes to the overall social and environmental sustainability of infrastructure investments.

In the case of Mexico’s airport, the project publishes details of awarded contracts, including visualizing the flow of funds and detailing the full amounts of awarded contracts and renewable agreements. Standardized, timely, and open data that follow global standards such as the Open Contracting Data Standard will make this information useful for analysis of value for money, cost-benefit, sustainability, and monitoring performance. Crucially, open contracting will shift the focus from the inputs into a PPP, to the outputs: the goods and services being delivered.

Benefits of open data for PPPs

We think that better and open data will lead to better PPPs. Here’s how:

1. Using user feedback to fix problems

The Brazilian state of Minas Gerais has been a leader in transparent PPP contracts with full proactive disclosure of the contract terms, as well as of other relevant project information—a practice that puts a government under more scrutiny but makes for better projects in the long run.

According to Marcos Siqueira, former head of the PPP Unit in Minas Gerais, “An adequate transparency policy can provide enough information to users so they can become contract watchdogs themselves.”

For example, a public-private contract was signed in 2014 to build a $300 million waste treatment plant for 2.5 million people in the metropolitan area of Belo Horizonte, the capital of Minas Gerais. As the team members conducted appraisals, they disclosed them on the Internet. In addition, the team held around 20 public meetings and identified all the stakeholders in the project. One notable result of the sharing and discussion of this information was the relocation of the facility to a less-populated area. When the project went to the bidding phase, it was much closer to the expectations of its various stakeholders.

2. Making better decisions on contracts and performance

Chile has been a leader in developing PPPs (which it refers to as concessions) for several decades, in a range of sectors: urban and inter-urban roads, seaports, airports, hospitals, and prisons. The country tops the list for the best enabling environment for PPPs in Latin America and the Caribbean, as measured by Infrascope, an index produced by the Economist Intelligence Unit and the Multilateral Investment Fund of the IDB Group.

Chile’s distinction is that it discloses information on performance of PPPs that are underway. The government’s Concessions Unit regularly publishes summaries of the projects during their different phases, including construction and operation. The reports are non-technical, yet include all the necessary information to understand the scope of the project…(More)”

Nudging – Possibilities, Limitations and Applications in European Law and Economics


Book edited by Mathis, Klaus and Tor, Avishalom: “This anthology provides an in-depth analysis and discusses the issues surrounding nudging and its use in legislation, regulation, and policy making more generally. The 17 essays in this anthology provide startling insights into the multifaceted debate surrounding the use of nudges in European Law and Economics.

Nudging is a tool aimed at altering people’s behaviour in a predictable way without forbidding any option or significantly changing economic incentives. It can be used to help people make better decisions to influence human behaviour without forcing them because they can opt out. Its use has sparked lively debates in academia as well as in the public sphere. This book explores who decides which behaviour is desired. It looks at whether or not the state has sufficient information for debiasing, and if there are clear-cut boundaries between paternalism, manipulation and indoctrination. The first part of this anthology discusses the foundations of nudging theory and the problems associated, as well as outlining possible solutions to the problems raised. The second part is devoted to the wide scope of applications of nudges from contract law, tax law and health claim regulations, among others.

This volume is a result of the flourishing annual Law and Economics Conference held at the law faculty of the University of Lucerne. The conferences have been instrumental in establishing a strong and ever-growing Law and Economics movement in Europe, providing unique insights in the challenges faced by Law and Economics when applied in European legal traditions….(More)”