Explore our articles
View All Results

Stefaan Verhulst

Statistics Canada: “As data and information take on a far more prominent role in Canada and, indeed, all over the world, data, databases and data science have become a staple of modern life. When the electricity goes out, Canadians are as much in search of their data feed as they are food and heat. Consumers are using more and more data that is embodied in the products they buy, whether those products are music, reading material, cars and other appliances, or a wide range of other goods and services. Manufacturers, merchants and other businesses depend increasingly on the collection, processing and analysis of data to make their production processes more efficient and to drive their marketing strategies.

The increasing use of and investment in all things data is driving economic growth, changing the employment landscape and reshaping how and from where we buy and sell goods. Yet the rapid rise in the use and importance of data is not well measured in the existing statistical system. Given the ‘lack of data on data’, Statistics Canada has initiated new research to produce a first set of estimates of the value of data, databases and data science. The development of these estimates benefited from collaboration with the Bureau of Economic Analysis in the United States and the Organisation for Economic Co-operation and Development.

In 2018, Canadian investment in data, databases and data science was estimated to be as high as $40 billion. This was greater than the annual investment in industrial machinery, transportation equipment, and research and development and represented approximately 12% of total non-residential investment in 2018….

Statistics Canada recently released a conceptual framework outlining how one might measure the economic value of data, databases and data science. Thanks to this new framework, the growing role of data in Canada can be measured through time. This framework is described in a paper that was released in The Daily on June 24, 2019 entitled “Measuring investments in data, databases and data science: Conceptual framework.” That paper describes the concept of an ‘information chain’ in which data are derived from everyday observations, databases are constructed from data, and data science creates new knowledge by analyzing the contents of databases….(More)”.

The value of data in Canada: Experimental estimates

Paper by Ayelet Sela: “Justice systems around the world are launching online courts and tribunals in order to improve access to justice, especially for self-represented litigants (SRLs). Online courts are designed to handhold SRLs throughout the process and empower them to make procedural and substantive decisions. To that end, they present SRLs with streamlined and simplified procedures and employ a host of user interface design and user experience strategies (UI/UX). Focusing on these features, the article analyzes online courts as digital choice environments that shape SRLs’ decisions, inputs and actions, and considers their implications on access to justice, due process and the impartiality of courts. Accordingly, the article begins to close the knowledge gap regarding choice architecture in online legal proceedings. 

Using examples from current online courts, the article considers how mechanisms such as choice overload, display, colorfulness, visual complexity, and personalization influence SRLs’ choices and actions. The analysis builds on research in cognitive psychology and behavioral economics that shows that subtle changes in the context in which decisions are made steer (nudge) people to choose a particular option or course of action. It is also informed by recent studies that capture the effect of digital choice architecture on users’ choices and behaviors in online settings. The discussion clarifies that seemingly naïve UI/UX features can strongly influence users of online courts, in a manner that may be at odds with their institutional commitment to impartiality and due process. Moreover, the article challenges the view that online court interfaces (and those of other online legal services, for that matter) should be designed to maximize navigability, intuitiveness and user-friendliness. It argues that these design attributes involve the risk of nudging SRLs to make uninformed, non-deliberate, and biased decisions, possibly infringing their autonomy and self-determination. Accordingly, the article suggests that choice architecture in online courts should aim to encourage reflective participation and informed decision-making. Specifically, its goal should be to improve SRLs’ ability to identify and consider options, and advance their own — inherently diverse — interests. In order to mitigate the abovementioned risks, the article proposes an initial evaluation framework, measures, and methodologies to support evidence-based and ethical choice architecture in online courts….(More)”.

E-Nudging Justice: The Role of Digital Choice Architecture in Online Courts

Paper by George Wyeth, Lee C. Paddock, Alison Parker, Robert L. Glicksman and Jecoliah Williams: “An increasingly sophisticated public, rapid changes in monitoring technology, the ability to process large volumes of data, and social media are increasing the capacity for members of the public and advocacy groups to gather, interpret, and exchange environmental data. This development has the potential to alter the government-centric approach to environmental governance; however, citizen science has had a mixed record in influencing government decisions and actions. This Article reviews the rapid changes that are going on in the field of citizen science and examines what makes citizen science initiatives impactful, as well as the barriers to greater impact. It reports on 10 case studies, and evaluates these to provide findings about the state of citizen science and recommendations on what might be done to increase its influence on environmental decisionmaking….(More)”,

The Impact of Citizen Environmental Science in the United States

Report by E&Y: “Unlocking the power of health care data to fuel innovation in medical research and improve patient care is at the heart of today’s health care revolution. When curated or consolidated into a single longitudinal dataset, patient-level records will trace a complete story of a patient’s demographics, health, wellness, diagnosis, treatments, medical procedures and outcomes. Health care providers need to recognize patient data for what it is: a valuable intangible asset desired by multiple stakeholders, a treasure trove of information.

Among the universe of providers holding significant data assets, the United Kingdom’s National Health Service (NHS) is the single largest integrated health care provider in the world. Its patient records cover the entire UK population from birth to death.

We estimate that the 55 million patient records held by the NHS today may have an indicative market value of several billion pounds to a commercial organization. We estimate also that the value of the curated NHS dataset could be as much as £5bn per annum and deliver around £4.6bn of benefit to patients per annum, in potential operational savings for the NHS, enhanced patient outcomes and generation of wider economic benefits to the UK….(More)”.

How we can place a value on health care data

Jonathan Zittrain in The New Yorker: “Like many medications, the wakefulness drug modafinil, which is marketed under the trade name Provigil, comes with a small, tightly folded paper pamphlet. For the most part, its contents—lists of instructions and precautions, a diagram of the drug’s molecular structure—make for anodyne reading. The subsection called “Mechanism of Action,” however, contains a sentence that might induce sleeplessness by itself: “The mechanism(s) through which modafinil promotes wakefulness is unknown.”

Provigil isn’t uniquely mysterious. Many drugs receive regulatory approval, and are widely prescribed, even though no one knows exactly how they work. This mystery is built into the process of drug discovery, which often proceeds by trial and error. Each year, any number of new substances are tested in cultured cells or animals; the best and safest of those are tried out in people. In some cases, the success of a drug promptly inspires new research that ends up explaining how it works—but not always. Aspirin was discovered in 1897, and yet no one convincingly explained how it worked until 1995. The same phenomenon exists elsewhere in medicine. Deep-brain stimulation involves the implantation of electrodes in the brains of people who suffer from specific movement disorders, such as Parkinson’s disease; it’s been in widespread use for more than twenty years, and some think it should be employed for other purposes, including general cognitive enhancement. No one can say how it works.

This approach to discovery—answers first, explanations later—accrues what I call intellectual debt. It’s possible to discover what works without knowing why it works, and then to put that insight to use immediately, assuming that the underlying mechanism will be figured out later. In some cases, we pay off this intellectual debt quickly. But, in others, we let it compound, relying, for decades, on knowledge that’s not fully known.

In the past, intellectual debt has been confined to a few areas amenable to trial-and-error discovery, such as medicine. But that may be changing, as new techniques in artificial intelligence—specifically, machine learning—increase our collective intellectual credit line. Machine-learning systems work by identifying patterns in oceans of data. Using those patterns, they hazard answers to fuzzy, open-ended questions. Provide a neural network with labelled pictures of cats and other, non-feline objects, and it will learn to distinguish cats from everything else; give it access to medical records, and it can attempt to predict a new hospital patient’s likelihood of dying. And yet, most machine-learning systems don’t uncover causal mechanisms. They are statistical-correlation engines. They can’t explain why they think some patients are more likely to die, because they don’t “think” in any colloquial sense of the word—they only answer. As we begin to integrate their insights into our lives, we will, collectively, begin to rack up more and more intellectual debt….(More)”.

The Hidden Costs of Automated Thinking

Paper by Harry Surden: “Much has been written recently about artificial intelligence (AI) and law. But what is AI, and what is its relation to the practice and administration of law? This article addresses those questions by providing a high-level overview of AI and its use within law. The discussion aims to be nuanced but also understandable to those without a technical background. To that end, I first discuss AI generally. I then turn to AI and how it is being used by lawyers in the practice of law, people and companies who are governed by the law, and government officials who administer the law. A key motivation in writing this article is to provide a realistic, demystified view of AI that is rooted in the actual capabilities of the technology. This is meant to contrast with discussions about AI and law that are decidedly futurist in nature…(More)”.

Artificial Intelligence and Law: An Overview

Paper by Tiago Peixoto et al : “Benjamin Franklin famously once said that “nothing can be said to be certain, except death and taxes.” In developing countries, however, tax revenues are anything but certain. Madagascar is a prime example, with tax collection as a share of GDP at just under 11 percent. This is low even compared with countries of similar levels of economic development, and well below what the government could reasonably collect to fund much-needed public services, such as education, health and infrastructure. 

Poor compliance by citizens who owe taxes remains a major reason for Madagascar’s low tax collection. Madagascar’s government has therefore made increasing tax revenue collection a high priority in its strategy for promoting sustainable economic growth and addressing poverty.

Reforming a tax system can take decades. But small measures, implemented with the help of technology, can help tax authorities improve compliance.  Our team at the World Bank jointly conducted a field experiment with the Madagascar’s Directorate General for Taxation, to test whether simple text message reminders via mobile phones could increase compliance among late-tax filers.

We took a group of 15,885 late-income-tax filers and randomly assigned some of them to receive a series of messages reminding them to file a tax declaration and emphasizing various reasons to pay taxes. Late tax filers were told that they could avoid a late penalty by meeting an extended deadline and were given the link to the tax filing website. 

The results of the experiment were significant. In the control group, only 7.2% of late filers filed a tax return by the extended deadline cited in the SMS messages. This increased to 9.8% in the treatment groups who received SMS reminders. This might not sound like much, but for every dollar spent sending text messages, the tax authority collected an additional 329 dollars in revenues, making the intervention highly cost-effective.

In fact, the return on this particular investment was 32,900 percent! Although this increase in revenue is relatively small in absolute terms—around $375,000—it could be automatically integrated into the tax system. It also suggests that messaging may hold a lot of promise for cost-effectively increasing tax receipts even in developing country contexts….(More)”.

How mobile text reminders earned Madagascar a 32,900% ROI in collecting unpaid taxes

Book edited by Michael A. Livermore and Daniel N. Rockmore: “In recent years, the digitization of legal texts, combined with developments in the fields of statistics, computer science, and data analytics, have opened entirely new approaches to the study of law. This volume explores the new field of computational legal analysis, an approach marked by its use of legal texts as data. The emphasis herein is work that pushes methodological boundaries, either by using new tools to study longstanding questions within legal studies or by identifying new questions in response to developments in data availability and analysis.

By using the text and underlying data of legal documents as the direct objects of quantitative statistical analysis, Law as Data introduces the legal world to the broad range of computational tools already proving themselves relevant to law scholarship and practice, and highlights the early steps in what promises to be an exciting new approach to studying the law….(More)”.

Law as Data: Computation, Text, and the Future of Legal Analysis

Blog post by Morgan Housel: “During the Vietnam War Secretary of Defense Robert McNamara tracked every combat statistic he could, creating a mountain of analytics and predictions to guide the war’s strategy.

Edward Lansdale, head of special operations at the Pentagon, once looked at McNamara’s statistics and told him something was missing.

“What?” McNamara asked.

“The feelings of the Vietnamese people,” Landsdale said.

That’s not the kind of thing a statistician pays attention to. But, boy, did it matter.

I believe in prediction. I think you have to in order to get out of bed in the morning.

But prediction is hard. Either you know that or you’re in denial about it.

A lot of the reason it’s hard is because the visible stuff that happens in the world is a small fraction of the hidden stuff that goes on inside people’s heads. The former is easy to overanalyze; the latter is easy to ignore.

This report describes 12 common flaws, errors, and misadventures that occur in people’s heads when predictions are made….(More)”.

The Psychology of Prediction

Priyanka Pulla in Nature: “Carl Malamud is on a crusade to liberate information locked up behind paywalls — and his campaigns have scored many victories. He has spent decades publishing copyrighted legal documents, from building codes to court records, and then arguing that such texts represent public-domain law that ought to be available to any citizen online. Sometimes, he has won those arguments in court. Now, the 60-year-old American technologist is turning his sights on a new objective: freeing paywalled scientific literature. And he thinks he has a legal way to do it.

Over the past year, Malamud has — without asking publishers — teamed up with Indian researchers to build a gigantic store of text and images extracted from 73 million journal articles dating from 1847 up to the present day. The cache, which is still being created, will be kept on a 576-terabyte storage facility at Jawaharlal Nehru University (JNU) in New Delhi. “This is not every journal article ever written, but it’s a lot,” Malamud says. It’s comparable to the size of the core collection in the Web of Science database, for instance. Malamud and his JNU collaborator, bioinformatician Andrew Lynn, call their facility the JNU data depot.

No one will be allowed to read or download work from the repository, because that would breach publishers’ copyright. Instead, Malamud envisages, researchers could crawl over its text and data with computer software, scanning through the world’s scientific literature to pull out insights without actually reading the text.

The unprecedented project is generating much excitement because it could, for the first time, open up vast swathes of the paywalled literature for easy computerized analysis. Dozens of research groups already mine papers to build databases of genes and chemicals, map associations between proteins and diseases, and generate useful scientific hypotheses. But publishers control — and often limit — the speed and scope of such projects, which typically confine themselves to abstracts, not full text. Researchers in India, the United States and the United Kingdom are already making plans to use the JNU store instead. Malamud and Lynn have held workshops at Indian government laboratories and universities to explain the idea. “We bring in professors and explain what we are doing. They get all excited and they say, ‘Oh gosh, this is wonderful’,” says Malamud.

But the depot’s legal status isn’t yet clear. Malamud, who contacted several intellectual-property (IP) lawyers before starting work on the depot, hopes to avoid a lawsuit. “Our position is that what we are doing is perfectly legal,” he says. For the moment, he is proceeding with caution: the JNU data depot is air-gapped, meaning that no one can access it from the Internet. Users have to physically visit the facility, and only researchers who want to mine for non-commercial purposes are currently allowed in. Malamud says his team does plan to allow remote access in the future. “The hope is to do this slowly and deliberately. We are not throwing this open right away,” he says….(More)”.

The plan to mine the world’s research papers

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday