Stefaan Verhulst

Regulatory Technology – Replacing Law with Computer Code

Curated on August 1, 2018 by Stefaan Verhulst

LSE Legal Studies Working Paper by Eva Micheler and Anna Whaley: “Recently both the Bank of England and the Financial Conduct Authority have carried out experiments using new digital technology for regulatory purposes. The idea is to replace rules written in natural legal language with computer code and to use artificial intelligence for regulatory purposes.

This new way of designing public law is in line with the government’s vision for the UK to become a global leader in digital technology. It is also reflected in the FCA’s business plan.

The article reviews the technology and the advantages and disadvantages of combining the technology with regulatory law. It then informs the discussion from a broader public law perspective. It analyses regulatory technology through criteria developed in the mainstream regulatory discourse. It contributes to that discourse by anticipating problems that will arise as the technology evolves. In addition, the hope is to assist the government in avoiding mistakes that have occurred in the past and creating a better system from the start…(More)”.

Big Data Is Getting Bigger. So Are the Privacy and Ethical Questions.

Curated on August 1, 2018May 29, 2019 by Stefaan Verhulst

Goldie Blumenstyk at The Chronicle of Higher Education: “…The next step in using “big data” for student success is upon us. It’s a little cool. And also kind of creepy.

This new approach goes beyond the tactics now used by hundreds of colleges, which depend on data collected from sources like classroom teaching platforms and student-information systems. It not only makes a technological leap; it also raises issues around ethics and privacy.

Here’s how it works: Whenever you log on to a wireless network with your cellphone or computer, you leave a digital footprint. Move from one building to another while staying on the same network, and that network knows how long you stayed and where you went. That data is collected continuously and automatically from the network’s various nodes.

Now, with the help of a company called Degree Analytics, a few colleges are beginning to use location data collected from students’ cellphones and laptops as they move around campus. Some colleges are using it to improve the kind of advice they might send to students, like a text-message reminder to go to class if they’ve been absent.

Others see it as a tool for making decisions on how to use their facilities. St. Edward’s University, in Austin, Tex., used the data to better understand how students were using its computer-equipped spaces. It found that a renovated lounge, with relatively few computers but with Wi-Fi access and several comfy couches, was one of the most popular such sites on campus. Now the university knows it may not need to buy as many computers as it once thought.

As Gary Garofalo, a co-founder and chief revenue officer of Degree Analytics, told me, “the network data has very intriguing advantages” over the forms of data that colleges now collect.

Some of those advantages are obvious: If you’ve got automatic information on every person walking around with a cellphone, your dataset is more complete than if you need to extract it from a learning-management system or from the swipe-card readers some colleges use to track students’ activities. Many colleges now collect such data to determine students’ engagement with their coursework and campus activities.

Of course, the 24-7 reporting of the data is also what makes this approach seem kind of creepy….

I’m not the first to ask questions like this. A couple of years ago, a group of educators organized by Martin Kurzweil of Ithaka S+R and Mitchell Stevens of Stanford University issued a series of guidelines for colleges and companies to consider as they began to embrace data analytics. Among other principles, the guidelines highlighted the importance of being transparent about how the information is used, and ensuring that institutions’ leaders really understand what companies are doing with the data they collect. Experts at New America weighed in too.

I asked Kurzweil what he makes of the use of Wi-Fi information. Location tracking tends toward the “dicey” side of the spectrum, he says, though perhaps not as far out as using students’ social-media habits, health information, or what they check out from the library. The fundamental question, he says, is “how are they managing it?”… So is this the future? Benz, at least, certainly hopes so. Inspired by the Wi-Fi-based StudentLife research project at Dartmouth College and the experiences Purdue University is having with students’ use of its Forecast app, he’s in talks now with a research university about a project that would generate other insights that might be gleaned from students’ Wi-Fi-usage patterns….(More)

Informational Autocrats

Curated on July 31, 2018October 1, 2018 by Stefaan Verhulst

Paper by Sergei M. Guriev and Daniel Treisman: “In recent decades, dictatorships based on mass repression have largely given way to a new model based on the manipulation of information. Instead of terrorizing citizens into submission, “informational autocrats” artificially boost their popularity by convincing the public they are competent.

To do so, they use propaganda and silence informed members of the elite by co-optation or censorship.

Using several sources – including a newly created dataset of authoritarian control techniques – we document a range of trends in recent autocracies that fit the theory: a decline in violence, efforts to conceal state repression, rejection of official ideologies, imitation of democracy, a perceptions gap between masses and elite, and the adoption by leaders of a rhetoric of performance rather than one aimed at inspiring fear….(More)”

Open Data Use Case: Using data to improve public health

Curated on July 31, 2018May 29, 2019 by Stefaan Verhulst

Chris Willsher at ODX: “Studies have shown that a large majority of Canadians spend too much time in sedentary activities. According to the Health Status of Canadians report in 2016, only 2 out of 10 Canadian adults met the Canadian Physical Activity Guidelines. Increasing physical activity and healthy lifestyle behaviours can reduce the risk of chronic illnesses, which can decrease pressures on our health care system. And data can play a role in improving public health.

We are already seeing examples of a push to augment the role of data, with programs recently being launched at home and abroad. Canada and the US established an initiative in the spring of 2017 called the Healthy Behaviour Data Challenge. The goal of the initiative is to open up new methods for generating and using data to monitor health, specifically in the areas of physical activity, sleep, sedentary behaviour, or nutrition. The challenge recently wrapped up with winners being announced in late April 2018. Programs such as this provide incentive to the private sector to explore data’s role in measuring healthy lifestyles and raise awareness of the importance of finding new solutions.

In the UK, Sport England and the Open Data Institute (ODI) have collaborated to create the OpenActive initiative. It has set out to encourage both government and private sector entities to unlock data around physical activities so that others can utilize this information to ease the process of engaging in an active lifestyle. The goal is to “make it as easy to find and book a badminton court as it is to book a hotel room.” As of last fall, OpenActive counted more than 76,000 activities across 1,000 locations from their partner organizations. They have also developed a standard for activity data to ensure consistency among data sources, which eases the ability for developers to work with the data. Again, this initiative serves as a mechanism for open data to help address public health issues.

In Canada, we are seeing more open datasets that could be utilized to devise new solutions for generating higher rates of physical activity. A lot of useful information is available at the municipal level that can provide specifics around local infrastructure. Plus, there is data at the provincial and federal level that can provide higher-level insights useful to developing methods for promoting healthier lifestyles.

Information about cycling infrastructure seems to be relatively widespread among municipalities with a robust open data platform. As an example, the City of Toronto, publishes map data of bicycle routes around the city. This information could be utilized in a way to help citizens find the best bike route between two points. In addition, the city also publishes data on indoor, outdoor, and post and ring bicycle parking facilities that can identify where to securely lock your bike. Exploring data from proprietary sources, such as Strava, could further enhance an application by layering on popular cycling routes or allow users to integrate their personal information. And algorithms could allow for the inclusion of data on comparable driving times, projected health benefits, or savings on automotive maintenance.

The City of Calgary publishes data on park sports surfaces and recreation facilities that could potentially be incorporated into sports league applications. This would make it easier to display locations for upcoming games or to arrange pick-up games. Knowing where there are fields nearby that may be available for a last minute soccer game could be useful in encouraging use of the facilities and generating more physical activity. Again, other data sources, such as weather, could be integrated with this information to provide a planning tool for organizing these activities….(More)”.

Under what conditions is information empowering?

Curated on July 31, 2018 by Stefaan Verhulst

FeedbackLabs: “A 72% increase in students ceasing to abuse drugs. A 57 percentage point jump in vaccination rates. Fourteen percent higher odds of adults quitting smoking. The improvements in outcomes that people can achieve for themselves when armed with information can be striking.

Yet the above examples and many more show that information alone rarely empowers people to make their lives better. Information empowers when social and emotional factors induce people to reinterpret that information, and act on it. In this report, we draw on 44 real-life examples and 168 research papers from 10 fields to develop 7 general principles that seem to underlie information initiatives that successfully empower people. Principles 1, 2, and 3 speak to how information empowers through reinterpretation, and Principles 4 to 7 speak to how we can support that reinterpretation—and get people to act. Based on the 7 principles, we then provide a checklist of questions a team can use to increase the likelihood that their initiative will empower the people they seek to serve.

Throughout, we provide concrete illustrations from a wide range of fields to show how applying these principles in practice has led to substantially better outcomes. We also consider examples with outcomes we might consider to be negative. The 7 principles are broadly applicable to how information empowers people to perceive, make and act on choices—but they are agnostic about whether the outcomes of those choices are positive or negative.

The way that the principles are applied in one context may not always work in another. But from the context-specific evidence summarized in this report we have extrapolated a framework that can be applied more broadly—in both theory and practice, for both funders and implementers. Although many of the in-depth case studies presented stem from the US, the principles are based on a wide range of examples and evidence from around the world. We believe the framework we construct here is powerful and can be applied globally; but it’s also clear that much more remains to be understood, so we hope it also sparks ideas, experimentation, and new discoveries….(More)”.

Identifying Healthcare Fraud with Open Data

Curated on July 31, 2018 by Stefaan Verhulst

Paper by Xuan Zhang et al: “Health care fraud is a serious problem that impacts every patient and consumer. This fraudulent behavior causes excessive financial losses every year and causes significant patient harm. Healthcare fraud includes health insurance fraud, fraudulent billing of insurers for services not provided, and exaggeration of medical services, etc. To identify healthcare fraud thus becomes an urgent task to avoid the abuse and waste of public funds. Existing methods in this research field usually use classified data from governments, which greatly compromises the generalizability and scope of application. This paper introduces a methodology to use publicly available data sources to identify potentially fraudulent behavior among physicians. The research involved data pairing of multiple datasets, selection of useful features, comparisons of classification models, and analysis of useful predictors. Our performance evaluation results clearly demonstrate the efficacy of the proposed method….(More)”.

Open innovation and the evaluation of internet-enabled public services in smart cities

Curated on July 31, 2018 by Stefaan Verhulst

Krassimira Paskaleva and Ian Cooper in Technovation: This article is focused on public service innovation from an innovation management perspective. It presents research experience gained from a European project for managing social and technological innovation in the production and evaluation of citizen-centred internet-enabled services in the public sector.

It is based on six urban pilot initiatives, which sought to operationalise a new approach to co-producing and co-evaluating civic services in smart cities – commonly referred to as open innovation for smart city services. Research suggests that the evidence base underpinning this approach is not sufficiently robust to support claims being made about its effectiveness.

Instead evaluation research of citizen-centred internet-enabled urban services is in its infancy and there are no tested methods or tools in the literature for supporting this approach.

The paper reports on the development and trialing of a novel Co-evaluation Framework, indicators and reporting categories, used to support the co-production of smart city services in an EU-funded project. Our point of departure is that innovation of services is a sub-set of innovation management that requires effective integration of technological with social innovation, supported by the right skills and capacities. The main skills sets needed for effective co-evaluation of open innovation services are the integration of stakeholder management with evaluation capacities.”

Big Data: the End of the Scientific Method?

Curated on July 31, 2018 by Stefaan Verhulst

Paper by S. Succi and P.V. Coveney at arXiv: “We argue that the boldest claims of Big Data are in need of revision and toning-down, in view of a few basic lessons learned from the science of complex systems. We point out that, once the most extravagant claims of Big Data are properly discarded, a synergistic merging of BD with big theory offers considerable potential to spawn a new scientific paradigm capable of overcoming some of the major barriers confronted by the modern scientific method originating with Galileo. These obstacles are due to the presence of nonlinearity, nonlocality and hyperdimensions which one encounters frequently in multiscale modelling….(More)”.

We Need Transparency in Algorithms, But Too Much Can Backfire

Curated on July 30, 2018 by Stefaan Verhulst

Kartik Hosanagar and Vivian Jair at Harvard Business Review: “In 2013, Stanford professor Clifford Nass faced a student revolt. Nass’s students claimed that those in one section of his technology interface course received higher grades on the final exam than counterparts in another. Unfortunately, they were right: two different teaching assistants had graded the two different sections’ exams, and one had been more lenient than the other. Students with similar answers had ended up with different grades.

Nass, a computer scientist, recognized the unfairness and created a technical fix: a simple statistical model to adjust scores, where students got a certain percentage boost on their final mark when graded by a TA known to give grades that percentage lower than average. In the spirit of openness, Nass sent out emails to the class with a full explanation of his algorithm. Further complaints poured in, some even angrier than before. Where had he gone wrong?…

Kizilcec had in fact tested three levels of transparency: low and medium but also high, where the students got not only a paragraph explaining the grading process but also their raw peer-graded scores and how these were each precisely adjusted by the algorithm to get to a final grade. And this is where the results got more interesting. In the experiment, while medium transparency increased trust significantly, high transparency eroded it completely, to the point where trust levels were either equal to or lower than among students experiencing low transparency.

Making Modern AI Transparent: A Fool’s Errand?

What are businesses to take home from this experiment? It suggests that technical transparency – revealing the source code, inputs, and outputs of the algorithm – can build trust in many situations. But most algorithms in the world today are created and managed by for-profit companies, and many businesses regard their algorithms as highly valuable forms of intellectual property that must remain in a “black box.” Some lawmakers have proposed a compromise, suggesting that the source code be revealed to regulators or auditors in the event of a serious problem, and this adjudicator will assure consumers that the process is fair.

This approach merely shifts the burden of belief from the algorithm itself to the regulators. This may a palatable solution in many arenas: for example, few of us fully understand financial markets, so we trust the SEC to take on oversight. But in a world where decisions large and small, personal and societal, are being handed over to algorithms, this becomes less acceptable.

Another problem with technical transparency is that it makes algorithms vulnerable to gaming. If an instructor releases the complete source code for an algorithm grading student essays, it becomes easy for students to exploit loopholes in the code: maybe, for example, the algorithm seeks evidence that the students have done research by looking for phrases such as “according to published research.” A student might then deliberately use this language at the start of every paragraph in her essay.

But the biggest problem is that modern AI is making source code – transparent or not – less relevant compared with other factors in algorithmic functioning. Specifically, machine learning algorithms – and deep learning algorithms in particular – are usually built on just a few hundred lines of code. The algorithms logic is mostly learned from training data and is rarely reflected in its source code. Which is to say, some of today’s best-performing algorithms are often the most opaque. High transparency might involve getting our heads around reams and reams of data – and then still only being able to guess at what lessons the algorithm has learned from it.

This is where Kizilcec’s work becomes relevant – a way to embrace rather than despair over deep learning’s impenetrability. His work shows that users will not trust black box models, but they don’t need – or even want – extremely high levels of transparency. That means responsible companies need not fret over what percentage of source code to reveal, or how to help users “read” massive datasets. Instead, they should work to provide basic insights on the factors driving algorithmic decisions….(More)”