Open Data Use Case: Using data to improve public health


Chris Willsher at ODX: “Studies have shown that a large majority of Canadians spend too much time in sedentary activities. According to the Health Status of Canadians report in 2016, only 2 out of 10 Canadian adults met the Canadian Physical Activity Guidelines. Increasing physical activity and healthy lifestyle behaviours can reduce the risk of chronic illnesses, which can decrease pressures on our health care system. And data can play a role in improving public health.

We are already seeing examples of a push to augment the role of data, with programs recently being launched at home and abroad. Canada and the US established an initiative in the spring of 2017 called the Healthy Behaviour Data Challenge. The goal of the initiative is to open up new methods for generating and using data to monitor health, specifically in the areas of physical activity, sleep, sedentary behaviour, or nutrition. The challenge recently wrapped up with winners being announced in late April 2018. Programs such as this provide incentive to the private sector to explore data’s role in measuring healthy lifestyles and raise awareness of the importance of finding new solutions.

In the UK, Sport England and the Open Data Institute (ODI) have collaborated to create the OpenActive initiative. It has set out to encourage both government and private sector entities to unlock data around physical activities so that others can utilize this information to ease the process of engaging in an active lifestyle. The goal is to “make it as easy to find and book a badminton court as it is to book a hotel room.” As of last fall, OpenActive counted more than 76,000 activities across 1,000 locations from their partner organizations. They have also developed a standard for activity data to ensure consistency among data sources, which eases the ability for developers to work with the data. Again, this initiative serves as a mechanism for open data to help address public health issues.

In Canada, we are seeing more open datasets that could be utilized to devise new solutions for generating higher rates of physical activity. A lot of useful information is available at the municipal level that can provide specifics around local infrastructure. Plus, there is data at the provincial and federal level that can provide higher-level insights useful to developing methods for promoting healthier lifestyles.

Information about cycling infrastructure seems to be relatively widespread among municipalities with a robust open data platform. As an example, the City of Toronto, publishes map data of bicycle routes around the city. This information could be utilized in a way to help citizens find the best bike route between two points. In addition, the city also publishes data on indooroutdoor, and post and ring bicycle parking facilities that can identify where to securely lock your bike. Exploring data from proprietary sources, such as Strava, could further enhance an application by layering on popular cycling routes or allow users to integrate their personal information. And algorithms could allow for the inclusion of data on comparable driving times, projected health benefits, or savings on automotive maintenance.

The City of Calgary publishes data on park sports surfaces and recreation facilities that could potentially be incorporated into sports league applications. This would make it easier to display locations for upcoming games or to arrange pick-up games. Knowing where there are fields nearby that may be available for a last minute soccer game could be useful in encouraging use of the facilities and generating more physical activity. Again, other data sources, such as weather, could be integrated with this information to provide a planning tool for organizing these activities….(More)”.

Predicting Public Interest Issue Campaign Participation on Social Media


Jungyun Won, Linda Hon, Ah Ram Lee in the Journal of Public Interest Communication: “This study investigates what motivates people to participate in a social media campaign in the context of animal protection issues.

Structural equation modeling (SEM) tested a proposed research model with survey data from 326 respondents.

Situational awareness, participation benefits, and social ties influence were positive predictors of social media campaign participation intentions. Situational awareness also partially mediates the relationship between participation benefits and participation intentions as well as strong ties influence and participation intentions.

When designing social media campaigns, public interest communicators should raise situational awareness and emphasize participation benefits. Messages shared through social networks, especially via strong ties, also may be more effective than those posted only on official websites or social networking sites (SNSs)….(More)”.

Identifying Healthcare Fraud with Open Data


Paper by Xuan Zhang et al: “Health care fraud is a serious problem that impacts every patient and consumer. This fraudulent behavior causes excessive financial losses every year and causes significant patient harm. Healthcare fraud includes health insurance fraud, fraudulent billing of insurers for services not provided, and exaggeration of medical services, etc. To identify healthcare fraud thus becomes an urgent task to avoid the abuse and waste of public funds. Existing methods in this research field usually use classified data from governments, which greatly compromises the generalizability and scope of application. This paper introduces a methodology to use publicly available data sources to identify potentially fraudulent behavior among physicians. The research involved data pairing of multiple datasets, selection of useful features, comparisons of classification models, and analysis of useful predictors. Our performance evaluation results clearly demonstrate the efficacy of the proposed method….(More)”.

Open innovation and the evaluation of internet-enabled public services in smart cities


Krassimira Paskaleva and Ian Cooper in Technovation: This article is focused on public service innovation from an innovation management perspective. It presents research experience gained from a European project for managing social and technological innovation in the production and evaluation of citizen-centred internet-enabled services in the public sector.

It is based on six urban pilot initiatives, which sought to operationalise a new approach to co-producing and co-evaluating civic services in smart cities – commonly referred to as open innovation for smart city services. Research suggests that the evidence base underpinning this approach is not sufficiently robust to support claims being made about its effectiveness.

Instead evaluation research of citizen-centred internet-enabled urban services is in its infancy and there are no tested methods or tools in the literature for supporting this approach.

The paper reports on the development and trialing of a novel Co-evaluation Framework, indicators and reporting categories, used to support the co-production of smart city services in an EU-funded project. Our point of departure is that innovation of services is a sub-set of innovation management that requires effective integration of technological with social innovation, supported by the right skills and capacities. The main skills sets needed for effective co-evaluation of open innovation services are the integration of stakeholder management with evaluation capacities.”

Big Data: the End of the Scientific Method?


Paper by S. Succi and P.V. Coveney at arXiv: “We argue that the boldest claims of Big Data are in need of revision and toning-down, in view of a few basic lessons learned from the science of complex systems. We point out that, once the most extravagant claims of Big Data are properly discarded, a synergistic merging of BD with big theory offers considerable potential to spawn a new scientific paradigm capable of overcoming some of the major barriers confronted by the modern scientific method originating with Galileo. These obstacles are due to the presence of nonlinearity, nonlocality and hyperdimensions which one encounters frequently in multiscale modelling….(More)”.

We Need Transparency in Algorithms, But Too Much Can Backfire


Kartik Hosanagar and Vivian Jair at Harvard Business Review: “In 2013, Stanford professor Clifford Nass faced a student revolt. Nass’s students claimed that those in one section of his technology interface course received higher grades on the final exam than counterparts in another. Unfortunately, they were right: two different teaching assistants had graded the two different sections’ exams, and one had been more lenient than the other. Students with similar answers had ended up with different grades.

Nass, a computer scientist, recognized the unfairness and created a technical fix: a simple statistical model to adjust scores, where students got a certain percentage boost on their final mark when graded by a TA known to give grades that percentage lower than average. In the spirit of openness, Nass sent out emails to the class with a full explanation of his algorithm. Further complaints poured in, some even angrier than before. Where had he gone wrong?…

Kizilcec had in fact tested three levels of transparency: low and medium but also high, where the students got not only a paragraph explaining the grading process but also their raw peer-graded scores and how these were each precisely adjusted by the algorithm to get to a final grade. And this is where the results got more interesting. In the experiment, while medium transparency increased trust significantly, high transparency eroded it completely, to the point where trust levels were either equal to or lower than among students experiencing low transparency.

Making Modern AI Transparent: A Fool’s Errand?

 What are businesses to take home from this experiment?  It suggests that technical transparency – revealing the source code, inputs, and outputs of the algorithm – can build trust in many situations. But most algorithms in the world today are created and managed by for-profit companies, and many businesses regard their algorithms as highly valuable forms of intellectual property that must remain in a “black box.” Some lawmakers have proposed a compromise, suggesting that the source code be revealed to regulators or auditors in the event of a serious problem, and this adjudicator will assure consumers that the process is fair.

This approach merely shifts the burden of belief from the algorithm itself to the regulators. This may a palatable solution in many arenas: for example, few of us fully understand financial markets, so we trust the SEC to take on oversight. But in a world where decisions large and small, personal and societal, are being handed over to algorithms, this becomes less acceptable.

Another problem with technical transparency is that it makes algorithms vulnerable to gaming. If an instructor releases the complete source code for an algorithm grading student essays, it becomes easy for students to exploit loopholes in the code:  maybe, for example, the algorithm seeks evidence that the students have done research by looking for phrases such as “according to published research.” A student might then deliberately use this language at the start of every paragraph in her essay.

But the biggest problem is that modern AI is making source code – transparent or not – less relevant compared with other factors in algorithmic functioning. Specifically, machine learning algorithms – and deep learning algorithms in particular – are usually built on just a few hundred lines of code. The algorithms logic is mostly learned from training data and is rarely reflected in its source code. Which is to say, some of today’s best-performing algorithms are often the most opaque. High transparency might involve getting our heads around reams and reams of data – and then still only being able to guess at what lessons the algorithm has learned from it.

This is where Kizilcec’s work becomes relevant – a way to embrace rather than despair over deep learning’s impenetrability. His work shows that users will not trust black box models, but they don’t need – or even want – extremely high levels of transparency. That means responsible companies need not fret over what percentage of source code to reveal, or how to help users “read” massive datasets. Instead, they should work to provide basic insights on the factors driving algorithmic decisions….(More)”

What top technologies should the next generation know how to use?


Lottie Waters at Devex: “Technology provides some great opportunities for global development, and a promising future. But for the next generation of professionals to succeed, it’s vital they stay up to date with the latest tech, innovations, and tools.

In a recent report produced by Devex in collaboration with the United States Agency for International Development and DAI, some 86 percent of survey respondents believe the technology, skills, and approaches development professionals will be using in 10 years’ time will be significantly different to today’s.

In fact, “technology for development” is regarded as the sector that will see the most development progress, but is also cited as the one that will see the biggest changes in skills required, according to the survey.

“As different technologies develop, new possibilities will open up that we may not even be aware of yet. These opportunities will bring new people into the development sector and require those in it to be more agile in adapting technologies to meet development challenges,” said one survey respondent.

While “blockchain,” “artificial intelligence,” and “drones” may be the current buzzwords surrounding tech in global development, geographical information systems, or GIS, and big data are actually the top technologies respondents believe the next generation of development professionals should learn how to utilize.

So, how are these technologies currently being used in development, how might this change in the near future, and what will their impact be in the next 10 years? Devex spoke with experts in the field who are already integrating these technologies into their work to find out….(More)”

Doing good data science


Mike Loukides, Hilary Mason and DJ Patil at O’Reilly: “(This post is the first in a series on data ethics) The hard thing about being an ethical data scientist isn’t understanding ethics. It’s the junction between ethical ideas and practice. It’s doing good data science.

There has been a lot of healthy discussion about data ethics lately. We want to be clear: that discussion is good, and necessary. But it’s also not the biggest problem we face. We already have good standards for data ethics. The ACM’s code of ethics, which dates back to 1993, is clear, concise, and surprisingly forward-thinking; 25 years later, it’s a great start for anyone thinking about ethics. The American Statistical Association has a good set of ethical guidelines for working with data. So, we’re not working in a vacuum.

And, while there are always exceptions, we believe that most people want to be fair. Data scientists and software developers don’t want to harm the people using their products. There are exceptions, of course; we call them criminals and con artists. Defining “fairness” is difficult, and perhaps impossible, given the many crosscutting layers of “fairness” that we might be concerned with. But we don’t have to solve that problem in advance, and it’s not going to be solved in a simple statement of ethical principles, anyway.

The problem we face is different: how do we put ethical principles into practice? We’re not talking about an abstract commitment to being fair. Ethical principles are worse than useless if we don’t allow them to change our practice, if they don’t have any effect on what we do day-to-day. For data scientists, whether you’re doing classical data analysis or leading-edge AI, that’s a big challenge. We need to understand how to build the software systems that implement fairness. That’s what we mean by doing good data science.

Any code of data ethics will tell you that you shouldn’t collect data from experimental subjects without informed consent. But that code won’t tell you how to implement “informed consent.” Informed consent is easy when you’re interviewing a few dozen people in person for a psychology experiment. Informed consent means something different when someone clicks on an item in an online catalog (hello, Amazon), and ads for that item start following them around ad infinitum. Do you use a pop-up to ask for permission to use their choice in targeted advertising? How many customers would you lose? Informed consent means something yet again when you’re asking someone to fill out a profile for a social site, and you might (or might not) use that data for any number of experimental purposes. Do you pop up a consent form in impenetrable legalese that basically says “we will use your data, but we don’t know for what”? Do you phrase this agreement as an opt-out, and hide it somewhere on the site where nobody will find it?…

To put ethical principles into practice, we need space to be ethical. We need the ability to have conversations about what ethics means, what it will cost, and what solutions to implement. As technologists, we frequently share best practices at conferences, write blog posts, and develop open source technologies—but we rarely discuss problems such as how to obtain informed consent.

There are several facets to this space that we need to think about.

First, we need corporate cultures in which discussions about fairness, about the proper use of data, and about the harm that can be done by inappropriate use of data can be considered. In turn, this means that we can’t rush products out the door without thinking about how they’re used. We can’t allow “internet time” to mean ignoring the consequences. Indeed, computer security has shown us the consequences of ignoring the consequences: many companies that have never taken the time to implement good security practices and safeguards are now paying with damage to their reputations and their finances. We need to do the same when thinking about issues like fairness, accountability, and unintended consequences….(More)”.

‘Mayor for a Day’ – Is Gamified Urban Management the Way Forward?


Paper by Gianluca Sgueo: “…aims at describing the use, exploring the potential – but also at understanding the limits – of the use of ‘gamification’ strategies into urban management. Commonly defined as the introduction of game-design elements into non-game contexts, with the former aimed at making the latter more fun, gamification is recognised among the technological paradigms that are shaping the evolution of public administrations.

The paper is divided in three sections.

SECTION I discusses the definition (and appropriateness of) gamification in urban management, and locates it conceptually at the crossroads between nudging, democratic innovations, and crowdsourcing.

SECTION II analyses the potentials of gamified urban management. Four benefits are assessed: first, gamified urban management seems to encourage adaptation of policy-making to structural/societal changes; second, it offers a chance to local administrators to (re-)gain trust from citizens, and thus be perceived as legitimate; third, it adapts policy-making to budgetary challenges; fourth, it helps to efficiently tackle complex regulatory issues.

SECTION III of this paper turns to consider the risks related with the use of gamification in urban management. The first consists of the obstacles faced by participatory rights within gamified policies; the second risk is defined ‘paradox of incentives’; the third is related with privacy issues. In the concluding section, this paper advances some proposals (or, alternatively, highlight valuable theoretical and empirical research efforts) aimed at solving some of the most pressing threats posed by gamified urban management.

The main features of the case studies described in SECTIONS II and III are summarised in a table at the end of the paper….(More)”.

Algorithms are taking over – and woe betide anyone they class as a ‘deadbeat’


Zoe Williams at The Guardian: “The radical geographer and equality evangelist Danny Dorling tried to explain to me once why an algorithm could be bad for social justice.

Imagine if email inboxes became intelligent: your messages would be prioritised on arrival, so if the recipient knew you and often replied to you, you’d go to the top; I said that was fine. That’s how it works already. If they knew you and never replied, you’d go to the bottom, he continued. I said that was fair – it would teach me to stop annoying that person.

If you were a stranger, but typically other people replied to you very quickly – let’s say you were Barack Obama – you’d sail right to the top. That seemed reasonable. And if you were a stranger who others usually ignored, you’d fall off the face of the earth.

“Well, maybe they should get an allotment and stop emailing people,” I said.

“Imagine how angry those people would be,” Dorling said. “They already feel invisible and they [would] become invisible by design.”…

All our debates about the use of big data have centred on privacy, and all seem a bit distant: I care, in principle, whether or not Ocado knows what I bought on Amazon. But in my truest heart, I don’t really care whether or not my Frube vendor knows that I also like dystopian fiction of the 1970s.

I do, however, care that a program exists that will determine my eligibility for a loan by how often I call my mother. I care if landlords are using tools to rank their tenants by compliant behaviour, to create a giant, shared platform of desirable tenants, who never complain about black mould and greet each rent increase with a basket of muffins. I care if the police in Durham are using Experian credit scores to influence their custodial decisions, an example – as you may have guessed by its specificity – that is already real. I care that the same credit-rating company has devised a Mosaic score, which splits households into comically bigoted stereotypes: if your name is Liam and you are an “avid texter”, that puts you in “disconnected youth”, while if you’re Asha you’re in “crowded kaleidoscope”. It’s not a privacy issue so much as a profiling one, although, as anyone who has ever been the repeated victim of police stop-and-search could have told me years ago, these are frequently the same thing.

Privacy isn’t the right to keep secrets: it’s the right to be an individual, not a type; the right to make a choice that’s entirely your own; the right to be private….(More)”.