A rationale for data governance as an approach to tackle recurrent drawbacks in open data portals

Conference paper by Juan Ribeiro Reis et al: “Citizens and developers are gaining broad access to public data sources, made available in open data portals. These machine-readable datasets enable the creation of applications that help the population in several ways, giving them the opportunity to actively participate in governance processes, such as decision taking and policy-making.

While the number of open data portals grows over the years, researchers have been able to identify recurrent problems with the data they provide, such as lack of data standards, difficulty in data access and poor understandability. Such issues make difficult the effective use of data. Several works in literature propose different approaches to mitigate these issues, based on novel or well-known data management techniques.

However, there is a lack of general frameworks for tackling these problems. On the other hand, data governance has been applied in large companies to manage data problems, ensuring that data meets business needs and become organizational assets. In this paper, firstly, we highlight the main drawbacks pointed out in literature for government open data portals. Eventually, we bring around how data governance can tackle much of the issues identified…(More)”.

Open Data Use Case: Using data to improve public health

Chris Willsher at ODX: “Studies have shown that a large majority of Canadians spend too much time in sedentary activities. According to the Health Status of Canadians report in 2016, only 2 out of 10 Canadian adults met the Canadian Physical Activity Guidelines. Increasing physical activity and healthy lifestyle behaviours can reduce the risk of chronic illnesses, which can decrease pressures on our health care system. And data can play a role in improving public health.

We are already seeing examples of a push to augment the role of data, with programs recently being launched at home and abroad. Canada and the US established an initiative in the spring of 2017 called the Healthy Behaviour Data Challenge. The goal of the initiative is to open up new methods for generating and using data to monitor health, specifically in the areas of physical activity, sleep, sedentary behaviour, or nutrition. The challenge recently wrapped up with winners being announced in late April 2018. Programs such as this provide incentive to the private sector to explore data’s role in measuring healthy lifestyles and raise awareness of the importance of finding new solutions.

In the UK, Sport England and the Open Data Institute (ODI) have collaborated to create the OpenActive initiative. It has set out to encourage both government and private sector entities to unlock data around physical activities so that others can utilize this information to ease the process of engaging in an active lifestyle. The goal is to “make it as easy to find and book a badminton court as it is to book a hotel room.” As of last fall, OpenActive counted more than 76,000 activities across 1,000 locations from their partner organizations. They have also developed a standard for activity data to ensure consistency among data sources, which eases the ability for developers to work with the data. Again, this initiative serves as a mechanism for open data to help address public health issues.

In Canada, we are seeing more open datasets that could be utilized to devise new solutions for generating higher rates of physical activity. A lot of useful information is available at the municipal level that can provide specifics around local infrastructure. Plus, there is data at the provincial and federal level that can provide higher-level insights useful to developing methods for promoting healthier lifestyles.

Information about cycling infrastructure seems to be relatively widespread among municipalities with a robust open data platform. As an example, the City of Toronto, publishes map data of bicycle routes around the city. This information could be utilized in a way to help citizens find the best bike route between two points. In addition, the city also publishes data on indooroutdoor, and post and ring bicycle parking facilities that can identify where to securely lock your bike. Exploring data from proprietary sources, such as Strava, could further enhance an application by layering on popular cycling routes or allow users to integrate their personal information. And algorithms could allow for the inclusion of data on comparable driving times, projected health benefits, or savings on automotive maintenance.

The City of Calgary publishes data on park sports surfaces and recreation facilities that could potentially be incorporated into sports league applications. This would make it easier to display locations for upcoming games or to arrange pick-up games. Knowing where there are fields nearby that may be available for a last minute soccer game could be useful in encouraging use of the facilities and generating more physical activity. Again, other data sources, such as weather, could be integrated with this information to provide a planning tool for organizing these activities….(More)”.

Identifying Healthcare Fraud with Open Data

Paper by Xuan Zhang et al: “Health care fraud is a serious problem that impacts every patient and consumer. This fraudulent behavior causes excessive financial losses every year and causes significant patient harm. Healthcare fraud includes health insurance fraud, fraudulent billing of insurers for services not provided, and exaggeration of medical services, etc. To identify healthcare fraud thus becomes an urgent task to avoid the abuse and waste of public funds. Existing methods in this research field usually use classified data from governments, which greatly compromises the generalizability and scope of application. This paper introduces a methodology to use publicly available data sources to identify potentially fraudulent behavior among physicians. The research involved data pairing of multiple datasets, selection of useful features, comparisons of classification models, and analysis of useful predictors. Our performance evaluation results clearly demonstrate the efficacy of the proposed method….(More)”.

Information Asymmetries, Blockchain Technologies, and Social Change

Reflections by Stefaan Verhulst on “the potential (and challenges) of Distributed Ledgers for “Market for Lemons” Conditions: We live in a data age, and it has become common to extol the transformative power of data and information. It is now conventional to assume that many of our most pressing public problems—everything from climate change to terrorism to mass migration—are amenable to a “data fix.”

The truth, though, is a little more complicated. While there is no doubt that data—when analyzed and used responsibly—holds tremendous potential, many factors affect whether, and to what extent, that potential will ultimately be fulfilled.

Our ability to address complex public problems using data depends vitally on how our respective data ecosystems is designed (as well as ongoing questions of representation in, power over, and stewardship of these ecosystems).

Flaws in our data ecosystem that prevent us from addressing problems; may also be responsible for many societal failures and inequalities result from the fact that:

  • some actors have better access to data than others;
  • data is of poor quality (or even “fake”); contains implicit bias; and/or is not validated and thus not trusted;
  • only easily accessible data are shared and integrated (“open washing”) while important data remain carefully hidden or without resources for relevant research and analysis; and more generally that
  • even in an era of big and open data, information too often remains stove-piped, siloed, and generally difficult to access.

Several observers have pointed to the relationship between these information asymmetries and, for example, corruption, financial exclusion, global pandemics, forced mass migration, human rights abuses, and electoral fraud.

Consider the transaction costs, power inequities and other obstacles that result from such information asymmetries, namely:

–     At the individual level: too often someone who is trying to open a bank account (or sign up for new cell phone service) is unable to provide all the requisite information, such as credit history, proof of address or other confirmatory and trusted attributes of identity. As such, information asymmetries are in effect limiting this individual’s access to financial and communications services.

–     At the corporate level, a vast body of literature in economics has shown how uncertainty over the quality and trustworthiness of data can impose transaction costs, limit the development of markets for goods and services, or shut them down altogether. This is the well-known “market for lemons” problem made famous in a 1970 paper of the same name by George Akerlof.

–     At the societal or governance level, information asymmetries don’t just affect the efficiency of markets or social inequality. They can also incentivize unwanted behaviors that cause substantial public harm. Tyrants and corrupt politicians thrive on limiting their citizens’ access to information (e.g., information related to bank accounts, investment patterns or disbursement of public funds). Likewise, criminals, operate and succeed in the information-scarce corners of the underground economy.

Blockchain technologies and Information Asymmetries

This is where blockchain comes in. At their core, blockchain technologies are a new type of disclosure mechanism that have the potential to address some of the information asymmetries listed above. There are many types of blockchain technologies, and while I use the blanket term ‘blockchain’ in the below for simplicity’s sake, the nuances between different types of blockchain technologies can greatly impact the character and likelihood of success of a given initiative.

By leveraging a shared and verified database of ledgers stored in a distributed manner, blockchain seeks to redesign information ecosystems in a more transparent, immutable, and trusted manner. Solving information asymmetries may be the real potential of blockchain, and this—much more than the current hype over virtual currencies—is the real reason to assess its potential….(More)”.

Evaluating Civic Open Data Standards

Renee Sieber and Rachel Bloom at SocArXiv Papers: In many ways, a precondition to realizing the promise of open government data is the standardization of that data. Open data standards ensure interoperability, establish benchmarks in assessing whether governments achieve their goals in publishing open data, can better ensure accuracy of the data. Interoperability enables the use of off-the shelf software and can ease third party development of products that serves multiple locales.

Our project aims to determine which standards for civic data are “best” to open up government data. We began by disambiguating the multiple meanings of what constitutes a data standard by creating a standards stack.

The empirical research started by identifying twelve “high value” open datasets for which we found 22 data standards. A qualitative systematic review of the gray literature and standards documentation generated 18 evaluation metrics, which we grouped into four categories. We evaluated the metrics with civic data standards. Our goal is to identify and characterize types of standards and provide a systematic way to assess their quality…(More)”.

Is Open Data Working for Women in Africa?

Web Foundation: “Open data has the potential to change politics, economies and societies for the better by giving people more opportunities to engage in the decisions that affect their lives. But to reach the full potential of open data, it must be available to and used by all. Yet, across the globe — and in Africa in particular — there is a significant data gap.

This report — Is open data working for women in Africa — maps the current state of open data for women across Africa, with insights from country-specific research in Nigeria, Cameroon, Uganda and South Africa with additional data from a survey of experts in 12 countries across the continent.

Our findings show that, despite the potential for open data to empower people, it has so far changed little for women living in Africa.

Key findings

  • There is a closed data culture in Africa — Most countries lack an open culture and have legislation and processes that are not gender-responsive. Institutional resistance to disclosing data means few countries have open data policies and initiatives at the national level. In addition, gender equality legislation and policies are incomplete and failing to reduce gender inequalities. And overall, Africa lacks the cross-organisational collaboration needed to strengthen the open data movement.
  • There are barriers preventing women from using the data that is available — Cultural and social realities create additional challenges for women to engage with data and participate in the technology sector. 1GB of mobile data in Africa costs, on average, 10% of average monthly income. This high cost keeps women, who generally earn less than men, offline. Moreover, time poverty, the gender pay gap and unpaid labour create economic obstacles for women to engage with digital technology.
  • Key datasets to support the advocacy objectives of women’s groups are missing — Data on budget, health and crime are largely absent as open data. Nearly all datasets in sub-Saharan Africa (373 out of 375) are closed, and sex-disaggregated data, when available online, is often not published as open data. There are few open data policies to support opening up of key datasets and even when they do exist, they largely remain in draft form. With little investment in open data initiatives, good data management practices or for implementing Right To Information (RTI) reforms, improvement is unlikely.
  • There is no strong base of research on women’s access and use of open data — There is lack of funding, little collaboration and few open data champions. Women’s groups, digital rights groups and gender experts rarely collaborate on open data and gender issues. To overcome this barrier, multi-stakeholder collaborations are essential to develop effective solutions….(More)”.

Data infrastructure literacy

Paper by Jonathan Gray, Carolin Gerlitz and Liliana Bounegru at Big Data & Society: “A recent report from the UN makes the case for “global data literacy” in order to realise the opportunities afforded by the “data revolution”. Here and in many other contexts, data literacy is characterised in terms of a combination of numerical, statistical and technical capacities. In this article, we argue for an expansion of the concept to include not just competencies in reading and working with datasets but also the ability to account for, intervene around and participate in the wider socio-technical infrastructures through which data is created, stored and analysed – which we call “data infrastructure literacy”. We illustrate this notion with examples of “inventive data practice” from previous and ongoing research on open data, online platforms, data journalism and data activism. Drawing on these perspectives, we argue that data literacy initiatives might cultivate sensibilities not only for data science but also for data sociology, data politics as well as wider public engagement with digital data infrastructures. The proposed notion of data infrastructure literacy is intended to make space for collective inquiry, experimentation, imagination and intervention around data in educational programmes and beyond, including how data infrastructures can be challenged, contested, reshaped and repurposed to align with interests and publics other than those originally intended….(More)”

Open Data in Tourism

European Data Portal: “New technologies are rapidly changing the tourism industry. Data are central assets in management and marketing of tourism destinations and businesses. Data driven services became a prominent tool for tourists to plan their trips. The study “Utilizing open data in tourism” predicts great potential for Open Data to increase innovations and destination management. Several actors already use Open Data to provide services in the tourism industry, e.g. the open service called Helsinki Region Infoshare from the city of Helsinki. Malta and Montenegro, for example, are providing data sets on tourist expenditure, hotels, accommodation, restaurants, events, bicycle stations, heritage sites, or beaches.

But not only government organisations and companies use Open Data in tourism. User-generated content, such as reviews and comments spread via social networking services, supports Tourists’ decision making. The study “You will like it!”  analyses user generated Open Data to predict tourists’ perception of sights or attractions.  Thereby they are contributing to the process of predicting tourists’ future preferences, what has potential implications and benefits for the tourism industry.

Engage in the discourse of Open data in tourism, for example on 18 July: the meeting “Linked Open Data im Tourismus“for destination marketing organizations (DMOs) takes place  in Innsbruck to discuss possibilities and prerequisites for using Open Data in tourism. If you rather try out using Open Data to plan your next weekend trip, visit the European Data Portal featured data article on  “Use Open Data to prepare your holiday trip”….(More)”.

Microsoft Research Open Data

Microsoft Research Open Data: “… is a data repository that makes available datasets that researchers at Microsoft have created and published in conjunction with their research. You can browse available datasets and either download them or directly copy them to an Azure-based Virtual Machine or Data Science Virtual Machine. To the extent possible, we follow FAIR (findable, accessible, interoperable and reusable) data principles and will continue to push towards the highest standards for data sharing. We recognize that there are dozens of data repositories already in use by researchers and expect that the capabilities of this repository will augment existing efforts. Datasets are categorized by their primary research area. You can find links to research projects or publications with the dataset.

What is our goal?

Our goal is to provide a simple platform to Microsoft’s researchers and collaborators to share datasets and related research technologies and tools. The site has been designed to simplify access to these data sets, facilitate collaboration between researchers using cloud-based resources, and enable the reproducibility of research. We will continue to evolve and grow this repository and add features to it based on feedback from the community.

How did this project come to be?

Over the past few years, our team, based at Microsoft Research, has worked extensively with the research community to create cloud-based research infrastructure. We started this project as a prototype about a year ago and are excited to finally share it with the research community to support data-intensive research in the cloud. Because almost all research projects have a data component, there is real need for curated and meaningful datasets in the research community, not only in computer science but in interdisciplinary and domain sciences. We have now made several such datasets available for download or use directly on cloud infrastructure….(More)”.

My City Forecast: Urban planning communication tool for citizen with national open data

Paper by Y. Hasegawa, Y. Sekimoto, T. Seto, Y. Fukushima et al in Computers, Environment and Urban Systems: “In urban management, the importance of citizen participation is being emphasized more than ever before. This is especially true in countries where depopulation has become a major concern for urban managers and many local authorities are working on revising city master plans, often incorporating the concept of the “compact city.” In Japan, for example, the implementation of compact city plans means that each local government decides on how to designate residential areas and promotes citizens moving to these areas in order to improve budget effectiveness and the vitality of the city. However, implementing a compact city is possible in various ways. Given that there can be some designated withdrawal areas for budget savings, compact city policies can include disadvantages for citizens. At this turning point for urban structures, citizen–government mutual understanding and cooperation is necessary for every step of urban management, including planning.

Concurrently, along with the recent rapid growth of big data utilization and computer technologies, a new conception of cooperation between citizens and government has emerged. With emerging technologies based on civic knowledge, citizens have started to obtain the power to engage directly in urban management by obtaining information, thinking about their city’s problems, and taking action to help shape the future of their city themselves (Knight Foundation, 2013). This development is also supported by the open government data movement, which promotes the availability of government information online (Kingston, Carver, Evans, & Turton, 2000). CityDashboard is one well-known example of real-time visualization and distribution of urban information. CityDashboard, a web tool launched in 2012 by University College London, aggregates spatial data for cities around the UK and displays the data on a dashboard and a map. These new technologies are expected to enable both citizens and government to see their urban situation in an interface presenting an overhead view based on statistical information.

However, usage of statistics and governmental data is as yet limited in the actual process of urban planning…

To help improve this situation and increase citizen participation in urban management, we have developed a web-based urban planning communication tool using open government data for enhanced citizen–government cooperation. The main aim of the present research is to evaluate the effect of our system on users’ awareness of and attitude toward the urban situation. We have designed and developed an urban simulation system, My City Forecast (http://mycityforecast.net,) that enables citizens to understand how their environment and region are likely to change by urban management in the future (up to 2040)….(More)”.