Index: Open Data


By Alexandra Shaw, Michelle Winowatan, Andrew Young, and Stefaan Verhulst

The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on open data and was originally published in 2018.

Value and Impact

  • The projected year at which all 28+ EU member countries will have a fully operating open data portal: 2020

  • Between 2016 and 2020, the market size of open data in Europe is expected to increase by 36.9%, and reach this value by 2020: EUR 75.7 billion

Public Views on and Use of Open Government Data

  • Number of Americans who do not trust the federal government or social media sites to protect their data: Approximately 50%

  • Key findings from The Economist Intelligence Unit report on Open Government Data Demand:

    • Percentage of respondents who say the key reason why governments open up their data is to create greater trust between the government and citizens: 70%

    • Percentage of respondents who say OGD plays an important role in improving lives of citizens: 78%

    • Percentage of respondents who say OGD helps with daily decision making especially for transportation, education, environment: 53%

    • Percentage of respondents who cite lack of awareness about OGD and its potential use and benefits as the greatest barrier to usage: 50%

    • Percentage of respondents who say they lack access to usable and relevant data: 31%

    • Percentage of respondents who think they don’t have sufficient technical skills to use open government data: 25%

    • Percentage of respondents who feel the number of OGD apps available is insufficient, indicating an opportunity for app developers: 20%

    • Percentage of respondents who say OGD has the potential to generate economic value and new business opportunity: 61%

    • Percentage of respondents who say they don’t trust governments to keep data safe, protected, and anonymized: 19%

Efforts and Involvement

  • Time that’s passed since open government advocates convened to create a set of principles for open government data – the instance that started the open data government movement: 10 years

  • Countries participating in the Open Government Partnership today: 79 OGP participating countries and 20 subnational governments

  • Percentage of “open data readiness” in Europe according to European Data Portal: 72%

    • Open data readiness consists of four indicators which are presence of policy, national coordination, licensing norms, and use of data.

  • Number of U.S. cities with Open Data portals: 27

  • Number of governments who have adopted the International Open Data Charter: 62

  • Number of non-state organizations endorsing the International Open Data Charter: 57

  • Number of countries analyzed by the Open Data Index: 94

  • Number of Latin American countries that do not have open data portals as of 2017: 4 total – Belize, Guatemala, Honduras and Nicaragua

  • Number of cities participating in the Open Data Census: 39

Demand for Open Data

  • Open data demand measured by frequency of open government data use according to The Economist Intelligence Unit report:

    • Australia

      • Monthly: 15% of respondents

      • Quarterly: 22% of respondents

      • Annually: 10% of respondents

    • Finland

      • Monthly: 28% of respondents

      • Quarterly: 18% of respondents

      • Annually: 20% of respondents

    •  France

      • Monthly: 27% of respondents

      • Quarterly: 17% of respondents

      • Annually: 19% of respondents

        •  
    • India

      • Monthly: 29% of respondents

      • Quarterly: 20% of respondents

      • Annually: 10% of respondents

    • Singapore

      • Monthly: 28% of respondents

      • Quarterly: 15% of respondents

      • Annually: 17% of respondents 

    • UK

      • Monthly: 23% of respondents

      • Quarterly: 21% of respondents

      • Annually: 15% of respondents

    • US

      • Monthly: 16% of respondents

      • Quarterly: 15% of respondents

      • Annually: 20% of respondents

  • Number of FOIA requests received in the US for fiscal year 2017: 818,271

  • Number of FOIA request processed in the US for fiscal year 2017: 823,222

  • Distribution of FOIA requests in 2017 among top 5 agencies with highest number of request:

    • DHS: 45%

    • DOJ: 10%

    • NARA: 7%

    • DOD: 7%

    • HHS: 4%

Examining Datasets

  • Country with highest index score according to ODB Leaders Edition: Canada (76 out of 100)

  • Country with lowest index score according to ODB Leaders Edition: Sierra Leone (22 out of 100)

  • Number of datasets open in the top 30 governments according to ODB Leaders Edition: Fewer than 1 in 5

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition: 19%

  • Average percentage of datasets that are open in the top 30 open data governments according to ODB Leaders Edition by sector/subject:

    • Budget: 30%

    • Companies: 13%

    • Contracts: 27%

    • Crime: 17%

    • Education: 13%

    • Elections: 17%

    • Environment: 20%

    • Health: 17%

    • Land: 7%

    • Legislation: 13%

    • Maps: 20%

    • Spending: 13%

    • Statistics: 27%

    • Trade: 23%

    • Transport: 30%

  • Percentage of countries that release data on government spending according to ODB Leaders Edition: 13%

  • Percentage of government data that is updated at regular intervals according to ODB Leaders Edition: 74%

  • Number of datasets available through:

  • Number of datasets classed as “open” in 94 places worldwide analyzed by the Open Data Index: 11%

  • Percentage of open datasets in the Caribbean, according to Open Data Census: 7%

  • Number of companies whose data is available through OpenCorporates: 158,589,950

City Open Data

  • New York City

  • Singapore

    • Number of datasets published in Singapore: 1,480

    • Percentage of datasets with standardized format: 35%

    • Percentage of datasets made as raw as possible: 25%

  • Barcelona

    • Number of datasets published in Barcelona: 443

    • Open data demand in Barcelona measured by:

      • Number of unique sessions in the month of September 2018: 5,401

    • Quality of datasets published in Barcelona according to Tim Berners Lee 5-star Open Data: 3 stars

  • London

    • Number of datasets published in London: 762

    • Number of data requests since October 2014: 325

  • Bandung

    • Number of datasets published in Bandung: 1,417

  • Buenos Aires

    • Number of datasets published in Buenos Aires: 216

  • Dubai

    • Number of datasets published in Dubai: 267

  • Melbourne

    • Number of datasets published in Melbourne: 199

Sources

  • About OGP, Open Government Partnership. 2018.  

‘Do Not Track,’ the Privacy Tool Used by Millions of People, Doesn’t Do Anything


Kashmir Hill at Gizmodo: “When you go into the privacy settings on your browser, there’s a little option there to turn on the “Do Not Track” function, which will send an invisible request on your behalf to all the websites you visit telling them not to track you. A reasonable person might think that enabling it will stop a porn site from keeping track of what she watches, or keep Facebook from collecting the addresses of all the places she visits on the internet, or prevent third-party trackers she’s never heard of from following her from site to site. According to a recent survey by Forrester Research, a quarter of American adults use “Do Not Track” to protect their privacy. (Our own stats at Gizmodo Media Group show that 9% of visitors have it turned on.) We’ve got bad news for those millions of privacy-minded people, though: “Do Not Track” is like spray-on sunscreen, a product that makes you feel safe while doing little to actually protect you.

“Do Not Track,” as it was first imagined a decade ago by consumer advocates, was going to be a “Do Not Call” list for the internet, helping to free people from annoying targeted ads and creepy data collection. But only a handful of sites respect the request, the most prominent of which are Pinterest and Medium. (Pinterest won’t use offsite data to target ads to a visitor who’s elected not to be tracked, while Medium won’t send their data to third parties.) The vast majority of sites, including this one, ignore it….(More)”.

How pro-trust initiatives are taking over the Internet


Sara Fisher at Axios: “Dozens of new initiatives have launched over the past few years to address fake news and the erosion of faith in the media, creating a measurement problem of its own.

Why it matters: So many new efforts are launching simultaneously to solve the same problem that it’s become difficult to track which ones do what and which ones are partnering with each other….

To name a few:

  • The Trust Project, which is made up of dozens of global news companies, announced this morning that the number of journalism organizations using the global network’s “Trust Indicators” now totals 120, making it one of the larger global initiatives to combat fake news. Some of these groups (like NewsGuard) work with Trust Project and are a part of it.
  • News Integrity Initiative (Facebook, Craig Newmark Philanthropic Fund, Ford Foundation, Democracy Fund, John S. and James L. Knight Foundation, Tow Foundation, AppNexus, Mozilla and Betaworks)
  • NewsGuard (Longtime journalists and media entrepreneurs Steven Brill and Gordon Crovitz)
  • The Journalism Trust Initiative (Reporters Without Borders, and Agence France Presse, the European Broadcasting Union and the Global Editors Network )
  • Internews (Longtime international non-profit)
  • Accountability Journalism Program (American Press Institute)
  • Trusting News (Reynolds Journalism Institute)
  • Media Manipulation Initiative (Data & Society)
  • Deepnews.ai (Frédéric Filloux)
  • Trust & News Initiative (Knight Foundation, Facebook and Craig Newmark in. affiliation with Duke University)
  • Our.News (Independently run)
  • WikiTribune (Wikipedia founder Jimmy Wales)

There are also dozens of fact-checking efforts being championed by different third-parties, as well as efforts being built around blockchain and artificial intelligence.

Between the lines: Most of these efforts include some sort of mechanism for allowing readers to physically discern real journalism from fake news via some sort of badge or watermark, but that presents problems as well.

  • Attempts to flag or call out news as being real and valid have in the past been rejected even further by those who wish to discredit vetted media.
  • For example, Facebook said in December that it will no longer use “Disputed Flags” — red flags next to fake news articles — to identify fake news for users, because it found that “putting a strong image, like a red flag, next to an article may actually entrench deeply held beliefs – the opposite effect to what we intended.”…(More)”.

Study: Crowdsourced Hospital Ratings May Not Be Fair


Samantha Horton at WFYI: “Though many websites offer non-scientific ratings on a number of services, two Indiana University scientists say judging hospitals that way likely isn’t fair.

Their recently-released study compares the federal government’s Hospital Compare and crowdsourced sites such as Facebook, Yelp and Google. The research finds it’s difficult for people to accurately understand everything a hospital does, and that leads to biased ratings.

Patient experiences with food, amenities and bedside manner often aligns with federal government ratings. But IU professor Victoria Perez says judging quality of care and safety is much more nuanced and people often get it wrong.

“About 20 percent of the hospitals rated best within a local market on social media were rated worst in that market by Hospital Compare in terms of patient health outcomes,” she says.

For the crowdsourced ratings to be more useful, Perez says people would have to know how to cross-reference them with a more reliable data source, such as Hospital Compare. But even that site can be challenging to navigate depending on what the consumer is looking for.

“If you have a condition-specific concern and you can see the clinical measure for a hospital that may be helpful,” says Perez. “But if your particular medical concern is not listed there, it might be hard to extrapolate from the ones that are listed or to know which ones you should be looking at.”

She says consumers would need more information about patient outcomes and other quality metrics to be able to reliably crowdsource a hospital on a site such as Google…(More)”.

Statistics and data science degrees: Overhyped or the real deal?


 at The Conversation“Data science” is hot right now. The number of undergraduate degrees in statistics has tripled in the past decade, and as a statistics professor, I can tell you that it isn’t because freshmen love statistics.

Way back in 2009, economist Hal Varian of Google dubbed statistician the “next sexy job.” Since then, statistician, data scientist and actuary have topped various “best jobs” lists. Not to mention the enthusiastic press coverage of industry applications: Machine learning! Big dataAIDeep learning!

But is it good advice? I’m going to voice an unpopular opinion for the sake of starting a conversation. Stats is indeed useful, but not in the way that the popular media – and all those online data science degree programs – seem to suggest….

While all the press tends to go to the sensationalist applications – computers that watch cat videos, anyone? – the data science boom reflects a broad increase in demand for data literacy, as a baseline requirement for modern jobs.

The “big data era” doesn’t just mean large amounts of data; it also means increased ease and ability to collect data of all types, in all walks of life. Although the big five tech companies – Google, Apple, Amazon, Facebook and Microsoft – represent about 10 percent of the U.S. market cap and dominate the public imagination, they employ only one-half of one percent of all employees.

Therefore, to be a true revolution, data science will need to infiltrate nontech industries. And it is. The U.S. has seen its impact on political campaigns. I myself have consulted in the medical devices sector. A few years back, Walmart held a data analysis competition as a recruiting tool. The need for people that can dig into the data and parse it is everywhere.

In a speech at the National Academy of Sciences in 2015, Steven “Freakonomics” Levitt related his insights about the need for data-savvy workers, based on his experience as a sought-after consultant in fields ranging from the airline industry to fast food….(More)”.

Text Analysis Systems Mine Workplace Emails to Measure Staff Sentiments


Alan Rothman at LLRX: “…For all of these good, bad or indifferent workplaces, a key question is whether any of the actions of management to engage the staff and listen to their concerns ever resulted in improved working conditions and higher levels of job satisfaction?

The answer is most often “yes”. Just having a say in, and some sense of control over, our jobs and workflows can indeed have a demonstrable impact on morale, camaraderie and the bottom line. As posited in the Hawthorne Effect, also termed the “Observer Effect”, this was first discovered during studies in the 1920’s and 1930’s when the management of a factory made improvements to the lighting and work schedules. In turn, worker satisfaction and productivity temporarily increased. This was not so much because there was more light, but rather, that the workers sensed that management was paying attention to, and then acting upon, their concerns. The workers perceived they were no longer just cogs in a machine.

Perhaps, too, the Hawthorne Effect is in some ways the workplace equivalent of the Heisenberg’s Uncertainty Principle in physics. To vastly oversimplify this slippery concept, the mere act of observing a subatomic particle can change its position.¹

Giving the processes of observation, analysis and change at the enterprise level a modern (but non-quantum) spin, is a fascinating new article in the September 2018 issue of The Atlantic entitled What Your Boss Could Learn by Reading the Whole Company’s Emails, by Frank Partnoy.  I highly recommend a click-through and full read if you have an opportunity. I will summarize and annotate it, and then, considering my own thorough lack of understanding of the basics of y=f(x), pose some of my own physics-free questions….

Today the text analytics business, like the work done by KeenCorp, is thriving. It has been long-established as the processing behind email spam filters. Now it is finding other applications including monitoring corporate reputations on social media and other sites.²

The finance industry is another growth sector, as investment banks and hedge funds scan a wide variety of information sources to locate “slight changes in language” that may point towards pending increases or decreases in share prices. Financial research providers are using artificial intelligence to mine “insights” from their own selections of news and analytical sources.

But is this technology effective?

In a paper entitled Lazy Prices, by Lauren Cohen (Harvard Business School and NBER), Christopher Malloy (Harvard Business School and NBER), and Quoc Nguyen (University of Illinois at Chicago), in a draft dated February 22, 2018, these researchers found that the share price of company, in this case NetApp in their 2010 annual report, measurably went down after the firm “subtly changes” its reporting “descriptions of certain risks”. Algorithms can detect such changes more quickly and effectively than humans. The company subsequently clarified in its 2011 annual report their “failure to comply” with reporting requirements in 2010. A highly skilled stock analyst “might have missed that phrase”, but once again its was captured by “researcher’s algorithms”.

In the hands of a “skeptical investor”, this information might well have resulted in them questioning the differences in the 2010 and 2011 annual reports and, in turn, saved him or her a great deal of money. This detection was an early signal of a looming decline in NetApp’s stock. Half a year after the 2011 report’s publication, it was reported that the Syrian government has bought the company and “used that equipment to spy on its citizen”, causing further declines.

Now text analytics is being deployed at a new target: The composition of employees’ communications. Although it has been found that workers have no expectations of privacy in their workplaces, some companies remain reluctant to do so because of privacy concerns. Thus, companies are finding it more challenging to resist the “urge to mine employee information”, especially as text analysis systems continue to improve.

Among the evolving enterprise applications are the human resources departments in assessing overall employee morale. For example, Vibe is such an app that scans through communications on Slack, a widely used enterprise platform. Vibe’s algorithm, in real-time reporting, measures the positive and negative emotions of a work team….(More)”.

Renovating democracy from the bottom up


Nathan Gardels at the Washington Post: “The participatory power of social media is a game changer for governance. It levels the playing field among amateurs and experts, peers and authorities and even challenges the legitimacy of representative government. Its arrival coincides with and reinforces the widespread distrust of elites across the Western world, ripening the historical moment for direct democracy.

For the first time, an Internet-based movement has come to power in a major country, Italy, under the slogan “Participate, don’t delegate!” All of the Five Star Movement’s parliamentarians, who rule the country in a coalition with the far-right League party, were nominated and elected to stand for office online. And they have appointed the world’s first minister for direct democracy, Riccardo Fraccaro.

In Rome this week, he explained the participatory agenda of Italy’s ruling coalition government to The WorldPost at a meeting of the Global Forum on Modern Direct Democracy. “Citizens must be granted the same possibility to actively intervene in the process of managing and administrating public goods as normally carried out by their elected representatives,” he enthused. “What we have witnessed in our democracy is a drift toward ‘partyocracy,’ in which a restricted circle of policymakers have been so fully empowered with decision-making capacity that they could virtually ignore and bypass the public will. The mere election of a representative every so many years is no longer sufficient to prevent this from happening. That is why our government will take the next step forward in order to innovate and enhance our democracy.”

Fraccaro went on: “Referenda, public petitions and the citizens’ ballot initiative are nothing other than the direct means available for the citizenry to submit laws that political parties are not willing to propose or to reject rules approved by political parties that are not welcome by the people. Our aim, therefore, is to establish the principles and practices of direct democracy alongside the system of representative government in order to give real, authentic sovereignty to the citizens.”

At the Rome forum, Deputy Prime Minister Luigi di Maio, a Five Star member, railed against the technocrats and banks he says are trying to frustrate the will of the people. He promised forthcoming changes in the Italian constitution to follow through on Fraccaro’s call for citizen-initiated propositions that will go to the public ballot if the legislature does not act on them.

The program that has so far emerged out of the government’s participatory agenda is a mixed bag. It includes everything from anti-immigrant and anti-vaccine policies to the expansion of digital networks and planting more trees. In a move that has unsettled the European Union authorities as well as Italy’s non-partisan, indirectly-elected president, the governing coalition last week proposed both a tax cut and the provision of a universal basic income — despite the fact that Italy’s long-term debt is already 130 percent of GDP.

The Italian experiment warrants close attention as a harbinger of things to come elsewhere. It reveals a paradox for governance in this digital age: the more participation there is, the greater the need for the counterbalance of impartial mediating practices and institutions that can process the cacophony of voices, sort out the deluge of contested information, dispense with magical thinking and negotiate fair trade-offs among the welter of conflicting interests….(More)”.

Social Media Use in Crisis and Risk Communication: Emergencies, Concerns and Awareness


Open Access Book edited by Harald Hornmoen and Klas Backholm: ” This book is about how different communicators – whether professionals, such as crisis managers, first responders and journalists, or private citizens and disaster victims – have used social media to communicate about risks and crises. It is also about how these very different actors can play a crucial role in mitigating or preventing crises. How can they use social media to strengthen their own and the public’s awareness and understanding of crises when they unfold? How can they use social media to promote resilience during crises and the ability to deal with the after-effects? Moreover, what can they do to avoid using social media in a manner that weakens the situation awareness of crisis workers and citizens, or obstructs effective emergency management?

The RESCUE (Researching Social Media and Collaborative Software Use in Emergency Situations) project, on which this book is based, has sought to enable a more efficient and appropriate use of social media among key communicators, such as journalists and government actors involved in crisis management. Through empirical studies, and by drawing on relevant theory, the collection aims to improve our understanding of how social media have been used in different types of risks and crises. Building on our empirical work, we provide research-based input into how social media can be used efficiently by different communicators in a way appropriate to the specific crisis and to the concerns of the public.

We address our questions by presenting new research-based knowledge on social media use during different crises: the terrorist attacks in Norway on 22 July 2011; the central European floods in Austria in 2013; and the West African Ebola outbreak in 2014. The social media platforms analysed include the most popular ones in the affected areas at the time of the crises: Twitter and Facebook. By addressing such different cases, the book will move the field of crisis communication in social media beyond individual studies towards providing knowledge which is valid across situations….(More)”.

Emerging Labour Market Data Sources towards Digital Technical and Vocational Education and Training (TVET)


Paper by Nikos Askitas, Rafik Mahjoubi, Pedro S. Martins, Koffi Zougbede for Paris21/OECD: “Experience from both technology and policy making shows that solutions for labour market improvements are simply choices of new, more tolerable problems. All data solutions supporting digital Technical and Vocational Education and Training (TVET) will have to incorporate a roadmap of changes rather than an unrealistic super-solution. The ideal situation is a world in which labour market participants engage in intelligent strategic behavior in an informed, fair and sophisticated manner.

Labour market data captures transactions within labour market processes. In order to successfully capture such data, we need to understand the specifics of these market processes. Designing an ecosystem of labour market matching facilitators and rules of engagement for contributing to a lean and streamlined Logistics Management and Information System (LMIS) is the best way to create Big Data with context relevance. This is in contrast with pre-existing Big Data captured by global job boards or social media for which relevance is limited by the technology access gap and its variations across the developing world.

Network effects occur in technology and job facilitation, as seen in the developed world. Managing and instigating the right network effects might be crucial to avoid fragmented stagnation and inefficiency. This is key to avoid throwing money behind wrong choices that do not gain traction.

A mixed mode approach is possibly the ideal approach for developing countries. Mixing offline and online elements correctly will be crucial in bridging the technology access gap and reaping the benefits of digitisation at the same time.

Properly incentivising the various entities is critical for progression, and more specifically the private sector, which is significantly more agile and inventive, has “skin in the game” and a long-term commitment to the conditions in the field, has intimate knowledge of how to solve the the technology gap and brings a better understanding of the particular ambient context they are operating in. To summarise: Big Data starts small.

Managing expectations and creating incentives for the various stakeholders will be crucial in establishing digitally supported TVET. Developing the right business models will be crucial in the short term and beyond, and it will be the result of creating the right mix of technological and policy expertise with good knowledge of the situation on the ground….(More)”.

Crowdsourced social media data for disaster management: Lessons from the PetaJakarta.org project


R.I.Ogie, R.J.Clarke, H.Forehead and P.Perez in Computers, Environment and Urban Systems: “The application of crowdsourced social media data in flood mapping and other disaster management initiatives is a burgeoning field of research, but not one that is without challenges. In identifying these challenges and in making appropriate recommendations for future direction, it is vital that we learn from the past by taking a constructively critical appraisal of highly-praised projects in this field, which through real-world implementations have pioneered the use of crowdsourced geospatial data in modern disaster management. These real-world applications represent natural experiments, each with myriads of lessons that cannot be easily gained from computer-confined simulations.

This paper reports on lessons learnt from a 3-year implementation of a highly-praised project- the PetaJakarta.org project. The lessons presented derive from the key success factors and the challenges associated with the PetaJakarta.org project. To contribute in addressing some of the identified challenges, desirable characteristics of future social media-based disaster mapping systems are discussed. It is envisaged that the lessons and insights shared in this study will prove invaluable within the broader context of designing socio-technical systems for crowdsourcing and harnessing disaster-related information….(More)”.