Paper by Carson K. Leung et al: “As the urbanization of the world continues and the population of cities rise, the issue of how to effectively move all these people around the city becomes much more important. In order to use the limited space in a city most efficiently, many cities and their residents are increasingly looking towards public transportation as the solution. In this paper, we focus on the public bus system as the primary form of public transit. In particular, we examine open public transit data for the Canadian city of Winnipeg. We mine and conduct transportation analytics on data prior to the coronavirus disease 2019 (COVID-19) situation and during the COVID-19 situation. By discovering how often and when buses were reported to be too full to take on new passengers at bus stops, analysts can get an insight of which routes and destinations are the busiest. This information would help decision makers make appropriate actions (e.g., add extra bus for those busiest routines). This results in a better and more convenient transit system towards a smart city. Moreover, during the COVID-19 era, it leads to additional benefits of contributing to safer buses services and bus waiting experiences while maintaining social distancing…(More)”.
Personal data, public data, privacy & power: GDPR & company data
Open Corporates: “…there are three other aspects which are relevant when talking about access to EU company data.
Cargo-culting GDPR
The first, is a tendency to take this complex and subtle legislation that is GDPR and use a poorly understood version in other legislation and regulation, even if that regulation is already covered by GDPR. This actually undermines the GDPR regime, and prevents it from working effectively, and should strongly be resisted. In the tech world, such approaches are called ‘cargo-culting’.
Similarly GDPR is often used as an excuse for not releasing company information as open data, even when the same data is being sold to third parties apparently without concerns — if one is covered by GDPR, the other certainly should be.
Widened power asymmetries
The second issue is the unintended consequences of GDPR, specifically the way it increases asymmetries of power and agency. For example, something like the so-called Right To Be Forgotten takes very significant resources to implement, and so actually strengthens the position of the giant tech companies — for such companies, investing millions in large teams to decide who should and should not be given the Right To Be Forgotten is just a relatively small cost of doing business.
Another issue is the growth of a whole new industry dedicated to removing traces of people’s past from the internet (2), which is also increasing the asymmetries of power. The vast majority of people are not directors of companies, or beneficial owners, and it is only the relatively rich and powerful (including politicians and criminals) who can afford lawyers to stifle free speech, or remove parts of their past they would rather not be there, from business failures to associations with criminals.
OpenCorporates, for example, was threatened with a lawsuit from a member of one of the wealthiest families in Europe for reproducing a gazette notice from the Luxembourg official gazette (a publication that contains public notices). We refused to back down, believing we had a good case in law and in the public interest, and the other side gave up. But such so-called SLAPP suits are becoming increasingly common, although unlike many US states there are currently no defences in place to resist these in the EU, despite pressure from civil society to address this….
At the same time, the automatic assumption that all Personally Identifiable Information (PII), someone’s name for example, is private is highly problematic, confusing both citizens and policy makers, and further undermining democracies and fair societies. As an obvious case, it’s critical that we know the names of our elected representatives, and those in positions of power, otherwise we would have an opaque society where decisions are made by nameless individuals with opaque agendas and personal interests — such as a leader awarding a contract to their brother’s company, for example.
As the diagram below illustrates, there is some personally identifiable information that it’s strongly in the public interest to know. Take the director or beneficial owner of a company, for example, of course their details are PII — clearly you need to know their name (and other information too), otherwise what actually do you know about them, or the company (only that some unnamed individual has been given special protection under law to be shielded from the company’s debts and actions, and yet can benefit from its profits)?
On the other hand, much of the data which is truly about our privacy — the profiles, inferences and scores that companies store on us — is explicitly outside GDPR, if it doesn’t contain PII.
Hopefully, as awareness of the issues increases, we will develop a more nuanced, deeper, understanding of privacy, such that case law around GDPR, and successors to this legislation begin to rebalance and case law starts to bring clarity to the ambiguities of the GDPR….(More)”.
A need for open public data standards and sharing in light of COVID-19
Lauren Gardner, Jeremy Ratcliff, Ensheng Dong and Aaron Katz at the Lancet: “The disjointed public health response to the COVID-19 pandemic has demonstrated one clear truth: the value of timely, publicly available data. The John Hopkins University (JHU) Center for Systems Science and Engineering’s COVID-19 dashboard exists to provide this information. What grew from a modest effort to track a novel cause of pneumonia in China quickly became a mainstay symbol of the pandemic, receiving over 1 billion hits per day within weeks of its creation, primarily driven by the general public seeking information on the emerging health crisis. Critically, the data supporting the visualisation were provided in a publicly accessible repository and eagerly adopted by policy makers and the research community for purposes of modelling and planning, as evidenced by the more than 1200 citations in the first 4 months of its publication. 6 months into the pandemic, the JHU COVID-19 dashboard still stands as the authoritative source of global COVID-19 epidemiological data.
Similar commendable efforts to facilitate public understanding of COVID-19 have since been introduced by various academic, industry, and public health entities. These costly and disparate efforts around the world were necessary to fill the gap left by the lack of an established infrastructure for real-time reporting and open data sharing during an ongoing public health crisis…
Although existing systems were in place to achieve such objectives, they were not empowered or equipped to fully meet the public’s expectation for timely open data at an actionable level of spatial resolution. Moving forward, it is imperative that a standardised reporting system for systematically collecting, visualising, and sharing high-quality data on emerging infectious and notifiable diseases in real-time is established. The data should be made available at a spatial and temporal scale that is granular enough to prove useful for planning and modelling purposes. Additionally, a critical component of the proposed system is the democratisation of data; all collected information (observing necessary privacy standards) should be made publicly available immediately upon release, in machine-readable formats, and based on open data standards..(More)”. (See also https://data4covid19.org/)
How open data could tame Big Tech’s power and avoid a breakup
Patrick Leblond at The Conversation: “…Traditional antitrust approaches such as breaking up Big Tech firms and preventing potential competitor acquisitions are never-ending processes. Even if you break them up and block their ability to acquire other, smaller tech firms, Big Tech will start growing again because of network effects and their data advantage.
And how do we know when a tech firm is big enough to ensure competitive markets? What are the size or scope thresholds for breaking up firms or blocking mergers and acquisitions?
A small startup acquired for millions of dollars can be worth billions of dollars for a Big Tech acquirer once integrated in its ecosystem. A series of small acquisitions can result in a dominant position in one area of the digital economy. Knowing this, competition/antitrust authorities would potentially have to examine every tech transaction, however small.
Not only would this be administratively costly or burdensome on resources, but it would also be difficult for government officials to assess with some precision (and therefore legitimacy), the likely future economic impact of an acquisition in a rapidly evolving technological environment.
Open data access, level the playing field
Given that mass data collection is at the core of Big Tech’s power as gatekeepers to customers, a key solution is to open up data access for other firms so that they can compete better.
Anonymized data (to protect an individual’s privacy rights) about people’s behaviour, interests, views, etc., should be made available for free to anyone wanting to pursue a commercial or non-commercial endeavour. Data about a firm’s operations or performance would, however, remain private.
Using an analogy from the finance world, Big Tech firms act as insider traders. Stock market insiders often possess insider (or private) information about companies that the public does not have. Such individuals then have an incentive to profit by buying or selling shares in those companies before the public becomes aware of the information.
Big Tech’s incentives are no different than stock market insiders. They trade on exclusively available private information (data) to generate extraordinary profits.
Continuing the finance analogy, financial securities regulators forbid the use of inside or non-publicly available information for personal benefit. Individuals found to illegally use such information are punished with jail time and fines.
They also require companies to publicly report relevant information that affects or could significantly affect their performance. Finally, they oblige insiders to publicly report when they buy and sell shares in a company in which they have access to privileged information.
Transposing stock market insider trading regulation to Big Tech implies that data access and use should be monitored under an independent regulatory body — call it a Data Market Authority. Such a body would be responsible for setting and enforcing principles, rules and standards of behaviour among individuals and organizations in the data-driven economy.
For example, a Data Market Authority would require firms to publicly report how they acquire and use personal data. It would prohibit personal data hoarding by ensuring that data is easily portable from one platform, network or marketplace to another. It would also prohibit the buying and selling of personal data as well as protect individuals’ privacy by imposing penalties on firms and individuals in cases of non-compliance.
Data openly and freely available under a strict regulatory environment would likely be a better way to tame Big Tech’s power than breaking them up and having antitrust authorities approving every acquisition that they wish to make….(More)”.
Do FOI laws and open government data deliver as anti-corruption policies? Evidence from a cross-country study
Paper by Mária Žuffová: “In election times, political parties promise in their manifestos to pass reforms increasing access to government information to root out corruption and improve public service delivery. Scholars have already offered several fascinating explanations of why governments adopt transparency policies that constrain their choices. However, knowledge of their impacts is limited. Does greater access to information deliver on its promises as an anti-corruption policy? While some research has already addressed this question in relation to freedom of information laws, the emergence of new digital technologies enabled new policies, such as open government data. Its effects on corruption remain empirically underexplored due to its novelty and a lack of measurements. In this article, I provide the first empirical study of the relationship between open government data, relative to FOI laws, and corruption. I propose a theoretical framework, which specifies conditions necessary for FOI laws and open government data to affect corruption levels, and I test it on a novel cross-country dataset.
The results suggest that the effects of open government data on corruption are conditional upon the quality of media and internet freedom. Moreover, other factors, such as free and fair elections, independent and accountable judiciary, or economic development, are far more critical for tackling corruption than increasing access to information. These findings are important for policies. In particular, digital transparency reforms will not yield results in the anti-corruption fight unless robust provisions safeguarding media and internet freedom complement them….(More)”.
Scraping Court Records Data to Find Dirty Cops
Article by Lawsuit.org: “In the 2002 dystopian sci-fi film “Minority Report,” law enforcement can manage crime by “predicting” illegal behavior before it happens. While fiction, the plot is intriguing and contributes to the conversation on advanced crime-fighting technology. However, today’s world may not be far off.
Data’s role in our lives and more accessibility to artificial intelligence is changing the way we approach topics such as research, real estate, and law enforcement. In fact, recent investigative reporting has shown that “dozens of [American] cities” are now experimenting with predictive policing technology.
Despite the current controversy surrounding predictive policing, it seems to be a growing trend that has been met with little real resistance. We may be closer to policing that mirrors the frightening depictions in “Minority Report” than we ever thought possible.
Fighting Fire With Fire
In its current state, predictive policing is defined as:
“The usage of mathematical, predictive analytics, and other analytical techniques in law enforcement to identify potential criminal activity. Predictive policing methods fall into four general categories: methods for predicting crimes, methods for predicting offenders, methods for predicting perpetrators’ identities, and methods for predicting victims of crime.”
While it might not be possible to prevent predictive policing from being employed by the criminal justice system, perhaps there are ways we can create a more level playing field: One where the powers of big data analysis aren’t just used to predict crime, but also are used to police law enforcement themselves.
Below, we’ve provided a detailed breakdown of what this potential reality could look like when applied to one South Florida county’s public databases, along with information on how citizens and communities can use public data to better understand the behaviors of local law enforcement and even individual police officers….(More)”.
Open Data from Authoritarian Regimes: New Opportunities, New Challenges
Paper by Ruth D. Carlitz and Rachael McLellan: “Data availability has long been a challenge for scholars of authoritarian politics. However, the promotion of open government data—through voluntary initiatives such as the Open Government Partnership and soft conditionalities tied to foreign aid—has motivated many of the world’s more closed regimes to produce and publish fine-grained data on public goods provision, taxation, and more. While this has been a boon to scholars of autocracies, we argue that the politics of data production and dissemination in these countries create new challenges.
Systematically missing or biased data may jeopardize research integrity and lead to false inferences. We provide evidence of such risks from Tanzania. The example also shows how data manipulation fits into the broader set of strategies that authoritarian leaders use to legitimate and prolong their rule. Comparing data released to the public on local tax revenues with verified internal figures, we find that the public data appear to significantly underestimate opposition performance. This can bias studies on local government capacity and risk parroting the party line in data form. We conclude by providing a framework that researchers can use to anticipate and detect manipulation in newly available data….(More)”.
EU Company Data: State of the Union 2020
Report by OpenCorporates: “… on access to company data in the EU. It’s completely revised, with more detail on the impact that the lack of access to this critical dataset has – on business, on innovation, on democracy, and society.
The results are still not great however:
- Average score is low
The average score across the EU in terms of access to company data is just 40 out of 100. This is better than the average score 8 years ago, which was just 23 out of 100, but still very low nevertheless. - Some major economies score badly
Some of the EU’s major economies continue to score very badly indeed, with Germany, for example, scoring just 15/100, Italy 10/100, and Spain 0/100. - EU policies undermined
The report identifies 15 areas where the lack of open company data frustrates, impedes or otherwise has a negative impact on EU policy. - Inequalities widened
The report also identifies how inequalities are further widened by poor access to this critical dataset, and how the recovery from COVID-19 will be hampered by it too.
On the plus side, the report also identifies the EU Open Data & PSI Directive passed last year as potentially game changing – but only if it is implemented fully, and there are significant doubts whether this will happen….(More)”
Characterizing Disinformation Risk to Open Data in the Post-Truth Era
Paper by Adrienne Colborne and Michael Smit: “Curated, labeled, high-quality data is a valuable commodity for tasks such as business analytics and machine learning. Open data is a common source of such data—for example, retail analytics draws on open demographic data, and weather forecast systems draw on open atmospheric and ocean data. Open data is released openly by governments to achieve various objectives, such as transparency, informing citizen engagement, or supporting private enterprise.
Critical examination of ongoing social changes, including the post-truth phenomenon, suggests the quality, integrity, and authenticity of open data may be at risk. We introduce this risk through various lenses, describe some of the types of risk we expect using a threat model approach, identify approaches to mitigate each risk, and present real-world examples of cases where the risk has already caused harm. As an initial assessment of awareness of this disinformation risk, we compare our analysis to perspectives captured during open data stakeholder consultations in Canada…(More)”.
Why open science is critical to combatting COVID-19
Article by the OECD: “…In January 2020, 117 organisations – including journals, funding bodies, and centres for disease prevention – signed a statement titled “Sharing research data and findings relevant to the novel coronavirus outbreak”, committing to provide immediate open access for peer-reviewed publications at least for the duration of the outbreak, to make research findings available via preprint servers, and to share results immediately with the World Health Organization (WHO). This was followed in March by the Public Health Emergency COVID-19 Initiative, launched by 12 countries1 at the level of chief science advisors or equivalent, calling for open access to publications and machine-readable access to data related to COVID-19, which resulted in an even stronger commitment by publishers.
The Open COVID Pledge was launched in April 2020 by an international coalition of scientists, lawyers, and technology companies, and calls on authors to make all intellectual property (IP) under their control available, free of charge, and without encumbrances to help end the COVID-19 pandemic, and reduce the impact of the disease….
Remaining challenges
While clinical, epidemiological and laboratory data about COVID-19 is widely available, including genomic sequencing of the pathogen, a number of challenges remain:
- All data is not sufficiently findable, accessible, interoperable and reusable (FAIR), or not yet FAIR data.
- Sources of data tend to be dispersed, even though many pooling initiatives are under way, curation needs to be operated “on the fly”.
- Providing access to personal health record sharing needs to be readily accessible, pending the patient’s consent. Legislation aimed at fostering interoperability and avoiding information blocking are yet to be passed in many OECD countries. Access across borders is even more difficult under current data protection frameworks in most OECD countries.
- In order to achieve the dual objectives of respecting privacy while ensuring access to machine readable, interoperable and reusable clinical data, the Virus Outbreak Data Network (VODAN) proposes to create FAIR data repositories which could be used by incoming algorithms (virtual machines) to ask specific research questions.
- In addition, many issues arise around the interpretation of data – this can be illustrated by the widely followed epidemiological statistics. Typically, the statistics concern “confirmed cases”, “deaths” and “recoveries”. Each of these items seem to be treated differently in different countries, and are sometimes subject to methodological changes within the same country.
- Specific standards for COVID-19 data therefore need to be established, and this is one of the priorities of the UK COVID-19 Strategy. A working group within Research Data Alliance has been set up to propose such standards at an international level.
- In some cases it could be inferred that the transparency of the statistics may have guided governments to restrict testing in order to limit the number of “confirmed cases” and avoid the rapid rise of numbers. Lower testing rates can in turn reduce the efficiency of quarantine measures, lowering the overall efficiency of combating the disease….(More)”.