Global Fishing Watch: Pooling Data and Expertise to Combat Illegal Fishing


Data Collaborative Case Study by Michelle Winowatan, Andrew Young, and Stefaan Verhulst: “

Global Fishing Watch, originally set up through a collaboration between Oceana, SkyTruth and Google, is an independent nonprofit organization dedicated to advancing responsible stewardship of our oceans through increased transparency in fishing activity and scientific research. Using big data processing and machine learning, Global Fishing Watch visualizes, tracks, and shares data about global fishing activity in near-real time and for free via their public map. To date, the platform tracks approximately 65,000 commercial fishing vessels globally. These insights have been used in a number of academic publications, ocean advocacy efforts, and law enforcement activities.

Data Collaborative Model: Based on the typology of data collaborative practice areas, Global Fishing Watch is an example of the data pooling model of data collaboration, specifically a public data pool. Public data pools co-mingle data assets from multiple data holders — including governments and companies — and make those shared assets available on the web. This approach enabled the data stewards and stakeholders involved in Global Fishing Watch to bring together multiple data streams from both public- and private-sector entities in a single location. This single point of access provides the public and relevant authorities with user-friendly access to actionable, previously fragmented data that can drive efforts to address compliance in fisheries and illegal fishing around the world.

Data Stewardship Approach: Global Fishing Watch also provides a clear illustration of the importance of data stewards. For instance, representatives from Google Earth Outreach, one of the data holders, played an important stewardship role in seeking to connect and coordinate with SkyTruth and Oceana, two important nonprofit environmental actors who were working separately prior to this initiative. The brokering of this partnership helped to bring relevant data assets from the public and private sectors to bear in support of institutional efforts to address the stubborn challenge of illegal fishing.

Read the full case study here.”

Three Examples of Data Empowerment


Blog by Michael Cañares: “It was a humid December afternoon in Banda Aceh, a bustling city in north Indonesia. Two women members of an education reform advocacy group were busy preparing infographics on how the city government was spending its education budget and its impact on service delivery quality in schools. The room was abuzz with questions and apprehension because the next day, the group would present its analysis on the data that they were able to access for the first time to education department officials. The analyses uncovered inefficiencies, poor school performance, ineffective allocation of resources, among others.

While worried about how the officials would react, almost everyone in the room was cheerful. One advocate told me she found the whole process liberating. She found it exhilarating to use government-published data to ask civil servants why the state of education in some schools was disappointing. “Armed with data, I am no longer afraid to speak my mind,” she said.

This was five years ago, but the memory has stuck with me. It was one of many experiences that inspired me to continue advocating for governments to publish data proactively, and searching for ways to use data to strengthen people’s voice on matters that are important to them.

Globally, there are many examples of how data has enabled people to advocate for their rights, demand better public services or hold governments to account. This blog post shares a few examples, focusing largely on how people are able to access and use data that shape their lives — the first dimension of how we characterize data empowerment….

Poverty Stoplight: People use their own data to improve their lives

Data Zetu: Giving borrowed data back to citizens

Check My School: Data-based community action to improve school performance…(More)”.

Taming the Beast: Harnessing Blockchains in Developing Country Governments


Paper by Raúl Zambrano: “Amid pressing demands to achieve critical sustainable development goals, governments in developing countries face the additional complex task of embracing new digital technologies such as blockchains. This paper develops a framework interlinking development, technology, and government institutions that policymakers and development practitioners could use to address such a conundrum. State capacity and democratic governance are introduced as drivers in the overall analysis. With this in hand, blockchain technology is revisited from the perspective of governments in the Global South, identifying in the process key traits and proposing a new typology. An overview of the status of blockchain deployments in the Global South follows, complemented by a closer look at country examples to distill trends, patterns and risks. The paper closes with a discussion of the findings, highlighting both challenges and opportunities for governments. It also provides basic guidance to development practitioners interested in enhancing current programming using blockchains as an enabler….(More)”

What is My Data Worth?


Ruoxi Jia at Berkeley artificial intelligence research: “People give massive amounts of their personal data to companies every day and these data are used to generate tremendous business values. Some economists and politicians argue that people should be paid for their contributions—but the million-dollar question is: by how much?

This article discusses methods proposed in our recent AISTATS and VLDB papers that attempt to answer this question in the machine learning context. This is joint work with David Dao, Boxin Wang, Frances Ann Hubis, Nezihe Merve Gurel, Nick Hynes, Bo Li, Ce Zhang, Costas J. Spanos, and Dawn Song, as well as a collaborative effort between UC Berkeley, ETH Zurich, and UIUC. More information about the work in our group can be found here.

What are the existing approaches to data valuation?

Various ad-hoc data valuation schemes have been studied in the literature and some of them have been deployed in the existing data marketplaces. From a practitioner’s point of view, they can be grouped into three categories:

  • Query-based pricing attaches values to user-initiated queries. One simple example is to set the price based on the number of queries allowed during a time window. Other more sophisticated examples attempt to adjust the price to some specific criteria, such as arbitrage avoidance.
  • Data attribute-based pricing constructs a price model that takes into account various parameters, such as data age, credibility, potential benefits, etc. The model is trained to match market prices released in public registries.
  • Auction-based pricing designs auctions that dynamically set the price based on bids offered by buyers and sellers.

However, existing data valuation schemes do not take into account the following important desiderata:

  • Task-specificness: The value of data depends on the task it helps to fulfill. For instance, if Alice’s medical record indicates that she has disease A, then her data will be more useful to predict disease A as opposed to other diseases.
  • Fairness: The quality of data from different sources varies dramatically. In the worst-case scenario, adversarial data sources may even degrade model performance via data poisoning attacks. Hence, the data value should reflect the efficacy of data by assigning high values to data which can notably improve the model’s performance.
  • Efficiency: Practical machine learning tasks may involve thousands or billions of data contributors; thus, data valuation techniques should be capable of scaling up.

With the desiderata above, we now discuss a principled notion of data value and computationally efficient algorithms for data valuation….(More)”.

Will Artificial Intelligence Eat the Law? The Rise of Hybrid Social-Ordering Systems


Paper by Tim Wu: “Software has partially or fully displaced many former human activities, such as catching speeders or flying airplanes, and proven itself able to surpass humans in certain contests, like Chess and Jeopardy. What are the prospects for the displacement of human courts as the centerpiece of legal decision-making?

Based on the case study of hate speech control on major tech platforms, particularly on Twitter and Facebook, this Essay suggests displacement of human courts remains a distant prospect, but suggests that hybrid machine–human systems are the predictable future of legal adjudication, and that there lies some hope in that combination, if done well….(More)”.

Trusted smart statistics: Motivations and principles


Paper by Fabio Ricciato et al : “In this contribution we outline the concept of Trusted Smart Statistics as the natural evolution of official statistics in the new datafied world. Traditional data sources, namely survey and administrative data, represent nowadays a valuable but small portion of the global data stock, much thereof being held in the private sector. The availability of new data sources is only one aspect of the global change that concerns official statistics. Other aspects, more subtle but not less important, include the changes in perceptions, expectations, behaviours and relations between the stakeholders. The environment around official statistics has changed: statistical offices are not any more data monopolists, but one prominent species among many others in a larger (and complex) ecosystem. What was established in the traditional world of legacy data sources (in terms of regulations, technologies, practices, etc.) is not guaranteed to be sufficient any more with new data sources.

Trusted Smart Statistics is not about replacing existing sources and processes, but augmenting them with new ones. Such augmentation however will not be only incremental: the path towards Trusted Smart Statistics is not about tweaking some components of the legacy system but about building an entirely new system that will coexist with the legacy one. In this position paper we outline some key design principles for the new Trusted Smart Statistics system. Taken collectively they picture a system where the smart and trust aspects enable and reinforce each other. A system that is more extrovert towards external stakeholders (citizens, private companies, public authorities) with whom Statistical Offices will be sharing computation, control, code, logs and of course final statistics, without necessarily sharing the raw input data….(More)”.

Towards adaptive governance in big data health research: implementing regulatory principles


Chapter by Alessandro Blasimme and Effy Vayena: “While data-enabled health care systems are in their infancy, biomedical research is rapidly adopting the big data paradigm. Digital epidemiology for example, already employs data generated outside the public health care system – that is, data generated without the intent of using them for epidemiological research – to understand and prevent patterns of diseases in populations (Salathé 2018)(Salathé 2018). Precision medicine – pooling together genomic, environmental and lifestyle data – also represents a prominent example of how data integration can drive both fundamental and translational research in important medical domains such as oncology (D. C. Collins et al. 2017). All of this requires the collection, storage, analysis and distribution of massive amounts of personal information as well as the use of state-of-the art data analytics tools to uncover healthand disease related patterns.


The realization of the potential of big data in health evokes a necessary commitment to a sense of “continuity” articulated in three distinct ways: a) from data generation to use (as in the data enabled learning health care ); b) from research to clinical practice e.g. discovery of new mutations in the context of diagnostics; c) from strictly speaking health data (Vayena and Gasser 2016) e.g. clinical records, to less so e.g. tweets used in digital epidemiology. These continuities face the challenge of regulatory and governance approaches that were designed for clear data taxonomies, for a less blurred boundary between research and clinical practice, and for rules that focused mostly on data generation and less on their eventual and multiple uses.

The result is significant uncertainty about how responsible use of such large amounts of sensitive personal data could be fostered. In this chapter we focus on the uncertainties surrounding the use of biomedical big data in the context of health research. Are new criteria needed to review biomedical big data research projects? Do current mechanisms, such as informed consent, offer sufficient protection to research participants’ autonomy and privacy in this new context? Do existing oversight mechanisms ensure transparency and accountability in data access and sharing? What monitoring tools are available to assess how personal data are used over time? Is the equitable distribution of benefits accruing from such data uses considered, or can it be ensured? How is the public being involved – if at all – with decisions about creating and using large data
repositories for research purposes? What is the role that IT (information technology) players, and especially big ones, acquire in research? And what regulatory instruments do we have to ensure that such players do not undermine the independence of research?…(More)”.

Data in Society: Challenging Statistics in an Age of Globalisation


Book edited by Jeff Evans, Sally Ruane and Humphrey Southall: “Statistical data and evidence-based claims are increasingly central to our everyday lives. Critically examining ‘Big Data’, this book charts the recent explosion in sources of data, including those precipitated by global developments and technological change. It sets out changes and controversies related to data harvesting and construction, dissemination and data analytics by a range of private, governmental and social organisations in multiple settings.

Analysing the power of data to shape political debate, the presentation of ideas to us by the media, and issues surrounding data ownership and access, the authors suggest how data can be used to uncover injustices and to advance social progress…(More)”.

Responsible data sharing in a big data-driven translational research platform: lessons learned


Paper by S. Kalkman et al: “The sharing of clinical research data is increasingly viewed as a moral duty [1]. Particularly in the context of making clinical trial data widely available, editors of international medical journals have labeled data sharing a highly efficient way to advance scientific knowledge [2,3,4]. The combination of even larger datasets into so-called “Big Data” is considered to offer even greater benefits for science, medicine and society [5]. Several international consortia have now promised to build grand-scale, Big Data-driven translational research platforms to generate better scientific evidence regarding disease etiology, diagnosis, treatment and prognosis across various disease areas [6,7,8].

Despite anticipated benefits, large-scale sharing of health data is charged with ethical questions. Stakeholders have been urged to consider how to manage privacy and confidentiality issues, ensure valid informed consent, and determine who gets to decide about data access [9]. More fundamentally, new data sharing activities prompt questions about social justice and public trust [10]. To balance potential benefits and ethical considerations, data sharing platforms require guidance for the processes of interaction and decision-making. In the European Union (EU), legal norms specified for the sharing of personal data for health research, most notably those set out in the General Data Protection Regulation (GDPR) (EU 2016/679), remain open to interpretation and offer limited practical guidance to researchers [12,12,13]. Striking in this regard is that the GDPR itself stresses the importance of adherence to ethical standards, when broad consent is put forward as a legal basis for the processing of personal data. For example, Recital 33 of the GDPR states that data subjects should be allowed to give “consent to certain areas of scientific research when in keeping with recognised ethical standards for scientific research” [14]. In fact, the GDPR actually encourages data controllers to establish self-regulating mechanisms, such as a code of conduct. To foster responsible and sustainable data sharing in translational research platforms, ethical guidance and governance is therefore necessary. Here, we define governance as ‘the processes of interaction and decision-making among the different stakeholders that are involved in a collective problem that lead to the creation, reinforcement, or reproduction of social norms and institutions’…(More)”.

Too much information? The new challenge for decision-makers


Daniel Winter at the Financial Times: “…Concern over technology’s capacity both to shrink the world and complicate it has grown steadily since the second world war — little wonder, perhaps, when the existential threats it throws up have expanded from nuclear weapons to encompass climate change (and any consequent geoengineering), gene editing and AI as well. The financial crisis of 2008, in which poorly understood investment instruments made economies totter, has added to the unease over our ability to make sense of things.

From preoccupying cold war planners, attempts to codify best practice in sense-making have gone on to exercise (often profitably) business academics and management consultants, and now draw large audiences online.

Blogs, podcasts and YouTube channels such as Rebel Wisdom and Future Thinkers aim to arm their followers with the tools they need to understand the world, and make the right decisions. Daniel Schmachtenberger is one such voice, whose interviews on YouTube and his podcast Civilization Emerging have reached hundreds of thousands of people.

“Due to increasing technological capacity — increasing population multiplied by increasing impact per person — we’re making more and more consequential choices with worse and worse sense-making to inform those choices,” he says in one video. “Exponential tech is leading to exponential disinformation.” Strengthening individuals’ ability to handle and filter information would go a long way towards improving the “information ecology”, Mr Schmachtenberger argues. People need to get used to handling complex information and should train themselves to be less distracted. “The impulse to say, ‘hey, make it really simple so everyone can get it’ and the impulse to say ‘[let’s] help people actually make sense of the world well’ are different things,” he says. Of course, societies have long been accustomed to handling complexity. No one person can possibly memorise the entirety of US law or be an expert in every field of medicine. Libraries, databases, and professional and academic networks exist to aggregate expertise.

The increasing bombardment of data — the growing amount of evidence that can inform any course of action — pushes such systems to the limit, prompting people to offload the work to computers. Yet this only defers the problem. As AI becomes more sophisticated, its decision-making processes become more opaque. The choice as to whether to trust it — to let it run a self-driving car in a crowded town, say — still rests with us.

Far from being able to outsource all complex thinking to the cloud, Prof Guillén warns that leaders will need to be as skilled as ever at handling and critically evaluating information. It will be vital, he suggests, to build flexibility into the policymaking process.

“The feedback loop between the effects of the policy and how you need to recalibrate the policy in real time becomes so much faster and so much more unpredictable,” he says. “That’s the effect that complex policies produce.” A more piecemeal approach could better suit regulation in fast-moving fields, he argues, with shorter “bursts” of rulemaking, followed by analysis of the effects and then adjustments or additions where necessary.

Yet however adept policymakers become at dealing with a complex world, their task will at some point always resist simplification. That point is where the responsibility resides. Much as we may wish it otherwise, governance will always be as much an art as a science….(More)”.