Stop the Open Data Bus, We Want to Get Off


Paper by Chris Culnane, Benjamin I. P. Rubinstein, and Vanessa Teague: “The subject of this report is the re-identification of individuals in the Myki public transport dataset released as part of the Melbourne Datathon 2018. We demonstrate the ease with which we were able to re-identify ourselves, our co-travellers, and complete strangers; our analysis raises concerns about the nature and granularity of the data released, in particular the ability to identify vulnerable or sensitive groups…..

This work highlights how a large number of passengers could be re-identified in the 2018 Myki data release, with detailed discussion of specific people. The implications of re-identification are potentially serious: ex-partners, one-time acquaintances, or other parties can determine places of home, work, times of travel, co-travelling patterns—presenting risk to vulnerable groups in particular…

In 2018 the Victorian Government released a large passenger centric transport dataset to a data science competition—the 2018 Melbourne Datathon. Access to the data was unrestricted, with a URL provided on the datathon’s website to download the complete dataset from an Amazon S3 Bucket. Over 190 teams continued to analyse the data through the 2 month competition period. The data consisted of touch on and touch off events for the Myki smart card ticketing system used throughout the state of Victoria, Australia. With such data, contestants would be able to apply retrospective analyses on an entire public transport system, explore suitability of predictive models, etc.

The Myki ticketing system is used across Victorian public transport: on trains, buses and trams. The dataset was a longitudinal dataset, consisting of touch on and touch off events from Week 27 in 2015 through to Week 26 in 2018. Each event contained a card identifier (cardId; not the actual card number), the card type, the time of the touch on or off, and various location information, for example a stop ID or route ID, along with other fields which we omit here for brevity. Events could be indexed by the cardId and as such, all the events associated with a single card could be retrieved. There are a total of 15,184,336 cards in the dataset—more than twice the 2018 population of Victoria. It appears that all touch on and off events for metropolitan trains and trams have been included, though other forms of transport such as intercity trains and some buses are absent. In total there are nearly 2 billion touch on and off events in the dataset.

No information was provided as to the de-identification that was performed on the dataset. Our analysis indicates that little to no de-identification took place on the bulk of the data, as will become evident in Section 3. The exception is the cardId, which appears to have been mapped in some way from the Myki Card Number. The exact mapping has not been discovered, although concerns remain as to its security effectiveness….(More)”.

Datafication and accountability in public health


Introduction to a special issue of Social Studies of Science by Klaus Hoeyer, Susanne Bauer, and Martyn Pickersgill: “In recent years and across many nations, public health has become subject to forms of governance that are said to be aimed at establishing accountability. In this introduction to a special issue, From Person to Population and Back: Exploring Accountability in Public Health, we suggest opening up accountability assemblages by asking a series of ostensibly simple questions that inevitably yield complicated answers: What is counted? What counts? And to whom, how and why does it count? Addressing such questions involves staying attentive to the technologies and infrastructures through which data come into being and are made available for multiple political agendas. Through a discussion of public health, accountability and datafication we present three key themes that unite the various papers as well as illustrate their diversity….(More)”.

What the Hack? – Towards a Taxonomy of Hackathons


Paper by Christoph Kollwitz and Barbara Dinter: “In order to master the digital transformation and to survive in global competition, companies face the challenge of improving transformation processes, such as innovation processes. However, the design of these processes poses a challenge, as the related knowledge is still largely in its infancy. A popular trend since the mid-2000s are collaborative development events, so-called hackathons, where people with different professional backgrounds work collaboratively on development projects for a defined period. While hackathons are a widespread phenomenon in practice and many field reports and individual observations exist, there is still a lack of holistic and structured representations of the new phenomenon in literature.

The paper at hand aims to develop a taxonomy of hackathons in order to illustrate their nature and underlying characteristics. For this purpose, a systematic literature review is combined with existing taxonomies or taxonomy-like artifacts (e.g. morphological boxes, typologies) from similar research areas in an iterative taxonomy development process. The results contribute to an improved understanding of the phenomenon hackathon and allow the more effective use of hackathons as a new tool in organizational innovation processes. Furthermore, the taxonomy provides guidance on how to apply hackathons for organizational innovation processes….(More)”.

Aliens in Europe. An open approach to involve more people in invasive species detection


Paper by Sven Schade et al: “Amplified by the phenomenon of globalisation, such as increased human mobility and the worldwide shipping of goods, we observe an increasing spread of animals and plants outside their native habitats. A few of these ‘aliens’ have negative impacts on their environment, including threats to local biodiversity, agricultural productivity, and human health. Our work addresses these threats, particularly within the European Union (EU), where a related legal framework has been established. We follow an open and participatory approach that allows more people to share their experiences of invasive alien species (IAS) in their surroundings. Over the past three years, we developed a mobile phone application, together with the underlying data management and validation infrastructure, which allows smartphone users to report a selected list of IAS. We put quality assurance and data integration mechanisms into place that allows the uptake of information into existing official systems in order to make it accessible to the relevant policy-making at EU level.

This article summarises our scientific methodology and technical approach, explains our decisions, and provides an outlook to the future of IAS monitoring involving citizens and utilising the latest technological advancements. Last but not least we emphasise on software design for reuse, within the domain of IAS monitoring, but also for supporting citizen science apps more generally. Whereas much could already be achieved, many scientific, technical and organizational challenges still remain to be addressed before data can be seamlessly shared and integrated. Here, we particularly highlight issues that emerge in an international setting, which involves many different stakeholders….(More)”.

Tackling Climate Change with Machine Learning


Paper by David Rolnick et al: “Climate change is one of the greatest challenges facing humanity, and we, as machine learning experts, may wonder how we can help. Here we describe how machine learning can be a powerful tool in reducing greenhouse gas emissions and helping society adapt to a changing climate. From smart grids to disaster management, we identify high impact problems where existing gaps can be filled by machine learning, in collaboration with other fields. Our recommendations encompass exciting research questions as well as promising business opportunities. We call on the machine learning community to join the global effort against climate change….(More)”.

Social Media and Polarization


Paper by Arthur Campbell, C. Matthew Leister and Yves Zenou: “Because of its impacts on democracy, there is an important debate on whether the recent trends towards greater use of social media increases or decreases (political) polarization. One challenge for understanding this issue is how social media affects the equilibrium prevalence of different types of media content. We address this issue by developing a model of a social media network where there are two types of news content: mass-market (mainstream news) and niche-market (biased or more “extreme” news) and two different types of individuals who have a preference for recommending one or other type of content. We find that social media will amplify the prevalence of mass-market content and may result in it being the only type of content consumed. Further, we find that greater connectivity and homophily in the social media network will concurrently increase the prevalence of the niche market content and polarization. We then study an extension where there are two lobbying agents that can and wish to influence the prevalence of each type of content. We find that the lobbying agent in favor of the niche content will invest more in lobbying activities. We also show that lobbying activity will tend to increase polarization, and that this effect is greatest in settings where polarization would be small absent of lobbying activity. Finally, we allow individuals to choose the degree of homophily amongst their connections and demonstrate that niche-market individuals exhibit greater homophily than the mass-market ones, and contribute more to polarization….(More)”.

Exploring the Relationship between Trust in Government and Citizen Participation


Paper by Yunsoo Lee and Hindy Lauer Schachter: “Theories of deliberative and stealth democracy offer different predictions on the relationship between trust in government and citizen participation. To help resolve the contradictory predictions, this study used the World Values Survey to examine the influence of trust in government on citizen participation. Regression analyses yielded mixed results. As deliberative democracy theory predicts, the findings showed that people who trust governmental institutions are more likely to vote and sign a petition. However, the data provided limited support for stealth democracy in that trust in government negatively affects the frequency of attending a demonstration….(More)”.

For Crowdsourcing to Work, Everyone Needs an Equal Voice


Joshua Becker and Edward “Ned” Smith in Havard Business Review: “How useful is the wisdom of crowds? For years, it has been recognized as producing incredibly accurate predictions by aggregating the opinions of many people, allowing even amateur forecasters to beat the experts. The belief is that when large numbers of people make forecasts independently, their errors are uncorrelated and ultimately cancel each other out, which leads to more accurate final answers.

However, researchers and pundits have argued that the wisdom of crowds is extremely fragile, especially in two specific circumstances: when people are influenced by the opinions of others (because they lose their independence) and when opinions are distorted by cognitive biases (for example, strong political views held by a group).

In new research, we and our colleagues zeroed in on these assumptions and found that the wisdom of crowds is more robust than previously thought — it can even withstand the groupthink of similar-minded people. But there’s one important caveat: In order for the wisdom of crowds to retain its accuracy for making predictions, every member of the group must be given an equal voice, without any one person dominating. As we discovered, the pattern of social influence within groups — that is, who talks to whom and when — is the key determinant of the crowd’s accuracy in making predictions….(More)”.

Trust and Mistrust in Americans’ Views of Scientific Experts


Report by the Pew Research Center: “In an era when science and politics often appear to collide, public confidence in scientists is on the upswing, and six-inten Americans say scientists should play an active role in policy debates about scientific
issues, according to a new Pew Research Center survey.

The survey finds public confidence in scientists on par with confidence in the military. It also exceeds the levels of public confidence in other groups and institutions, including the media, business leaders and elected officials.

At the same time, Americans are divided along party lines in terms of how they view the value and objectivity of scientists and their ability to act in the public interest. And, while political divides do not carry over to views of all scientists and scientific issues, there are particularly sizable gaps between Democrats and Republicans when it comes to trust in scientists whose work is related to the environment.

Higher levels of familiarity with the work of scientists are associated with more positive and more trusting views of scientists regarding their competence, credibility and commitment to the public, the survey shows….(More)”.

What can the labor flow of 500 million people on LinkedIn tell us about the structure of the global economy?


Paper by Jaehyuk Park et al: “…One of the most popular concepts for policy makers and business economists to understand the structure of the global economy is “cluster”, the geographical agglomeration of interconnected firms such as Silicon ValleyWall Street, and Hollywood. By studying those well-known clusters, we become to understand the advantage of participating in a geo-industrial cluster for firms and how it is related to the economic growth of a region. 

However, the existing definition of geo-industrial cluster is not systematic enough to reveal the whole picture of the global economy. Often, after defining as a group of firms in a certain area, the geo-industrial clusters are considered as independent to each other. As we should consider the interaction between accounting team and marketing team to understand the organizational structure of a firm, the relationships among those geo-industrial clusters are the essential part of the whole picture….

In this new study, my colleagues and I at Indiana University — with support from LinkedIn — have finally overcome these limitations by defining geo-industrial clusters through labor flow and constructing a global labor flow network from LinkedIn’s individual-level job history dataset. Our access to this data was made possible by our selection as one of 11 teams selected to participate in the LinkedIn Economic Graph Challenge.

The transitioning of workers between jobs and firms — also known as labor flow — is considered central in driving firms towards geo-industrial clusters due to knowledge spillover and labor market pooling. In response, we mapped the cluster structure of the world economy based on labor mobility between firms during the last 25 years, constructing a “labor flow network.” 

To do this, we leverage LinkedIn’s data on professional demographics and employment histories from more than 500 million people between 1990 and 2015. The network, which captures approximately 130 million job transitions between more than 4 million firms, is the first-ever flow network of global labor.

The resulting “map” allows us to:

  • identify geo-industrial clusters systematically and organically using network community detection
  • verify the importance of region and industry in labor mobility
  • compare the relative importance between the two constraints in different hierarchical levels, and
  • reveal the practical advantage of the geo-industrial cluster as a unit of future economic analyses.
  • show a better picture of what industry in what region leads the economic growth of the industry or the region, at the same time
  • find out emerging and declining skills based on the representativeness of them in growing and declining geo-industrial clusters…(More)”.