The promise and perils of big gender data


Essay by Bapu Vaitla, Stefaan Verhulst, Linus Bengtsson, Marta C. González, Rebecca Furst-Nichols & Emily Courey Pryor in Special Issue on Big Data of Nature Medicine: “Women and girls are legally and socially marginalized in many countries. As a result, policymakers neglect key gendered issues such as informal labor markets, domestic violence, and mental health1. The scientific community can help push such topics onto policy agendas, but science itself is riven by inequality: women are underrepresented in academia, and gendered research is rarely a priority of funding agencies.

However, the critical importance of better gender data for societal well-being is clear. Mental health is a particularly striking example. Estimates from the Global Burden of Disease database suggest that depressive and anxiety disorders are the second leading cause of morbidity among females between 10 and 63 years of age2. But little is known about the risk factors that contribute to mental illness among specific groups of women and girls, the challenges of seeking care for depression and anxiety, or the long-term consequences of undiagnosed and untreated illness. A lack of data similarly impedes policy action on domestic and intimate-partner violence, early marriage, and sexual harassment, among many other topics.

‘Big data’ can help fill that gap. The massive amounts of information passively generated by electronic devices represent a rich portrait of human life, capturing where people go, the decisions they make, and how they respond to changes in their socio-economic environment. For example, mobile-phone data allow better understanding of health-seeking behavior as well as the dynamics of infectious-disease transmission3. Social-media platforms generate the world’s largest database of thoughts and emotions—information that, if leveraged responsibly, can be used to infer gendered patterns of mental health4. Remote sensors, especially satellites, can be used in conjunction with traditional data sources to increase the spatial and temporal granularity of data on women’s economic activity and health status5.

But the risk of gendered algorithmic bias is a serious obstacle to the responsible use of big data. Data are not value free; they reproduce the conscious and unconscious attitudes held by researchers, programmers, and institutions. Consider, for example, the training datasets on which the interpretation of big data depends. Training datasets establish the association between two or more directly observed phenomena of interest—for example, the mental health of a platform user (typically collected through a diagnostic survey) and the semantic content of the user’s social-media posts. These associations are then used to develop algorithms that interpret big data streams. In the example here, the (directly unobserved) mental health of a large population of social-media users would be inferred from their observed posts….(More)”.

Collective Intelligence in City Design


Idea by Helena Rong and Juncheng Yang: “We propose an interactive design engagement platform which facilitates a continuous conversation between developers, designers and end users from pre-design and planning phases all the way to post-occupancy, adopting a citizen-centric and inclusive-oriented approach which would stimulate trust-building and invite active participation from end users from different age, ethnicity, social and economic background to participate in the design and development process. We aim to explore how collective intelligence through citizen engagement could be enabled by data to allow new collectives to emerge, confronting design as an iterative process involving scalable cooperation of different actors. As a result, design is a collaborative and conscious practice not born out of a single mastermind of the architect. Rather, its agency is reinforced by a cooperative ideal involving institutions, enterprises and single individuals alike enabled by data science….(More)”

Crossing the Digital Divide: Applying Technology to the Global Refugee Crisis


Report by Shelly Culbertson, James Dimarogonas, Katherine Costello, and Serafina Lanna: “In the past two decades, the global population of forcibly displaced people has more than doubled, from 34 million in 1997 to 71 million in 2018. Amid this growing crisis, refugees and the organizations that assist them have turned to technology as an important resource, and technology can and should play an important role in solving problems in humanitarian settings. In this report, the authors analyze technology uses, needs, and gaps, as well as opportunities for better using technology to help displaced people and improving the operations of responding agencies. The authors also examine inherent ethical, security, and privacy considerations; explore barriers to the successful deployment of technology; and outline some tools for building a more systematic approach to such deployment. The study approach included a literature review, semi-structured interviews with stakeholders, and focus groups with displaced people in Colombia, Greece, Jordan, and the United States. The authors provide several recommendations for more strategically using and developing technology in humanitarian settings….(More)”.

Global Fishing Watch: Pooling Data and Expertise to Combat Illegal Fishing


Data Collaborative Case Study by Michelle Winowatan, Andrew Young, and Stefaan Verhulst: “

Global Fishing Watch, originally set up through a collaboration between Oceana, SkyTruth and Google, is an independent nonprofit organization dedicated to advancing responsible stewardship of our oceans through increased transparency in fishing activity and scientific research. Using big data processing and machine learning, Global Fishing Watch visualizes, tracks, and shares data about global fishing activity in near-real time and for free via their public map. To date, the platform tracks approximately 65,000 commercial fishing vessels globally. These insights have been used in a number of academic publications, ocean advocacy efforts, and law enforcement activities.

Data Collaborative Model: Based on the typology of data collaborative practice areas, Global Fishing Watch is an example of the data pooling model of data collaboration, specifically a public data pool. Public data pools co-mingle data assets from multiple data holders — including governments and companies — and make those shared assets available on the web. This approach enabled the data stewards and stakeholders involved in Global Fishing Watch to bring together multiple data streams from both public- and private-sector entities in a single location. This single point of access provides the public and relevant authorities with user-friendly access to actionable, previously fragmented data that can drive efforts to address compliance in fisheries and illegal fishing around the world.

Data Stewardship Approach: Global Fishing Watch also provides a clear illustration of the importance of data stewards. For instance, representatives from Google Earth Outreach, one of the data holders, played an important stewardship role in seeking to connect and coordinate with SkyTruth and Oceana, two important nonprofit environmental actors who were working separately prior to this initiative. The brokering of this partnership helped to bring relevant data assets from the public and private sectors to bear in support of institutional efforts to address the stubborn challenge of illegal fishing.

Read the full case study here.”

Copy, Paste, Legislate


The Center for Public Integrity: “Do you know if a bill introduced in your statehouse — it might govern who can fix your shattered iPhone screen or whether you can still sue a pedophile priest years later — was actually written by your elected lawmakers? Use this new tool to find out.

Spoiler alert The answer may well be no.

Thousands of pieces of “model legislation” are drafted each year by business organizations and special interest groups and distributed to state lawmakers for introduction. These copycat bills influence policymaking across the nation, state by state, often with little scrutiny. This news application was developed by the Center for Public Integrity, part of a year-long collaboration with USA TODAY and the Arizona Republic to bring the practice into the light….(More)”.

Dollars for Profs: How to Investigate Professors’ Conflicts of Interest


ProPublica: “When professors moonlight, the income may influence their research and policy views. Although most universities track this outside work, the records have rarely been accessible to the public, potentially obscuring conflicts of interests.

That changed last month when ProPublica launched Dollars for Profs, an interactive database that, for the first time ever, allows you to look up more than 37,000 faculty and staff disclosures from about 20 public universities and the National Institutes of Health.

We believe there are hundreds of stories in this database, and we hope to tell as many as possible. Already, we’ve revealed how the University of California’s weak monitoring of conflicts has allowed faculty members to underreport their outside income, potentially depriving the university of millions of dollars. In addition, using a database of NIH records, we found that health researchers have acknowledged a total of at least $188 million in financial conflicts of interest since 2012.

We hope journalists all over the country will look into the database and find more. Here are tips for local education reporters, college newspaper journalists and anyone else who wants to hold academia accountable on how to dig into the disclosures….(More)”.

Practical Knowledge: Sustaining Massively-Multiplayer Innovation


Paper by Amar Bhide: “Governments and universities are pouring money into more ‘practical’ research – ‘translational’ medicine and ‘evidence-based’ policies in education, public health and economic development, for instance. But just translating or applying science rarely produces practical advances – and an inflexible adherence to the methods of natural or social scientists can do more harm than good. Instead, I propose a general approach – and specific research topics – to advance practical knowledge and study its distinctive contemporary nature….(More)”

Governing the Plural City


Introduction by Ash Amin: “….More than 50% of the world’s population lives in cities, and this figure is expected to rise to 70% by 2050. World affairs and city affairs have become deeply enmeshed, and what goes on within cities – their economic productivity, environmental footprint, cultural practices, social wellbeing, and political stability – affects the world at large. They shape the weather and are the weathervane of our times, so getting them right matters. But what this involves and how far it is within reach is by no means clear….

Thus, while the international policy community may confidently call for cities to be made ‘inclusive, safe, resilient and sustainable’ in the way headlined in the UN’s 2030 Agenda for Sustainable Development, it tends to underestimate the challenges of achieving traction in a distributed, plural and often hidden force field. A number of pressing questions arise. Should state effort focus on comprehensive master plans and general infrastructures and services, or on strategic risks and vulnerabilities, while coordinating risks? What are the limits and limitations of state action, and how is the balance between the general and the specific or the communal
and sectionalist to be found? What is the relationship between central authority plans and the communities who are to benefit, and how can neighbourhood knowledge and effort be supported amidst policy neglect or corporatist calculation? Is it possible to reconcile strategic and democratic goals in the twenty first-century city of multiple logics, demands and actors?…(More)”.

Missions: A beginner's guide


UCL Institute for Innovation and Public Purpose: “…The 21st century is becoming increasingly defined by the need to respond to major issues facing society, the environment around us and the possibility of developing a prosperous equal economy. Sometimes referred to as ‘grand challenges’, these include climate change, ageing societies, preventative healthcare, and generating sustainable growth for the benefit of all.

Innovation has not just a rate but also a direction. How that direction is set — not just by the government but by different actors and socio-political forces — is a key aspect of IIPP’s work. But how should we decide which direction? We use the concept of public value as a way to think about which direction innovation and industrial policy takes. Public value is value that is created collectively for a public purpose — this requires citizens to engage in defining purpose, nurturing capabilities and capacities, assess the value created, and ensure that societal value is distributed equitably…(More)”.

Federal Sources of Entrepreneurship Data: A Compendium


Compendium developed by Andrew Reamer: “The E.M. Kauffman Foundation has asked the George Washington Institute of Public Policy (GWIPP) to prepare a compendium of federal sources of data on self-employment, entrepreneurship, and small business development. The Foundation believes that the availability of useful, reliable federal data on these topics would enable robust descriptions and explanations of entrepreneurship trends in the United States and so help guide the development of effective entrepreneurship policies.


Achieving these ends first requires the identification and detailed description of available federal datasets, as provided in this compendium. Its contents include:

  • An overview and discussion of 18 datasets from four federal agencies, organized by two categories and five subcategories.
  • Tables providing information on each dataset, including:
    • scope of coverage of self-employed, entrepreneurs, and businesses;
    • data collection methods (nature of data source, periodicity, sampling frame, sample size);
    • dataset variables (owner characteristics, business characteristics and operations, geographic areas);
    • Data release schedule; and
    • Data access by format (including fixed tables, interactive tools, API, FTP download, public use microdata samples [PUMS], and confidential microdata).

For each dataset, examples of studies, if any, that use the data source to describe and explain trends in entrepreneurship.
The author’s aim is for the compendium to facilitate an assessment of the strengths and weaknesses of currently available federal datasets, discussion about how data availability and value can be improved, and implementation of desired improvements…(More)”