The promise and perils of big gender data


Essay by Bapu Vaitla, Stefaan Verhulst, Linus Bengtsson, Marta C. González, Rebecca Furst-Nichols & Emily Courey Pryor in Special Issue on Big Data of Nature Medicine: “Women and girls are legally and socially marginalized in many countries. As a result, policymakers neglect key gendered issues such as informal labor markets, domestic violence, and mental health1. The scientific community can help push such topics onto policy agendas, but science itself is riven by inequality: women are underrepresented in academia, and gendered research is rarely a priority of funding agencies.

However, the critical importance of better gender data for societal well-being is clear. Mental health is a particularly striking example. Estimates from the Global Burden of Disease database suggest that depressive and anxiety disorders are the second leading cause of morbidity among females between 10 and 63 years of age2. But little is known about the risk factors that contribute to mental illness among specific groups of women and girls, the challenges of seeking care for depression and anxiety, or the long-term consequences of undiagnosed and untreated illness. A lack of data similarly impedes policy action on domestic and intimate-partner violence, early marriage, and sexual harassment, among many other topics.

‘Big data’ can help fill that gap. The massive amounts of information passively generated by electronic devices represent a rich portrait of human life, capturing where people go, the decisions they make, and how they respond to changes in their socio-economic environment. For example, mobile-phone data allow better understanding of health-seeking behavior as well as the dynamics of infectious-disease transmission3. Social-media platforms generate the world’s largest database of thoughts and emotions—information that, if leveraged responsibly, can be used to infer gendered patterns of mental health4. Remote sensors, especially satellites, can be used in conjunction with traditional data sources to increase the spatial and temporal granularity of data on women’s economic activity and health status5.

But the risk of gendered algorithmic bias is a serious obstacle to the responsible use of big data. Data are not value free; they reproduce the conscious and unconscious attitudes held by researchers, programmers, and institutions. Consider, for example, the training datasets on which the interpretation of big data depends. Training datasets establish the association between two or more directly observed phenomena of interest—for example, the mental health of a platform user (typically collected through a diagnostic survey) and the semantic content of the user’s social-media posts. These associations are then used to develop algorithms that interpret big data streams. In the example here, the (directly unobserved) mental health of a large population of social-media users would be inferred from their observed posts….(More)”.

Tech groups cannot be allowed to hide from scrutiny


Marietje Schaake at the Financial Times: “Technology companies have governments over a barrel. Whether they are maximising traffic flow efficiency, matching pupils with their school preferences, trying to anticipate drought based on satellite and soil data, most governments heavily rely on critical infrastructure and artificial intelligence developed by the private sector. This growing dependence has profound implications for democracy.

An unprecedented information asymmetry is growing between companies and governments. We can see this in the long-running investigation into interference in the 2016 US presidential elections. Companies build voter registries, voting machines and tallying tools, while social media companies sell precisely targeted advertisements using information gleaned by linking data on friends, interests, location, shopping and search.

This has big privacy and competition implications, yet oversight is minimal. Governments, researchers and citizens risk being blindsided by the machine room that powers our lives and vital aspects of our democracies. Governments and companies have fundamentally different incentives on transparency and accountability.

While openness is the default and secrecy the exception for democratic governments, companies resist providing transparency about their algorithms and business models. Many of them actively prevent accountability, citing rules that protect trade secrets.

We must revisit these protections when they shield companies from oversight. There is a place for protecting proprietary information from commercial competitors, but the scope and context need to be clarified and balanced when they have an impact on democracy and the rule of law.

Regulators must act to ensure that those designing and running algorithmic processes do not abuse trade secret protections. Tech groups also use the EU’s General Data Protection Regulation to deny access to company information. Although the regulation was enacted to protect citizens against the mishandling of personal data, it is now being wielded cynically to deny scientists access to data sets for research. The European Data Protection Supervisor has intervened, but problems could recur. To mitigate concerns about the power of AI, provider companies routinely promise that the applications will be understandable, explainable, accountable, reliable, contestable, fair and — don’t forget — ethical.

Yet there is no way to test these subjective notions without access to the underlying data and information. Without clear benchmarks and information to match, proper scrutiny of the way vital data is processed and used will be impossible….(More)”.

How digital sleuths unravelled the mystery of Iran’s plane crash


Chris Stokel-Walker at Wired: “The video shows a faint glow in the distance, zig-zagging like a piece of paper caught in an underdraft, slowly meandering towards the horizon. Then there’s a bright flash and the trees in the foreground are thrown into shadow as Ukraine International Airlines flight PS752 hits the ground early on the morning of January 8, killing all 176 people on board.

At first, it seemed like an accident – engine failure was fingered as the cause – until the first video showing the plane seemingly on fire as it weaved to the ground surfaced. United States officials started to investigate, and a more complicated picture emerged. It appeared that the plane had been hit by a missile, corroborated by a second video that appears to show the moment the missile ploughs into the Boeing 737-800. While military and intelligence officials at governments around the world were conducting their inquiries in secret, a team of investigators were using open-source intelligence (OSINT) techniques to piece together the puzzle of flight PS752.

It’s not unusual nowadays for OSINT to lead the way in decoding key news events. When Sergei Skripal was poisoned, Bellingcat, an open-source intelligence website, tracked and identified his killers as they traipsed across London and Salisbury. They delved into military records to blow the cover of agents sent to kill. And in the days after the Ukraine Airlines plane crashed into the ground outside Tehran, Bellingcat and The New York Times have blown a hole in the supposition that the downing of the aircraft was an engine failure. The pressure – and the weight of public evidence – compelled Iranian officials to admit overnight on January 10 that the country had shot down the plane “in error”.

So how do they do it? “You can think of OSINT as a puzzle. To get the complete picture, you need to find the missing pieces and put everything together,” says Loránd Bodó, an OSINT analyst at Tech versus Terrorism, a campaign group. The team at Bellingcat and other open-source investigators pore over publicly available material. Thanks to our propensity to reach for our cameraphones at the sight of any newsworthy incident, video and photos are often available, posted to social media in the immediate aftermath of events. (The person who shot and uploaded the second video in this incident, of the missile appearing to hit the Boeing plane was a perfect example: they grabbed their phone after they heard “some sort of shot fired”.) “Open source investigations essentially involve the collection, preservation, verification, and analysis of evidence that is available in the public domain to build a picture of what happened,” says Yvonne McDermott Rees, a lecturer at Swansea University….(More)”.

The future is intelligent: Harnessing the potential of artificial intelligence in Africa


Youssef Travaly and Kevin Muvunyi at Brookings: “…AI in particular presents countless avenues for both the public and private sectors to optimize solutions to the most crucial problems facing the continent today, especially for struggling industries. For example, in health care, AI solutions can help scarce personnel and facilities do more with less by speeding initial processing, triage, diagnosis, and post-care follow up. Furthermore, AI-based pharmacogenomics applications, which focus on the likely response of an individual to therapeutic drugs based on certain genetic markers, can be used to tailor treatments. Considering the genetic diversity found on the African continent, it is highly likely that the application of these technologies in Africa will result in considerable advancement in medical treatment on a global level.

In agricultureAbdoulaye Baniré Diallo, co-founder and chief scientific officer of the AI startup My Intelligent Machines, is working with advanced algorithms and machine learning methods to leverage genomic precision in livestock production models. With genomic precision, it is possible to build intelligent breeding programs that minimize the ecological footprint, address changing consumer demands, and contribute to the well-being of people and animals alike through the selection of good genetic characteristics at an early stage of the livestock production process. These are just a few examples that illustrate the transformative potential of AI technology in Africa.

However, a number of structural challenges undermine rapid adoption and implementation of AI on the continent. Inadequate basic and digital infrastructure seriously erodes efforts to activate AI-powered solutions as it reduces crucial connectivity. (For more on strategies to improve Africa’s digital infrastructure, see the viewpoint on page 67 of the full report). A lack of flexible and dynamic regulatory systems also frustrates the growth of a digital ecosystem that favors AI technology, especially as tech leaders want to scale across borders. Furthermore, lack of relevant technical skills, particularly for young people, is a growing threat. This skills gap means that those who would have otherwise been at the forefront of building AI are left out, preventing the continent from harnessing the full potential of transformative technologies and industries.

Similarly, the lack of adequate investments in research and development is an important obstacle. Africa must develop innovative financial instruments and public-private partnerships to fund human capital development, including a focus on industrial research and innovation hubs that bridge the gap between higher education institutions and the private sector to ensure the transition of AI products from lab to market….(More)”.

Data Democracy


Book by Feras Batarseh and Ruixin Yang: “Data Democracy: At the Nexus of Artificial Intelligence, Software Development, and Knowledge Engineering provides a manifesto to data democracy. After reading the chapters of this book, you are informed and suitably warned! You are already part of the data republic, and you (and all of us) need to ensure that our data fall in the right hands. Everything you click, buy, swipe, try, sell, drive, or fly is a data point. But who owns the data? At this point, not you! You do not even have access to most of it. The next best empire of our planet is one who owns and controls the world’s best dataset. If you consume or create data, if you are a citizen of the data republic (willingly or grudgingly), and if you are interested in making a decision or finding the truth through data-driven analysis, this book is for you. A group of experts, academics, data science researchers, and industry practitioners gathered to write this manifesto about data democracy.

Key Features

  • The future of the data republic, life within a data democracy, and our digital freedoms
  • An in-depth analysis of open science, open data, open source software, and their future challenges
  • A comprehensive review of data democracy’s implications within domains such as: healthcare, space exploration, earth sciences, business, and psychology
  • The democratization of Artificial Intelligence (AI), and data issues such as: Bias, imbalance, context, and knowledge extraction
  • A systematic review of AI methods applied to software engineering problems…(More)”.

Crossing the Digital Divide: Applying Technology to the Global Refugee Crisis


Report by Shelly Culbertson, James Dimarogonas, Katherine Costello, and Serafina Lanna: “In the past two decades, the global population of forcibly displaced people has more than doubled, from 34 million in 1997 to 71 million in 2018. Amid this growing crisis, refugees and the organizations that assist them have turned to technology as an important resource, and technology can and should play an important role in solving problems in humanitarian settings. In this report, the authors analyze technology uses, needs, and gaps, as well as opportunities for better using technology to help displaced people and improving the operations of responding agencies. The authors also examine inherent ethical, security, and privacy considerations; explore barriers to the successful deployment of technology; and outline some tools for building a more systematic approach to such deployment. The study approach included a literature review, semi-structured interviews with stakeholders, and focus groups with displaced people in Colombia, Greece, Jordan, and the United States. The authors provide several recommendations for more strategically using and developing technology in humanitarian settings….(More)”.

How people decide what they want to know


Tali Sharot & Cass R. Sunstein in Nature: “Immense amounts of information are now accessible to people, including information that bears on their past, present and future. An important research challenge is to determine how people decide to seek or avoid information. Here we propose a framework of information-seeking that aims to integrate the diverse motives that drive information-seeking and its avoidance. Our framework rests on the idea that information can alter people’s action, affect and cognition in both positive and negative ways. The suggestion is that people assess these influences and integrate them into a calculation of the value of information that leads to information-seeking or avoidance. The theory offers a framework for characterizing and quantifying individual differences in information-seeking, which we hypothesize may also be diagnostic of mental health. We consider biases that can lead to both insufficient and excessive information-seeking. We also discuss how the framework can help government agencies to assess the welfare effects of mandatory information disclosure….(More)”.

Technology Can't Fix Algorithmic Injustice


Annette Zimmerman, Elena Di Rosa and Hochan Kima at Boston Review: “A great deal of recent public debate about artificial intelligence has been driven by apocalyptic visions of the future. Humanity, we are told, is engaged in an existential struggle against its own creation. Such worries are fueled in large part by tech industry leaders and futurists, who anticipate systems so sophisticated that they can perform general tasks and operate autonomously, without human control. Stephen Hawking, Elon Musk, and Bill Gates have all publicly expressed their concerns about the advent of this kind of “strong” (or “general”) AI—and the associated existential risk that it may pose for humanity. In Hawking’s words, the development of strong AI “could spell the end of the human race.”

These are legitimate long-term worries. But they are not all we have to worry about, and placing them center stage distracts from ethical questions that AI is raising here and now. Some contend that strong AI may be only decades away, but this focus obscures the reality that “weak” (or “narrow”) AI is already reshaping existing social and political institutions. Algorithmic decision making and decision support systems are currently being deployed in many high-stakes domains, from criminal justice, law enforcement, and employment decisions to credit scoring, school assignment mechanisms, health care, and public benefits eligibility assessments. Never mind the far-off specter of doomsday; AI is already here, working behind the scenes of many of our social systems.

What responsibilities and obligations do we bear for AI’s social consequences in the present—not just in the distant future? To answer this question, we must resist the learned helplessness that has come to see AI development as inevitable. Instead, we should recognize that developing and deploying weak AI involves making consequential choices—choices that demand greater democratic oversight not just from AI developers and designers, but from all members of society….(More)”.

Data as infrastructure? A study of data sharing legal regimes


Paper by Charlotte Ducuing: “The article discusses the concept of infrastructure in the digital environment, through a study of three data sharing legal regimes: the Public Sector Information Directive (PSI Directive), the discussions on in-vehicle data governance and the freshly adopted data sharing legal regime in the Electricity Directive.

While aiming to contribute to the scholarship on data governance, the article deliberately focuses on network industries. Characterised by the existence of physical infrastructure, they have a special relationship to digitisation and ‘platformisation’ and are exposed to specific risks. Adopting an explanatory methodology, the article exposes that these regimes are based on two close but different sources of inspiration, yet intertwined and left unclear. By targeting entities deemed ‘monopolist’ with regard to the data they create and hold, data sharing obligations are inspired from competition law and especially the essential facility doctrine. On the other hand, beneficiaries appear to include both operators in related markets needing data to conduct their business (except for the PSI Directive), and third parties at large to foster innovation. The latter rationale illustrates what is called here a purposive view of data as infrastructure. The underlying understanding of ‘raw’ data (management) as infrastructure for all to use may run counter the ability for the regulated entities to get a fair remuneration for ‘their’ data.

Finally, the article pleads for more granularity when mandating data sharing obligations depending upon the purpose. Shifting away from a ‘one-size-fits-all’ solution, the regulation of data could also extend to the ensuing context-specific data governance regime, subject to further research…(More)”.

Global Fishing Watch: Pooling Data and Expertise to Combat Illegal Fishing


Data Collaborative Case Study by Michelle Winowatan, Andrew Young, and Stefaan Verhulst: “

Global Fishing Watch, originally set up through a collaboration between Oceana, SkyTruth and Google, is an independent nonprofit organization dedicated to advancing responsible stewardship of our oceans through increased transparency in fishing activity and scientific research. Using big data processing and machine learning, Global Fishing Watch visualizes, tracks, and shares data about global fishing activity in near-real time and for free via their public map. To date, the platform tracks approximately 65,000 commercial fishing vessels globally. These insights have been used in a number of academic publications, ocean advocacy efforts, and law enforcement activities.

Data Collaborative Model: Based on the typology of data collaborative practice areas, Global Fishing Watch is an example of the data pooling model of data collaboration, specifically a public data pool. Public data pools co-mingle data assets from multiple data holders — including governments and companies — and make those shared assets available on the web. This approach enabled the data stewards and stakeholders involved in Global Fishing Watch to bring together multiple data streams from both public- and private-sector entities in a single location. This single point of access provides the public and relevant authorities with user-friendly access to actionable, previously fragmented data that can drive efforts to address compliance in fisheries and illegal fishing around the world.

Data Stewardship Approach: Global Fishing Watch also provides a clear illustration of the importance of data stewards. For instance, representatives from Google Earth Outreach, one of the data holders, played an important stewardship role in seeking to connect and coordinate with SkyTruth and Oceana, two important nonprofit environmental actors who were working separately prior to this initiative. The brokering of this partnership helped to bring relevant data assets from the public and private sectors to bear in support of institutional efforts to address the stubborn challenge of illegal fishing.

Read the full case study here.”