The promise and perils of big gender data


Essay by Bapu Vaitla, Stefaan Verhulst, Linus Bengtsson, Marta C. González, Rebecca Furst-Nichols & Emily Courey Pryor in Special Issue on Big Data of Nature Medicine: “Women and girls are legally and socially marginalized in many countries. As a result, policymakers neglect key gendered issues such as informal labor markets, domestic violence, and mental health1. The scientific community can help push such topics onto policy agendas, but science itself is riven by inequality: women are underrepresented in academia, and gendered research is rarely a priority of funding agencies.

However, the critical importance of better gender data for societal well-being is clear. Mental health is a particularly striking example. Estimates from the Global Burden of Disease database suggest that depressive and anxiety disorders are the second leading cause of morbidity among females between 10 and 63 years of age2. But little is known about the risk factors that contribute to mental illness among specific groups of women and girls, the challenges of seeking care for depression and anxiety, or the long-term consequences of undiagnosed and untreated illness. A lack of data similarly impedes policy action on domestic and intimate-partner violence, early marriage, and sexual harassment, among many other topics.

‘Big data’ can help fill that gap. The massive amounts of information passively generated by electronic devices represent a rich portrait of human life, capturing where people go, the decisions they make, and how they respond to changes in their socio-economic environment. For example, mobile-phone data allow better understanding of health-seeking behavior as well as the dynamics of infectious-disease transmission3. Social-media platforms generate the world’s largest database of thoughts and emotions—information that, if leveraged responsibly, can be used to infer gendered patterns of mental health4. Remote sensors, especially satellites, can be used in conjunction with traditional data sources to increase the spatial and temporal granularity of data on women’s economic activity and health status5.

But the risk of gendered algorithmic bias is a serious obstacle to the responsible use of big data. Data are not value free; they reproduce the conscious and unconscious attitudes held by researchers, programmers, and institutions. Consider, for example, the training datasets on which the interpretation of big data depends. Training datasets establish the association between two or more directly observed phenomena of interest—for example, the mental health of a platform user (typically collected through a diagnostic survey) and the semantic content of the user’s social-media posts. These associations are then used to develop algorithms that interpret big data streams. In the example here, the (directly unobserved) mental health of a large population of social-media users would be inferred from their observed posts….(More)”.

Tech groups cannot be allowed to hide from scrutiny


Marietje Schaake at the Financial Times: “Technology companies have governments over a barrel. Whether they are maximising traffic flow efficiency, matching pupils with their school preferences, trying to anticipate drought based on satellite and soil data, most governments heavily rely on critical infrastructure and artificial intelligence developed by the private sector. This growing dependence has profound implications for democracy.

An unprecedented information asymmetry is growing between companies and governments. We can see this in the long-running investigation into interference in the 2016 US presidential elections. Companies build voter registries, voting machines and tallying tools, while social media companies sell precisely targeted advertisements using information gleaned by linking data on friends, interests, location, shopping and search.

This has big privacy and competition implications, yet oversight is minimal. Governments, researchers and citizens risk being blindsided by the machine room that powers our lives and vital aspects of our democracies. Governments and companies have fundamentally different incentives on transparency and accountability.

While openness is the default and secrecy the exception for democratic governments, companies resist providing transparency about their algorithms and business models. Many of them actively prevent accountability, citing rules that protect trade secrets.

We must revisit these protections when they shield companies from oversight. There is a place for protecting proprietary information from commercial competitors, but the scope and context need to be clarified and balanced when they have an impact on democracy and the rule of law.

Regulators must act to ensure that those designing and running algorithmic processes do not abuse trade secret protections. Tech groups also use the EU’s General Data Protection Regulation to deny access to company information. Although the regulation was enacted to protect citizens against the mishandling of personal data, it is now being wielded cynically to deny scientists access to data sets for research. The European Data Protection Supervisor has intervened, but problems could recur. To mitigate concerns about the power of AI, provider companies routinely promise that the applications will be understandable, explainable, accountable, reliable, contestable, fair and — don’t forget — ethical.

Yet there is no way to test these subjective notions without access to the underlying data and information. Without clear benchmarks and information to match, proper scrutiny of the way vital data is processed and used will be impossible….(More)”.

How digital sleuths unravelled the mystery of Iran’s plane crash


Chris Stokel-Walker at Wired: “The video shows a faint glow in the distance, zig-zagging like a piece of paper caught in an underdraft, slowly meandering towards the horizon. Then there’s a bright flash and the trees in the foreground are thrown into shadow as Ukraine International Airlines flight PS752 hits the ground early on the morning of January 8, killing all 176 people on board.

At first, it seemed like an accident – engine failure was fingered as the cause – until the first video showing the plane seemingly on fire as it weaved to the ground surfaced. United States officials started to investigate, and a more complicated picture emerged. It appeared that the plane had been hit by a missile, corroborated by a second video that appears to show the moment the missile ploughs into the Boeing 737-800. While military and intelligence officials at governments around the world were conducting their inquiries in secret, a team of investigators were using open-source intelligence (OSINT) techniques to piece together the puzzle of flight PS752.

It’s not unusual nowadays for OSINT to lead the way in decoding key news events. When Sergei Skripal was poisoned, Bellingcat, an open-source intelligence website, tracked and identified his killers as they traipsed across London and Salisbury. They delved into military records to blow the cover of agents sent to kill. And in the days after the Ukraine Airlines plane crashed into the ground outside Tehran, Bellingcat and The New York Times have blown a hole in the supposition that the downing of the aircraft was an engine failure. The pressure – and the weight of public evidence – compelled Iranian officials to admit overnight on January 10 that the country had shot down the plane “in error”.

So how do they do it? “You can think of OSINT as a puzzle. To get the complete picture, you need to find the missing pieces and put everything together,” says Loránd Bodó, an OSINT analyst at Tech versus Terrorism, a campaign group. The team at Bellingcat and other open-source investigators pore over publicly available material. Thanks to our propensity to reach for our cameraphones at the sight of any newsworthy incident, video and photos are often available, posted to social media in the immediate aftermath of events. (The person who shot and uploaded the second video in this incident, of the missile appearing to hit the Boeing plane was a perfect example: they grabbed their phone after they heard “some sort of shot fired”.) “Open source investigations essentially involve the collection, preservation, verification, and analysis of evidence that is available in the public domain to build a picture of what happened,” says Yvonne McDermott Rees, a lecturer at Swansea University….(More)”.

Innovation labs and co-production in public problem solving


Paper by Michael McGann, Tamas Wells & Emma Blomkamp: “Governments are increasingly establishing innovation labs to enhance public problem solving. Despite the speed at which these new units are being established, they have only recently begun to receive attention from public management scholars. This study assesses the extent to which labs are enhancing strategic policy capacity through pursuing more collaborative and citizen-centred approaches to policy design. Drawing on original case study research of five labs in Australia and New Zealand, it examines the structure of lab’s relationships to government partners, and the extent and nature of their activities in promoting citizen-participation in public problem solving….(More)”.

Lies, Deception and Democracy


Essay by Richard Bellamy: “This essay explores how far democracy is compatible with lies and deception, and whether it encourages or discourages their use by politicians. Neo-Kantian arguments, such as Newey’s, that lies and deception undermine individual autonomy and the possibility for consent go too far, given that no democratic process can be regarded as a plausible mechanism for achieving collective consent to state policies. However, they can be regarded as incompatible with a more modest account of democracy as a system of public equality among political equals.

On this view, the problem with lies and deception derives from their being instruments of manipulation and domination. Both can be distinguished from ‘spin’, with a working democracy being capable of uncovering them and so incentivising politicians to be truthful. Nevertheless, while lies and deception will find you out, bullshit and post truth disregard and subvert truth respectively, and as such prove more pernicious as they admit of no standard whereby they might be challenged….(More)”.

Machine Learning, Big Data and the Regulation of Consumer Credit Markets: The Case of Algorithmic Credit Scoring


Paper by Nikita Aggarwal et al: “Recent advances in machine learning (ML) and Big Data techniques have facilitated the development of more sophisticated, automated consumer credit scoring models — a trend referred to as ‘algorithmic credit scoring’ in recognition of the increasing reliance on computer (particularly ML) algorithms for credit scoring. This chapter, which forms part of the 2018 collection of short essays ‘Autonomous Systems and the Law’, examines the rise of algorithmic credit scoring, and considers its implications for the regulation of consumer creditworthiness assessment and consumer credit markets more broadly.

The chapter argues that algorithmic credit scoring, and the Big Data and ML technologies underlying it, offer both benefits and risks for consumer credit markets. On the one hand, it could increase allocative efficiency and distributional fairness in these markets, by widening access to, and lowering the cost of, credit, particularly for ‘thin-file’ and ‘no-file’ consumers. On the other hand, algorithmic credit scoring could undermine distributional fairness and efficiency, by perpetuating discrimination in lending against certain groups and by enabling the more effective exploitation of borrowers.

The chapter considers how consumer financial regulation should respond to these risks, focusing on the UK/EU regulatory framework. As a general matter, it argues that the broadly principles and conduct-based approach of UK consumer credit regulation provides the flexibility necessary for regulators and market participants to respond dynamically to these risks. However, this approach could be enhanced through the introduction of more robust product oversight and governance requirements for firms in relation to their use of ML systems and processes. Supervisory authorities could also themselves make greater use of ML and Big Data techniques in order to strengthen the supervision of consumer credit firms.

Finally, the chapter notes that cross-sectoral data protection regulation, recently updated in the EU under the GDPR, offers an important avenue to mitigate risks to consumers arising from the use of their personal data. However, further guidance is needed on the application and scope of this regime in the consumer financial context….(More)”.

The wisdom of crowds: What smart cities can learn from a dead ox and live fish


Portland State University: “In 1906, Francis Galton was at a country fair where attendees had the opportunity to guess the weight of a dead ox. Galton took the guesses of 787 fair-goers and found that the average guess was only one pound off of the correct weight — even when individual guesses were off base.

This concept, known as “the wisdom of crowds” or “collective intelligence,” has been applied to many situations over the past century, from people estimating the number of jellybeans in a jar to predicting the winners of major sporting events — often with high rates of success. Whatever the problem, the average answer of the crowd seems to be an accurate solution.

But does this also apply to knowledge about systems, such as ecosystems, health care, or cities? Do we always need in-depth scientific inquiries to describe and manage them — or could we leverage crowds?

This question has fascinated Antonie J. Jetter, associate professor of Engineering and Technology Management for many years. Now, there’s an answer. A recent study, which was co-authored by Jetter and published in Nature Sustainability, shows that diverse crowds of local natural resource stakeholders can collectively produce complex environmental models very similar to those of trained experts.

For this study, about 250 anglers, water guards and board members of German fishing clubs were asked to draw connections showing how ecological relationships influence the pike stock from the perspective of the anglers and how factors like nutrients and fishing pressures help determine the number of pike in a freshwater lake ecosystem. The individuals’ drawings — or their so-called mental models — were then mathematically combined into a collective model representing their averaged understanding of the ecosystem and compared with the best scientific knowledge on the same subject.

The result is astonishing. If you combine the ideas from many individual anglers by averaging their mental models, the final outcomes correspond more or less exactly to the scientific knowledge of pike ecology — local knowledge of stakeholders produces results that are in no way inferior to lengthy and expensive scientific studies….(More)”.

Collective Intelligence in City Design


Idea by Helena Rong and Juncheng Yang: “We propose an interactive design engagement platform which facilitates a continuous conversation between developers, designers and end users from pre-design and planning phases all the way to post-occupancy, adopting a citizen-centric and inclusive-oriented approach which would stimulate trust-building and invite active participation from end users from different age, ethnicity, social and economic background to participate in the design and development process. We aim to explore how collective intelligence through citizen engagement could be enabled by data to allow new collectives to emerge, confronting design as an iterative process involving scalable cooperation of different actors. As a result, design is a collaborative and conscious practice not born out of a single mastermind of the architect. Rather, its agency is reinforced by a cooperative ideal involving institutions, enterprises and single individuals alike enabled by data science….(More)”

The Wild Wild West of Data Hoarding in the Federal Government


ActiveNavigation: “There is a strong belief, both in the public and private sector, that the worst thing you can do with a piece of data is to delete it. The government stores all sorts of data, from traffic logs to home ownership statistics. Data is obviously incredibly important to the Federal Government – but storing large amounts of it poses significant compliance and security risks – especially with the rise of Nation State hackers. As the risk of being breached continues to rise, why is the government not tackling their data storage problem head on?

The Myth of “Free” Storage

Storage is cheap, especially compared to 10-15 years ago. Cloud storage has made it easier than ever to store swaths of information, creating what some call “digital landfills.” However, the true cost of storage isn’t in the ones and zeros sitting on the server somewhere. It’s the business cost.

As information stores continue to grow, the Federal Government’s ability to execute moving information to the correct place gets harder and harder, not to mention more expensive. The U.S. Government has a duty to provide accurate, up-to-date information to its taxpayers – meaning that sharing “bad data” is not an option.

The Association of Information and Image Management (AIIM) reports that half of an organization’s retained data has no value. So far, in 2019, through our work with Federal Agencies, we have discovered that this number, is in fact, low. Over 66% of data we’ve indexed, by the client’s definition, has fallen into that “junk” category. Eliminating junk data paves the way for greater accessibility, transparency and major financial savings. But what is “junk” data?

Redundant, Obsolete and Trivial (ROT) Data

Data is important – but if you can’t assign a value to it, it can become impossible to manage. Simply put, ROT data is digital information that an organization retains, that has no business or legal value. To be efficient from both a cyber hygiene and business perspective, the government needs to get better at purging their ROT data.

Again, purging data doesn’t just help with the hard cost of storage and backups, etc. For example, think about what needs to be done to answer a Freedom of Information Act (FOIA) request. You have a petabyte of data. You have at least a billion documents you need to funnel through to be able to respond to that FOIA request. By eliminating 50% of your ROT data, you probably have also reduced your FOIA response time by 50%.

Records and information governance, taken at face value, might seem fairly esoteric. It may not be as fun or as sexy as the new Space Force, but the reality is, the only way to know if the government is doing what it says it’s through records and information. You can’t answer an FOIA request if there’s no material. You can’t answer Congress if the material isn’t accurate. Being able to access timely, accurate information is critical. That’s why NARA is advocating a move to electronic records.…(More)”.

The future is intelligent: Harnessing the potential of artificial intelligence in Africa


Youssef Travaly and Kevin Muvunyi at Brookings: “…AI in particular presents countless avenues for both the public and private sectors to optimize solutions to the most crucial problems facing the continent today, especially for struggling industries. For example, in health care, AI solutions can help scarce personnel and facilities do more with less by speeding initial processing, triage, diagnosis, and post-care follow up. Furthermore, AI-based pharmacogenomics applications, which focus on the likely response of an individual to therapeutic drugs based on certain genetic markers, can be used to tailor treatments. Considering the genetic diversity found on the African continent, it is highly likely that the application of these technologies in Africa will result in considerable advancement in medical treatment on a global level.

In agricultureAbdoulaye Baniré Diallo, co-founder and chief scientific officer of the AI startup My Intelligent Machines, is working with advanced algorithms and machine learning methods to leverage genomic precision in livestock production models. With genomic precision, it is possible to build intelligent breeding programs that minimize the ecological footprint, address changing consumer demands, and contribute to the well-being of people and animals alike through the selection of good genetic characteristics at an early stage of the livestock production process. These are just a few examples that illustrate the transformative potential of AI technology in Africa.

However, a number of structural challenges undermine rapid adoption and implementation of AI on the continent. Inadequate basic and digital infrastructure seriously erodes efforts to activate AI-powered solutions as it reduces crucial connectivity. (For more on strategies to improve Africa’s digital infrastructure, see the viewpoint on page 67 of the full report). A lack of flexible and dynamic regulatory systems also frustrates the growth of a digital ecosystem that favors AI technology, especially as tech leaders want to scale across borders. Furthermore, lack of relevant technical skills, particularly for young people, is a growing threat. This skills gap means that those who would have otherwise been at the forefront of building AI are left out, preventing the continent from harnessing the full potential of transformative technologies and industries.

Similarly, the lack of adequate investments in research and development is an important obstacle. Africa must develop innovative financial instruments and public-private partnerships to fund human capital development, including a focus on industrial research and innovation hubs that bridge the gap between higher education institutions and the private sector to ensure the transition of AI products from lab to market….(More)”.