Creating a Machine Learning Commons for Global Development


Blog by Hamed Alemohammad: “Advances in sensor technology, cloud computing, and machine learning (ML) continue to converge to accelerate innovation in the field of remote sensing. However, fundamental tools and technologies still need to be developed to drive further breakthroughs and to ensure that the Global Development Community (GDC) reaps the same benefits that the commercial marketplace is experiencing. This process requires us to take a collaborative approach.

Data collaborative innovation — that is, a group of actors from different data domains working together toward common goals — might hold the key to finding solutions for some of the global challenges that the world faces. That is why Radiant.Earth is investing in new technologies such as Cloud Optimized GeoTiffsSpatial Temporal Asset Catalogues (STAC), and ML. Our approach to advance ML for global development begins with creating open libraries of labeled images and algorithms. This initiative and others require — and, in fact, will thrive as a result of — using a data collaborative approach.

“Data is only as valuable as the decisions it enables.”

This quote by Ion Stoica, professor of computer science at the University of California, Berkeley, may best describe the challenge facing those of us who work with geospatial information:

How can we extract greater insights and value from the unending tsunami of data that is before us, allowing for more informed and timely decision making?…(More).

From Texts to Tweets to Satellites: The Power of Big Data to Fill Gender Data Gaps


 at UN Foundation Blog: “Twitter posts, credit card purchases, phone calls, and satellites are all part of our day-to-day digital landscape.

Detailed data, known broadly as “big data” because of the massive amounts of passively collected and high-frequency information that such interactions generate, are produced every time we use one of these technologies. These digital traces have great potential and have already developed a track record for application in global development and humanitarian response.

Data2X has focused particularly on what big data can tell us about the lives of women and girls in resource-poor settings. Our research, released today in a new report, Big Data and the Well-Being of Women and Girls, demonstrates how four big data sources can be harnessed to fill gender data gaps and inform policy aimed at mitigating global gender inequality. Big data can complement traditional surveys and other data sources, offering a glimpse into dimensions of girls’ and women’s lives that have otherwise been overlooked and providing a level of precision and timeliness that policymakers need to make actionable decisions.

Here are three findings from our report that underscore the power and potential offered by big data to fill gender data gaps:

  1. Social media data can improve understanding of the mental health of girls and women.

Mental health conditions, from anxiety to depression, are thought to be significant contributors to the global burden of disease, particularly for young women, though precise data on mental health is sparse in most countries. However, research by Georgia Tech University, commissioned by Data2X, finds that social media provides an accurate barometer of mental health status…..

  1. Cell phone and credit card records can illustrate women’s economic and social patterns – and track impacts of shocks in the economy.

Our spending priorities and social habits often indicate economic status, and these activities can also expose economic disparities between women and men.

By compiling cell phone and credit card records, our research partners at MIT traced patterns of women’s expenditures, spending priorities, and physical mobility. The research found that women have less mobility diversity than men, live further away from city centers, and report less total expenditure per capita…..

  1. Satellite imagery can map rivers and roads, but it can also measure gender inequality.

Satellite imagery has the power to capture high-resolution, real-time data on everything from natural landscape features, like vegetation and river flows, to human infrastructure, like roads and schools. Research by our partners at the Flowminder Foundation finds that it is also able to measure gender inequality….(More)”.

Everything* You Always Wanted To Know About Blockchain (But Were Afraid To Ask)


Alice Meadows at the Scholarly Kitchen: “In this interview, Joris van Rossum (Director of Special Projects, Digital Science) and author of Blockchain for Research, and Martijn Roelandse (Head of Publishing Innovation, Springer Nature), discuss blockchain in scholarly communications, including the recently launched Peer Review Blockchain initiative….

How would you describe blockchain in one sentence?

Joris: Blockchain is a technology for decentralized, self-regulating data which can be managed and organized in a revolutionary new way: open, permanent, verified and shared, without the need of a central authority.

How does it work (in layman’s language!)?

Joris: In a regular database you need a gatekeeper to ensure that whatever is stored in a database (financial transactions, but this could be anything) is valid. However with blockchain, trust is not created by means of a curator, but through consensus mechanisms and cryptographic techniques. Consensus mechanisms clearly define what new information is allowed to be added to the datastore. With the help of a technology called hashing, it is not possible to change any existing data without this being detected by others. And through cryptography, the database can be shared without real identities being revealed. So the blockchain technology removes the need for a middle-man.

How is this relevant to scholarly communication?

Joris: It’s very relevant. We’ve explored the possibilities and initiatives in a report published by Digital Science. The blockchain could be applied on several levels, which is reflected in a number of initiatives announced recently. For example, a cryptocurrency for science could be developed. This ‘bitcoin for science’ could introduce a monetary reward scheme to researchers, such as for peer review. Another relevant area, specifically for publishers, is digital rights management. The potential for this was picked up by this blog at a very early stage. Blockchain also allows publishers to easily integrate micropayments, thereby creating a potentially interesting business model alongside open access and subscriptions.

Moreover, blockchain as a datastore with no central owner where information can be stored pseudonymously could support the creation of a shared and authoritative database of scientific events. Here traditional activities such as publications and citations could be stored, along with currently opaque and unrecognized activities, such as peer review. A data store incorporating all scientific events would make science more transparent and reproducible, and allow for more comprehensive and reliable metrics….

How do you see developments in the industry regarding blockchain?

Joris: In the last couple of months we’ve seen the launch of many interesting initiatives. For example scienceroot.comPluto.network, and orvium.io. These are all ambitious projects incorporating many of the potential applications of blockchain in the industry, and to an extent aim to disrupt the current ecosystem. Recently artifacts.ai was announced, an interesting initiative that aims to allow researchers to permanently document every stage of the research process. However, we believe that traditional players, and not least publishers, should also look at how services to researchers can be improved using blockchain technology. There are challenges (e.g. around reproducibility and peer review) but that does not necessarily mean the entire ecosystem needs to be overhauled. In fact, in academic publishing we have a good track record of incorporating new technologies and using them to improve our role in scholarly communication. In other words, we should fix the system, not break it!

What is the Peer Review Blockchain initiative, and why did you join?

Martijn: The problems of research reproducibility, recognition of reviewers, and the rising burden of the review process, as research volumes increase each year, have led to a challenging landscape for scholarly communications. There is an urgent need for change to tackle the problems which is why we joined this initiative, to be able to take a step forward towards a fairer and more transparent ecosystem for peer review. The initiative aims to look at practical solutions that leverage the distributed registry and smart contract elements of blockchain technologies. Each of the parties can deposit peer review activity in the blockchain — depending on peer review type, either partially or fully encrypted — and subsequent activity is also deposited in the reviewer’s ORCID profile. These business transactions — depositing peer review activity against person x — will be verifiable and auditable, thereby increasing transparency and reducing the risk of manipulation. Through the shared processes we will setup with other publishers, and recordkeeping, trust will increase.

A separate trend we see is the broadening scope of research evaluation which triggered researchers to also get (more) recognition for their peer review work, beyond citations and altmetrics. At a later stage new applications could be built on top of the peer review blockchain….(More)”.

Replicating the Justice Data Lab in the USA: Key Considerations


Blog by Tracey Gyateng and Tris Lumley: “Since 2011, NPC has researched, supported and advocated for the development of impact-focussed Data Labs in the UK. The goal has been to unlock government administrative data so that organisations (primarily nonprofits) who provide a social service can understand the impact of their services on the people who use them.

So far, one of these Data Labs has been developed to measure re-offending outcomes- the Justice Data Lab-, and others are currently being piloted for employment and education. Given our seven years of work in this area, we at NPC have decided to reflect on the key factors needed to create a Data Lab with our report: How to Create an Impact Data Lab. This blog outlines these factors, examines whether they are present in the USA, and asks what the next steps should be — drawing on the research undertaken with the Governance Lab….Below we examine the key factors and to what extent they appear to be present within the USA.

Environment: A broad culture that supports impact measurement. Similar to the UK, nonprofits in the USA are increasingly measuring the impact they have had on the participants of their service and sharing the difficulties of undertaking robust, high quality evaluations.

Data: Individual person-level administrative data. A key difference between the two countries is that, in the USA, personal data on social services tends to be held at a local, rather than central level. In the UK social services data such as reoffending, education and employment are collated into a central database. In the USA, the federal government has limited centrally collated personal data, instead this data can be found at state/city level….

A leading advocate: A Data Lab project team, and strong networks. Data Labs do not manifest by themselves. They requires a lead agency to campaign with, and on behalf of, nonprofits to set out a persuasive case for their development. In the USA, we have developed a partnership with the Governance Lab to seek out opportunities where Data Labs can be established but given the size of the country, there is scope for further collaborations/ and or advocates to be identified and supported.

Customers: Identifiable organisations that would use the Data Lab. Initial discussions with several US nonprofits and academia indicate support for a Data Lab in their context. Broad consultation based on an agreed region and outcome(s) will be needed to fully assess the potential customer base.

Data owners: Engaged civil servants. Generating buy-in and persuading various stakeholders including data owners, analysts and politicians is a critical part of setting up a data lab. While the exact profiles of the right people to approach can only be assessed once a region and outcome(s) of interest have been chosen, there are encouraging signs, such as the passing of the Foundations for Evidence-Based Policy Making Act of 2017 in the house of representatives which, among other things, mandates the appointment of “Chief Evaluation Officers” in government departments- suggesting that there is bipartisan support for increased data-driven policy evaluation.

Legal and ethical governance: A legal framework for sharing data. In the UK, all personal data is subject to data protection legislation, which provides standardised governance for how personal data can be processed across the country and within the European Union. A universal data protection framework does not exist within the USA, therefore data sharing agreements between customers and government data-owners will need to be designed for the purposes of Data Labs, unless there are existing agreements that enable data sharing for research purposes. This will need to be investigated at the state/city level of a desired Data Lab.

Funding: Resource and support for driving the set-up of the Data Lab. Most of our policy lab case studies were funded by a mixture of philanthropy and government grants. It is expected that a similar mixed funding model will need to be created to establish Data Labs. One alternative is the model adopted by the Washington State Institute for Public Policy (WSIPP), which was created by the Washington State Legislature and is funded on a project basis, primarily by the state. Additionally funding will be needed to enable advocates of a Data Lab to campaign for the service….(More)”.

Artificial Intelligence and the Need for Data Fairness in the Global South


Medium blog by Yasodara Cordova: “…The data collected by industry represents AI opportunities for governments, to improve their services through innovation. Data-based intelligence promises to increase the efficiency of resource management by improving transparency, logistics, social welfare distribution — and virtually every government service. E-government enthusiasm took of with the realization of the possible applications, such as using AI to fight corruption by automating the fraud-tracking capabilities of cost-control tools. Controversially, the AI enthusiasm has spread to the distribution of social benefits, optimization of tax oversight and control, credit scoring systems, crime prediction systems, and other applications based in personal and sensitive data collection, especially in countries that do not have comprehensive privacy protections.

There are so many potential applications, society may operate very differently in ten years when the “datafixation” has advanced beyond citizen data and into other applications such as energy and natural resource management. However, many countries in the Global South are not being given necessary access to their countries’ own data.

Useful data are everywhere, but only some can take advantage. Beyond smartphones, data can be collected from IoT components in common spaces. Not restricted to urban spaces, data collection includes rural technology like sensors installed in tractors. However, even when the information is related to issues of public importance in developing countries —like data taken from road mesh or vital resources like water and land — it stays hidden under contract rules and public citizens cannot access, and therefore take benefit, from it. This arrangement keeps the public uninformed about their country’s operations. The data collection and distribution frameworks are not built towards healthy partnerships between industry and government preventing countries from realizing the potential outlined in the previous paragraph.

The data necessary to the development of better cities, public policies, and common interest cannot be leveraged if kept in closed silos, yet access often costs more than is justifiable. Data are a primordial resource to all stages of new technology, especially tech adoption and integration, so the necessary long term investment in innovation needs a common ground to start with. The mismatch between the pace of the data collection among big established companies and small, new, and local businesses will likely increase with time, assuming no regulation is introduced for equal access to collected data….

Currently, data independence remains restricted to discussions on the technological infrastructure that supports data extraction. Privacy discussions focus on personal data rather than the digital accumulation of strategic data in closed silos — a necessary discussion not yet addressed. The national interest of data is not being addressed in a framework of economic and social fairness. Access to data, from a policy-making standpoint, needs to find a balance between the extremes of public, open access and limited, commercial use.

A final, but important note: the vast majority of social media act like silos. APIs play an important role in corporate business models, where industry controls the data it collects without reward, let alone user transparency. Negotiation of the specification of APIs to make data a common resource should be considered, for such an effort may align with the citizens’ interest….(More)”.

Journalism and artificial intelligence


Notes by Charlie Beckett (at LSE’s Media Policy Project Blog) : “…AI and machine learning is a big deal for journalism and news information. Possibly as important as the other developments we have seen in the last 20 years such as online platforms, digital tools and social media. My 2008 book on how journalism was being revolutionised by technology was called SuperMedia because these technologies offered extraordinary opportunities to make journalism much more efficient and effective – but also to transform what we mean by news and how we relate to it as individuals and communities. Of course, that can be super good or super bad.

Artificial intelligence and machine learning can help the news media with its three core problems:

  1. The overabundance of information and sources that leave the public confused
  2. The credibility of journalism in a world of disinformation and falling trust and literacy
  3. The Business model crisis – how can journalism become more efficient – avoiding duplication; be more engaged, add value and be relevant to the individual’s and communities’ need for quality, accurate information and informed, useful debate.

But like any technology they can also be used by bad people or for bad purposes: in journalism that can mean clickbait, misinformation, propaganda, and trolling.

Some caveats about using AI in journalism:

  1. Narratives are difficult to program. Trusted journalists are needed to understand and write meaningful stories.
  2. Artificial Intelligence needs human inputs. Skilled journalists are required to double check results and interpret them.
  3. Artificial Intelligence increases quantity, not quality. It’s still up to the editorial team and developers to decide what kind of journalism the AI will help create….(More)”.

A primer on political bots: Part one


Stuart W. Shulman et al at Data Driven Journalism: “The rise of political bots brings into sharp focus the role of automated social media accounts in today’s democratic civil society. Events during the Brexit referendum and the 2016 U.S. Presidential election revealed the scale of this issue for the first time to the majority of citizens and policy-makers. At the same time, the deployment of Russian-linked bots designed to promote pro-gun laws in the aftermath of the Florida school shooting demonstrates the state-sponsored, real-time readiness to shape, through information warfare, the dominant narratives on platforms such as Twitter. The regular news reports on these issues lead us to conclude that the foundations of democracy have become threatened by the presence of aggressive and socially disruptive bots, which aim to manipulate online political discourse.

While there is clarity on the various functions that bot accounts can be scripted to perform, as described below, the task of accurately defining this phenomenon and identifying bot accounts remains a challenge. At Texifter, we have endeavoured to bring nuance to this issue through a research project which explores the presence of automated accounts on Twitter. Initially, this project concerned itself with an attempt to identify bots which participated in online conversations around the prevailing cryptocurrency phenomenon. This article is the first in a series of three blog posts produced by the researchers at Texifter that outlines the contemporary phenomenon of Twitter bots….

Bots in their current iteration have a relatively short, albeit rapidly evolving history. Initially constructed with non-malicious intentions, it wasn’t until the late 1990s with the advent of Web 2.0 when bots began to develop a more negative reputation. Although bots have been used maliciously in denial-of-service (DDoS) attacks, spam emails, and mass identity theft, their purpose is not explicitly to incite mayhem.

Before the most recent political events, bots existed in chat rooms, operated as automated customer service agents on websites, and were a mainstay on dating websites. This familiar form of the bot is known to the majority of the general population as a “chatbot” – for instance, CleverBot was and still is a popular platform to talk to an “AI”. Another prominent example was Microsoft’s failed Twitter Chatbot Tay which made headlines in 2016 when “her” vocabulary and conversation functions were manipulated by Twitter users until “she” espoused neo-nazi views when “she” was subsequently deleted.

Image: XKCD Comic #632.

A Twitter bot is an account controlled by an algorithm or script, which is typically hosted on a cloud platform such as Heroku. They are typically, though not exclusively, scripted to conduct repetitive tasks.  For example, there are bots that retweet content containing particular keywords, reply to new followers, and direct messages to new followers; although they can be used for more complex tasks such as participating in online conversations. Bot accounts make up between 9 and 15% of all active accounts on Twitter; however, it is predicted that they account for a much greater percentage of total Twitter traffic. Twitter bots are generally not created with malicious intent; they are frequently used for online chatting or for raising the professional profile of a corporation – but their ability to pervade our online experience and shape political discourse warrants heightened scrutiny….(More)”.

The Future Computed: Artificial Intelligence and its role in society


Brad Smith at the Microsoft Blog: “Today Microsoft is releasing a new book, The Future Computed: Artificial Intelligence and its role in society. The two of us have written the foreword for the book, and our teams collaborated to write its contents. As the title suggests, the book provides our perspective on where AI technology is going and the new societal issues it has raised.

On a personal level, our work on the foreword provided an opportunity to step back and think about how much technology has changed our lives over the past two decades and to consider the changes that are likely to come over the next 20 years. In 1998, we both worked at Microsoft, but on opposite sides of the globe. While we lived on separate continents and in quite different cultures, we shared similar experiences and daily routines which were managed by manual planning and movement. Twenty years later, we take for granted the digital world that was once the stuff of science fiction.

Technology – including mobile devices and cloud computing – has fundamentally changed the way we consume news, plan our day, communicate, shop and interact with our family, friends and colleagues. Two decades from now, what will our world look like? At Microsoft, we imagine that artificial intelligence will help us do more with one of our most precious commodities: time. By 2038, personal digital assistants will be trained to anticipate our needs, help manage our schedule, prepare us for meetings, assist as we plan our social lives, reply to and route communications, and drive cars.

Beyond our personal lives, AI will enable breakthrough advances in areas like healthcare, agriculture, education and transportation. It’s already happening in impressive ways.

But as we’ve witnessed over the past 20 years, new technology also inevitably raises complex questions and broad societal concerns. As we look to a future powered by a partnership between computers and humans, it’s important that we address these challenges head on.

How do we ensure that AI is designed and used responsibly? How do we establish ethical principles to protect people? How should we govern its use? And how will AI impact employment and jobs?

To answer these tough questions, technologists will need to work closely with government, academia, business, civil society and other stakeholders. At Microsoft, we’ve identified six ethical principles – fairness, reliability and safety, privacy and security, inclusivity, transparency, and accountability – to guide the cross-disciplinary development and use of artificial intelligence. The better we understand these or similar issues — and the more technology developers and users can share best practices to address them — the better served the world will be as we contemplate societal rules to govern AI.

We must also pay attention to AI’s impact on workers. What jobs will AI eliminate? What jobs will it create? If there has been one constant over 250 years of technological change, it has been the ongoing impact of technology on jobs — the creation of new jobs, the elimination of existing jobs and the evolution of job tasks and content. This too is certain to continue.

Some key conclusions are emerging….

The Future Computed is available here and additional content related to the book can be found here.”

Migration Data Portal


New portal managed and developed by IOM’s Global Migration Data Analysis Centre (GMDAC)“…aims to serve as a unique access point to timely, comprehensive migration statistics and reliable information about migration data globally. The site is designed to help policy makers, national statistics officers, journalists and the general public interested in the field of migration to navigate the increasingly complex landscape of international migration data, currently scattered across different organisations and agencies.

Especially in critical times, such as those faced today, it is essential to ensure that responses to migration are based on sound facts and accurate analysis. By making the evidence about migration issues accessible and easy to understand, the Portal aims to contribute to a more informed public debate….

The five main sections of the Portal are designed to help you quickly and easily find the data and information you need.

  • DATA – Our interactive world map visualizes international, publicly-available and internationally comparable migration data.
  • THEMES – Thematic overviews explain how various aspects of migration are measured, what are the data sources, their strengths and weaknesses and provide context and analysis of key migration data.
  • TOOLS – Migration data tools are regularly added to help you find the right tools, guidelines and manuals on how to collect, interpret and disseminate migration data.
  • Sustainable Development Goals (SDGs) and the Global Compact on Migration (GCM) – Migration Data, the SDGs and the new Global Compact on Migration (GCM) – Reviews the migration-related targets in the SDGs, how they are defined and measured, and provides information on the new GCM and the migration data needs to support its implementation.
  • BLOG – Our blog and the Talking Migration Data video series provide a place for the migration data community to share their opinion on new developments and policy, new data or methods….(More)”.

SAM, the first A.I. politician on Messenger


 at Digital Trends: “It’s said that all politicians are the same, but it seems safe to assume that you’ve never seen a politician quite like this. Meet SAM, heralded as the politician of the future. Unfortunately, you can’t exactly shake this politician’s hand, or have her kiss your baby. Rather, SAM is the world’s first Virtual Politician (and a female presence at that), “driven by the desire to close the gap between what voters want and what politicians promise, and what they actually achieve.”

The artificially intelligent chat bot is currently live on Facebook Messenger, though she probably is most helpful to those in New Zealand. After all, the bot’s website notes, “SAM’s goal is to act as a representative for all New Zealanders, and evolves based on voter input.” Capable of being reached by anyone at just about anytime from anywhere, this may just be the single most accessible politician we’ve ever seen. But more importantly, SAM purports to be a true representative, claiming to analyze “everyone’s views [and] opinions, and impact of potential decisions.” This, the bot notes, could make for better policy for everyone….(More)”.