How to ensure that your data science is inclusive


Blog by Samhir Vasdev: “As a new generation of data scientists emerges in Africa, they will encounter relatively little trusted, accurate, and accessible data upon which to apply their skills. It’s time to acknowledge the limitations of the data sources upon which data science relies, particularly in lower-income countries.

The potential of data science to support, measure, and amplify sustainable development is undeniable. As public, private, and civic institutions around the world recognize the role that data science can play in advancing their growth, an increasingly robust array of efforts has emerged to foster data science in lower-income countries.

This phenomenon is particularly salient in Sub-Saharan Africa. There, foundations are investing millions into building data literacy and data science skills across the continent. Multilaterals and national governments are pioneering new investments into data science, artificial intelligence, and smart cities. Private and public donors are building data science centers to build cohorts of local, indigenous data science talent. Local universities are launching graduate-level data science courses.

Despite this progress, among the hype surrounding data science rests an unpopular and inconvenient truth: As a new generation of data scientists emerges in Africa, they will encounter relatively little trusted, accurate, and accessible data that they can use for data science.

We hear promises of how data science can help teachers tailor curricula according to students’ performances, but many school systems don’t collect or track that performance data with enough accuracy and timeliness to perform those data science–enabled tweaks. We believe that data science can help us catch disease outbreaks early, but health care facilities often lack the specific data, like patient origin or digitized information, that is needed to discern those insights.

These fundamental data gaps invite the question: Precisely what data would we perform data science on to achieve sustainable development?…(More)”.

Timing Technology


Blog by Gwern Branwen: “Technological forecasts are often surprisingly prescient in terms of predicting that something was possible & desirable and what they predict eventually happens; but they are far less successful at predicting the timing, and almost always fail, with the success (and riches) going to another.

Why is their knowledge so useless? The right moment cannot be known exactly in advance, so attempts to forecast will typically be off by years or worse. For many claims, there is no way to invest in an idea except by going all in and launching a company, resulting in extreme variance in outcomes, even when the idea is good and the forecasts correct about the (eventual) outcome.

Progress can happen and can be foreseen long before, but the details and exact timing due to bottlenecks are too difficult to get right. Launching too early means failure, but being conservative & launching later is just as bad because regardless of forecasting, a good idea will draw overly-optimistic researchers or entrepreneurs to it like moths to a flame: all get immolated but the one with the dumb luck to kiss the flame at the perfect instant, who then wins everything, at which point everyone can see that the optimal time is past. All major success stories overshadow their long list of predecessors who did the same thing, but got unlucky. So, ideas can be divided into the overly-optimistic & likely doomed, or the fait accompli. On an individual level, ideas are worthless because so many others have them too—‘multiple invention’ is the rule, and not the exception.

This overall problem falls under the reinforcement learning paradigm, and successful approaches are analogous to Thompson sampling/posterior sampling: even an informed strategy can’t reliably beat random exploration which gradually shifts towards successful areas while continuing to take occasional long shots. Since people tend to systematically over-exploit, how is this implemented? Apparently by individuals acting suboptimally on the personal level, but optimally on societal level by serving as random exploration.

A major benefit of R&D, then, is in laying fallow until the ‘ripe time’ when they can be immediately exploited in previously-unpredictable ways; applied R&D or VC strategies should focus on maintaining diversity of investments, while continuing to flexibly revisit previous failures which forecasts indicate may have reached ‘ripe time’. This balances overall exploitation & exploration to progress as fast as possible, showing the usefulness of technological forecasting on a global level despite its uselessness to individuals….(More)”.

Ethical guidelines issued by engineers’ organization fail to gain traction


Blogpost by Nicolas Kayser-Bril: “In early 2016, the Institute of Electrical and Electronics Engineers, a professional association known as IEEE, launched a “global initiative to advance ethics in technology.” After almost three years of work and multiple rounds of exchange with experts on the topic, it released last April the first edition of Ethically Aligned Design, a 300-page treatise on the ethics of automated systems.

The general principles issued in the report focus on transparency, human rights and accountability, among other topics. As such, they are not very different from the 83 other ethical guidelines that researchers from the Health Ethics and Policy Lab of the Swiss Federal Institute of Technology in Zurich reviewed in an article published in Nature Machine Intelligence in September. However, one key aspect makes IEEE different from other think-tanks. With over 420,000 members, it is the world’s largest engineers’ association with roots reaching deep into Silicon Valley. Vint Cerf, one of Google’s Vice Presidents, is an IEEE “life fellow.”

Because the purpose of the IEEE principles is to serve as a “key reference for the work of technologists”, and because many technologists contributed to their conception, we wanted to know how three technology companies, Facebook, Google and Twitter, were planning to implement them.

Transparency and accountability

Principle number 5, for instance, requires that the basis of a particular automated decision be “discoverable”. On Facebook and Instagram, the reasons why a particular item is shown on a user’s feed are all but discoverable. Facebook’s “Why You’re Seeing This Post” feature explains that “many factors” are involved in the decision to show a specific item. The help page designed to clarify the matter fails to do so: many sentences there use opaque wording (users are told that “some things influence ranking”, for instance) and the basis of the decisions governing their newsfeeds are impossible to find.

Principle number 6 states that any autonomous system shall “provide an unambiguous rationale for all decisions made.” Google’s advertising systems do not provide an unambiguous rationale when explaining why a particular advert was shown to a user. A click on “Why This Ad” states that an “ad may be based on general factors … [and] information collected by the publisher” (our emphasis). Such vagueness is antithetical to the requirement for explicitness.

AlgorithmWatch sent detailed letters (which you can read below this article) with these examples and more, asking Google, Facebook and Twitter how they planned to implement the IEEE guidelines. This was in June. After a great many emails, phone calls and personal meetings, only Twitter answered. Google gave a vague comment and Facebook promised an answer which never came…(More)”

The weather data gap: How can mobile technology make smallholder farmers climate resilient?


Rishi Raithatha at GSMA: “In the new GSMA AgriTech report, Mobile Technology for Climate Resilience: The role of mobile operators in bridging the data gap, we explore how mobile network operators (MNOs) can play a bigger role in developing and delivering services to strengthen the climate resilience of smallholder farmers. By harnessing their own assets and data, MNOs can improve a broad suite of weather products that are especially relevant for farming communities. These include a variety of weather forecasts (daily, weekly, sub-seasonal and seasonal) and nowcasts, as real-time monitoring and one- to two-hour predictions are often used for Early Warning Systems (EWS) to prevent weather-related disasters. MNOs can also help strengthen the value proposition of other climate products, such as weather index insurance and decision agriculture.

Why do we need more weather data?

Agriculture is highly dependent on regional climates, especially in developing countries where farming is largely rain-fed. Smallholder farmers, who are responsible for the bulk of agricultural production in developing countries, are particularly vulnerable to changing weather patterns – especially given their reliance on natural resources and exclusion from social protection schemes. However, the use of climate adaptation approaches, such as localised weather forecasts and weather index insurance, can enhance smallholder farmers’ ability to withstand the risks posed by climate change and maintain agricultural productivity.

Ground-level measurements are an essential component of climate resilience products; the creation of weather forecasts and nowcasts starts with the analysis of ground, spatial and aerial observations. This involves the use of algorithms, weather models and current and historical observational weather data. Observational instruments, such as radar, weather stations and satellites, are necessary in measuring ground-level weather. However, National Hydrological and Meteorological Services (NHMSs) in developing countries often lack the capacity to generate accurate ground-level measurements beyond a few areas, resulting in gaps in local weather data.

While satellite offers better quality resolution than before, and is more affordable and available to NHMSs, there is a need to complement this data with ground-level measurements. This is especially true in tropical and sub-tropical regions where most smallholder farmers live, where variable local weather patterns can lead to skewed averages from satellite data….(More).”

Why policy networks don’t work (the way we think they do)


Blog by James Georgalakis: “Is it who you know or what you know? The literature on evidence uptake and the role of communities of experts mobilised at times of crisis convinced me that a useful approach would be to map the social network that emerged around the UK-led mission to Sierra Leone so it could be quantitatively analysed. Despite the well-deserved plaudits for my colleagues at IDS and their partners in the London School of Hygiene and Tropical Medicine, the UK Department for International Development (DFID), the Wellcome Trust and elsewhere, I was curious to know why they had still met real resistance to some of their policy advice. This included the provision of home care kits for victims of the virus who could not access government or NGO run Ebola Treatment Units (ETUs).

It seemed unlikely these challenges were related to poor communications. The timely provision of accessible research knowledge by the Ebola Response Anthropology Platform has been one of the most celebrated aspects of the mobilisation of anthropological expertise. This approach is now being replicated in the current Ebola response in the Democratic Republic of Congo (DRC).  Perhaps the answer was in the network itself. This was certainly indicated by some of the accounts of the crisis by those directly involved.

Social network analysis

I started by identifying the most important looking policy interactions that took place between March 2014, prior to the UK assuming leadership of the Sierra Leone international response and mid-2016, when West Africa was finally declared Ebola free. They had to be central to the efforts to coordinate the UK response and harness the use of evidence. I then looked for documents related to these events, a mixture of committee minutes, reports and correspondence , that could confirm who was an active participant in each. This analysis of secondary sources related to eight separate policy processes and produced a list of 129 individuals. However, I later removed a large UK conference that took place in early 2016 at which learning from the crisis was shared.  It appeared that most delegates had no significant involvement in giving policy advice during the crisis. This reduced the network to 77….(More)”.

Three Big Things: The Most Important Forces Shaping the World


Essay by Morgan Housel: “An irony of studying history is that we often know exactly how a story ends, but have no idea where it began…

Nothing is as influential as World War II has been. But there are a few other Big Things worth paying attention to, because they’re the root influencer of so many other topics.

The three big ones that stick out are demographics, inequality, and access to information.

There are hundreds of forces shaping the world not mentioned here. But I’d argue that many, even most, are derivatives of those three.

Each of these Big Things will have a profound impact on the coming decades because they’re both transformational and ubiquitous. They impact nearly everyone, albeit in different ways. With that comes the reality that we don’t know exactly how their influence will unfold. No one in 1945 knew exactly how World War II would go on to shape the world, only that it would in extreme ways. But we can guess some of the likeliest changes.

Essay by Morgan Housel: “An irony of studying history is that we often know exactly how a story ends, but have no idea where it began…

3. Access to information closes gaps that used to create a social shield of ignorance.

Carole Cole disappeared in 1970 after running away from a juvenile detention center in Texas. She was 17.

A year later an unidentified murdered body was found in Louisiana. It was Carole, but Louisiana police had no idea. They couldn’t identify her. Carole’s disappearance went cold, as did the unidentified body.

Thirty-four years later Carole’s sister posted messages on Craigslist asking for clues into her sister’s disappearance. At nearly the same time, a sheriff’s department in Louisiana made a Facebook page asking for help identifying the Jane Doe body found 34 years before.

Six days later, someone connected the dots between the two posts.

What stumped detectives for almost four decades was solved by Facebook and Craigslist in less than a week.

This kind of stuff didn’t happen even 10 years ago. And we probably haven’t awoken to its full potential – good and bad.

The greatest innovation of the last generation has been the destruction of information barriers that used to keep strangers isolated from one another…(More)”

The Passion Economy and the Future of Work


Li Jin at Andreessen-Horowitz: “The top-earning writer on the paid newsletter platform Substack earns more than $500,000 a year from reader subscriptions. The top content creator on Podia, a platform for video courses and digital memberships, makes more than $100,000 a month. And teachers across the US are bringing in thousands of dollars a month teaching live, virtual classes on Outschool and Juni Learning.

These stories are indicative of a larger trend: call it the “creator stack” or the “enterprization of consumer.” Whereas previously, the biggest online labor marketplaces flattened the individuality of workers, new platforms allow anyone to monetize unique skills. Gig work isn’t going anywhere—but there are now more ways to capitalize on creativity. Users can now build audiences at scale and turn their passions into livelihoods, whether that’s playing video games or producing video content. This has huge implications for entrepreneurship and what we’ll think of as a “job” in the future….(More)”.

Lessons Learned for New Office of Innovation


Blog by Catherine Tkachyk: “I have worked in a government innovation office for the last eight years in four different roles and two different communities.  In that time, I’ve had numerous conversations on what works and doesn’t work for innovation in local government.  Here’s what I’ve learned: starting an innovation office in government is hard.  That is not a complaint, I love the work I do, but it comes with its own challenges.  When you think about many of the services government provides: Police; Fire; Health and Human Services; Information Technology; Human Resources; Finance; etc. very few people question whether government should provide those services.  They may question how they are provided, who is providing them, or how much they cost, but they don’t question the service.  That’s not true for innovation offices.  One of the first questions I can get from people when they hear what I do is, “Why does government need an Office of Innovation.”  My first answer is, “Do you like how government works?  If not, then maybe there should be a group of people focused on fixing it.” 

Over my career, I have come across a few lessons on how to start up an innovation office to give you the best chance for success. Some of these lessons come from listening to others, but many (probably too many) come from my own mistakes….(More)”.

Five Ethical Principles for Humanitarian Innovation


Peter Batali, Ajoma Christopher & Katie Drew in the Stanford Social Innovation Review: “…Based on this experience, UNHCR and CTEN developed a pragmatic, refugee-led, “good enough” approach to experimentation in humanitarian contexts. We believe a wide range of organizations, including grassroots community organizations and big-tech multinationals, can apply this approach to ensure that the people they aim to help hold the reigns of the experimentation process.

1. Collaborate Authentically and Build Intentional Partnerships

Resource and information asymmetry are inherent in the humanitarian system. Refugees have long been constructed as “‘victims”’ in humanitarian response, waiting for “salvation” from heroic humanitarians. Researcher Matthew Zagor describes this construct as follows: “The genuine refugee … is the passive, coerced, patient refugee, the one waiting in the queue—the victim, anticipating our redemptive touch, defined by the very passivity which in our gaze both dehumanizes them, in that they lack all autonomy in our eyes, and romanticizes them as worthy in their potentiality.”

Such power dynamics make authentic collaboration challenging….

2. Avoid Technocratic Language

Communication can divide us or bring us together. Using exclusive or “expert” terminology (terms like “ideation,” “accelerator,” and “design thinking”) or language that reinforces power dynamics or assigns an outsider role (such as “experimenting on”) can alienate community participants. Organizations should aim to use inclusive language than everyone understands, as well as set a positive and realistic tone. Communication should focus on the need to co-develop solutions with the community, and the role that testing or trying something new can play….

3. Don’t Assume Caution Is Best

Research tells us that we feel more regret over actions that lead to negative outcomes than we do over inactions that lead to the same or worse outcomes. As a result, we tend to perceive and weigh action and inaction unequally. So while humanitarian organizations frequently consider the implications of our actions and the possible negative outcome for communities, we don’t always consider the implications of doing nothing. Is it ethical to continue an activity that we know isn’t as effective as it could be, when testing small and learning fast could reap real benefits? In some cases, taking a risk might, in fact, be the least risky path of action. We need to always ask ourselves, “Is it really ethical to do nothing?”…

4. Choose Experiment Participants Based on Values

Many humanitarian efforts identify participants based on their societal role, vulnerability, or other selection criteria. However, these methods often lead to challenges related to incentivization—the need to provide things like tea, transportation, or cash payments to keep participants engaged. Organizations should instead consider identifying participants who demonstrate the values they hope to promote—such as collaboration, transparency, inclusivity, or curiosity. These community members are well-poised to promote inclusivity, model positive behaviors, and engage participants across the diversity of your community….

5. Monitor Community Feedback and Adapt

While most humanitarian agencies know they need to listen and adapt after establishing communication channels, the process remains notoriously challenging. One reason is that community members don’t always share their feedback on experimentation formally; feedback sometimes comes from informal channels or even rumors. Yet consistent, real-time feedback is essential to experimentation. Listening is the pressure valve in humanitarian experimentation; it allows organizations to adjust or stop an experiment if the community flags a negative outcome….(More)”.

Why data from companies should be a common good


Paula Forteza at apolitical: “Better planning of public transport, protecting fish from intensive fishing, and reducing the number of people killed in car accidents: for these and many other public policies, data is essential.

Data applications are diverse, and their origins are equally numerous. But data is not exclusively owned by the public sector. Data can be produced by private actors such as mobile phone operators, as part of marine traffic or by inter-connected cars to give just a few examples.

The awareness around the potential of private data is increasing, as the proliferation of data partnerships between companies, governments, local authorities show. However, these partnerships represent only a very small fraction of what could be done.

The opening of public data, meaning that public data is made freely available to everyone, has been conducted on a wide scale in the last 10 years, pioneered by the US and UK, soon followed by France and many other countries. In 2015, France took a first step, as the government introduced the Digital Republic Bill which made data open by default and introduced the concept of public interest data. Due to a broad definition and low enforcement, the opening of private sector data is, nevertheless, still lagging behind.

The main arguments for opening private data are that it will allow better public decision-making and it could trigger a new way to regulate Big Tech. There is, indeed, a strong economic case for data sharing, because data is a non-rival good: the value of data does not diminish when shared. On the contrary, new uses can be designed and data can be enriched by aggregation, which could improve innovation for start-ups….

Why Europe needs a private data act

Data hardly knows any boundaries.

Some states are opening like France did in 2015 by creating a framework for “public interest data,” but the absence of a common international legal framework for private data sharing is a major obstacle to its development. To scale up, a European Private Data Act is needed.

This framework must acknowledge the legitimate interest of the private companies that collect and control data. Data can be their main source of income or one they are wishing to develop, and this must be respected. Trade secrecy has to be protected too: data sharing is not open data.

Data can be shared to a limited and identified number of partners and it does not always have to be free. Yet private interest must be aligned with the public good. The European Convention on Human Rights and the European Charter of Fundamental Rights acknowledge that some legitimate and proportional limitations can be set to the freedom of enterprise, which gives everyone the right to pursue their own profitable business.

The “Private Data Act” should contain several fundamental data sharing principles in line with those proposed by the European Commission in 2018: proportionality, “do no harm”, full respect of the GDPR, etc. It should also include guidelines on which data to share, how to appreciate the public interest, and in which cases data should be opened for free or how the pricing should be set.

Two methods can be considered:

  • Defining high-value datasets, as it has been done for public data in the recent Open Data Directive, in areas like mobile communications, banking, transports, etc. This method is strong but is not flexible enough.
  • Alternatively, governments might define certain “public interest projects”. In doing so, governments could get access to specific data that is seen as a prerequisite to achieve the project. For example, understanding why there is a increasing mortality among bees, requires various data sources: concrete data on bee mortality from the beekeepers, crops and the use of pesticides from the farmers, weather data, etc. This method is more flexible and warrants that only the data needed for the project is shared.

Going ahead on open data and data sharing should be a priority for the upcoming European Commission and Parliament. Margrethe Vestager has been renewed as Competition Commissioner and Vice-President of the Commission and she already mentioned the opportunity to define access to data for newcomers in the digital market.

Public interest data is a new topic on the EU agenda and will probably become crucial in the near future….(More)”.