How Can We Overcome the Challenge of Biased and Incomplete Data?


Knowledge@Wharton: “Data analytics and artificial intelligence are transforming our lives. Be it in health care, in banking and financial services, or in times of humanitarian crises — data determine the way decisions are made. But often, the way data is collected and measured can result in biased and incomplete information, and this can significantly impact outcomes.  

In a conversation with Knowledge@Wharton at the SWIFT Institute Conference on the Impact of Artificial Intelligence and Machine Learning in the Financial Services Industry, Alexandra Olteanu, a post-doctoral researcher at Microsoft Research, U.S. and Canada, discussed the ethical and people considerations in data collection and artificial intelligence and how we can work towards removing the biases….

….Knowledge@Wharton: Bias is a big issue when you’re dealing with humanitarian crises, because it can influence who gets help and who doesn’t. When you translate that into the business world, especially in financial services, what implications do you see for algorithmic bias? What might be some of the consequences?

Olteanu: A good example is from a new law in the New York state according to which insurance companies can now use social media to decide the level for your premiums. But, they could in fact end up using incomplete information. For instance, you might be buying your vegetables from the supermarket or a farmer’s market, but these retailers might not be tracking you on social media. So nobody knows that you are eating vegetables. On the other hand, a bakery that you visit might post something when you buy from there. Based on this, the insurance companies may conclude that you only eat cookies all the time. This shows how even incomplete data can affect you….(More)”.

107 Years Later, The Titanic Sinking Helps Train Problem-Solving AI


Kiona N. Smith at Forbes: “What could the 107-year-old tragedy of the Titanic possibly have to do with modern problems like sustainable agriculture, human trafficking, or health insurance premiums? Data turns out to be the common thread. The modern world, for better or or worse, increasingly turns to algorithms to look for patterns in the data and and make predictions based on those patterns. And the basic methods are the same whether the question they’re trying to answer is “Would this person survive the Titanic sinking?” or “What are the most likely routes for human trafficking?”

An Enduring Problem

Predicting survival at sea based on the Titanic dataset is a standard practice problem for aspiring data scientists and programmers. Here’s the basic challenge: feed your algorithm a portion of the Titanic passenger list, which includes some basic variables describing each passenger and their fate. From that data, the algorithm (if you’ve programmed it well) should be able to draw some conclusions about which variables made a person more likely to live or die on that cold April night in 1912. To test its success, you then give the algorithm the rest of the passenger list (minus the outcomes) and see how well it predicts their fates.

Online communities like Kaggle.com have held competitions to see who can develop the algorithm that predicts survival most accurately, and it’s also a common problem presented to university classes. The passenger list is big enough to be useful, but small enough to be manageable for beginners. There’s a simple set out of outcomes — life or death — and around a dozen variables to work with, so the problem is simple enough for beginners to tackle but just complex enough to be interesting. And because the Titanic’s story is so famous, even more than a century later, the problem still resonates.

“It’s interesting to see that even in such a simple problem as the Titanic, there are nuggets,” said Sagie Davidovich, Co-Founder & CEO of SparkBeyond, who used the Titanic problem as an early test for SparkBeyond’s AI platform and still uses it as a way to demonstrate the technology to prospective customers….(More)”.

Can tracking people through phone-call data improve lives?


Amy Maxmen in Nature: “After an earthquake tore through Haiti in 2010, killing more than 100,000 people, aid agencies spread across the country to work out where the survivors had fled. But Linus Bengtsson, a graduate student studying global health at the Karolinska Institute in Stockholm, thought he could answer the question from afar. Many Haitians would be using their mobile phones, he reasoned, and those calls would pass through phone towers, which could allow researchers to approximate people’s locations. Bengtsson persuaded Digicel, the biggest phone company in Haiti, to share data from millions of call records from before and after the quake. Digicel replaced the names and phone numbers of callers with random numbers to protect their privacy.

Bengtsson’s idea worked. The analysis wasn’t completed or verified quickly enough to help people in Haiti at the time, but in 2012, he and his collaborators reported that the population of Haiti’s capital, Port-au-Prince, dipped by almost one-quarter soon after the quake, and slowly rose over the next 11 months1. That result aligned with an intensive, on-the-ground survey conducted by the United Nations.

Humanitarians and researchers were thrilled. Telecommunications companies scrutinize call-detail records to learn about customers’ locations and phone habits and improve their services. Researchers suddenly realized that this sort of information might help them to improve lives. Even basic population statistics are murky in low-income countries where expensive household surveys are infrequent, and where many people don’t have smartphones, credit cards and other technologies that leave behind a digital trail, making remote-tracking methods used in richer countries too patchy to be useful.

Since the earthquake, scientists working under the rubric of ‘data for good’ have analysed calls from tens of millions of phone owners in Pakistan, Bangladesh, Kenya and at least two dozen other low- and middle-income nations. Humanitarian groups say that they’ve used the results to deliver aid. And researchers have combined call records with other information to try to predict how infectious diseases travel, and to pinpoint locations of poverty, social isolation, violence and more (see ‘Phone calls for good’)….(More)”.

Africa must reap the benefits of its own data


Tshilidzi Marwala at Business Insider: “Twenty-two years ago when I was a doctoral student in artificial intelligence (AI) at the University of Cambridge, I had to create all the AI algorithms I needed to understand the complex phenomena related to this field.

For starters, AI is a computer software that performs intelligent tasks that normally require human beings, while an algorithm is a set of rules that instruct a computer to execute specific tasks. In that era, the ability to create AI algorithms was more important than the ability to acquire and use data.

Google has created an open-source library called TensorFlow, which contains all the developed AI algorithms. This way Google wants people to develop applications (apps) using their software, with the payoff being that Google will collect data on any individual using the apps developed with TensorFlow.

Today, an AI algorithm is not a competitive advantage but data is. The World Economic Forum calls data the new “oxygen”, while Chinese AI specialist Kai-Fu Lee calls it the new “oil”.

Africa’s population is increasing faster than in any region in the world. The continent has a population of 1.3-billion people and a total nominal GDP of $2.3-trillion. This increase in the population is in effect an increase in data, and if data is the new oil, it is akin to an increase in oil reserve.

Even oil-rich countries such as Saudi Arabia do not experience an increase in their oil reserve. How do we as Africans take advantage of this huge amount of data?

There are two categories of data in Africa: heritage and personal. Heritage data resides in society, whereas personal data resides in individuals. Heritage data includes data gathered from our languages, emotions and accents. Personal data includes health, facial and fingerprint data.

Facebook, Amazon, Apple, Netflix and Google are data companies. They trade data to advertisers, banks and political parties, among others. For example, the controversial company Cambridge Analytica harvested Facebook data to influence the presidential election that potentially contributed to Donald Trump’s victory in the US elections.

The company Google collects language data to build an application called Google Translate that translates from one language to another. This app claims to cover African languages such as Zulu, Yoruba and Swahili. Google Translate is less effective in handling African languages than it is in handling European and Asian languages.

Now, how do we capitalise on our language heritage to create economic value? We need to build our own language database and create our own versions of Google Translate.

An important area is the creation of an African emotion database. Different cultures exhibit emotions differently. These are very important in areas such as safety of cars and aeroplanes. If we can build a system that can read pilots’ emotions, this would enable us to establish if a pilot is in a good state of mind to operate an aircraft, which would increase safety.

To capitalise on the African emotion database, we should create a data bank that captures emotions of African people in various parts of the continent, and then use this database to create AI apps to read people’s emotions. Mercedes-Benz has already implemented the “Attention Assist”, which alerts drivers to fatigue.

Another important area is the creation of an African health database. AI algorithms are able to diagnose diseases better than human doctors. However, these algorithms depend on the availability of data. To capitalise on this, we need to collect such data and use it to build algorithms that will be able to augment medical care….(More)”.

Airbnb and New York City Reach a Truce on Home-Sharing Data


Paris Martineau at Wired: “For much of the past decade, Airbnb and New York City have been embroiled in a high-profile feud. Airbnb wants legitimacy in its biggest market. City officials want to limit home-sharing platforms, which they argue exacerbate the city’s housing crisis and pose safety risks by allowing people to transform homes into illegal hotels.

Despite years of lawsuits, countersuits, lobbying campaigns, and failed attempts at legislation, progress on resolving the dispute has been incremental at best. The same could be said for many cities around the nation, as local government officials struggle to come to grips with the increasing popularity of short-term rental platforms like Airbnb, HomeAway, and VRBO in high-tourism areas.

In New York last week, there were two notable breaks in the logjam. On May 14, Airbnb agreed to give city officials partially anonymized host and reservation data for more than 17,000 listings. Two days later, a judge ordered Airbnb to turn over more detailed and nonanonymized information on dozens of hosts and hundreds of guests who have listed or stayed in more than a dozen buildings in Manhattan, Brooklyn, and Queens in the past seven years.

In both cases, the information will be used by investigators with the Mayor’s Office of Special Enforcement to identify hosts and property owners who may have broken the city’s notoriously strict short-term rental laws by converting residences into de facto hotels by listing them on Airbnb.

City officials originally subpoenaed Airbnb for the data—not anonymized—on the more than 17,000 listings in February. Mayor Bill de Blasio called the move an effort to force the company to “come clean about what they’re actually doing in this city.” The agreement outlining the data sharing was signed as a compromise on May 14, according to court records.

In addition to the 17,000 listings identified by the city, Airbnb will also share data on every listing rented through its platform between January 1, 2018, and February 18, 2019, that could have potentially violated New York’s short-term rental laws. The city prohibits rentals of an entire apartment or home for less than 30 days without the owner present in the unit, making many stays traditionally associated with services like Airbnb, HomeAway, and VRBO illegal. Only up to two guests are permitted in the short-term rental of an apartment or room, and they must be given “free and unobstructed access to every room and to each exit within the apartment,” meaning hosts can’t get around the ban on whole-apartment rentals by renting out three separate private rooms at once….(More)”.

Companies That Rely On Census Data Worry Citizenship Question Will Hurt


Hansi Lo Wang at NPR: “Some critics of the citizenship question the Trump administration wants to add to the 2020 census are coming from a group that tends to stay away from politically heated issues — business leaders.

From longtime corporations like Levi Strauss & Co. to upstarts like Warby Parker, some companies say that including the question — “Is this person a citizen of the United States?” — could harm not only next year’s national head count, but also their bottom line.

How governments use census data is a common refrain in the lead-up to a constitutionally mandated head count of every person living in the U.S. The new population counts, gathered once a decade, are used to determine how congressional seats and Electoral College votes are distributed among the states. They also guide how hundreds of billions in federal tax dollars are spread around the country to fund public services.

What is often less visible is how the census data undergird decisions made by large and small businesses across the country. The demographic information the census collects — including the age, sex, race, ethnicity and housing status of all U.S. residents — informs business owners about who their existing and future customers are, which new products and services those markets may want and where to build new locations.

Weeks before the Supreme Court heard oral arguments over the citizenship question last month, more than two dozen companies and business groups filed a friend-of-the-court brief against the question. Its potential impact on the accuracy of census data, especially about immigrants and people of color, is drawing concern from both Lyft and Uber, as well as Levi Strauss, Warby Parker and Univision.

“We don’t view this as a political situation at all,” says Christine Pierce, the senior vice president of data science at Nielsen — a major data analytics company in the business world that filed its own brief with the high court. “We see this as one that is around sound research and good science.”…(More)”.

Facebook releases a trio of maps to aid with fighting disease outbreaks


Sarah Perez at Techcrunch: “Facebook… announced a new initiative focused on using its data and technologies to help nonprofit organizations and universities working in public health better map the spread of infectious diseases around the world. Specifically, the company is introducing three new maps: population density maps with demographic estimates, movement maps and network coverage maps. These, says Facebook, will help the health partners to understand where people live, how they’re moving and if they have connectivity — all factors that can aid in determining how to respond to outbreaks, and where supplies should be delivered.

As Facebook explained, health organizations rely on information like this when planning public health campaigns. But much of the information they rely on is outdated, like older census data. In addition, information from more remote communities can be scarce.

By combining the new maps with other public health data, Facebook believes organizations will be better equipped to address epidemics.

The new high-resolution population density maps will estimate the number of people living within 30-meter grid tiles, and provide insights on demographics, including the number of children under five, the number of women of reproductive age, as well as young and elderly populations. These maps aren’t built using Facebook data, but are instead built by using Facebook’s AI capabilities with satellite imagery and census information.

Movement maps, meanwhile, track aggregate data about Facebook users’ movements via their mobile phones (when location services are enabled). At scale, health partners can combine this with other data to predict where other outbreaks may occur next….(More)”.

Open data could have helped us learn from another mining dam disaster


Paulo A. de Souza Jr. at Nature: “The recent Brumadinho dam disaster in Brazil is an example of infrastructure failure with catastrophic consequences. Over 300 people were reported dead or missing, and nearly 400 more were rescued alive. The environmental impact is massive and difficult to quantify. The frequency of these disasters demonstrates that the current assets for monitoring integrity and generating alerting managers, authorities and the public to ongoing change in tailings are, in many cases, not working as they should. There is also the need for adequate prevention procedures. Monitoring can be perfect, but without timely and appropriate action, it will be useless. Good management therefore requires quality data. Undisputedly, management practices of industrial sites, including audit procedures, must improve, and data and metadata available from preceding accidents should be better used. There is a rich literature available about design, construction, operation, maintenance and decommissioning of tailing facilities. These include guidelines, standards, case studies, technical reports, consultancy and audit practices, and scientific papers. Regulation varies from country to country and in some cases, like Australia and Canada, it is controlled by individual state agencies. There are, however, few datasets available that are shared with the technical and scientific community more globally; particularly for prior incidents. Conspicuously lacking are comprehensive data related to monitoring of large infrastructures such as mining dams.

Today, Scientific Data published a Data Descriptor presenting a dataset obtained from 54 laboratory experiments on the breaching of fluvial dikes because of flow overtopping. (Re)use of such data can help improve our understanding of fundamental processes underpinning industrial infrastructure collapse (e.g., fluvial dike breaching, mining dam failure), and assess the accuracy of numerical models for the prediction of such incidents. This is absolutely essential for better management of floods, mitigation of dam collapses, and similar accidents. The authors propose a framework that could exemplify how data involving similar infrastructure can be stored, shared, published, and reused…(More)”.

When to Use User-Centered Design for Public Policy


Stephen Moilanen at the Stanford Social Innovation Review: “Throughout Barack Obama’s presidency, technology company executives regularly sounded off on what, from their perspective, the administration might do differently. In 2010, Steve Jobs reportedly warned Obama that he likely wouldn’t win reelection, because his administration’s policies disadvantaged businesses like Apple. And in a speech at the 2016 Republican National Convention, Peter Thiel expressed his disapproval of the political establishment by quipping, “Instead of going to Mars, we have invaded the Middle East.”

Against this backdrop, one specific way Silicon Valley has tried to nudge Washington in a new direction is with respect to policy development. Specifically, leading technologists have begun encouraging policy makers to apply user-centered design (otherwise known as design thinking or human-centered design) to the public sector. The thinking goes that if government develops policy with users more squarely in mind, it might accelerate social progress rather than—as has often been the case—stifle it.

At a moment when fewer Americans than ever believe government is meeting their needs, a new approach that elevates the voices of citizens is long overdue. Even so, it would be misguided to view user-centered design as a cure-all for what ails the public sector. The approach holds great promise, but only in a well-defined set of circumstances.

User-Centered Design in the Public Policy Arena

The term “user-centered design” refers simply to a method of building products with an eye toward what users want and need.

To date, the approach has been applied primarily to the domain of for-profit start-ups. In recent months and years, however, supporters of user-centered design have sought to introduce it to other domains. A 2013 article authored by the head of a Danish design consultancy, for example, heralded the fact that “public sector design is on this rise.” And in the recent book Lean Impact, former Google executive and USAID official Ann-Mei Chang made an incisive and compelling case for why the social sector stands to benefit from this approach.

According to this line of thinking, we should be driving toward a world where government designs policy with an eye toward the individuals that stand to benefit from—or that could be hurt by—changes to public policy.

An Imperfect Fit

The merits of user-centered design in this context may seem self-evident. Yet it stands in stark contrast to how public sector leaders typically approach policy development. As leading design thinking theorist Jeanne Liedkta notes in her book Design Thinking for the Greater Good, “Innovation and design are [currently] the domain of experts, policy makers, planners and senior leaders. Everyone else is expected to step away.”

But while user-centered design has much to offer the policy development, it does not map perfectly onto this new territory….(More)”.

San Francisco becomes the first US city to ban facial recognition by government agencies


Colin Lecher at The Verge: “In a first for a city in the United States, San Francisco has voted to ban its government agencies from using facial recognition technology.

The city’s Board of Supervisors voted eight to one to approve the proposal, set to take effect in a month, that would bar city agencies, including law enforcement, from using the tool. The ordinance would also require city agencies to get board approval for their use of surveillance technology, and set up audits of surveillance tech already in use. Other cities have approved similar transparency measures.“

The plan, called the Stop Secret Surveillance Ordinance, was spearheaded by Supervisor Aaron Peskin. In a statement read ahead of the vote, Peskin said it was “an ordinance about having accountability around surveillance technology.”

“This is not an anti-technology policy,” he said, stressing that many tools used by law enforcement are still important to the city’s security. Still, he added, facial recognition is “uniquely dangerous and oppressive.”

The ban comes amid a broader debate over facial recognition, which can be used to rapidly identify people and has triggered new questions about civil liberties. Experts have raised specific concerns about the tools, as studies have demonstrated instances of troubling bias and error rates.

Microsoft, which offers facial recognition tools, has called for some form of regulation for the technology — but how, exactly, to regulate the tool has been contested. Proposals have ranged from light regulation to full moratoriums. Legislation has largely stalled, however.

San Francisco’s decision will inevitably be used as an example as the debate continues and other cities and states decide whether and how to regulate facial recognition. Civil liberties groups like the ACLU of Northern California have already thrown their support behind the San Francisco plan, while law enforcement in the area has pushed back….(More)”.