Data in public health


Jeremy Berg in Science: “In 1854, physician John Snow helped curtail a cholera outbreak in a London neighborhood by mapping cases and identifying a central public water pump as the potential source. This event is considered by many to represent the founding of modern epidemiology. Data and analysis play an increasingly important role in public health today. This can be illustrated by examining the rise in the prevalence of autism spectrum disorders (ASDs), where data from varied sources highlight potential factors while ruling out others, such as childhood vaccines, facilitating wise policy choices…. A collaboration between the research community, a patient advocacy group, and a technology company (www.mss.ng) seeks to sequence the genomes of 10,000 well-phenotyped individuals from families affected by ASD, making the data freely available to researchers. Studies to date have confirmed that the genetics of autism are extremely complicated—a small number of genomic variations are closely associated with ASD, but many other variations have much lower predictive power. More than half of siblings, each of whom has ASD, have different ASD-associated variations. Future studies, facilitated by an open data approach, will no doubt help advance our understanding of this complex disorder….

A new data collection strategy was reported in 2013 to examine contagious diseases across the United States, including the impact of vaccines. Researchers digitized all available city and state notifiable disease data from 1888 to 2011, mostly from hard-copy sources. Information corresponding to nearly 88 million cases has been stored in a database that is open to interested parties without restriction (www.tycho.pitt.edu). Analyses of these data revealed that vaccine development and systematic vaccination programs have led to dramatic reductions in the number of cases. Overall, it is estimated that ∼100 million cases of serious childhood diseases have been prevented through these vaccination programs.

These examples illustrate how data collection and sharing through publication and other innovative means can drive research progress on major public health challenges. Such evidence, particularly on large populations, can help researchers and policy-makers move beyond anecdotes—which can be personally compelling, but often misleading—for the good of individuals and society….(More)”

Chile’s ‘Uber of Recycling’ Is Sparking a Recycling Revolution


Tomas Urbina at Motherboard: “In 2015, after finishing a soccer game in Chile’s capital, Santiago, engineering student Cristián Lara and his friends noticed an older man picking through a dumpster nearby. He was searching for anything that could be recycled, and loading it onto his bike.

“It looked like incredibly hard work,” Lara recalled. After talking to the man, it turns out he had been doing the same work for 10 years, and was still living in poverty.

The encounter gave Lara an idea. What if there was a way to connect the collector on the street directly to the massive waste streams that exist in Chile, and to the companies that pay decent money for recyclables?

“We knew we had to do something,” said 24-year-old Lara. That’s how a recycling app startup, called ReciclApp, was born. The app launched last August. Since then, the bearded young entrepreneur has been on a mission. Standing in their section of an open collaborative workspace on the fifth floor of the luminous new innovation centre at Santiago’s Catholic University, Lara let his glee shine through in his elevator pitch for the app.

“It’s the Uber of recycling,” he said.

It works like this: individuals, businesses, and institutions download the free app. Once they have cans, boxes or bottles to get rid of, they declare specific numbers in the app and choose a date and time period for pickup. From that data, the company creates and prints out routes for the collectors they work with. There are now an average of 200 collectors working with ReciclApp across Chile, and about 1,000 app users in the country.

For collectors, it’s an efficient route with guaranteed recyclables, and they keep all the money they make. Lara’s team cuts out the middleman transporters who would previously take the material to large recycling companies. ReciclApp even has designated storage centres where collectors can leave material before a truck from large recyclers shows up….

Lara estimates that there are about 100,000 people trying to earn money from recycling in Chile. Those that work with ReciclApp have more than doubled their recycling earnings on average from about $100 USD per month to $250 USD. But even that, Lara admitted, is a small gain when you consider Chile’s high cost of living….

ReciclApp intends to change that. “We’re going to start hiring waste collectors, so they’ll have a set wage, a schedule, and can earn extra income based on how much they collect and how many homes or businesses they visit,” said ReciclApp’s director of operations, 25-year-old Manuel Fonseca….

For Fuentes, 40, the biggest improvement is how she’s treated. “Families value us as workers now, not as the lady who asks for donations and picks through the garbage,” she said. “We spent too many years hidden in the shadows. I feel different now. I’m not embarrassed of my work the way I used to be.”….(More)”

Using Algorithms To Predict Gentrification


Tanvi Misra in CityLab: “I know it when I see it,” is as true for gentrification as it is for pornography. Usually, it’s when a neighborhood’s property values and demographics are already changing that the worries about displacement set in—rousing housing advocates and community organizers to action. But by that time, it’s often hard to pause, and put in safeguards for the neighborhood’s most vulnerable residents.

But what if there was an early warning system that detects where price appreciation or decline is about to occur? Predictive tools like this have been developed around the country, most notably by researchers in San Francisco. And their value is clear: city leaders and non-profits pinpoint where to preserve existing affordable housing, where to build more, and where to attract business investment ahead of time. But they’re often too academic or too obscure, which is why it’s not yet clear how they’re being used by policymakers and planners.

That’s the problem Ken Steif, at the University of Pennsylvania, is working to solve, in partnership with Alan Mallach, from the Center for Community Progress.

Mallach’s non-profit focused on revitalizing distressed neighborhoods, particularly in “legacy cities.” These are towns like St. Louis, Flint, Dayton, and Baltimore, that have experienced population loss and economic contraction in recent years, and suffer from property vacancies, blight, and unemployment. Mallach is interested in understanding which neighborhoods are likely to continue down that path, and which ones will do a 180-degree turn. Right now, he can intuitively make those predictions, based on his observations on neighborhood characteristics like housing stock, median income, and race. But an objective assessment can help confirm or deny his hypotheses.

That’s where Steif comes in. Having consulted with cities and non-profits on place-based data analytics, Steif has developed a number of algorithms that predict the movement of housing markets using expensive private data from entities like Zillow. Mallach suggested he try his algorithms on Census data, which is free and standardized.

The phenomenon he tested was  ‘endogenous gentrification’—this idea that an increase in home prices moves from wealthy neighborhoods to less expensive ones in its vicinity, like a wave. ..Steif used Census data from 1990 and 2000 to predict housing price change in 2010 in 29 big and small legacy cities. His algorithms took into account the relationship between the median home prices of a census tract to the ones around it, the proximity of census tracts to high-cost areas, and the spatial patterns in home price distribution. It also folded in variables like race, income and housing supply, among others.

After cross-checking the 2010 prediction with actual home prices, he projected the neighborhood change all the way to 2020. His algorithms were able to compute the speed and breadth of the wave of gentrification over time reasonably well, overall…(More)”.

Why Big Data Is a Big Deal for Cities


John M. Kamensky in Governing: “We hear a lot about “big data” and its potential value to government. But is it really fulfilling the high expectations that advocates have assigned to it? Is it really producing better public-sector decisions? It may be years before we have definitive answers to those questions, but new research suggests that it’s worth paying a lot of attention to.

University of Kansas Prof. Alfred Ho recently surveyed 65 mid-size and large cities to learn what is going on, on the front line, with the use of big data in making decisions. He found that big data has made it possible to “change the time span of a decision-making cycle by allowing real-time analysis of data to instantly inform decision-making.” This decision-making occurs in areas as diverse as program management, strategic planning, budgeting, performance reporting and citizen engagement.

Cities are natural repositories of big data that can be integrated and analyzed for policy- and program-management purposes. These repositories include data from public safety, education, health and social services, environment and energy, culture and recreation, and community and business development. They include both structured data, such as financial and tax transactions, and unstructured data, such as recorded sounds from gunshots and videos of pedestrian movement patterns. And they include data supplied by the public, such as the Boston residents who use a phone app to measure road quality and report problems.

These data repositories, Ho writes, are “fundamental building blocks,” but the challenge is to shift the ownership of data from separate departments to an integrated platform where the data can be shared.

There’s plenty of evidence that cities are moving in that direction and that they already are systematically using big data to make operational decisions. Among the 65 cities that Ho examined, he found that 49 have “some form of data analytics initiatives or projects” and that 30 have established “a multi-departmental team structure to do strategic planning for these data initiatives.”….The effective use of big data can lead to dialogs that cut across school-district, city, county, business and nonprofit-sector boundaries. But more importantly, it provides city leaders with the capacity to respond to citizens’ concerns more quickly and effectively….(More)”

Organizational crowdsourcing


Jeremy Morgan at Lippincott: “One of the most consequential insights from the study of organizational culture happens to have an almost irresistible grounding in basic common sense. When attempting to solve the challenges of today’s businesses, inviting a broad slice of an employee population yields more creative, actionable solutions than restricting the conversation to a small strategy or leadership team.

This recognition, that in order to uncover new business ideas and innovations, organizations must foster listening cultures and a meritocracy of best thinking, is fueling interest in organizational crowdsourcing — a discipline focused on employee connection, collaboration and ideation. Leaders at companies such as Roche, Bank of the West, Merck, Facebook and IBM, along with countless Silicon Valley companies for whom the “hackathon” is a major cultural event, have embraced employee crowdsourcing as a way to unlock organizational knowledge and promote empathy through technology.

The benefits of internal crowdsourcing are clear. First, it ensures that a company’s understanding of key change drivers and potential strategic priorities is grounded in the organization’s everyday reality and not abstract hypotheses developed by a team of strategists. Second, employees inherently believe in and want to own the implementation of ideas that they generate through crowdsourcing. These are ideas borne of the culture for the culture, and are less likely to run aground on the rocks of employee indifference….

How can this be achieved through organizational crowdsourcing?

There is no out-of-the-box solution. Each campaign has to organically surface areas of focus for further inquiries, develop a framework and set of questions to guide participation and ignite conversations, and then analyze and communicate results in a way that helps bring solutions to life. But there are some key principles that will maximize the success of any crowdsourcing effort.

Obtaining insightful and actionable answers boils down to asking the questions at just the right altitude. If they’re too high up, too broad and open-ended, the usefulness of the feedback will suffer. If the questions are too broad — “How can we make our workplace better?” — you will likely hear responses like “juice bars” and “massage therapists.” If the questions are too narrow — “What kind of lighting do we need in our conference rooms?” — you limit the opportunity of people to use their creativity. However, the answers are likely to spark a conversation if people are asked, “How can we create spaces that allow us to generate ideas more effectively?” Conversation will flow to discussion of breaking down physical barriers in office design, building social “hubs” and investing in live events that allow employees from disparate geographies to meet in person and solve problems together.

On the technology side, crowdsourcing platforms such as Jive Software and UserVoice, among others, make it easy to bring large numbers of employees together to gather, build upon and prioritize new ideas and innovation efforts, from process simplification and product development to the transformation of customer experiences. Respondents can vote on other people’s suggestions and add comments.

By facilitating targeted conversations across times zones, geographies and corporate functions, crowdsourcing makes possible a new way of listening: of harnessing an organization’s collective wisdom to achieve action by a united and inspired employee population. It’s amazing to see the thoughtfulness, precision and energy unleashed by crowdsourcing efforts. People genuinely want to contribute to their company’s success if you open the doors and let them.

Taking a page from the Silicon Valley hackathon, organizational crowdsourcing campaigns are structured as events of limited duration focused on a specific challenge or business problem….(More)”

Dumpster diving made easier with food donation points


Springwise: “With food waste a substantial contributor to both environmental and social problems, communities around the world are trying to find ways to make better use of leftovers as well as reduce the overall production of unused foodstuffs. One of the biggest challenges in getting leftovers to the people who need them is the logistics of finding and connecting the relevant groups and transporting the food. Several on-demand apps, like this one that matches homeless shelters with companies that have leftover food, are taking the guesswork out of what to do with available food. And retailers are getting smarter, like this one in the United States, now selling produce that would previously have been rejected for aesthetic reasons only.

In Brazil, the Makers Society collective designed a campaign called Prato de Rua (Street Dish) to help link people in possession of edible leftovers with community members in need. The campaign centers around a sticker that is affixed to the side of city dumpsters requesting that donated food be left at the specific points. By providing a more organized approach to getting rid of leftover food, the collective hopes to help people think more carefully about what they are getting rid of and why. At the same time, the initiative helps people who would otherwise be forced to go through the contents of a dumpster for edible remains, access good food more safely and swiftly.

The campaign sticker is available for download for communities globally to take on and adapt the idea….(More)”

Unconscious gender bias in the Google algorithm


Interview in Metode with Londa Schiebinger, director of Gendered Innovations: “We were interested, because the methods of sex and gender analysis are not in the university curriculum, yet it is very important. The first thing our group did was to develop those methods and we present twelve methods on the website. We knew it would be very important to create case studies or concrete examples where sex and gender analysis added something new to the research. One of my favorite examples is machine translation. If you look at Google Translate, which is the main one in the United States – SYSTRAN is the main one in Europe – we found that it defaults the masculine pronoun. So does SYSTRAN. If I put an article about myself into Google Translate, it defaults to «he said» instead of «she said». So, in an article of one of my visits to Spain, it defaults to «he thinks, he says…» and, occasionally, «it wrote». We wondered why this happened and we found out, because Google Translate works on an algorithm, the problem is that «he said» appears on the web four times more than «she said», so the machine gets it right if it chooses «he said». Because the algorithm is just set up for that. But, anyway, we found that there was a huge change in English language from 1968 to the current time, and the proportion of «he said» and «she said» changed from 4-to-1 to 2-to-1. But, still, the translation does not take this into account. So we went to Google and we said «Hey, what is going on?» and they said «Oh, wow, we didn’t know, we had no idea!». So what we recognized is that there is an unconscious gender bias in the Google algorithm. They did not intend to do this at all, so now there are a lot of people who are trying to fix it….

How can you fix that?

Oh, well, this is the thing! …I think algorithms in general are a problem because if there is any kind of unconscious bias in the data, the algorithm just returns that to you. So even though Google has policies, company policies, to support gender equality, they had an unconscious bias in their product and they do not mean to. Now that they know about it, they can try to fix it….(More)”

Big data may be reinforcing racial bias in the criminal justice system


Laurel Eckhouse at the Washington Post: “Big data has expanded to the criminal justice system. In Los Angeles, police use computerized “predictive policing” to anticipate crimes and allocate officers. In Fort Lauderdale, Fla., machine-learning algorithms are used to set bond amounts. In states across the country, data-driven estimates of the risk of recidivism are being used to set jail sentences.

Advocates say these data-driven tools remove human bias from the system, making it more fair as well as more effective. But even as they have become widespread, we have little information about exactly how they work. Few of the organizations producing them have released the data and algorithms they use to determine risk.

 We need to know more, because it’s clear that such systems face a fundamental problem: The data they rely on are collected by a criminal justice system in which race makes a big difference in the probability of arrest — even for people who behave identically. Inputs derived from biased policing will inevitably make black and Latino defendants look riskier than white defendants to a computer. As a result, data-driven decision-making risks exacerbating, rather than eliminating, racial bias in criminal justice.
Consider a judge tasked with making a decision about bail for two defendants, one black and one white. Our two defendants have behaved in exactly the same way prior to their arrest: They used drugs in the same amount, have committed the same traffic offenses, owned similar homes and took their two children to the same school every morning. But the criminal justice algorithms do not rely on all of a defendant’s prior actions to reach a bail assessment — just those actions for which he or she has been previously arrested and convicted. Because of racial biases in arrest and conviction rates, the black defendant is more likely to have a prior conviction than the white one, despite identical conduct. A risk assessment relying on racially compromised criminal-history data will unfairly rate the black defendant as riskier than the white defendant.

To make matters worse, risk-assessment tools typically evaluate their success in predicting a defendant’s dangerousness on rearrests — not on defendants’ overall behavior after release. If our two defendants return to the same neighborhood and continue their identical lives, the black defendant is more likely to be arrested. Thus, the tool will falsely appear to predict dangerousness effectively, because the entire process is circular: Racial disparities in arrests bias both the predictions and the justification for those predictions.

We know that a black person and a white person are not equally likely to be stopped by police: Evidence on New York’s stop-and-frisk policy, investigatory stops, vehicle searches and drug arrests show that black and Latino civilians are more likely to be stopped, searched and arrested than whites. In 2012, a white attorney spent days trying to get himself arrested in Brooklyn for carrying graffiti stencils and spray paint, a Class B misdemeanor. Even when police saw him tagging the City Hall gateposts, they sped past him, ignoring a crime for which 3,598 people were arrested by the New York Police Department the following year.

Before adopting risk-assessment tools in the judicial decision-making process, jurisdictions should demand that any tool being implemented undergo a thorough and independent peer-review process. We need more transparencyand better data to learn whether these risk assessments have disparate impacts on defendants of different races. Foundations and organizations developing risk-assessment tools should be willing to release the data used to build these tools to researchers to evaluate their techniques for internal racial bias and problems of statistical interpretation. Even better, with multiple sources of data, researchers could identify biases in data generated by the criminal justice system before the data is used to make decisions about liberty. Unfortunately, producers of risk-assessment tools — even nonprofit organizations — have not voluntarily released anonymized data and computational details to other researchers, as is now standard in quantitative social science research….(More)”.

How to Do Social Science Without Data


Neil Gross in the New York Times: With the death last month of the sociologist Zygmunt Bauman at age 91, the intellectual world lost a thinker of rare insight and range. Because his style of work was radically different from that of most social scientists in the United States today, his passing is an occasion to consider what might be gained if more members of our profession were to follow his example….

Weber saw bureaucracies as powerful, but dispiritingly impersonal. Mr. Bauman amended this: Bureaucracy can be inhuman. Bureaucratic structures had deadened the moral sense of ordinary German soldiers, he contended, which made the Holocaust possible. They could tell themselves they were just doing their job and following orders.

Later, Mr. Bauman turned his scholarly attention to the postwar and late-20th-century worlds, where the nature and role of all-encompassing institutions were again his focal point. Craving stability after the war, he argued, people had set up such institutions to direct their lives — more benign versions of Weber’s bureaucracy. You could go to work for a company at a young age and know that it would be a sheltering umbrella for you until you retired. Governments kept the peace and helped those who couldn’t help themselves. Marriages were formed through community ties and were expected to last.

But by the end of the century, under pressure from various sources, those institutions were withering. Economically, global trade had expanded, while in Europe and North America manufacturing went into decline; job security vanished. Politically, too, changes were afoot: The Cold War drew to an end, Europe integrated and politicians trimmed back the welfare state. Culturally, consumerism seemed to pervade everything. Mr. Bauman noted major shifts in love and intimacy as well, including a growing belief in the contingency of marriage and — eventually — the popularity of online dating.

In Mr. Bauman’s view, it all connected. He argued we were witnessing a transition from the “solid modernity” of the mid-20th century to the “liquid modernity” of today. Life had become freer, more fluid and a lot more risky. In principle, contemporary workers could change jobs whenever they got bored. They could relocate abroad or reinvent themselves through shopping. They could find new sexual partners with the push of a button. But there was little continuity.

Mr. Bauman considered the implications. Some thrived in this new atmosphere; the institutions and norms previously in place could be stultifying, oppressive. But could a transient work force come together to fight for a more equitable distribution of resources? Could shopping-obsessed consumers return to the task of being responsible, engaged citizens? Could intimate partners motivated by short-term desire ever learn the value of commitment?…(More)”

State of Open Corporate Data: Wins and Challenges Ahead


Sunlight Foundation: “For many people working to open data and reduce corruption, the past year could be summed up in two words: “Panama Papers.” The transcontinental investigation by a team from International Center of Investigative Journalists (ICIJ) blew open the murky world of offshore company registration. It put corporate transparency high on the agenda of countries all around the world and helped lead to some notable advances in access to official company register data….

While most companies are created and operated for legitimate economic activity,  there is a small percentage that aren’t. Entities involved in corruption, money laundering, fraud and tax evasion frequently use such companies as vehicles for their criminal activity. “The Idiot’s Guide to Money Laundering from Global Witness” shows how easy it is to use layer after layer of shell companies to hide the identity of the person who controls and benefits from the activities of the network. The World Bank’s “Puppet Masters” report found that over 70% of grand corruption cases, in fact, involved the use of offshore vehicles.

For years, OpenCorporates has advocated for company information to be in the public domain as open data, so it is usable and comparable.  It was the public reaction to Panama Papers, however, that made it clear that due diligence requires global data sets and beneficial registries are key for integrity and progress.

The call for accountability and action was clear from the aftermath of the leak. ICIJ, the journalists involved and advocates have called for tougher action on prosecutions and more transparency measures: open corporate registers and beneficial ownership registers. A series of workshops organized by the B20 showed that business also needed public beneficial ownership registers….

Last year the UK became the first country in the world to collect and publish who controls and benefits from companies in a structured format, and as open data. Just a few days later, we were able to add the information in OpenCorporates. The UK data, therefore, is one of a kind, and has been highly anticipated by transparency skeptics and advocates advocates alike. So fa,r things are looking good. 15 other countries have committed to having a public beneficial ownership register including Nigeria, Afghanistan, Germany, Indonesia, New Zealand and Norway. Denmark has announced its first public beneficial ownership data will be published in June 2017. It’s likely to be open data.

This progress isn’t limited to beneficial ownership. It is also being seen in the opening up of corporate registers . These are what OpenCorporates calls “core company data”. In 2016, more countries started releasing company register as open data, including Japan, with over 4.4 million companies, IsraelVirginiaSloveniaTexas, Singapore and Bulgaria. We’ve also had a great start to 2017 , with France publishing their central company database as open data on January 5th.

As more states have embracing open data, the USA jumped from average score of 19/100 to 30/100. Singapore rose from 0 to 20. The Slovak Republic from 20 to 40. Bulgaria wet from 35 to 90.  Japan rose from 0 to 70 — the biggest increase of the year….(More)”