The risks of relying on robots for fairer staff recruitment


Sarah O’Connor at the Financial Times: “Robots are not just taking people’s jobs away, they are beginning to hand them out, too. Go to any recruitment industry event and you will find the air is thick with terms like “machine learning”, “big data” and “predictive analytics”.

The argument for using these tools in recruitment is simple. Robo-recruiters can sift through thousands of job candidates far more efficiently than humans. They can also do it more fairly. Since they do not harbour conscious or unconscious human biases, they will recruit a more diverse and meritocratic workforce.

This is a seductive idea but it is also dangerous. Algorithms are not inherently neutral just because they see the world in zeros and ones.

For a start, any machine learning algorithm is only as good as the training data from which it learns. Take the PhD thesis of academic researcher Colin Lee, released to the press this year. He analysed data on the success or failure of 441,769 job applications and built a model that could predict with 70 to 80 per cent accuracy which candidates would be invited to interview. The press release plugged this algorithm as a potential tool to screen a large number of CVs while avoiding “human error and unconscious bias”.

But a model like this would absorb any human biases at work in the original recruitment decisions. For example, the research found that age was the biggest predictor of being invited to interview, with the youngest and the oldest applicants least likely to be successful. You might think it fair enough that inexperienced youngsters do badly, but the routine rejection of older candidates seems like something to investigate rather than codify and perpetuate. Mr Lee acknowledges these problems and suggests it would be better to strip the CVs of attributes such as gender, age and ethnicity before using them….(More)”

Technology can boost active citizenship – if it’s chosen well


In Taiwan, for instance, tech activists have built online databases to track political contributions and create channels for public participation in parliamentary debates. In South Africa, anti-corruption organisation Corruption Watch has used online and mobile platforms to gather public votes for Public Protector candidates.

But research I recently completed with partners in Africa and Europe suggests that few of these organisations may be choosing the right technological tools to make their initiatives work.

We interviewed people in Kenya and South Africa who are responsible for choosing technologies when implementing transparency and accountability initiatives. In many cases, they’re not choosing their tech well. They often only recognised in retrospect how important their technology choices were. Most would have chosen differently if they were put in the same position again.

Our findings challenge a common mantra which holds that technological failures are usually caused by people or strategies rather than technologies. It’s certainly true that human agency matters. However powerful technologies may seem, choices are made by people – not the machines they invent. But our research supports the idea that technology isn’t neutral. It suggests that sometimes the problem really is the tech….

So what should those working in civic technology do about improving tool selection? From our research, we developed six “rules” for better tool choices. These are:

  • first work out what you don’t know;
  • think twice before building a new tool;
  • get a second opinion;
  • try it before you buy it;
  • plan for failure; and
  • share what you learn.

Possibly the most important of these recommendations is to try or “trial” technologies before making a final selection. This might seem obvious. But it was rarely done in our sample….(More)”

Data and Democracy


(Free) book by Andrew Therriault:  “The 2016 US elections will be remembered for many things, but for those who work in politics, 2016 may be best remembered as the year that the use of data in politics reached its maturity. Through a collection of essays from leading experts in the field, this report explores how political data science helps to drive everything from overall strategy and messaging to individual voter contacts and advertising.

Curated by Andrew Therriault, former Director of Data Science for the Democratic National Committee, this illuminating report includes first-hand accounts from Democrats, Republicans, and members of the media. Tech-savvy readers will get a comprehensive account of how data analysis has prevailed over political instinct and experience and examples of the challenges these practitioners face.

Essays include:

  • The Role of Data in Campaigns—Andrew Therriault, former Director of Data Science for the Democratic National Committee
  • Essentials of Modeling and Microtargeting—Dan Castleman, cofounder and Director of Analytics at Clarity Campaign Labs, a leading modeler in Democratic politics
  • Data Management for Political Campaigns—Audra Grassia, Deputy Political Director for the Democratic Governors Association in 2014
  • How Technology Is Changing the Polling Industry—Patrick Ruffini, cofounder of Echelon Insights and Founder/Chairman of Engage, was a digital strategist for President Bush in 2004 and for the Republican National Committee in 2006
  • Data-Driven Media Optimization—Alex Lundry, cofounder and Chief Data Scientist at Deep Root Analytics, a leading expert on media and voter analytics, electoral targeting, and political data mining
  • How (and Why) to Follow the Money in Politics—Derek Willis, ProPublica’s news applications developer, formerly with The New York Times
  • Digital Advertising in the Post-Obama Era—Daniel Scarvalone, Associate Director of Research and Data at Bully Pulpit Interactive (BPI), a digital marketer for the Democratic party
  • Election Forecasting in the Media—Natalie Jackson, Senior Polling Editor atThe Huffington Post…(More)”

White House, Transportation Dept. want help using open data to prevent traffic crashes


Samantha Ehlinger in FedScoop: “The Transportation Department is looking for public input on how to better interpret and use data on fatal crashes after 2015 data revealed a startling spike of 7.2 percent more deaths in traffic accidents that year.

Looking for new solutions that could prevent more deaths on the roads, the department released three months earlier than usual the 2015 open dataset about each fatal crash. With it, the department and the White House announced a call to action for people to use the data set as a jumping off point for a dialogue on how to prevent crashes, as well as understand what might be causing the spike.

“What we’re ultimately looking for is getting more people engaged in the data … matching this with other publicly available data, or data that the private sector might be willing to make available, to dive in and to tell these stories,” said Bryan Thomas, communications director for the National Highway Traffic Safety Administration, to FedScoop.

One striking statistic was that “pedestrian and pedalcyclist fatalities increased to a level not seen in 20 years,” according to a DOT press release. …

“We want folks to be engaged directly with our own data scientists, so we can help people through the dataset and help answer their questions as they work their way through, bounce ideas off of us, etc.,” Thomas said. “We really want to be accessible in that way.”

He added that as ideas “come to fruition,” there will be opportunities to present what people have learned.

“It’s a very, very rich data set, there’s a lot of information there,” Thomas said. “Our own ability is, frankly, limited to investigate all of the questions that you might have of it. And so we want to get the public really diving in as well.”…

Here are the questions “worth exploring,” according to the call to action:

  • How might improving economic conditions around the country change how Americans are getting around? What models can we develop to identify communities that might be at a higher risk for fatal crashes?
  • How might climate change increase the risk of fatal crashes in a community?
  • How might we use studies of attitudes toward speeding, distracted driving, and seat belt use to better target marketing and behavioral change campaigns?
  • How might we monitor public health indicators and behavior risk indicators to target communities that might have a high prevalence of behaviors linked with fatal crashes (drinking, drug use/addiction, etc.)? What countermeasures should we create to address these issues?”…(More)”

Everyday ‘Placebo Buttons’ Create Semblance of Control


crosswalk buttons
Crosswalk buttons by Peter Kazanjy

Each of these seemingly disconnected everyday buttons you pressed may have something in common: it is quite possible that none of them did a thing to influence the world around you. Any perceived impact may simply have been imaginary, a placebo effect giving you the illusion of control.

In the early 2000s, New York City transportation officials finally admitted what many had suspected: the majority of crosswalk buttons in the city are completely disconnected from the traffic light system. Thousands of these initially worked to request a signal change but most no longer do anything, even if their signage suggests otherwise.

Naturally, a number of street art projects have popped up around the humorous futility of pedestrians pressing placebo buttons:

Crosswalk buttons were originally introduced to NYC during the 1960s. At the time, there was less congestion and it made sense to leave green lights on for major thoroughfares until cross traffic came along … or until a pedestrian wanting to cross the street pushed a button.

Today, a combination of carefully orchestrated automation and higher traffic has made most of these buttons obsolete. Citywide, there are around 100 crosswalk buttons that still work in NYC but close to 1,000 more that do nothing at all. So why not take them down? Removing the remaining nonfunctional buttons would cost the city millions, a potential waste of already limited funds for civic infrastructure….(More)”

Democracy Is Getting A Reboot On The Blockchain


Adele Peters in FastCoExist: “In 2013, a group of activists in Buenos Aires attempted an experiment in what they called hacking democracy. Representatives from their new political party would promise to always vote on issues according to the will of citizens online. Using a digital platform, people could tell the legislator what to support, in a hybrid of a direct democracy and representation.

With 1.2% of the vote, the candidate they ran for a seat on the city council didn’t win. But the open-source platform they created for letting citizens vote, called Democracy OS, started getting attention around the world. In Buenos Aires, the government tried using it to get citizen feedback on local issues. Then, when the party attempted to run a candidate a second time, something happened that made them shift course. They were told they’d have to bribe a federal judge to participate.

“When you see that kind of corruption that you think happens in House of Cards—and you suddenly realize that House of Cards is happening all around you—it’s a very shocking thing,” says Santiago Siri, a programmer and one of the founders of the party, called Partido de la Red, or the Net Party. Siri started thinking about how technology could solve the fundamental problem of corruption—and about how democracy should work in the digital age.

The idea morphed into a Y Combinator-backed nonprofit called Democracy Earth Foundation. As the website explains:

The Internet transformed how we share culture, work together—and even fall in love—but governance has remained unchanged for over 200 years. With the rise of open-source software and peer-to-peer networks, political intermediation is no longer necessary. We are building a protocol with smart contracts that allows decentralized governance for any kind of organization.

Their new platform, which the team is working on now as part of the Fast Forward accelerator for tech nonprofits, starts by granting incorruptible identities to each citizen, and then records votes in a similarly incorruptible way.

“If you know anything about democracy, one of the simplest ways of subverting democracy is by faking identity,” says Siri. “This is about opening up the black box that can corrupt the system. In a democracy, that black box is who gets to count the votes, who gets to validate the identities that have the right to vote.”

While some experts argue that Internet voting isn’t secure enough to use yet, Democracy Earth’s new platform uses the blockchain—a decentralized, public ledger that uses encryption. Rather than recording votes in one place, everyone’s votes are recorded across a network of thousands of computers. The system can also validate identities in the same decentralized way….(More)”.

How Big Data Analytics is Changing Legal Ethics


Renee Knake at Bloomberg Law: “Big data analytics are changing how lawyers find clients, conduct legal research and discovery, draft contracts and court papers, manage billing and performance, predict the outcome of a matter, select juries, and more. Ninety percent of corporate legal departments, law firms, and government lawyers note that data analytics are applied in their organizations, albeit in limited ways, according to a 2015 survey. The Legal Services Corporation, the largest funder of civil legal aid for low-income individuals in the United States, recommended in 2012 that all states collect and assess data on case progress/outcomes to improve the delivery of legal services. Lawyers across all sectors of the market increasingly recognize how big data tools can enhance their work.

A growing literature advocates for businesses and governmental bodies to adopt data ethics policies, and many have done so. It is not uncommon to find data-use policies prominently displayed on company or government websites, or required a part of a click-through consent before gaining access to a mobile app or webpage. Data ethics guidelines can help avoid controversies, especially when analytics are used in potentially manipulative or exploitive ways. Consider, for example, Target’s data analytics that uncovered a teen’s pregnancy before her father did, or Orbitz’s data analytics offered pricier hotels to Mac users. These are just two of numerous examples in recent years where companies faced criticism for how they used data analytics.

While some law firms and legal services organizations follow data-use policies or codes of conduct, many do not. Perhaps this is because the legal profession was not transformed as early or rapidly as other industries, or because until now, big data in legal was largely limited to e-discovery, where the data use is confined to the litigation and is subject to judicial oversight. Another reason may be that lawyers believe their rules of professional conduct provide sufficient guidance and protection. Unlike other industries, lawyers are governed by a special code of ethical obligations to clients, the justice system, and the public. In most states, this code is based in part upon the American Bar Association (ABA) Model Rules of Professional Conduct, though rules often vary from jurisdiction to jurisdiction. Several of the Model Rules are relevant to big data use. That said, the Model Rules are insufficient for addressing a number of fundamental ethical concerns.

At the moment, legal ethics for big data analytics is at best an incomplete mix of professional conduct rules and informal policies adopted by some, but not all law practices. Given the increasing prevalence of data analytics in legal services, lawyers and law students should be familiar not only with the relevant professional conduct rules, but also the ethical questions left unanswered. Listed below is a brief summary of both, followed by a proposed legal ethics agenda for data analytics. …

Questions Unanswered by Lawyer Ethics Rules 

Access/Ownership. Who owns the original data — the individual source or the holder of the pooled information? Who owns the insights drawn from its analysis? Who should receive access to the data compilation and the results?

Anonymity/Identity. Should all personally identifiable or sensitive information be removed from the data? What protections are necessary to respect individual autonomy? How should individuals be able to control and shape their electronic identity?

Consent. Should individuals affirmatively consent to use of their personal data? Or is it sufficient to provide notice, perhaps with an opt-out provision?

Privacy/Security. Should privacy be protected beyond the professional obligation of client confidentiality? How should data be secured? The ABA called upon private and public sector lawyers to implement cyber-security policies, including data use, in a 2012resolution and produced a cyber-security handbook in 2013.

Process. How involved should lawyers be in the process of data collection and analysis? In the context of e-discovery, for example, a lawyer is expected to understand how documents are collected, produced, and preserved, or to work with a specialist. Should a similar level of knowledge be required for all forms of data analytics use?

Purpose. Why was the data first collected from individuals? What is the purpose for the current use? Is there a significant divergence between the original and secondary purposes? If so, is it necessary for the individuals to consent to the secondary purpose? How will unintended consequences be addressed?

Source. What is the source of the data? Did the lawyer collect it directly from clients, or is the lawyer relying upon a third-party source? Client-based data is, of course, subject to the lawyer’s professional conduct rules. Data from any source should be trustworthy, reasonable, timely, complete, and verifiable….(More)”

Why Zika, Malaria and Ebola should fear analytics


Frédéric Pivetta at Real Impact Analytics:Big data is a hot business topic. It turns out to be an equally hot topic for the non profit sector now that we know the vital role analytics can play in addressing public health issues and reaching sustainable development goals.

Big players like IBM just announced they will help fight Zika by analyzing social media, transportation and weather data, among other indicators. Telecom data takes it further by helping to predict the spread of disease, identifying isolated and fragile communities and prioritizing the actions of aid workers.

The power of telecom data

Human mobility contributes significantly to epidemic transmission into new regions. However, there are gaps in understanding human mobility due to the limited and often outdated data available from travel records. In some countries, these are collected by health officials in the hospitals or in occasional surveys.

Telecom data, constantly updated and covering a large portion of the population, is rich in terms of mobility insights. But there are other benefits:

  • it’s recorded automatically (in the Call Detail Records, or CDRs), so that we avoid data collection and response bias.
  • it contains localization and time information, which is great for understanding human mobility.
  • it contains info on connectivity between people, which helps understanding social networks.
  • it contains info on phone spending, which allows tracking of socio-economic indicators.

Aggregated and anonymized, mobile telecom data fills the public data gap without questioning privacy issues. Mixing it with other public data sources results in a very precise and reliable view on human mobility patterns, which is key for preventing epidemic spreads.

Using telecom data to map epidemic risk flows

So how does it work? As in any other big data application, the challenge is to build the right predictive model, allowing decision-makers to take the most appropriate actions. In the case of epidemic transmission, the methodology typically includes five steps :

  • Identify mobility patterns relevant for each particular disease. For example, short-term trips for fast-spreading diseases like Ebola. Or overnight trips for diseases like Malaria, as it spreads by mosquitoes that are active only at night. Such patterns can be deduced from the CDRs: we can actually find the home location of each user by looking at the most active night tower, and then tracking calls to identify short or long-term trips. Aggregating data per origin-destination pairs is useful as we look at intercity or interregional transmission flows. And it protects the privacy of individuals, as no one can be singled out from the aggregated data.
  • Get data on epidemic incidence, typically from local organisations like national healthcare systems or, in case of emergency, from NGOs or dedicated emergency teams. This data should be aggregated on the same level of granularity than CDRs.
  • Knowing how many travelers go from one place to another, for how long, and the disease incidence at origin and destination, build an epidemiological model that can account for the way and speed of transmission of the particular disease.
  • With an import/export scoring model, map epidemic risk flows and flag areas that are at risk of becoming the new hotspots because of human travel.
  • On that base, prioritize and monitor public health measures, focusing on restraining mobility to and from hotspots. Mapping risk also allows launching prevention campaigns at the right places and setting up the necessary infrastructure on time. Eventually, the tool reduces public health risks and helps stem the epidemic.

That kind of application works in a variety of epidemiological contexts, including Zika, Ebola, Malaria, Influenza or Tuberculosis. No doubt the global boom of mobile data will proof extraordinarily helpful in fighting these fierce enemies….(More)”

Legal confusion threatens to slow data science


Simon Oxenham in Nature: “Knowledge from millions of biological studies encoded into one network — that is Daniel Himmelstein’s alluring description of Hetionet, a free online resource that melds data from 28 public sources on links between drugs, genes and diseases. But for a product built on public information, obtaining legal permissions has been surprisingly tough.

Menche rapidly gave consent — but not everyone was so helpful. One research group never replied to Himmelstein, and three replied without clearing up the legal confusion. Ultimately, Himmelstein published the final version of Hetionet in July — minus one data set whose licence forbids redistribution, but including the three that he still lacks clear permission to republish. The tangle shows that many researchers don’t understand that simply posting a data set publicly doesn’t mean others can legally republish it, says Himmelstein.

The confusion has the power to slow down science, he says, because researchers will be discouraged from combining data sets into more useful resources. It will also become increasingly problematic as scientists publish more information online. “Science is becoming more and more dependent on reusing data,” Himmelstein says….

Himmelstein is not convinced that he is legally in the clear — and feels that such ­uncertainty may deter other scientists from reproducing academic data. If a researcher launches a commercial product that is based on public data sets, he adds, the stakes of not having clear licensing are likely to rise. “I think these are largely untested waters, and most ­academics aren’t in the position to risk ­setting off a legal battle that will help clarify these issues,” he says….(More)”

Metric Power


Book by David Beer: This book examines the powerful and intensifying role that metrics play in ordering and shaping our everyday lives. Focusing upon the interconnections between measurement, circulation and possibility, the author explores the interwoven relations between power and metrics. He draws upon a wide-range of interdisciplinary resources to place these metrics within their broader historical, political and social contexts. More specifically, he illuminates the various ways that metrics implicate our lives – from our work, to our consumption and our leisure, through to our bodily routines and the financial and organisational structures that surround us. Unravelling the power dynamics that underpin and reside within the so-called big data revolution, he develops the central concept of Metric Power along with a set of conceptual resources for thinking critically about the powerful role played by metrics in the social world today….(More)”