How to Crowdsource the Syrian Cease-Fire


Colum Lynch at Foreign Policy: “Can the wizards of Silicon Valley develop a set of killer apps to monitor the fragile Syria cease-fire without putting foreign boots on the ground in one of the world’s most dangerous countries?

They’re certainly going to try. The “cessation of hostilities” in Syria brokered by the United States and Russia last month has sharply reduced the levels of violence in the war-torn country and sparked a rare burst of optimism that it could lead to a broader cease-fire. But if the two sides lay down their weapons, the international community will face the challenge of monitoring the battlefield to ensure compliance without deploying peacekeepers or foreign troops. The emerging solution: using crowdsourcing, drones, satellite imaging, and other high-tech tools.

The high-level interest in finding a technological solution to the monitoring challenge was on full display last month at a closed-door meeting convened by the White House that brought together U.N. officials, diplomats, digital cartographers, and representatives of Google, DigitalGlobe, and other technology companies. Their assignment was to brainstorm ways of using high-tech tools to keep track of any future cease-fires from Syria to Libya and Yemen.

The off-the-record event came as the United States, the U.N., and other key powers struggle to find ways of enforcing cease-fires from Syria at a time when there is little political will to run the risk of sending foreign forces or monitors to such dangerous places. The United States has turned to high-tech weapons like armed drones as weapons of war; it now wants to use similar systems to help enforce peace.

Take the Syria Conflict Mapping Project, a geomapping program developed by the Atlanta-based Carter Center, a nonprofit founded by former U.S. President Jimmy Carter and his wife, Rosalynn, to resolve conflict and promote human rights. The project has developed an interactive digital map that tracks military formations by government forces, Islamist extremists, and more moderate armed rebels in virtually every disputed Syrian town. It is now updating its technology to monitor cease-fires.

The project began in January 2012 because of a single 25-year-old intern, Christopher McNaboe. McNaboe realized it was possible to track the state of the conflict by compiling disparate strands of publicly available information — including the shelling and aerial bombardment of towns and rebel positions — from YouTube, Twitter, and other social media sites. It has since developed a mapping program using software provided by Palantir Technologies, a Palo Alto-based big data company that does contract work for U.S. intelligence and defense agencies, from the CIA to the FBI….

Walter Dorn, an expert on technology in U.N. peace operations who attended the White House event, said he had promoted what he calls a “coalition of the connected.”

The U.N. or other outside powers could start by tracking social media sites, including Twitter and YouTube, for reports of possible cease-fire violations. That information could then be verified by “seeded crowdsourcing” — that is, reaching out to networks of known advocates on the ground — and technological monitoring through satellite imagery or drones.

Matthew McNabb, the founder of First Mile Geo, a start-up which develops geolocation technology that can be used to gather data in conflict zones, has another idea. McNabb, who also attended the White House event, believes “on-demand” technologies like SurveyMonkey, which provides users a form to create their own surveys, can be applied in conflict zones to collect data on cease-fire violations….(More)

It’s not big data that discriminates – it’s the people that use it


 in the Conversation: “Data can’t be racist or sexist, but the way it is used can help reinforce discrimination. The internet means more data is collected about us than ever before and it is used to make automatic decisions that can hugely affect our lives, from our credit scores to our employment opportunities.

If that data reflects unfair social biases against sensitive attributes, such as our race or gender, the conclusions drawn from that data might also be based on those biases.

But this era of “big data” doesn’t need to to entrench inequality in this way. If we build smarter algorithms to analyse our information and ensure we’re aware of how discrimination and injustice may be at work, we can actually use big data to counter our human prejudices.

This kind of problem can arise when computer models are used to make predictions in areas such as insurance, financial loans and policing. If members of a certain racial group have historically been more likely to default on their loans, or been more likely to be convicted of a crime, then the model can deem these people more risky. That doesn’t necessarily mean that these people actually engage in more criminal behaviour or are worse at managing their money. They may just be disproportionately targeted by police and sub-prime mortgage salesmen.

Excluding sensitive attributes

Data scientist Cathy O’Neil has written about her experience of developing models for homeless services in New York City. The models were used to predict how long homeless clients would be in the system and to match them with appropriate services. She argues that including race in the analysis would have been unethical.

If the data showed white clients were more likely to find a job than black ones, the argument goes, then staff might focus their limited resources on those white clients that would more likely have a positive outcome. While sociological research has unveiled the ways that racial disparities in homelessness and unemployment are the result of unjust discrimination, algorithms can’t tell the difference between just and unjust patterns. And so datasets should exclude characteristics that may be used to reinforce the bias, such as race.

But this simple response isn’t necessarily the answer. For one thing, machine learning algorithms can often infer sensitive attributes from a combination of other, non-sensitive facts. People of a particular race may be more likely to live in a certain area, for example. So excluding those attributes may not be enough to remove the bias….

An enlightened service provider might, upon seeing the results of the analysis, investigate whether and how racism is a barrier to their black clients getting hired. Equipped with this knowledge they could begin to do something about it. For instance, they could ensure that local employers’ hiring practices are fair and provide additional help to those applicants more likely to face discrimination. The moral responsibility lies with those responsible for interpreting and acting on the model, not the model itself.

So the argument that sensitive attributes should be stripped from the datasets we use to train predictive models is too simple. Of course, collecting sensitive data should be carefully regulated because it can easily be misused. But misuse is not inevitable, and in some cases, collecting sensitive attributes could prove absolutely essential in uncovering, predicting, and correcting unjust discrimination. For example, in the case of homeless services discussed above, the city would need to collect data on ethnicity in order to discover potential biases in employment practices….(More)

A new data viz tool shows what stories are being undercovered in countries around the world


Jospeh Lichterman at NiemanLab: “It’s a common lament: Though the Internet provides us access to a nearly unlimited number of sources for news, most of us rarely venture beyond the same few sources or topics. And as news consumption shifts to our phones, people are using even fewer sources: On average, consumers access 1.52 trusted news sources on their phones, according to the 2015 Reuters Digital News Report, which studied news consumption across several countries.

To try and diversify people’s perspectives on the news, Jigsaw — the techincubator, formerly known as Google Ideas, that’s run by Google’s parentcompany Alphabet — this week launched Unfiltered.News, an experimentalsite that uses Google News data to show users what topics are beingunderreported or are popular in regions around the world.

Screen Shot 2016-03-18 at 11.45.09 AM

Unfiltered.News’ main data visualization shows which topics are most reported in countries around the world. A column on the right side of the page highlights stories that are being reported widely elsewhere in the world, but aren’t in the top 100 stories on Google News in the selected country. In the United States yesterday, five of the top 10 underreported topics, unsurprisingly, dealt with soccer. In China, Barack Obama was the most undercovered topic….(More)”

Can Big Data Help Measure Inflation?


Bourree Lam in The Atlantic: “…As more and more people are shopping online, calculating this index has gotten more difficult, because there haven’t been any great ways of recording prices from the sites disparate retailers.Data shared by retailers and compiled by the technology firm Adobe might help close this gap. The company is perhaps known best for its visual software,including Photoshop, but the company has also become a provider of software and analytics for online retailers. Adobe is now aggregating the sales data that flows through their software for its Digital Price Index (DPI) project, an initiative that’s meant to answer some of the questions that have been dogging researcher snow that e-commerce is such a big part of the economy.

The project, which tracks billions of online transactions and the prices of over a million products, was developed with the help of the economists Austan Goolsbee, the former chairman of Obama’s Council of Economic Advisors and a professor at the University of Chicago’s Booth School of Business, and Peter Klenow, a professor at Stanford University. “We’ve been excited to help them setup various measures of the digital economy, and of prices, and also to see what the Adobe data can teach us about some of the questions that everybody’s had about the CPI,” says Goolsbee. “People are asking questions like ‘How price sensitive is online commerce?’ ‘How much is it growing?’ ‘How substitutable is itf or non-electronic commerce?’ A lot issues you can address with this in a way that we haven’t really been able to do before.” These are some questions that the DPI has the potential to answer.

…While this new trove of data will certainly be helpful to economists and analysts looking at inflation, it surely won’t replace the CPI. Currently, the government sends out hundreds of BLS employees to stores around the country to collect price data. Online pricing is a small part of the BLS calculation, which is incorporated into its methodology as people increasingly report shopping from retailers online, but there’s a significant time lag. While it’s unlikely that the BLS would incorporate private sources of data into its inflation calculations, as e-commerce grows they might look to improve the way they include online prices.Still, economists are optimistic about the potential of Adobe’s DPI. “I don’t think we know the digital economy as well as we should,” says Klenow, “and this data can help us eventually nail that better.”…(More)

Cities, Data, and Digital Innovation


Paper by Mark Kleinman: “Developments in digital innovation and the availability of large-scale data sets create opportunities for new economic activities and new ways of delivering city services while raising concerns about privacy. This paper defines the terms Big Data, Open Data, Open Government, and Smart Cities and uses two case studies – London (U.K.) and Toronto – to examine questions about using data to drive economic growth, improve the accountability of government to citizens, and offer more digitally enabled services. The paper notes that London has been one of a handful of cities at the forefront of the Open Data movement and has been successful in developing its high-tech sector, although it has so far been less innovative in the use of “smart city” technology to improve services and lower costs. Toronto has also made efforts to harness data, although it is behind London in promoting Open Data. Moreover, although Toronto has many assets that could contribute to innovation and economic growth, including a growing high-technology sector, world-class universities and research base, and its role as a leading financial centre, it lacks a clear narrative about how these assets could be used to promote the city. The paper draws some general conclusions about the links between data innovation and economic growth, and between open data and open government, as well as ways to use big data and technological innovation to ensure greater efficiency in the provision of city services…(More)

App turns smartphones into seismic monitors


Springwise: “MyShake is an app that enables anyone to contribute to a worldwide seismic network and help people prepare for earthquakes.

The sheer number of smartphones on the planet make them excellent tools for collecting scientific data. We have already seen citizen scientists use their devices to help crowdsource big data about jellyfish and pollution.Now, MyShake is an Android app from Berkeley University, which enables anyone to contribute to a worldwide seismic network and help reduce the effects of earthquakes.

To begin, users download the app and enable it to run silently in the background of their smartphone. The app monitors for movement that fits the vibrational profile of an earthquake and sends anonymous information to a central system whenever relevant. The crowdsourced data enables the system to confirm an impending quake and estimate its origin time, location and magnitude. Then, the app can send warnings to those in the network who are likely to be affected by the earthquake. MyShake makes use of on the fact that the average smartphone can record earthquakes larger than magnitude five and within 10 km.

myshake-2-earthquake-crowdsource-citizen-scientist-app

MyShake is free to download and the team hopes to launch an iPhone version in the future….(More)”

Political Behavior and Big Data


Special issue of the International Journal of Sociology: “Interest in the use of “big data” in the social sciences is growing dramatically. Yet, adequate methodological research on what constitutes such data, and about their validity, is lacking. Scholars face both opportunities and challenges inherent in this new era of unprecedented quantification of information, including that related to political actions and attitudes. This special issue of the International Journal of Sociology addresses recent uses of “big data,” its multiple meanings, and the potential that this may have in building a stronger understanding of political behavior. We present a working definition of “big data” and summarize the major issues involved in their use. While the papers in this volume deal with various problems – how to integrate “big data” sources with cross-national survey research, the methodological challenges involved in building cross-national longitudinal network data of country memberships in international nongovernmental organizations, methods of detecting and correcting for source selection bias in event data derived from news and other online sources, the challenges and solutions to ex post harmonization of international social survey data – they share a common viewpoint. To make good on the substantive promise of “big data,” scholars need to engage with their inherent methodological problems. At this date, scholars are only beginning to identify and solve them….(More)”

Big data, meet behavioral science


 at Brookings: “America’s community colleges offer the promise of a more affordable pathway to a bachelor’s degree. Students can pay substantially less for the first two years of college, transfer to a four-year college or university, and still earn their diploma in the same amount of time. At least in theory. Most community college students—80 percent of them—enter with the intention to transfer, but only 20 percent actually do so within five years of entering college. This divide represents a classic case of what behavioralists call an intention-action gap.

Why would so many students who enter community colleges intending to transfer fail to actually do so? Put yourself in the shoes of a 20-something community college student. You’ve worked hard for the past couple years, earning credits and paying a lot less in tuition than you would have if you had enrolled immediately in a four-year college or university. But now you want to transfer, so that you can complete your bachelor’s degree. How do you figure out where to go? Ideally you’d probably like to find a college that would take most of your credits, where you’re likely to graduate from, and where the degree is going to count for something in the labor market. A college advisor could probably help you figure this out,but at many community colleges there are at least 1,000 other students assigned to your advisor, so you might have a hard time getting a quality meeting.  Some states have articulation agreements between two- and four-year institutions that guarantee admission for students who complete certain course sequences and perform at a high enough level. But these agreements are often dense and inaccessible.

The combination of big data and behavioral insights has the potential to help students navigate these complex decisions and successfully follow through on their intentions. Big data analytic techniques allow us to identify concrete transfer pathways where students are positioned to succeed; behavioral insights ensure we communicate these options in a way that maximizes students’ engagement and responsiveness…..A growing body of innovative research has demonstrated that, by applying behavioral science insights to the way we communicate with students and families about the opportunities and resources available to them, we can help people navigate these complex decisions and experience better outcomes as a result. A combination of simplified information, reminders, and access to assistance have improved achievement and attainment up and down the education pipeline, nudging parents to practice early-literacy activities with their kids or check in with their high schoolers about missed assignments, andencouraging students to renew their financial aid for college….

These types of big data techniques are already being used in some education sectors. For instance, a growing number of colleges use predictive analytics to identify struggling students who need additional assistance, so faculty and administrators can intervene before the student drops out. But frequently there is insufficient attention, once the results of these predictive analyses are in hand, about how to communicate the information in a way that is likely to lead to behavior change among students or educators. And much of the predictive analytics work has been on the side of plugging leaks in the pipeline (e.g. preventing drop-outs from higher education), rather than on the side of proactively sending students and families personalized information about educational and career pathways where they are likely to flourish…(More)”

Ebola: A Big Data Disaster


Study by Sean Martin McDonald: “…undertaken with support from the Open Society Foundation, Ford Foundation, and Media Democracy Fund, explores the use of Big Data in the form of Call Detail Record (CDR) data in humanitarian crisis.

It discusses the challenges of digital humanitarian coordination in health emergencies like the Ebola outbreak in West Africa, and the marked tension in the debate around experimentation with humanitarian technologies and the impact on privacy. McDonald’s research focuses on the two primary legal and human rights frameworks, privacy and property, to question the impact of unregulated use of CDR’s on human rights. It also highlights how the diffusion of data science to the realm of international development constitutes a genuine opportunity to bring powerful new tools to fight crisis and emergencies.

Analysing the risks of using CDRs to perform migration analysis and contact tracing without user consent, as well as the application of big data to disease surveillance is an important entry point into the debate around use of Big Data for development and humanitarian aid. The paper also raises crucial questions of legal significance about the access to information, the limitation of data sharing, and the concept of proportionality in privacy invasion in the public good. These issues hold great relevance in today’s time where big data and its emerging role for development, involving its actual and potential uses as well as harms is under consideration across the world.

The paper highlights the absence of a dialogue around the significant legal risks posed by the collection, use, and international transfer of personally identifiable data and humanitarian information, and the grey areas around assumptions of public good. The paper calls for a critical discussion around the experimental nature of data modelling in emergency response due to mismanagement of information has been largely emphasized to protect the contours of human rights….

See Sean Martin McDonald – “Ebola: A Big Data Disaster” (PDF).

 

A machine intelligence commission for the UK


Geoff Mulgan at NESTA: ” This paper makes the case for creating a Machine Intelligence Commission – a new public institution to help the development of new generations of algorithms, machine learning tools and uses of big data, ensuring that the public interest is protected.

I argue that new institutions of this kind – which can interrogate, inspect and influence technological development – are a precondition for growing informed public trust. That trust will, in turn, be essential if we are to reap the full potential public and economic benefits from new technologies. The proposal draws on lessons from fields such as human fertilisation, biotech and energy, which have shown how trust can be earned, and how new industries can be grown.  It also draws on lessons from the mistakes made in fields like GM crops and personal health data, where lack of trust has impeded progress….(More)”