Algorithmic Accountability Reporting: On the Investigation of Black Boxes


New report by by Nicholas Diakopoulos: “The past three years have seen a small profusion of websites, perhaps as many as 80, spring up to capitalize on the high interest that mug shot photos generate online.1 Mug shots are public record, artifacts of an arrest, and these websites collect, organize, and optimize the photos so that they’re found more easily online. Proponents of such sites argue that the public has a right to know if their neighbor, romantic date, or colleague has an arrest record. Still, mug shots are not proof of conviction; they don’t signal guilt.
Having one online is likely to result in a reputational blemish; having that photo ranked as the first result when someone searches for your name on Google turns that blemish into a garish reputational wound, festering in facile accessibility. Some of these websites are exploiting this, charging peo- ple to remove their photo from the site so that it doesn’t appear in online searches. It’s reputational blackmail. And remember, these people aren’t necessarily guilty of anything.
To crack down on the practice, states like Oregon, Georgia, and Utah have passed laws requiring these sites to take down the photos if the person’s record has been cleared. Some credit card companies have stopped processing payments for the seediest of the sites. Clearly both legal and market forces can help curtail this activity, but there’s another way to deal with the issue too: algorithms. Indeed, Google recently launched updates to its ranking algorithm that down-weight results from mug shot websites, basically treating them more as spam than as legitimate information sources.2 With a single knock of the algorithmic gavel, Google declared such sites illegitimate.
At the turn of the millennium, 14 years ago, Lawrence Lessig taught us that “code is law”—that the architecture of systems, and the code and algorithms that run them, can be powerful influences on liberty.3 We’re living in a world now where algorithms adjudicate more and more consequential decisions in our lives. It’s not just search engines either; it’s everything from online review systems to educational evaluations, the operation of markets to how political campaigns are run, and even how social services like welfare and public safety are managed. Algorithms, driven by vast troves of data, are the new power brokers in society.
As the mug shots example suggests, algorithmic power isn’t necessarily detrimental to people; it can also act as a positive force. The intent here is not to demonize algorithms, but to recognize that they operate with biases like the rest of us.4 And they can make mistakes. What we generally lack as a public is clarity about how algorithms exercise their power over us. With that clarity comes an increased ability to publicly debate and dialogue the merits of any particular algorithmic power. While legal codes are available for us to read, algorithmic codes are more opaque, hidden behind layers of technical complexity. How can we characterize the power that various algorithms may exert on us? And how can we better understand when algo- rithms might be wronging us? What should be the role of journalists in holding that power to account?
In the next section I discuss what algorithms are and how they encode power. I then describe the idea of algorithmic accountability, first examining how algorithms problematize and sometimes stand in tension with transparency. Next, I describe how reverse engineering can provide an alternative way to characterize algorithmic power by delineating a conceptual model that captures different investigative scenarios based on reverse engineering algorithms’ input-output relationships. I then provide a number of illustrative cases and methodological details on how algorithmic accountability reporting might be realized in practice. I conclude with a discussion about broader issues of human resources, legality, ethics, and transparency.”

Focus on Migration: A tech ‘wiki’ site could improve lives


Max Martin in SciDev: “Wikipedia is probably the best example of a website that allows users to share and edit information in real time. But several other sites based on the ‘wiki’ model provide a sharing platform specifically for technologies that could help improve lives in the developing world.
One such site, Appropedia, is aimed at collaborative solutions in sustainability, appropriate technology and poverty reduction. Appropedia has had 50 million hits since its 2006 inception and is getting a facelift that will allow it to reach more people.
Such a one-stop information point offers tremendous scope for informing people on the move about green, low-cost and locally owned technologies. A website like Appropedia could function as a clearing house for information on technologies that could make life easier for migrants who are forced to travel and live rough in poor settings — as long as the information is reliable.
For example, displaced people building new homes after a disaster has struck face many choices over the materials they use, as I’ve written previously. The wiki site could be a place for them to swap experiences and learn what has worked for others in different settings.
It could also host advice for people on the move about affordable transport, healthcare and humanitarian aid locations, plus tips for staying safe while travelling in unfamiliar territory and what to pack when camping out in the open.
It could also help channel relevant innovations from other settings to migrants. For example, some villagers in flood-prone areas of Bangladesh grow crops on ‘floating gardens’ made using bamboo-pole rafts lined with soil water hyacinths and cow dung. [1] A local group in India’s frequently flooded Bihar state has shown how to make a life jacket using just plastic bottles, sticky tape, fast-drying cotton and thread. [2] Both of these concepts could be useful for other peoples affected by floods and a dedicated wiki could help disseminate know-how and review the technologies’ safety, reliability and suitability for different locations.
Of course, an information wiki for migrants must offer reliable information. This could be achieved by involving a specialist agency or a consortium of humanitarian groups who could invite experts and local practitioners to review and edit posts.”

Tim Berners-Lee: we need to re-decentralise the web


Wired:  “Twenty-five years on from the web’s inception, its creator has urged the public to re-engage with its original design: a decentralised internet that at its very core, remains open to all.
Speaking with Wired editor David Rowan at an event launching the magazine’s March issue, Tim Berners-Lee said that although part of this is about keeping an eye on for-profit internet monopolies such as search engines and social networks, the greatest danger is the emergence of a balkanised web.
“I want a web that’s open, works internationally, works as well as possible and is not nation-based,” Berners-Lee told the audience… “What I don’t want is a web where the  Brazilian government has every social network’s data stored on servers on Brazilian soil. That would make it so difficult to set one up.”
It’s the role of governments, startups and journalists to keep that conversation at the fore, he added, because the pace of change is not slowing — it’s going faster than ever before. For his part Berners-Lee drives the issue through his work at the Open Data Institute, World Wide Web Consortium and World Wide Web Foundation, but also as an MIT professor whose students are “building new architectures for the web where it’s decentralised”. On the issue of monopolies, Berners-Lee did say it’s concerning to be “reliant on big companies, and one big server”, something that stalls innovation, but that competition has historically resolved these issues and will continue to do so.
The kind of balkanised web he spoke about, as typified by Brazil’s home-soil servers argument or Iran’s emerging intranet, is partially being driven by revelations of NSA and GCHQ mass surveillance. The distrust that it has brewed, from a political level right down to the threat of self-censorship among ordinary citizens, threatens an open web and is, said Berners-Lee,  a greater threat than censorship. Knowing the NSA  may be breaking commercial encryption services could result in the emergence of more networks like China’s Great Firewall, to “protect” citizens. This is why we need a bit of anti-establishment push back, alluded to by Berners-Lee.”

Unbundling the nation state


The Economist on Government-to-government trade: “NIGERIAN pineapple for breakfast, Peruvian quinoa for lunch and Japanese sushi for dinner. Two centuries ago, when David Ricardo advocated specialisation and free trade, the notion that international exchange in goods and services could make such a cosmopolitan diet commonplace would have seemed fanciful.
Today another scenario may appear equally unlikely: a Norwegian government agency managing Algeria’s sovereign-wealth fund; German police overseeing security in the streets of Mumbai; and Dubai playing the role of the courthouse of the Middle East. Yet such outlandish possibilities are more than likely if a new development fulfils its promise. Ever more governments are trading with each other, from advising lawmakers to managing entire services. They are following businesses, which have long outsourced much of what they do. Is this the dawn of the government-to-government era?
Such “G2G” trade is not new, though the name may be. After the Ottoman empire defaulted on its debt in 1875 foreign lenders set up an “Ottoman Public Debt Administration”, its governing council packed with European government officials. At its peak it had 9,000 employees, more than the empire’s finance ministry. And the legacy of enforced G2G trade—colonialism, as it was known—is still visible even today. Britain’s Privy Council is the highest court of appeal for many Commonwealth countries. France provides a monetary-policy service to several west African nations by managing their currency, the CFA franc.
One reason G2G trade is growing is that it is a natural extension of the trend for governments to pinch policies from each other. “Policymaking now routinely occurs in comparative terms,” says Jamie Peck of the University of British Columbia, who refers to G2G advice as “fast policy”. Since the late 1990s Mexico’s pioneering policy to make cash benefits for poor families conditional on things like getting children vaccinated and sending them to school has been copied by almost 50 other countries….Budget cuts can provide another impetus for G2G trade. The Dutch army recently sold its Leopard II tanks and now sends tank crews to train with German forces. That way it will be able to reform its tank squadrons quickly if they are needed. Britain, with a ten-year gap between scrapping old aircraft-carriers and buying new ones, has sent pilots to train with the American marines on the F-35B, which will fly from both American and British carriers.

No one knows the size of the G2G market. Governments rarely publicise deals, not least because they fear looking weak. And there are formidable barriers to trade. The biggest is the “Westphalian” view of sovereignty, says Stephen Krasner of Stanford University: that states should run their own affairs without foreign interference. In 2004 Papua New Guinea’s parliament passed a RAMSI-like delegation agreement, but local elites opposed it and courts eventually declared it unconstitutional. Honduras attempted to create independent “charter cities”, a concept developed by Paul Romer of New York University (NYU), whose citizens would have had the right of appeal to the supreme court of Mauritius. But in 2012 this scheme, too, was deemed unconstitutional.
Critics fret about accountability and democratic legitimacy. The 2005 Paris Declaration on Aid Effectiveness, endorsed by governments and aid agencies, made much of the need for developing countries to design their own development strategies. And providers open themselves to reputational risk. British police, for instance, have trained Bahraini ones. A heavy-handed crackdown by local forces during the Arab spring reflected badly on their foreign teachers…
When San Francisco decided to install wireless control systems for its streetlights, it posted a “call for solutions” on Citymart, an online marketplace for municipal projects. In 2012 it found a Swiss firm, Paradox Engineering, which had built such systems for local cities. But though members often share ideas, says Sascha Haselmayer, Citymart’s founder, most still decide to implement their chosen policies themselves.
Weak government services are the main reason poor countries fail to catch up with rich ones, says Mr Romer. One response is for people in poorly run places to move to well governed ones. Better would be to bring efficient government services to them. In a recent paper with Brandon Fuller, also of NYU, Mr Romer argues that either response would bring more benefits than further lowering the barriers to trade in privately provided goods and services. Firms have long outsourced activities, even core ones, to others that do them better. It is time governments followed suit.”

"Natural Cities" Emerge from Social Media Location Data


Emerging Technology From the arXiv: “Nobody agrees on how to define a city. But the emergence of “natural cities” from social media data sets may change that, say computational geographers…
A city is a large, permanent human settlement. But try and define it more carefully and you’ll soon run into trouble. A settlement that qualifies as a city in Sweden may not qualify in China, for example. And the reasons why one settlement is classified as a town while another as a city can sometimes seem almost arbitrary.
City planners know this problem well.  They tend to define cities by administrative, legal or even historical boundaries that have little logic to them. Indeed, the same city can sometimes be defined in various different ways.
That causes all kinds of problems from counting the total population to working out who pays for the upkeep of the place.  Which definition do you use?
Now help may be at hand thanks to the work of Bin Jiang and Yufan Miao at the University of Gävle in Sweden. These guys have found a way to use people’s location recorded by social media to define the boundaries of so-called natural cities which have a close resemblance to real cities in the US.
Jiang and Miao began with a dataset from the Brightkite social network, which was active between 2008 and 2010. The site encouraged users to log in with their location details so that they could see other users nearby. So the dataset consists of almost 3 million locations in the US and the dates on which they were logged.
To start off, Jiang and Miao simply placed a dot on a map at the location of each login. They then connected these dots to their neighbours to form triangles that end up covering the entire mainland US.
Next, they calculated the size of each triangle on the map and plotted this size distribution, which turns out to follow a power law. So there are lots of tiny triangles but only a few  large ones.
Finally, the calculated the average size of the triangles and then coloured in all those that were smaller than average. The coloured areas are “natural cities”, say Jiang and Miao.
It’s easy to imagine that resulting map of triangles is of little value.  But to the evident surprise of ther esearchers, it produces a pretty good approximation of the cities in the US. “We know little about why the procedure works so well but the resulting patterns suggest that the natural cities effectively capture the evolution of real cities,” they say.
That’s handy because it suddenly gives city planners a way to study and compare cities on a level playing field. It allows them to see how cities evolve and change over time too. And it gives them a way to analyse how cities in different parts of the world differ.
Of course, Jiang and Miao will want to find out why this approach reveals city structures in this way. That’s still something of a puzzle but the answer itself may provide an important insight into the nature of cities (or at least into the nature of this dataset).
A few days ago, this blog wrote about how a new science of cities is emerging from the analysis of big data.  This is another example and expect to see more.
Ref:  http://arxiv.org/abs/1401.6756 : The Evolution of Natural Cities from the Perspective of Location-Based Social Media”

Shedding Light on Projects Through Contract Transparency


OpenAidMap: “In all countries, whether rich or poor, contracts are at the nexus of revenue generation, budget planning, resource management and the delivery of public goods. Open contracting refers to norms and practices for increased disclosure and participation in public contracting at all stages of the contracting process.
There are very good reasons for making procurement processes transparent. Public posting of tender notices and “requests for proposals” helps support free and fair competitive bidding – increasing citizen trust while also improving the likelihood of securing the best possible supplier. Once procurement is finished, public posting of contract awards gives important assurance for citizens, development partners, and competing companies that procurement processes are open and fair. Increasingly, open contracting in procurement transparency through portals like this one is becoming the norm for governments around the world. There is also a global initiative at work to establish a common standard for contracting data….
With so much momentum behind procurement transparency, there is an untapped opportunity to leverage data from public procurement processes to provide operational insight into activities. Procurement data can help answer two of the most important questions in project-level aid transparency: (1) Where are projects taking place? (2) How much money is being invested at each location?
Take an example from Nepal. Consulting the government’s aid management system yields some basic, but already useful, information about a particular transportation project. This type of information can be useful for anyone trying to assess patterns of transportation investment in the country or, for that matter, patterns of development partner financing….
Open contracting data have intrinsic value for transparency and accountability. However, they also have significant value for planners – even those who only care about getting greater insight into project activities. At the moment though, contracting data are too difficult to access. While contracting data are increasingly becoming available, they are often posted on stand-alone websites, in diverse data formats and without structured access. By standardizing around a core contracting data format and accessibility approach, we can unlock the potential to use contracting data at scale not only for transparency, but also as an effort-free addition to the arsenal of available data for project-level planning, coordination, and accountability. The utility could be even higher if combined with performance and results data.
When developing any public data standard, there are opportunities and risks. For open contracting data, there is a huge opportunity to make those data equally relevant for project planners as for those more purely interested in transparency and accountability. The pilot conducted by the Open Aid Partnership and AidData has explored this potential for overlap, yielding key insights that we hope can be used in the future development of an open and broadly relevant open contracting data standard.”

The Age of ‘Infopolitics’


Colin Koopman in the New York Times: “We are in the midst of a flood of alarming revelations about information sweeps conducted by government agencies and private corporations concerning the activities and habits of ordinary Americans. After the initial alarm that accompanies every leak and news report, many of us retreat to the status quo, quieting ourselves with the thought that these new surveillance strategies are not all that sinister, especially if, as we like to say, we have nothing to hide.
One reason for our complacency is that we lack the intellectual framework to grasp the new kinds of political injustices characteristic of today’s information society. Everyone understands what is wrong with a government’s depriving its citizens of freedom of assembly or liberty of conscience. Everyone (or most everyone) understands the injustice of government-sanctioned racial profiling or policies that produce economic inequality along color lines. But though nearly all of us have a vague sense that something is wrong with the new regimes of data surveillance, it is difficult for us to specify exactly what is happening and why it raises serious concern, let alone what we might do about it.
Our confusion is a sign that we need a new way of thinking about our informational milieu. What we need is a concept of infopolitics that would help us understand the increasingly dense ties between politics and information. Infopolitics encompasses not only traditional state surveillance and data surveillance, but also “data analytics” (the techniques that enable marketers at companies like Target to detect, for instance, if you are pregnant), digital rights movements (promoted by organizations like the Electronic Frontier Foundation), online-only crypto-currencies (like Bitcoin or Litecoin), algorithmic finance (like automated micro-trading) and digital property disputes (from peer-to-peer file sharing to property claims in the virtual world of Second Life). These are only the tip of an enormous iceberg that is drifting we know not where.
Surveying this iceberg is crucial because atop it sits a new kind of person: the informational person. Politically and culturally, we are increasingly defined through an array of information architectures: highly designed environments of data, like our social media profiles, into which we often have to squeeze ourselves. The same is true of identity documents like your passport and individualizing dossiers like your college transcripts. Such architectures capture, code, sort, fasten and analyze a dizzying number of details about us. Our minds are represented by psychological evaluations, education records, credit scores. Our bodies are characterized via medical dossiers, fitness and nutrition tracking regimens, airport security apparatuses. We have become what the privacy theorist Daniel Solove calls “digital persons.” As such we are subject to infopolitics (or what the philosopher Grégoire Chamayou calls “datapower,” the political theorist Davide Panagia “datapolitik” and the pioneering thinker Donna Haraway “informatics of domination”).
Today’s informational person is the culmination of developments stretching back to the late 19th century. It was in those decades that a number of early technologies of informational identity were first assembled. Fingerprinting was implemented in colonial India, then imported to Britain, then exported worldwide. Anthropometry — the measurement of persons to produce identifying records — was developed in France in order to identify recidivists. The registration of births, which has since become profoundly important for initiating identification claims, became standardized in many countries, with Massachusetts pioneering the way in the United States before a census initiative in 1900 led to national standardization. In the same era, bureaucrats visiting rural districts complained that they could not identify individuals whose names changed from context to context, which led to initiatives to universalize standard names. Once fingerprints, biometrics, birth certificates and standardized names were operational, it became possible to implement an international passport system, a social security number and all other manner of paperwork that tells us who someone is. When all that paper ultimately went digital, the reams of data about us became radically more assessable and subject to manipulation, which has made us even more informational.
We like to think of ourselves as somehow apart from all this information. We are real — the information is merely about us. But what is it that is real? What would be left of you if someone took away all your numbers, cards, accounts, dossiers and other informational prostheses? Information is not just about you — it also constitutes who you are….”

Google Hangouts vs Twitter Q&As: how the US and Europe are hacking traditional diplomacy


Wired (UK): “We’re not yet sure if diplomacy is going digital or just the conversations we’re having,” Moira Whelan, Deputy Assistant Secretary for Digital Strategy, US Department of State, admitted on stage at TedxStockholm. “Sometimes you just have to dive in, and we’re going to, but we’re not really sure where we’re going.”
The US has been at the forefront of digital diplomacy for many years now. President Obama was the first leader to sign up to Twitter, and has amassed the greatest number of followers among his peers at nearly 41 million. The account is, however, mainly run by his staff. It’s understandable, but demonstrates that there still remains a diplomatic disconnect in a country Whelan says knows it’s “ready, leading the conversation and on cutting edge”.
In Europe  Swedish Minister for Foreign Affairs Carl Bildt, on the other hand, carries out regular Q&As on the social network and is regarded as one of the most conversational leaders on Twitter and the best connected, according to annual survey Twiplomacy. Our own William Hague is chasing Bildt with close to 200,000 followers, and is the world’s second most connected Foreign Minister, while David Cameron is active on a daily basis with more than 570,000 followers. London was in fact the first place to host a “Diplohack”, an event where ambassadors are brought together with developers and others to hack traditional diplomacy, and Whelan travelled to Sweden to take place in the third European event, the Stockholm Initiative for Digital Diplomacy held 16-17 January in conjunction with TedxStockholm.
Nevertheless, Whelan, who has worked for the state for a decade, says the US is in the game and ready to try new things. Case in point being its digital diplomacy reaction to the crisis in Syria last year.
“In August 2013 we witnessed tragic events in Syria, and obviously the President of the United States and his security team jumped into action,” said Whelan. “We needed to bear witness and… very clearly saw the need for one thing — a Google+ Hangout.” With her tongue-in-cheek comment, Whelan was pointing out social media’s incredibly relevant role in communicating to the public what’s going on when crises hit, and in answering concerns and questions through it.
“We saw speeches and very disturbing images coming at us,” continued Whelan. “We heard leaders making impassioned speeches, and we ourselves had conversations about what we were seeing and how we needed to engage and inform; to give people the chance to engage and ask questions of us.
“We thought, clearly let’s have a Google+ Hangout. Three people joined us and Secretary John Kerry — Nicholas Kirstof of the New York Times, executive editor of Syria Deeply, Lara Setrakian and Andrew Beiter, a teacher affiliated with the Holocaust Memorial Museum who specialises in how we talk about these topics with our children.”
In the run up to the Hangout, news of the event trickled out and soon Google was calling, asking if it could advertise the session at the bottom of other Hangouts, then on YouTube ads. “Suddenly 15,000 people were watching the Secretary live — that’s by far largest number we’d seen. We felt we’d tapped into something, we knew we’d hit success at what was a challenging time. We were engaging the public and could join with them to communicate a set of questions. People want to ask questions and get very direct answers, and we know it’s a success. We’ve talked to Google about how we can replicate that. We want to transform what we’re doing to make that the norm.”
Secretary of State John Kerry is, Whelan told Wired.co.uk later, “game for anything” when it comes to social media — and having the department leader enthused at the prospect of taking digital diplomacy forward is obviously key to its success.
“He wanted us to get on Instagram and the unselfie meme during the Philippines crisis was his idea — an assistant had seen it and he held a paper in front of him with the URL to donate funds to Typhoon Haiyan victims,” Whelan told Wired.co.uk at the Stockholm diplohack.  “President Obama came in with a mandate that social media would be present and pronounced in all our departments.”
“[As] government changes and is more influenced away from old paper models and newspapers, suspenders and bow ties, and more into young innovators wanting to come in and change things,” Whelan continued, “I think it will change the way we work and help us get smarter.”

Use big data and crowdsourcing to detect nuclear proliferation, says DSB


FierceGovernmentIT: “A changing set of counter-nuclear proliferation problems requires a paradigm shift in monitoring that should include big data analytics and crowdsourcing, says a report from the Defense Science Board.
Much has changed since the Cold War when it comes to ensuring that nuclear weapons are subject to international controls, meaning that monitoring in support of treaties covering declared capabilities should be only one part of overall U.S. monitoring efforts, says the board in a January report (.pdf).
There are challenges related to covert operations, such as testing calibrated to fall below detection thresholds, and non-traditional technologies that present ambiguous threat signatures. Knowledge about how to make nuclear weapons is widespread and in the hands of actors who will give the United States or its allies limited or no access….
The report recommends using a slew of technologies including radiation sensors, but also exploitation of digital sources of information.
“Data gathered from the cyber domain establishes a rich and exploitable source for determining activities of individuals, groups and organizations needed to participate in either the procurement or development of a nuclear device,” it says.
Big data analytics could be used to take advantage of the proliferation of potential data sources including commercial satellite imaging, social media and other online sources.
The report notes that the proliferation of readily available commercial satellite imagery has created concerns about the introduction of more noise than genuine signal. “On balance, however, it is the judgment from the task force that more information from remote sensing systems, both commercial and dedicated national assets, is better than less information,” it says.
In fact, the ready availability of commercial imagery should be an impetus of governmental ability to find weak signals “even within the most cluttered and noisy environments.”
Crowdsourcing also holds potential, although the report again notes that nuclear proliferation analysis by non-governmental entities “will constrain the ability of the United States to keep its options open in dealing with potential violations.” The distinction between gathering information and making political judgments “will erode.”
An effort by Georgetown University students (reported in the Washington Post in 2011) to use open source data analyzing the network of tunnels used in China to hide its missile and nuclear arsenal provides a proof-of-concept on how crowdsourcing can be used to augment limited analytical capacity, the report says – despite debate on the students’ work, which concluded that China’s arsenal could be many times larger than conventionally accepted…
For more:
download the DSB report, “Assessment of Nuclear Monitoring and Verification Technologies” (.pdf)
read the WaPo article on the Georgetown University crowdsourcing effort”

Prospects for Online Crowdsourcing of Social Science Research Tasks: A Case Study Using Amazon Mechanical Turk


New paper by Catherine E. Schmitt-Sands and Richard J. Smith: “While the internet has created new opportunities for research, managing the increased complexity of relationships and knowledge also creates challenges. Amazon.com has a Mechanical Turk service that allows people to crowdsource simple tasks for a nominal fee. The online workers may be anywhere in North America or India and range in ability. Social science researchers are only beginning to use this service. While researchers have used crowdsourcing to find research subjects or classify texts, we used Mechanical Turk to conduct a policy scan of local government websites. This article describes the process used to train and ensure quality of the policy scan. It also examines choices in the context of research ethics.”