Code of Conduct: Cyber Crowdsourcing for Good


Patrick Meier at iRevolution: “There is currently no unified code of conduct for digital crowdsourcing efforts in the development, humanitarian or human rights space. As such, we propose the following principles (displayed below) as a way to catalyze a conversation on these issues and to improve and/or expand this Code of Conduct as appropriate.
This initial draft was put together by Kate ChapmanBrooke Simons and myself. The link above points to this open, editable Google Doc. So please feel free to contribute your thoughts by inserting comments where appropriate. Thank you.
An organization that launches a digital crowdsourcing project must:

  • Provide clear volunteer guidelines on how to participate in the project so that volunteers are able to contribute meaningfully.
  • Test their crowdsourcing platform prior to any project or pilot to ensure that the system will not crash due to obvious bugs.
  • Disclose the purpose of the project, exactly which entities will be using and/or have access to the resulting data, to what end exactly, over what period of time and what the expected impact of the project is likely to be.
  • Disclose whether volunteer contributions to the project will or may be used as training data in subsequent machine learning research
  • ….

An organization that launches a digital crowdsourcing project should:

  • Share as much of the resulting data with volunteers as possible without violating data privacy or the principle of Do No Harm.
  • Enable volunteers to opt out of having their tasks contribute to subsequent machine learning research. Provide digital volunteers with the option of having their contributions withheld from subsequent machine learning studies
  • … “

Seattle Launches Sweeping, Ethics-Based Privacy Overhaul


for the Privacy Advisor: “The City of Seattle this week launched a citywide privacy initiative aimed at providing greater transparency into the city’s data collection and use practices.
To that end, the city has convened a group of stakeholders, the Privacy Advisory Committee, comprising various government departments, to look at the ways the city is using data collected from practices as common as utility bill payments and renewing pet licenses or during the administration of emergency services like police and fire. By this summer, the committee will deliver the City Council suggested principles and a “privacy statement” to provide direction on privacy practices citywide.
In addition, the city has partnered with the University of Washington, where Jan Whittington, assistant professor of urban design and planning and associate director at the Center for Information Assurance and Cybersecurity, has been given a $50,000 grant to look at open data, privacy and digital equity and how municipal data collection could harm consumers.
Responsible for all things privacy in this progressive city is Michael Mattmiller, who was hired to the position of chief technology officer (CTO) for the City of Seattle in June. Before his current gig, he worked as a senior strategist in enterprise cloud privacy for Microsoft. He said it’s an exciting time to be at the helm of the office because there’s momentum, there’s talent and there’s intention.
“We’re at this really interesting time where we have a City Council that strongly cares about privacy … We have a new police chief who wants to be very good on privacy … We also have a mayor who is focused on the city being an innovative leader in the way we interact with the public,” he said.
In fact, some City Council members have taken it upon themselves to meet with various groups and coalitions. “We have a really good, solid environment we think we can leverage to do something meaningful,” Mattmiller said….
Armbruster said the end goal is to create policies that will hold weight over time.
“I think when looking at privacy principles, from an ethical foundation, the idea is to create something that will last while technology dances around us,” she said, adding the principles should answer the question, “What do we stand for as a city and how do we want to move forward? So any technology that falls into our laps, we can evaluate and tailor or perhaps take a pass on as it falls under our ethical framework.”
The bottom line, Mattmiller said, is making a decision that says something about Seattle and where it stands.
“How do we craft a privacy policy that establishes who we want to be as a city and how we want to operate?” Mattmiller asked.”

The Creepy New Wave of the Internet


Review by Sue Halpern in the New York Review of Books from:

 
…So here comes the Internet’s Third Wave. In its wake jobs will disappear, work will morph, and a lot of money will be made by the companies, consultants, and investment banks that saw it coming. Privacy will disappear, too, and our intimate spaces will become advertising platforms—last December Google sent a letter to the SEC explaining how it might run ads on home appliances—and we may be too busy trying to get our toaster to communicate with our bathroom scale to notice. Technology, which allows us to augment and extend our native capabilities, tends to evolve haphazardly, and the future that is imagined for it—good or bad—is almost always historical, which is to say, naive.”

A World That Counts: Mobilising a Data Revolution for Sustainable Development


Executive Summary of the Report by the UN Secretary-General’s Independent Expert Advisory Group on a Data Revolution for Sustainable Development (IEAG): “Data are the lifeblood of decision-making and the raw material for accountability. Without high-quality data providing the right information on the right things at the right time; designing, monitoring and evaluating effective policies becomes almost impossible.
New technologies are leading to an exponential increase in the volume and types of data available, creating unprecedented possibilities for informing and transforming society and protecting the environment. Governments, companies, researchers and citizen groups are in a ferment of experimentation, innovation and adaptation to the new world of data, a world in which data are bigger, faster and more detailed than ever before. This is the data revolution.
Some are already living in this new world. But too many people, organisations and governments are excluded because of lack of resources, knowledge, capacity or opportunity. There are huge and growing inequalities in access to data and information and in the ability to use it.
Data needs improving. Despite considerable progress in recent years, whole groups of people are not being counted and important aspects of people’s lives and environmental conditions are still not measured. For people, this can lead to the denial of basic rights, and for the planet, to continued environmental degradation. Too often, existing data remain unused because they are released too late or not at all, not well-documented and harmonized, or not available at the level of detail needed for decision-making.
As the world embarks on an ambitious project to meet new Sustainable Development Goals (SDGs), there is an urgent need to mobilise the data revolution for all people and the whole planet in order to monitor progress, hold governments accountable and foster sustainable development. More diverse, integrated, timely and trustworthy information can lead to better decision-making and real-time citizen feedback. This in turn enables individuals, public and private institutions, and companies to make choices that are good for them and for the world they live in.
This report sets out the main opportunities and risks presented by the data revolution for sustain-able development. Seizing these opportunities and mitigating these risks requires active choices, especially by governments and international institutions. Without immediate action, gaps between developed and developing countries, between information-rich and information-poor people, and between the private and public sectors will widen, and risks of harm and abuses of human rights will grow.

An urgent call for action: Key recommendations

The strong leadership of the United Nations (UN) is vital for the success of this process. The Independent Expert Advisory Group (IEAG), established in August 2014, offers the UN Secretary-General several key recommendations for actions to be taken in the near future, summarised below:

  1. Develop a global consensus on principles and standards: The disparate worlds of public, private and civil society data and statistics providers need to be urgently brought together to build trust and confidence among data users. We propose that the UN establish a process whereby key stakeholders create a “Global Consensus on Data”, to adopt principles concerning legal, technical, privacy, geospatial and statistical standards which, among other things, will facilitate openness and information exchange and promote and protect human rights.
  2. Share technology and innovations for the common good: To create mechanisms through which technology and innovation can be shared and used for the common good, we propose
    to create a global “Network of Data Innovation Networks”, to bring together the organisations and experts in the field. This would: contribute to the adoption of best practices for improving the monitoring of SDGs, identify areas where common data-related infrastructures could address capacity problems and improve efficiency, encourage collaborations, identify critical research gaps and create incentives to innovate.
  3. New resources for capacity development: Improving data is a development agenda in
    its own right, and can improve the targeting of existing resources and spur new economic opportunities. Existing gaps can only be overcome through new investments and the strengthening of capacities. A new funding stream to support the data revolution for sustainable development should be endorsed at the “Third International Conference on Financing for Development”, in Addis Ababa in July 2015. An assessment will be needed of the scale of investments, capacity development and technology transfer that is required, especially for low income countries; and proposals developed for mechanisms to leverage the creativity and resources of the private sector. Funding will also be needed to implement an education program aimed at improving people’s, infomediaries’ and public servants’ capacity and data literacy to break down barriers between people and data.
  4. Leadership for coordination and mobilisation: A UN-led “Global Partnership for Sustainable Development Data” is proposed, tomobiliseandcoordinate the actions and institutions required to make the data revolution serve sustainable development, promoting several initiatives, such as:
    • A “World Forum on Sustainable Development Data” to bring together the whole data ecosystem to share ideas and experiences for data improvements, innovation, advocacy and technology transfer. The first Forum should take place at the end of 2015, once the SDGs are agreed;
    • A “Global Users Forum for Data for SDGs”, to ensure feedback loops between data producers and users, help the international community to set priorities and assess results;
    • Brokering key global public-private partnerships for data sharing.
  5. Exploit some quick wins on SDG data: Establishing a “SDGs data lab” to support the development of a first wave of SDG indicators, developing an SDG analysis and visualisation platform using the most advanced tools and features for exploring data, and building a dashboard from diverse data sources on ”the state of the world”.

Never again should it be possible to say “we didn’t know”. No one should be invisible. This is the world we want – a world that counts.”

OpenUp Corporate Data while Protecting Privacy


Article by Stefaan G. Verhulst and David Sangokoya, (The GovLab) for the OpenUp? Blog: “Consider a few numbers: By the end of 2014, the number of mobile phone subscriptions worldwide is expected to reach 7 billion, nearly equal to the world’s population. More than 1.82 billion people communicate on some form of social network, and almost 14 billion sensor-laden everyday objects (trucks, health monitors, GPS devices, refrigerators, etc.) are now connected and communicating over the Internet, creating a steady stream of real-time, machine-generated data.
Much of the data generated by these devices is today controlled by corporations. These companies are in effect “owners” of terabytes of data and metadata. Companies use this data to aggregate, analyze, and track individual preferences, provide more targeted consumer experiences, and add value to the corporate bottom line.
At the same time, even as we witness a rapid “datafication” of the global economy, access to data is emerging as an increasingly critical issue, essential to addressing many of our most important social, economic, and political challenges. While the rise of the Open Data movement has opened up over a million datasets around the world, much of this openness is limited to government (and, to a lesser extent, scientific) data. Access to corporate data remains extremely limited. This is a lost opportunity. If corporate data—in the form of Web clicks, tweets, online purchases, sensor data, call data records, etc.—were made available in a de-identified and aggregated manner, researchers, public interest organizations, and third parties would gain greater insights on patterns and trends that could help inform better policies and lead to greater public good (including combatting Ebola).
Corporate data sharing holds tremendous promise. But its potential—and limitations—are also poorly understood. In what follows, we share early findings of our efforts to map this emerging open data frontier, along with a set of reflections on how to safeguard privacy and other citizen and consumer rights while sharing. Understanding the practice of shared corporate data—and assessing the associated risks—is an essential step in increasing access to socially valuable data held by businesses today. This is a challenge certainly worth exploring during the forthcoming OpenUp conference!
Understanding and classifying current corporate data sharing practices
Corporate data sharing remains very much a fledgling field. There has been little rigorous analysis of different ways or impacts of sharing. Nonetheless, our initial mapping of the landscape suggests there have been six main categories of activity—i.e., ways of sharing—to date:…
Assessing risks of corporate data sharing
Although the shared corporate data offers several benefits for researchers, public interest organizations, and other companies, there do exist risks, especially regarding personally identifiable information (PII). When aggregated, PII can serve to help understand trends and broad demographic patterns. But if PII is inadequately scrubbed and aggregated data is linked to specific individuals, this can lead to identity theft, discrimination, profiling, and other violations of individual freedom. It can also lead to significant legal ramifications for corporate data providers….”

The collision between big data and privacy law


Paper by Stephen Wilson in the Australian Journal of Telecommunications and the Digital Economy : “We live in an age where billionaires are self-made on the back of the most intangible of assets – the information they have about us. The digital economy is awash with data. It’s a new and endlessly re-useable raw material, increasingly left behind by ordinary people going about their lives online. Many information businesses proceed on the basis that raw data is up for grabs; if an entrepreneur is clever enough to find a new vein of it, they can feel entitled to tap it in any way they like. However, some tacit assumptions underpinning today’s digital business models are naive. Conventional data protection laws, older than the Internet, limit how Personal Information is allowed to flow. These laws turn out to be surprisingly powerful in the face of ‘Big Data’ and the ‘Internet of Things’. On the other hand, orthodox privacy management was not framed for new Personal Information being synthesised tomorrow from raw data collected today. This paper seeks to bridge a conceptual gap between data analytics and privacy, and sets out extended Privacy Principles to better deal with Big Data.”

Urban Observatory Is Snapping 9,000 Images A Day Of New York City


FastCo-Exist: “Astronomers have long built observatories to capture the night sky and beyond. Now researchers at NYU are borrowing astronomy’s methods and turning their cameras towards Manhattan’s famous skyline.
NYU’s Center for Urban Science and Progress has been running what’s likely the world’s first “urban observatory” of its kind for about a year. From atop a tall building in downtown Brooklyn (NYU won’t say its address, due to security concerns), two cameras—one regular one and one that captures infrared wavelengths—take panoramic images of lower and midtown Manhattan. One photo is snapped every 10 seconds. That’s 8,640 images a day, or more than 3 million since the project began (or about 50 terabytes of data).

“The real power of the urban observatory is that you have this synoptic imaging. By synoptic imaging, I mean these large swaths of the city,” says the project’s chief scientist Gregory Dobler, a former astrophysicist at Harvard University and the University of California, Santa Barbara who now heads the 15-person observatory team at NYU.
Dobler’s team is collaborating with New York City officials on the project, which is now expanding to set up stations that study other parts of Manhattan and Brooklyn. Its major goal is to discover information about the urban landscape that can’t be seen at other scales. Such data could lead to applications like tracking which buildings are leaking energy (with the infrared camera), or measuring occupancy patterns of buildings at night, or perhaps detecting releases of toxic chemicals in an emergency.
The video above is an example. The top panel cycles through a one-minute slice of observatory images. The bottom panel is an analysis of the same images in which everything that remains static in each image is removed, such as buildings, trees, and roads. What’s left is an imprint of everything in flux within the scene—the clouds, the cars on the FDR Drive, the boat moving down the East River, and, importantly, a plume of smoke that puffs out of a building.
“Periodically, a building will burp,” says Dobler. “It’s hard to see the puffs of smoke . . . but we can isolate that plume and essentially identify it.” (As Dobler has done by highlighting it in red in the top panel).
To the natural privacy concerns about this kind of program, Dobler emphasizes that the pictures are only from an 8 megapixel camera (the same found in the iPhone 6) and aren’t clear enough to see inside a window or make out individuals. As a further privacy safeguard, the images are analyzed to only look at “aggregate” measures—such as the patterns of nighttime energy usage—rather than specific buildings. “We’re not really interested in looking at a given building, and saying, hey, these guys are particular offenders,” he says (He also says the team is not looking at uses for the data in security applications.) However, Dobler was not able to answer a question as to whether the project’s partners at city agencies are able to access data analysis for individual buildings….”

Law is Code: A Software Engineering Approach to Analyzing the United States Code


New Paper by William Li, Pablo Azar, David Larochelle, Phil Hill & Andrew Lo: “The agglomeration of rules and regulations over time has produced a body of legal code that no single individual can fully comprehend. This complexity produces inefficiencies, makes the processes of understanding and changing the law difficult, and frustrates the fundamental principle that the law should provide fair notice to the governed. In this article, we take a quantitative, unbiased, and software-engineering approach to analyze the evolution of the United States Code from 1926 to today. Software engineers frequently face the challenge of understanding and managing large, structured collections of instructions, directives, and conditional statements, and we adapt and apply their techniques to the U.S. Code over time. Our work produces insights into the structure of the U.S. Code as a whole, its strengths and vulnerabilities, and new ways of thinking about individual laws. For example, we identify the first appearance and spread of important terms in the U.S. Code like “whistleblower” and “privacy.” We also analyze and visualize the network structure of certain substantial reforms, including the Patient Protection and Affordable Care Act (PPACA) and the Dodd-Frank Wall Street Reform and Consumer Protection Act, and show how the interconnections of references can increase complexity and create the potential for unintended consequences. Our work is a timely illustration of computational approaches to law as the legal profession embraces technology for scholarship, to increase efficiency, and to improve access to justice.”

Crowd-Sourcing Corruption: What Petrified Forests, Street Music, Bath Towels and the Taxman Can Tell Us About the Prospects for Its Future


Paper by Dieter Zinnbauer: This article seeks to map out the prospects of crowd-sourcing technologies in the area of corruption-reporting. A flurry of initiative and concomitant media hype in this area has led to exuberant hopes that the end of impunity is not such a distant possibility any more – at least not for the most blatant, ubiquitous and visible forms of administrative corruption, such as bribes and extortion payments that on average almost a quarter of citizens reported to face year in, year out in their daily lives in so many countries around the world (Transparency International 2013).
Only with hindsight will we be able to tell, if these hopes were justified. However, a closer look at an interdisciplinary body of literature on corruption and social mobilisation can help shed some interesting light on these questions and offer a fresh perspective on the potential of social media based crowd-sourcing for better governance and less corruption. So far the potential of crowd-sourcing is mainly approached from a technology-centred perspective. Where challenges are identified, pondered, and worked upon they are primarily technical and managerial in nature, ranging from issues of privacy protection and fighting off hacker attacks to challenges of data management, information validation or fundraising.
In contrast, short shrift is being paid to insights from a substantive, multi-disciplinary and growing body of literature on how corruption works, how it can be fought and more generally how observed logics of collective action and social mobilisation interact with technological affordances and condition the success of these efforts.
This imbalanced debate is not really surprising as it seems to follow the trajectory of the hype-and-bust cycle that we have seen in the public debate for a variety of other technology applications. From electronic health cards to smart government, to intelligent transport systems, all these and many other highly ambitious initiatives start with technology-centric visions of transformational impact. However, over time – with some hard lessons learnt and large sums spent – they all arrive at a more pragmatic and nuanced view on how social and economic forces shape the implementation of such technologies and require a more shrewd design approach, in order to make it more likely that potential actually translates into impact….”

Ebola and big data: Call for help


The Economist: “WITH at least 4,500 people dead, public-health authorities in west Africa and worldwide are struggling to contain Ebola. Borders have been closed, air passengers screened, schools suspended. But a promising tool for epidemiologists lies unused: mobile-phone data.
When people make mobile-phone calls, the network generates a call data record (CDR) containing such information as the phone numbers of the caller and receiver, the time of the call and the tower that handled it—which gives a rough indication of the device’s location. This information provides researchers with an insight into mobility patterns. Indeed phone companies use these data to decide where to build base stations and thus improve their networks, and city planners use them to identify places to extend public transport.
But perhaps the most exciting use of CDRs is in the field of epidemiology. Until recently the standard way to model the spread of a disease relied on extrapolating trends from census data and surveys. CDRs, by contrast, are empirical, immediate and updated in real time. You do not have to guess where people will flee to or move. Researchers have used them to map malaria outbreaks in Kenya and Namibia and to monitor the public response to government health warnings during Mexico’s swine-flu epidemic in 2009. Models of population movements during a cholera outbreak in Haiti following the earthquake in 2010 used CDRs and provided the best estimates of where aid was most needed.
Doing the same with Ebola would be hard: in west Africa most people do not own a phone. But CDRs are nevertheless better than simulations based on stale, unreliable statistics. If researchers could track population flows from an area where an outbreak had occurred, they could see where it would be likeliest to break out next—and therefore where they should deploy their limited resources. Yet despite months of talks, and the efforts of the mobile-network operators’ trade association and several smaller UN agencies, telecoms firms have not let researchers use the data (see article).
One excuse is privacy, which is certainly a legitimate worry, particularly in countries fresh from civil war, or where tribal tensions exist. But the phone data can be anonymised and aggregated in a way that alleviates these concerns. A bigger problem is institutional inertia. Big data is a new field. The people who grasp the benefits of examining mobile-phone usage tend to be young, and lack the clout to free them for research use.”