Resource Guide to Data Governance and Security


National Neighborhood Indicators Partnership (NNIP): “Any organization that collects, analyzes, or disseminates data should establish formal systems to manage data responsibly, protect confidentiality, and document data files and procedures. In doing so, organizations will build a reputation for integrity and facilitate appropriate interpretation and data sharing, factors that contribute to an organization’s long-term sustainability.

To help groups improve their data policies and practices, this guide assembles lessons from the experiences of partners in the National Neighborhood Indicators Partnership network and similar organizations. The guide presents advice and annotated resources for the three parts of a data governance program: protecting privacy and human subjects, ensuring data security, and managing the data life cycle. While applicable for non-sensitive data, the guide is geared for managing confidential data, such as data used in integrated data systems or Pay-for-Success programs….(More)”.

Is the Government More Entrepreneurial Than You Think?


 Freakonomics Radio (Podcast): We all know the standard story: our economy would be more dynamic if only the government would get out of the way. The economist Mariana Mazzucato says we’ve got that story backward. She argues that the government, by funding so much early-stage research, is hugely responsible for big successes in tech, pharma, energy, and more. But the government also does a terrible job in claiming credit — and, more important, getting a return on its investment….

Quote:

MAZZUCATO: “…And I’ve been thinking about this especially around the big data and the kind of new questions around privacy with Facebook, etc. Instead of having a situation where all the data basically gets captured, which is citizens’ data, by companies which then, in some way, we have to pay into in terms of accessing these great new services — whether they’re free or not, we’re still indirectly paying. We should have the data in some sort of public repository because it’s citizens’ data. The technology itself was funded by the citizens. What would Uber be without GPS, publicly financed? What would Google be without the Internet, publicly financed? So, the tech was financed from the state, the citizens; it’s their data. Why not completely reverse the current relationship and have that data in a public repository which companies actually have to pay into to get access to it under certain strict conditions which could be set by an independent advisory council?… (More)”

How Smart Should a City Be? Toronto Is Finding Out


Laura Bliss at CityLab: “A data-driven “neighborhood of the future” masterminded by a Google corporate sibling, the Quayside project could be a milestone in digital-age city-building. But after a year of scandal in Silicon Valley, questions about privacy and security remain…

Quayside was billed as “the world’s first neighborhood built from the internet up,” according to Sidewalk Labs’ vision plan, which won the RFP to develop this waterfront parcel. The startup’s pitch married “digital infrastructure” with an utopian promise: to make life easier, cheaper, and happier for Torontonians.

Everything from pedestrian traffic and energy use to the fill-height of a public trash bin and the occupancy of an apartment building could be counted, geo-tagged, and put to use by a wifi-connected “digital layer” undergirding the neighborhood’s physical elements. It would sense movement, gather data, and send information back to a centralized map of the neighborhood. “With heightened ability to measure the neighborhood comes better ways to manage it,” stated the winning document. “Sidewalk expects Quayside to become the most measurable community in the world.”

“Smart cities are largely an invention of the private sector—an effort to create a market within government,” Wylie wrote in Canada’s Globe and Mail newspaper in December 2017. “The business opportunities are clear. The risks inherent to residents, less so.” A month later, at a Toronto City Council meeting, Wylie gave a deputation asking officials to “ensure that the data and data infrastructure of this project are the property of the city of Toronto and its residents.”

In this case, the unwary Trojans would be Waterfront Toronto, the nonprofit corporation appointed by three levels of Canadian government to own, manage, and build on the Port Lands, 800 largely undeveloped acres between downtown and Lake Ontario. When Waterfront Toronto gave Sidewalk Labs a green light for Quayside in October, the startup committed $50 million to a one-year consultation, which was recently extended by several months. The plan is to submit a final “Master Innovation and Development Plan” by the end of this year.

That somewhat Orwellian vision of city management had privacy advocates and academics concerned from the the start. Bianca Wylie, the co-founder of the technology advocacy group Tech Reset Canada, has been perhaps the most outspoken of the project’s local critics. For the last year, she’s spoken up at public fora, written pointed op-edsand Medium posts, and warned city officials of what she sees as the “Trojan horse” of smart city marketing: private companies that stride into town promising better urban governance, but are really there to sell software and monetize citizen data.

But there has been no guarantee about who would own the data at the core of its proposal—much of which would ostensibly be gathered in public space. Also unresolved is the question of whether this data could be sold. With little transparency about what that means from the company or its partner, some Torontonians are wondering what Waterfront Toronto—and by extension, the public—is giving away….(More)”.

Decentralisation: the next big step for the world wide web


Zoë Corbyn at The Observer: “The decentralised web, or DWeb, could be a chance to take control of our data back from the big tech firms. So how does it work and when will it be here?...What is the decentralised web? 
It is supposed to be like the web you know but without relying on centralised operators. In the early days of the world wide web, which came into existence in 1989, you connected directly with your friends through desktop computers that talked to each other. But from the early 2000s, with the advent of Web 2.0, we began to communicate with each other and share information through centralised services provided by big companies such as Google, Facebook, Microsoft and Amazon. It is now on Facebook’s platform, in its so called “walled garden”, that you talk to your friends. “Our laptops have become just screens. They cannot do anything useful without the cloud,” says Muneeb Ali, co-founder of Blockstack, a platform for building decentralised apps. The DWeb is about re-decentralising things – so we aren’t reliant on these intermediaries to connect us. Instead users keep control of their data and connect and interact and exchange messages directly with others in their network.

Why do we need an alternative? 
With the current web, all that user data concentrated in the hands of a few creates risk that our data will be hacked. It also makes it easier for governments to conduct surveillance and impose censorship. And if any of these centralised entities shuts down, your data and connections are lost. Then there are privacy concerns stemming from the business models of many of the companies, which use the private information we provide freely to target us with ads. “The services are kind of creepy in how much they know about you,” says Brewster Kahle, the founder of the Internet Archive. The DWeb, say proponents, is about giving people a choice: the same services, but decentralised and not creepy. It promises control and privacy, and things can’t all of a sudden disappear because someone decides they should. On the DWeb, it would be harder for the Chinese government to block a site it didn’t like, because the information can come from other places.

How does the DWeb work that is different? 

There are two big differences in how the DWeb works compared to the world wide web, explains Matt Zumwalt, the programme manager at Protocol Labs, which builds systems and tools for the DWeb. First, there is this peer-to-peer connectivity, where your computer not only requests services but provides them. Second, how information is stored and retrieved is different. Currently we use http and https links to identify information on the web. Those links point to content by its location, telling our computers to find and retrieve things from those locations using the http protocol. By contrast, DWeb protocols use links that identify information based on its content – what it is rather than where it is. This content-addressed approach makes it possible for websites and files to be stored and passed around in many ways from computer to computer rather than always relying on a single server as the one conduit for exchanging information. “[In the traditional web] we are pointing to this location and pretending [the information] exists in only one place,” says Zumwalt. “And from this comes this whole monopolisation that has followed… because whoever controls the location controls access to the information.”…(More)”.

The Known Known


Book Review by Sue Halpern in The New York Review of Books of The Known Citizen: A History of Privacy in Modern America by Sarah E. Igo; Habeas Data: Privacy vs. the Rise of Surveillance Tech by Cyrus Farivar;  Beyond Abortion: Roe v. Wade and the Battle for Privacy by Mary Ziegler; Privacy’s Blueprint: The Battle to Control the Design of New Technologies by Woodrow Hartzog: “In 1999, when Scott McNealy, the founder and CEO of Sun Microsystems, declared, “You have zero privacy…get over it,” most of us, still new to the World Wide Web, had no idea what he meant. Eleven years later, when Mark Zuckerberg said that “the social norms” of privacy had “evolved” because “people [had] really gotten comfortable not only sharing more information and different kinds, but more openly and with more people,” his words expressed what was becoming a common Silicon Valley trope: privacy was obsolete.

By then, Zuckerberg’s invention, Facebook, had 500 million users, was growing 4.5 percent a month, and had recently surpassed its rival, MySpace. Twitter had overcome skepticism that people would be interested in a zippy parade of 140-character posts; at the end of 2010 it had 54 million active users. (It now has 336 million.) YouTube was in its fifth year, the micro-blogging platform Tumblr was into its third, and Instagram had just been created. Social media, which encouraged and relied on people to share their thoughts, passions, interests, and images, making them the Web’s content providers, were ascendant.

Users found it empowering to bypass, and even supersede, the traditional gatekeepers of information and culture. The social Web appeared to bring to fruition the early promise of the Internet: that it would democratize the creation and dissemination of knowledge. If, in the process, individuals were uploading photos of drunken parties, and discussing their sexual fetishes, and pulling back the curtain on all sorts of previously hidden personal behaviors, wasn’t that liberating, too? How could anyone argue that privacy had been invaded or compromised or effaced when these revelations were voluntary?

The short answer is that they couldn’t. And they didn’t. Users, who in the early days of social media were predominantly young, were largely guileless and unconcerned about privacy. In a survey of sixty-four of her students at Rochester Institute of Technology in 2006, Susan Barnes found that they “wanted to keep information private, but did not seem to realize that Facebook is a public space.” When a random sample of young people was asked in 2007 by researchers from the Pew Research Center if “they had any concerns about publicly posted photos, most…said they were not worried about risks to their privacy.” (This was largely before Facebook and other tech companies began tracking and monetizing one’s every move on- and offline.)

In retrospect, the tendencies toward disclosure and prurience online should not have been surprising….(More)”.

Long Term Info-structure


Long Now Foundation Seminar by Juan Benet: “We live in a spectacular time,”…”We’re a century into our computing phase transition. The latest stages have created astonishing powers for individuals, groups, and our species as a whole. We are also faced with accumulating dangers — the capabilities to end the whole humanity experiment are growing and are ever more accessible. In light of the promethean fire that is computing, we must prevent bad outcomes and lock in good ones to build robust foundations for our knowledge, and a safe future. There is much we can do in the short-term to secure the long-term.”

“I come from the front lines of computing platform design to share a number of new super-powers at our disposal, some old challenges that are now soluble, and some new open problems. In this next decade, we’ll need to leverage peer-to-peer networks, crypto-economics, blockchains, Open Source, Open Services, decentralization, incentive-structure engineering, and so much more to ensure short-term safety and the long-term flourishing of humanity.”

Juan Benet is the inventor of the InterPlanetary File System (IPFS)—a new protocol which uses content-addressing to make the web faster, safer, and more open—and the creator of Filecoin, a cryptocurrency-incentivized storage market….(More + Video)”

Making a Smart City a Fairer City: Chicago’s Technologists Address Issues of Privacy, Ethics, and Equity, 2011-2018


Case study by Gabriel Kuris and Steven S. Strauss at Innovations for Successful Societies: “In 2011, voters in Chicago elected Rahm Emanuel, a 51-year-old former Chicago congressman, as their new mayor. Emanuel inherited a city on the upswing after years of decline but still marked by high rates of crime and poverty, racial segregation, and public distrust in government. The Emanuel administration hoped to harness the city’s trove of digital data to improve Chicagoans’ health, safety, and quality of life. During the next several years, Chief Data Officer Brett Goldstein and his successor Tom Schenk led innovative uses of city data, ranging from crisis management to the statistical targeting of restaurant inspections and pest extermination. As their teams took on more-sophisticated projects that predicted lead-poisoning risks and Escherichia coli outbreaks and created a citywide network of ambient sensors, the two faced new concerns about normative issues like privacy, ethics, and equity. By 2018, Chicago had won acclaim as a smarter city, but was it a fairer city? This case study discusses some of the approaches the city developed to address those challenges and manage the societal implications of cutting-edge technologies….(More)”.

Protecting the Confidentiality of America’s Statistics: Adopting Modern Disclosure Avoidance Methods at the Census Bureau


John Abowd at US Census: “…Throughout our history, we have been leaders in statistical data protection, which we call disclosure avoidance. Other statistical agencies use the terms “disclosure limitation” and “disclosure control.” These terms are all synonymous. Disclosure avoidance methods have evolved since the censuses of the early 1800s, when the only protection used was simply removing names. Executive orders, and a series of laws modified the legal basis for these protections, which were finally codified in the 1954 Census Act (13 U.S.C. Sections 8(b) and 9). We have continually added better and stronger protections to keep the data we publish anonymous and underlying records confidential.

However, historical methods cannot completely defend against the threats posed by today’s technology. Growth in computing power, advances in mathematics, and easy access to large, public databases pose a significant threat to confidentiality. These forces have made it possible for sophisticated users to ferret out common data points between databases using only our published statistics. If left unchecked, those users might be able to stitch together these common threads to identify the people or businesses behind the statistics as was done in the case of the Netflix Challenge.

The Census Bureau has been addressing these issues from every feasible angle and changing rapidly with the times to ensure that we protect the data our census and survey respondents provide us. We are doing this by moving to a new, advanced, and far more powerful confidentiality protection system, which uses a rigorous mathematical process that protects respondents’ information and identity in all of our publications.

The new tool is based on the concept known in scientific and academic circles as “differential privacy.” It is also called “formal privacy” because it provides provable mathematical guarantees, similar to those found in modern cryptography, about the confidentiality protections that can be independently verified without compromising the underlying protections.

“Differential privacy” is based on the cryptographic principle that an attacker should not be able to learn any more about you from the statistics we publish using your data than from statistics that did not use your data. After tabulating the data, we apply carefully constructed algorithms to modify the statistics in a way that protects individuals while continuing to yield accurate results. We assume that everyone’s data are vulnerable and provide the same strong, state-of-the-art protection to every record in our database.

The Census Bureau did not invent the science behind differential privacy. However, we were the first organization anywhere to use it when we incorporated differential privacy into the OnTheMap application in 2008. It was used in this event to protect block-level residential population data. Recently, Google, Apple, Microsoft, and Uber have all followed the Census Bureau’s lead, adopting differentially privacy systems as the standard for protecting user data confidentiality inside their browsers (Chrome), products (iPhones), operating systems (Windows 10), and apps (Uber)….(More)”.

Origin Privacy: Protecting Privacy in the Big-Data Era


Paper by Helen Nissenbaum, Sebastian Benthall, Anupam Datta, Michael Carl Tschantz, and Piot Mardziel: “Machine learning over big data poses challenges for our conceptualization of privacy. Such techniques can discover surprising and counteractive associations that take innocent looking data and turns it into important inferences about a person. For example, the buying carbon monoxide monitors has been linked to paying credit card bills, while buying chrome-skull car accessories predicts not doing so. Also, Target may have used the buying of scent-free hand lotion and vitamins as a sign that the buyer is pregnant. If we take pregnancy status to be private and assume that we should prohibit the sharing information that can reveal that fact, then we have created an unworkable notion of privacy, one in which sharing any scrap of data may violate privacy.

Prior technical specifications of privacy depend on the classification of certain types of information as private or sensitive; privacy policies in these frameworks limit access to data that allow inference of this sensitive information. As the above examples show, today’s data rich world creates a new kind of problem: it is difficult if not impossible to guarantee that information does notallow inference of sensitive topics. This makes information flow rules based on information topic unstable.

We address the problem of providing a workable definition of private data that takes into account emerging threats to privacy from large-scale data collection systems. We build on Contextual Integrity and its claim that privacy is appropriate information flow, or flow according to socially or legally specified rules.

As in other adaptations of Contextual Integrity (CI) to computer science, the parameterization of social norms in CI is translated into a logical specification. In this work, we depart from CI by considering rules that restrict information flow based on its origin and provenance, instead of on it’s type, topic, or subject.

We call this concept of privacy as adherence to origin-based rules Origin Privacy. Origin Privacy rules can be found in some existing data protection laws. This motivates the computational implementation of origin-based rules for the simple purpose of compliance engineering. We also formally model origin privacy to determine what security properties it guarantees relative to the concerns that motivate it….(More)”.

Sharing the benefits: How to use data effectively in the public sector


Report by Sarah Timmis, Luke Heselwood and Eleonora Harwich (for Reform UK): “This report demonstrates the potential of data sharing to transform the delivery of public services and improve outcomes for citizens. It explores how government can overcome various challenges to ‘get data right’ and enable better use of personal data within and between public-sector organisations.

Ambition meets reality

Government is set on using data more effectively to help deliver better public services. Better use of data can improve the design, efficiency and outcomes of services. For example, sharing data digitally between GPs and hospitals can enable early identification of patients most at risk of hospital admission, which has reduced admissions by up to 30 per cent in Somerset. Bristol’s Homeless Health Service allows access to medical, psychiatric, social and prison data, helping to provide a clearer picture of the complex issues facing the city’s homeless population. However, government has not yet created a clear data infrastructure, which would allow data to be shared across multiple public services, meaning efforts on the ground have not always delivered results.

The data: sticking points

Several technical challenges must be overcome to create the right data infrastructure. Individual pieces of data must be presented in standard formats to enable sharing within and across services. Data quality can be improved at the point of data collection, through better monitoring of data quality and standards within public-sector organisations and through data-curation-processes. Personal data also needs to be presented in a given format so linking data is possible in certain instances to identify individuals. Interoperability issues and legacy systems act as significant barriers to data linking. The London Metropolitan Police alone use 750 different systems, many of which are incompatible. Technical solutions, such as Application Programming Interfaces (APIs) can be overlaid on top of legacy systems to improve interoperability and enable data sharing. However, this is only possible with the right standards and a solid new data model. To encourage competition and improve interoperability in the longer term, procurement rules should make interoperability a prerequisite for competing companies, allowing customers to integrate their choices of the most appropriate products from different vendors.

Building trustworthiness

The ability to share data at scale through the internet has brought new threats to the security and privacy of personal information that amplifies the need for trust between government and citizens and across government departments. Currently, just 9 per cent of people feel that the Government has their best interests at heart when data sharing, and only 15 per cent are confident that government organisations would deal well with a cyber-attack. Considering attitudes towards data sharing are time and context dependent, better engagement with citizens and clearer explanations of when and why data is used can help build confidence. Auditability is also key to help people and organisations track how data is used to ensure every interaction with personal data is auditable, transparent and secure. …(More)”.