Data-Intensive Approaches To Creating Innovation For Sustainable Smart Cities


Science Trends: “Located at the complex intersection of economic development and environmental change, cities play a central role in our efforts to move towards sustainability. Reducing air and water pollution, improving energy efficiency while securing energy supply, and minimizing vulnerabilities to disruptions and disturbances are interconnected and pose a formidable challenge, with their dynamic interactions changing in highly complex and unpredictable manners….

The Beijing City Lab demonstrates the usefulness of open urban data in mapping urbanization with a fine spatiotemporal scale and reflecting social and environmental dimensions of urbanization through visualization at multiple scales.

The basic principle of open data will generate significant opportunities for promoting inter-disciplinary and inter-organizational research, producing new data sets through the integration of different sources, avoiding duplication of research, facilitating the verification of previous results, and encouraging citizen scientists and crowdsourcing approaches. Open data also is expected to help governments promote transparency, citizen participation, and access to information in policy-making processes.

Despite a significant potential, however, there still remain numerous challenges in facilitating innovation for urban sustainability through open data. The scope and amount of data collected and shared are still limited, and the quality control, error monitoring, and cleaning of open data is also indispensable in securing the reliability of the analysis. Also, the organizational and legal frameworks of data sharing platforms are often not well-defined or established, and it is critical to address the interoperability between various data standards, balance between open and proprietary data, and normative and legal issues such as the data ownership, personal privacy, confidentiality, law enforcement, and the maintenance of public safety and national security….

These findings are described in the article entitled Facilitating data-intensive approaches to innovation for sustainability: opportunities and challenges in building smart cities, published in the journal Sustainability Science. This work was led by Masaru Yarime from the City University of Hong Kong….(More)”.

Government data: How open is too open?


Sharon Fisher at HPE: “The notion of “open government” appeals to both citizens and IT professionals seeking access to freely available government data. But is there such a thing as data access being too open? Governments may want to be transparent, yet they need to avoid releasing personally identifiable information.

There’s no question that open government data offers many benefits. It gives citizens access to the data their taxes paid for, enables government oversight, and powers the applications developed by government, vendors, and citizens that improve people’s lives.

However, data breaches and concerns about the amount of data that government is collecting makes some people wonder: When is it too much?

“As we think through the big questions about what kind of data a state should collect, how it should use it, and how to disclose it, these nuances become not some theoretical issue but a matter of life and death to some people,” says Alexander Howard, deputy director of the Sunlight Foundation, a Washington nonprofit that advocates for open government. “There are people in government databases where the disclosure of their [physical] location is the difference between a life-changing day and Wednesday.

Open data supporters point out that much of this data has been considered a public record all along and tout the value of its use in analytics. But having personal data aggregated in a single place that is accessible online—as opposed to, say, having to go to an office and physically look up each record—makes some people uneasy.

Privacy breaches, wholesale

“We’ve seen a real change in how people perceive privacy,” says Michael Morisy, executive director at MuckRock, a Cambridge, Massachusetts, nonprofit that helps media and citizens file public records requests. “It’s been driven by a long-standing concept in transparency: practical obscurity.” Even if something was technically a public record, effort needed to be expended to get one’s hands on it. That amount of work might be worth it about, say, someone running for office, but on the whole, private citizens didn’t have to worry. Things are different now, says Morisy. “With Google, and so much data being available at the click of a mouse or the tap of a phone, what was once practically obscure is now instantly available.”

People are sometimes also surprised to find out that public records can contain their personally identifiable information (PII), such as addresses, phone numbers, and even Social Security numbers. That may be on purpose or because someone failed to redact the data properly.

That’s had consequences. Over the years, there have been a number of incidents in which PII from public records, including addresses, was used to harass and sometimes even kill people. For example, in 1989, Rebecca Schaeffer was murdered by a stalker who learned her address from the Department of Motor Vehicles. Other examples of harassment via driver’s license numbers include thieves who tracked down the address of owners of expensive cars and activists who sent anti-abortion literature to women who had visited health clinics that performed abortions.

In response, in 1994, Congress enacted the Driver’s Privacy Protection Act to restrict the sale of such data. More recently, the state of Idaho passed a law protecting the identity of hunters who shot wolves, because the hunters were being harassed by wolf supporters. Similarly, the state of New York allowed concealed pistol permit holders to make their name and address private after a newspaper published an online interactive map showing the names and addresses of all handgun permit holders in Westchester and Rockland counties….(More)”.

New York City moves to create accountability for algorithms


Lauren Kirchner at ArsTechnica: “The algorithms that play increasingly central roles in our lives often emanate from Silicon Valley, but the effort to hold them accountable may have another epicenter: New York City. Last week, the New York City Council unanimously passed a bill to tackle algorithmic discrimination—the first measure of its kind in the country.

The algorithmic accountability bill, waiting to be signed into law by Mayor Bill de Blasio, establishes a task force that will study how city agencies use algorithms to make decisions that affect New Yorkers’ lives, and whether any of the systems appear to discriminate against people based on age, race, religion, gender, sexual orientation, or citizenship status. The task force’s report will also explore how to make these decision-making processes understandable to the public.

The bill’s sponsor, Council Member James Vacca, said he was inspired by ProPublica’s investigation into racially biased algorithms used to assess the criminal risk of defendants….

A previous, more sweeping version of the bill had mandated that city agencies publish the source code of all algorithms being used for “targeting services” or “imposing penalties upon persons or policing” and to make them available for “self-testing” by the public. At a hearing at City Hall in October, representatives from the mayor’s office expressed concerns that this mandate would threaten New Yorkers’ privacy and the government’s cybersecurity.

The bill was one of two moves the City Council made last week concerning algorithms. On Thursday, the committees on health and public safety held a hearing on the city’s forensic methods, including controversial tools that the chief medical examiner’s office crime lab has used for difficult-to-analyze samples of DNA.

As a ProPublica/New York Times investigation detailed in September, an algorithm created by the lab for complex DNA samples has been called into question by scientific experts and former crime lab employees.

The software, called the Forensic Statistical Tool, or FST, has never been adopted by any other lab in the country….(More)”.

Normative Challenges of Identification in the Internet of Things: Privacy, Profiling, Discrimination, and the GDPR


Paper by Sandra Wachter: “In the Internet of Things (IoT), identification and access control technologies provide essential infrastructure to link data between a user’s devices with unique identities, and provide seamless and linked up services. At the same time, profiling methods based on linked records can reveal unexpected details about users’ identity and private life, which can conflict with privacy rights and lead to economic, social, and other forms of discriminatory treatment. A balance must be struck between identification and access control required for the IoT to function and user rights to privacy and identity. Striking this balance is not an easy task because of weaknesses in cybersecurity and anonymisation techniques.

The EU General Data Protection Regulation (GDPR), set to come into force in May 2018, may provide essential guidance to achieve a fair balance between the interests of IoT providers and users. Through a review of academic and policy literature, this paper maps the inherit tension between privacy and identifiability in the IoT.

It focuses on four challenges: (1) profiling, inference, and discrimination; (2) control and context-sensitive sharing of identity; (3) consent and uncertainty; and (4) honesty, trust, and transparency. The paper will then examine the extent to which several standards defined in the GDPR will provide meaningful protection for privacy and control over identity for users of IoT. The paper concludes that in order to minimise the privacy impact of the conflicts between data protection principles and identification in the IoT, GDPR standards urgently require further specification and implementation into the design and deployment of IoT technologies….(More)”.

Research reveals de-identified patient data can be re-identified


Vanessa Teague, Chris Culnane and Ben Rubinstein in PhysOrg: “In August 2016, Australia’s federal Department of Health published medical billing records of about 2.9 million Australians online. These records came from the Medicare Benefits Scheme (MBS) and the Pharmaceutical Benefits Scheme (PBS) containing 1 billion lines of historical health data from the records of around 10 per cent of the population.

These longitudinal records were de-identified, a process intended to prevent a person’s identity from being connected with information, and were made public on the government’s open data website as part of its policy on accessible public 

We found that patients can be re-identified, without decryption, through a process of linking the unencrypted parts of the  with known information about the individual.

Our findings replicate those of similar studies of other de-identified datasets:

  • A few mundane facts taken together often suffice to isolate an individual.
  • Some patients can be identified by name from publicly available information.
  • Decreasing the precision of the data, or perturbing it statistically, makes re-identification gradually harder at a substantial cost to utility.

The first step is examining a patient’s uniqueness according to medical procedures such as childbirth. Some individuals are unique given public information, and many patients are unique given a few basic facts, such as year of birth or the date a baby was delivered….

The second step is examining uniqueness according to the characteristics of commercial datasets we know of but cannot access directly. There are high uniqueness rates that would allow linking with a commercial pharmaceutical dataset, and with the billing data available to a bank. This means that ordinary people, not just the prominent ones, may be easily re-identifiable by their bank or insurance company…

These de-identification methods were bound to fail, because they were trying to achieve two inconsistent aims: the protection of individual privacy and publication of detailed individual records. De-identification is very unlikely to work for other rich datasets in the government’s care, like census data, tax records, mental health records, penal information and Centrelink data.

While the ambition of making more data more easily available to facilitate research, innovation and sound public policy is a good one, there is an important technical and procedural problem to solve: there is no good solution for publishing sensitive complex individual records that protects privacy without substantially degrading the usefulness of the data.

Some data can be safely published online, such as information about government, aggregations of large collections of material, or data that is differentially private. For sensitive, complex data about individuals, a much more controlled release in a secure research environment is a better solution. The Productivity Commission recommends a “trusted user” model, and techniques like dynamic consent also give patients greater control and visibility over their personal information….(More).

Accelerating the Sharing of Data Across Sectors to Advance the Common Good


Paper by Robert M. Groves and Adam Neufeld: “The public pays for and provides an incredible amount of data to governments and companies. Yet much of the value of this data is being wasted, remaining in silos rather than being shared to enhance the common good—whether it’s helping governments to stop opioid addiction or helping companies predict and meet the demand for electric or autonomous vehicles.

  • Many companies and governments are interested in sharing more of their data with each other; however, right now the process of sharing is very time consuming and can pose great risks since it often involves sharing full data sets with another entity
  • We need intermediaries to design safe environments to facilitate data sharing in the low-trust and politically sensitive context of companies and governments. These safe environments would exist outside the government, be transparent to the public, and use modern technologies and techniques to allow only statistical uses of data through temporary linkages in order to minimize the risk to individuals’ privacy.
  • Governments must lead the way in sharing more data by re-evaluating laws that limit sharing of data, and must embrace new technologies that could allow the private sector to receive at least some value from many sensitive data sets. By decreasing the cost and risks of sharing data, more data will be freed from their silos, and we will move closer to what we deserve—that our data are used for the greatest societal benefit….(More)”.

Sharing is Daring: An Experiment on Consent, Chilling Effects and a Salient Privacy Nudge


Hermstrüwer, Yoan and Dickert, Stephan at the International Review of Law and Economics: “Privacy law rests on the assumption that government surveillance may increase the general level of conformity and thus generate a chilling effect. In a study that combines elements of a lab and a field experiment, we show that salient and incentivized consent options are sufficient to trigger this behavioral effect. Salient ex ante consent options may lure people into giving up their privacy and increase their compliance with social norms – even when the only immediate risk of sharing information is mere publicity on a Google website. A right to be forgotten (right to deletion), however, seems to reduce neither privacy valuations nor chilling effects. In spite of low deletion costs people tend to stick with a retention default. The study suggests that consent architectures may play out on social conformity rather than on consent choices and privacy valuations. Salient notice and consent options may not merely empower users to make an informed consent decision. Instead, they can trigger the very effects that privacy law intends to curb….(More)”.

Transatlantic Data Privacy


Paul M. Schwartz and Karl-Nikolaus Peifer in Georgetown Law Journal: “International flows of personal information are more significant than ever, but differences in transatlantic data privacy law imperil this data trade. The resulting policy debate has led the EU to set strict limits on transfers of personal data to any non-EU country—including the United States—that lacks sufficient privacy protections. Bridging the transatlantic data divide is therefore a matter of the greatest significance.

In exploring this issue, this Article analyzes the respective legal identities constructed around data privacy in the EU and the United States. It identifies profound differences in the two systems’ images of the individual as bearer of legal interests. The EU has created a privacy culture around “rights talk” that protects its “datasubjects.” In the EU, moreover, rights talk forms a critical part of the postwar European project of creating the identity of a European citizen. In the United States, in contrast, the focus is on a “marketplace discourse” about personal information and the safeguarding of “privacy consumers.” In the United States, data privacy law focuses on protecting consumers in a data marketplace.

This Article uses its models of rights talk and marketplace discourse to analyze how the EU and United States protect their respective data subjects and privacy consumers. Although the differences are great, there is still a path forward. A new set of institutions and processes can play a central role in developing mutually acceptable standards of data privacy. The key documents in this regard are the General Data Protection Regulation, an EU-wide standard that becomes binding in 2018, and the Privacy Shield, an EU–U.S. treaty signed in 2016. These legal standards require regular interactions between the EU and United States and create numerous points for harmonization, coordination, and cooperation. The GDPR and Privacy Shield also establish new kinds of governmental networks to resolve conflicts. The future of international data privacy law rests on the development of new understandings of privacy within these innovative structures….(More)”.

Understanding Corporate Data Sharing Decisions: Practices, Challenges, and Opportunities for Sharing Corporate Data with Researchers


Leslie Harris at the Future of Privacy Forum: “Data has become the currency of the modern economy. A recent study projects the global volume of data to grow from about 0.8 zettabytes (ZB) in 2009 to more than 35 ZB in 2020, most of it generated within the last two years and held by the corporate sector.

As the cost of data collection and storage becomes cheaper and computing power increases, so does the value of data to the corporate bottom line. Powerful data science techniques, including machine learning and deep learning, make it possible to search, extract and analyze enormous sets of data from many sources in order to uncover novel insights and engage in predictive analysis. Breakthrough computational techniques allow complex analysis of encrypted data, making it possible for researchers to protect individual privacy, while extracting valuable insights.

At the same time, these newfound data sources hold significant promise for advancing scholarship and shaping more impactful social policies, supporting evidence-based policymaking and more robust government statistics, and shaping more impactful social interventions. But because most of this data is held by the private sector, it is rarely available for these purposes, posing what many have argued is a serious impediment to scientific progress.

A variety of reasons have been posited for the reluctance of the corporate sector to share data for academic research. Some have suggested that the private sector doesn’t realize the value of their data for broader social and scientific advancement. Others suggest that companies have no “chief mission” or public obligation to share. But most observers describe the challenge as complex and multifaceted. Companies face a variety of commercial, legal, ethical, and reputational risks that serve as disincentives to sharing data for academic research, with privacy – particularly the risk of reidentification – an intractable concern. For companies, striking the right balance between the commercial and societal value of their data, the privacy interests of their customers, and the interests of academics presents a formidable dilemma.

To be sure, there is evidence that some companies are beginning to share for academic research. For example, a number of pharmaceutical companies are now sharing clinical trial data with researchers, and a number of individual companies have taken steps to make data available as well. What is more, companies are also increasingly providing open or shared data for other important “public good” activities, including international development, humanitarian assistance and better public decision-making. Some are contributing to data collaboratives that pool data from different sources to address societal concerns. Yet, it is still not clear whether and to what extent this “new era of data openness” will accelerate data sharing for academic research.

Today, the Future of Privacy Forum released a new study, Understanding Corporate Data Sharing Decisions: Practices, Challenges, and Opportunities for Sharing Corporate Data with ResearchersIn this report, we aim to contribute to the literature by seeking the “ground truth” from the corporate sector about the challenges they encounter when they consider making data available for academic research. We hope that the impressions and insights gained from this first look at the issue will help formulate further research questions, inform the dialogue between key stakeholders, and identify constructive next steps and areas for further action and investment….(More)”.

Ethical questions in data journalism and the power of online discussion


David Craig, Stan Ketterer and Mohammad Yousuf at Data Driven Journalism: “One common element uniting data journalism projects, across different stories and locations, is the ethical challenges they present.

As scholars and practitioners of data journalism have pointed out, main issues include flawed datamisrepresentation from a lack of context, and privacy concerns. Contributors have discussed the ethics of data journalism on this site in posts about topics such as the use of pervasive datatransparency about editorial processes in computational journalism, and best practices for doing data journalism ethically.

Our research project looked at similar ethical challenges by examining journalists’ discussion of the controversial handling of publicly accessible gun permit data in two communities in the United States. The cases are not new now, but the issues they raise persist and point to opportunities – both to learn from online discussion of ethical issues and to ask a wide range of ethical questions about data journalism

The cases

Less than two weeks after the 2012 shooting deaths of 20 children and six staff members at Sandy Hook Elementary School in Newtown, Connecticut, a journalist at The Journal News in White Plains, New York, wrote a story about the possible expansion of publicly accessible gun permit data. The article was accompanied by three online maps with the locations of gun permit holders. The clickable maps of a two-county area in the New York suburbs also included the names and addresses of the gun permit holders. The detailed maps with personal information prompted a public outcry both locally and nationally, mainly involving privacy and safety concerns, and were subsequently taken down.

Although the 2012 case prompted the greatest attention, another New York newspaper reporter’s Freedom of Information request for a gun permit database for three counties sparked an earlier public outcry in 2008. The Glen Falls Post-Star’s editor published an editorial in response. “We here at The Post-Star find ourselves in the unusual position of responding to the concerns of our readers about something that has not even been published in our newspaper or Web site,” the editorial began. The editor said the request “drew great concern from members of gun clubs and people with gun permits in general, a concern we totally understand.”

Both of these cases prompted discussion among journalists, including participants in NICAR-L, the listserv of the National Institute for Computer-Assisted Reporting, whose subscribers include data journalists from major news organizations in the United States and around the world. Our study examined the content of three discussion threads with a total of 119 posts that focused mainly on ethical issues.

Key ethical issues

Several broad ethical issues, and specific themes related to those issues, appeared in the discussion.

1. Freedom versus responsibility and journalistic purpose..

2. Privacy and verification..

3. Consequences..

….(More)”

See also: David Craig, Stan Ketterer and Mohammad Yousuf, “To Post or Not to Post: Online Discussion of Gun Permit Mapping and the Development of Ethical Standards in Data Journalism,” Journalism & Mass Communication Quarterly