Balancing Act: Innovation vs. Privacy in the Age of Data Portability


Thursday, July 12, 2018 @ 2 MetroTech Center, Brooklyn, NY 11201

RSVP here.

The ability of people to move or copy data about themselves from one service to another — data portability — has been hailed as a way of increasing competition and driving innovation. In many areas, such as through the Open Banking initiative in the United Kingdom, the practice of data portability is fully underway and propagating. The launch of GDPR in Europe has also elevated the issue among companies and individuals alike. But recent online security breaches and other experiences of personal data being transferred surreptitiously from private companies, (e.g., Cambridge Analytica’s appropriation of Facebook data), highlight how data portability can also undermine people’s privacy.

The GovLab at the NYU Tandon School of Engineering is pleased to present Jeni Tennison, CEO of the Open Data Institute, for its next Ideas Lunch, where she will discuss how data portability has been regulated in the UK and Europe, and what governments, businesses and people need to do to strike the balance between its risks and benefits.

Jeni Tennison is the CEO of the Open Data Institute. She gained her PhD from the University of Nottingham then worked as an independent consultant, specialising in open data publishing and consumption, before joining the ODI in 2012. Jeni was awarded an OBE for services to technology and open data in the 2014 New Year Honours.

Before joining the ODI, Jeni was the technical architect and lead developer for legislation.gov.uk. She worked on the early linked data work on data.gov.uk, including helping to engineer new standards for publishing statistics as linked data. She continues her work within the UK’s public sector as a member of the Open Standards Board.

Jeni also works on international web standards. She was appointed to serve on the W3C’s Technical Architecture Group from 2011 to 2015 and in 2014 she started to co-chair the W3C’s CSV on the Web Working Group. She also sits on the Advisory Boards for Open Contracting Partnership and the Data Transparency Lab.

Twitter handle: @JeniT

We Need to Save Ignorance From AI


Christina Leuker and Wouter van den Bos in Nautilus:  “After the fall of the Berlin Wall, East German citizens were offered the chance to read the files kept on them by the Stasi, the much-feared Communist-era secret police service. To date, it is estimated that only 10 percent have taken the opportunity.

In 2007, James Watson, the co-discoverer of the structure of DNA, asked that he not be given any information about his APOE gene, one allele of which is a known risk factor for Alzheimer’s disease.

Most people tell pollsters that, given the choice, they would prefer not to know the date of their own death—or even the future dates of happy events.

Each of these is an example of willful ignorance. Socrates may have made the case that the unexamined life is not worth living, and Hobbes may have argued that curiosity is mankind’s primary passion, but many of our oldest stories actually describe the dangers of knowing too much. From Adam and Eve and the tree of knowledge to Prometheus stealing the secret of fire, they teach us that real-life decisions need to strike a delicate balance between choosing to know, and choosing not to.

But what if a technology came along that shifted this balance unpredictably, complicating how we make decisions about when to remain ignorant? That technology is here: It’s called artificial intelligence.

AI can find patterns and make inferences using relatively little data. Only a handful of Facebook likes are necessary to predict your personality, race, and gender, for example. Another computer algorithm claims it can distinguish between homosexual and heterosexual men with 81 percent accuracy, and homosexual and heterosexual women with 71 percent accuracy, based on their picture alone. An algorithm named COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) can predict criminal recidivism from data like juvenile arrests, criminal records in the family, education, social isolation, and leisure activities with 65 percent accuracy….

Recently, though, the psychologist Ralph Hertwig and legal scholar Christoph Engel have published an extensive taxonomy of motives for deliberate ignorance. They identified two sets of motives, in particular, that have a particular relevance to the need for ignorance in the face of AI.

The first set of motives revolves around impartiality and fairness. Simply put, knowledge can sometimes corrupt judgment, and we often choose to remain deliberately ignorant in response. For example, peer reviews of academic papers are usually anonymous. Insurance companies in most countries are not permitted to know all the details of their client’s health before they enroll; they only know general risk factors. This type of consideration is particularly relevant to AI, because AI can produce highly prejudicial information….(More)”.

Personal Data v. Big Data: Challenges of Commodification of Personal Data


Maria Bottis and  George Bouchagiar in the Open Journal of Philosophy: “Any firm today may, at little or no cost, build its own infrastructure to process personal data for commercial, economic, political, technological or any other purposes. Society has, therefore, turned into a privacy-unfriendly environment. The processing of personal data is essential for multiple economically and socially useful purposes, such as health care, education or terrorism prevention. But firms view personal data as a commodity, as a valuable asset, and heavily invest in processing for private gains. This article studies the potential to subject personal data to trade secret rules, so as to ensure the users’ control over their data without limiting the data’s free movement, and examines some positive scenarios of attributing commercial value to personal data….(More)”.

Against the Dehumanisation of Decision-Making – Algorithmic Decisions at the Crossroads of Intellectual Property, Data Protection, and Freedom of Information


Paper by Guido Noto La Diega: “Nowadays algorithms can decide if one can get a loan, is allowed to cross a border, or must go to prison. Artificial intelligence techniques (natural language processing and machine learning in the first place) enable private and public decision-makers to analyse big data in order to build profiles, which are used to make decisions in an automated way.

This work presents ten arguments against algorithmic decision-making. These revolve around the concepts of ubiquitous discretionary interpretation, holistic intuition, algorithmic bias, the three black boxes, psychology of conformity, power of sanctions, civilising force of hypocrisy, pluralism, empathy, and technocracy.

The lack of transparency of the algorithmic decision-making process does not stem merely from the characteristics of the relevant techniques used, which can make it impossible to access the rationale of the decision. It depends also on the abuse of and overlap between intellectual property rights (the “legal black box”). In the US, nearly half a million patented inventions concern algorithms; more than 67% of the algorithm-related patents were issued over the last ten years and the trend is increasing.

To counter the increased monopolisation of algorithms by means of intellectual property rights (with trade secrets leading the way), this paper presents three legal routes that enable citizens to ‘open’ the algorithms.

First, copyright and patent exceptions, as well as trade secrets are discussed.

Second, the GDPR is critically assessed. In principle, data controllers are not allowed to use algorithms to take decisions that have legal effects on the data subject’s life or similarly significantly affect them. However, when they are allowed to do so, the data subject still has the right to obtain human intervention, to express their point of view, as well as to contest the decision. Additionally, the data controller shall provide meaningful information about the logic involved in the algorithmic decision.

Third, this paper critically analyses the first known case of a court using the access right under the freedom of information regime to grant an injunction to release the source code of the computer program that implements an algorithm.

Only an integrated approach – which takes into account intellectual property, data protection, and freedom of information – may provide the citizen affected by an algorithmic decision of an effective remedy as required by the Charter of Fundamental Rights of the EU and the European Convention on Human Rights….(More)”.

Ghost Cities: Built but Never Inhabited


Civic Data Design Lab at UrbanNext: “Ghost Cities are vacant neighborhoods and sometimes whole cities that were built but were never inhabited. Their existence is a physical manifestation of Chinese overdevelopment in real estate and the dependence on housing as an investment strategy. Little data exists which establishes the location and extent of these Ghost Cities in China. MIT’s Civic Data Design Lab developed a model using data scraped from Chinese social media sites and Baidu (Chinese Google Maps) to create one of the first maps identifying the locations of Chinese Ghost Cities….

Quantifying the extent and location of Ghost Cities is complicated by the fact that the Chinese government keeps a tight hold on data about sales and occupancy of buildings. Even local planners may have a hard time acquiring it. The Civic Data Design Lab developed a model to identify Ghost Cities based on the idea that amenities (grocery stores, hair salons, restaurants, schools, retail, etc.) are the mark of a healthy community and the lack of amenities might indicate locations where no one lives. Given the lack of openly available data in China, data was scraped from Chinese social media and websites, including Dianping (Chinese Yelp), Amap (Chinese Map Quest), Fang (Chinese Zillow), and Baidu (Chinese Google Maps) using openly accessible Application Programming Interfaces(APIs). 

Using data scraped from social media sites in Chengdu and Shenyang, the model was tested using 300 m x 300 m grid cells marking residential locations. Each grid cell was given an amenity accessibility score based on the distance and clustering of amenities nearby. Residential areas that had a cluster of low scores were marked as Ghost Cities. The results were ground-truthed through site visits documenting the location using aerial photography from drones and interviews with local stakeholders.

The model worked well at documenting under-utilized residential locations in these Chinese cities, picking up everything from vacant housing and stalled construction to abandoned older residential locations, creating the first data set that marks risk in the Chinese real estate market. The research shows that data available through social media can help locate and estimate risk in the Chinese real estate market. Perhaps more importantly, however, identifying where these areas are concentrated can help city planners, developers and local citizens make better investment decisions and address the risk created by these under-utilized developments….(More)”.

When Technology Gets Ahead of Society


Tarun Khanna at Harvard Business Review: “Drones, originally developed for military purposes, weren’t approved for commercial use in the United States until 2013. When that happened, it was immediately clear that they could be hugely useful to a whole host of industries—and almost as quickly, it became clear that regulation would be a problem. The new technology raised multiple safety and security issues, there was no consensus on who should write rules to mitigate those concerns, and the knowledge needed to develop the rules didn’t yet exist in many cases. In addition, the little flying robots made a lot of people nervous.

Such regulatory, logistical, and social barriers to adopting novel products and services are very common. In fact, technology routinely outstrips society’s ability to deal with it. That’s partly because tech entrepreneurs are often insouciant about the legal and social issues their innovations birth. Although electric cars are subsidized by the federal government, Tesla has run afoul of state and local regulations because it bypasses conventional dealers to sell directly to consumers. Facebook is only now facing up to major regulatory concerns about its use of data, despite being massively successful with users and advertisers.

It’s clear that even as innovations bring unprecedented comfort and convenience, they also threaten old ways of regulating industries, running a business, and making a living. This has always been true. Thus early cars weren’t allowed to go faster than horses, and some 19th-century textile workers used sledgehammers to attack the industrial machinery they feared would displace them. New technology can even upend social norms: Consider how dating apps have transformed the way people meet.

Entrepreneurs, of course, don’t really care that the problems they’re running into are part of a historical pattern. They want to know how they can manage—and shorten—the period between the advent of a technology and the emergence of the rules and new behaviors that allow society to embrace its possibilities.

Interestingly, the same institutional murkiness that pervades nascent industries such as drones and driverless cars is something I’ve also seen in developing countries. And strange though this may sound, I believe that tech entrepreneurs can learn a lot from businesspeople who have succeeded in the world’s emerging markets.

Entrepreneurs in Brazil or Nigeria know that it’s pointless to wait for the government to provide the institutional and market infrastructure their businesses need, because that will simply take too long. They themselves must build support structures to compensate for what Krishna Palepu and I have referred to in earlier writings as “institutional voids.” They must create the conditions that will allow them to create successful products or services.

Tech-forward entrepreneurs in developed economies may want to believe that it’s not their job to guide policy makers and the public—but the truth is that nobody else can play that role. They may favor hardball tactics, getting ahead by evading rules, co-opting regulators, or threatening to move overseas. But in the long term, they’d be wiser to use soft power, working with a range of partners to co-create the social and institutional fabric that will support their growth—as entrepreneurs in emerging markets have done.…(More)”.

Wikipedia vandalism could thwart hoax-busting on Google, YouTube and Facebook


Daniel Funke at Poynter: “For a brief moment, the California Republican Party supported Nazism. At least, that’s what Google said.

That’s because someone vandalized the Wikipedia page for the party on May 31 to list “Nazism” alongside ideologies like “Conservatism,” “Market liberalism” and “Fiscal conservatism.” The mistake was removed from search results, with Google clarifying to Vice News that the search engine had failed to catch the vandalism in the Wikipedia entry….

Google has long drawn upon the online encyclopedia for appending basic information to search results. According to the edit log for the California GOP page, someone added “Nazism” to the party’s ideology section around 7:40 UTC on May 31. The edit was removed within a minute, but it appears Google’s algorithm scraped the page just in time for the fake.

“Sometimes people vandalize public information sources, like Wikipedia, which can impact the information that appears in search,” a Google spokesperson told Poynter in an email. “We have systems in place that catch vandalism before it impacts search results, but occasionally errors get through, and that’s what happened here.”…

According to Google, more than 99.9 percent of Wikipedia edits that show up in Knowledge Panels, which display basic information about searchable keywords at the top of results, aren’t vandalism. The user who authored the original edit to the California GOP’s page did not use a user profile, making them hard to track down.

That’s a common tactic among people who vandalize Wikipedia pages, a practice the nonprofit has documented extensively. But given the volume of edits that are made on Wikipedia — about 10 per second, with 600 new pages per day — and the fact that Facebook and YouTube are now pulling from them to provide more context to posts, the potential for and effect of abuse is high….(More)”.

Ontario is trying a wild experiment: Opening access to its residents’ health data


Dave Gershorn at Quartz: “The world’s most powerful technology companies have a vision for the future of healthcare. You’ll still go to your doctor’s office, sit in a waiting room, and explain your problem to someone in a white coat. But instead of relying solely on their own experience and knowledge, your doctor will consult an algorithm that’s been trained on the symptoms, diagnoses, and outcomes of millions of other patients. Instead of a radiologist reading your x-ray, a computer will be able to detect minute differences and instantly identify a tumor or lesion. Or at least that’s the goal.

AI systems like these, currently under development by companies including Google and IBM, can’t read textbooks and journals, attend lectures, and do rounds—they need millions of real life examples to understand all the different variations between one patient and another. In general, AI is only as good as the data it’s trained on, but medical data is exceedingly private—most developed countries have strict health data protection laws, such as HIPAA in the United States….

These approaches, which favor companies with considerable resources, are pretty much the only way to get large troves of health data in the US because the American health system is so disparate. Healthcare providers keep personal files on each of their patients, and can only transmit them to other accredited healthcare workers at the patient’s request. There’s no single place where all health data exists. It’s more secure, but less efficient for analysis and research.

Ontario, Canada, might have a solution, thanks to its single-payer healthcare system. All of Ontario’s health data exists in a few enormous caches under government control. (After all, the government needs to keep track of all the bills its paying.) Similar structures exist elsewhere in Canada, such as Quebec, but Toronto, which has become a major hub for AI research, wants to lead the charge in providing this data to businesses.

Until now, the only people allowed to study this data were government organizations or researchers who partnered with the government to study disease. But Ontario has now entrusted the MaRS Discovery District—a cross between a tech incubator and WeWork—to build a platform for approved companies and researchers to access this data, dubbed Project Spark. The project, initiated by MaRS and Canada’s University Health Network, began exploring how to share this data after both organizations expressed interest to the government about giving broader health data access to researchers and companies looking to build healthcare-related tools.

Project Spark’s goal is to create an API, or a way for developers to request information from the government’s data cache. This could be used to create an app for doctors to access the full medical history of a new patient. Ontarians could access their health records at any time through similar software, and catalog health issues as they occur. Or researchers, like the ones trying to build AI to assist doctors, could request a different level of access that provides anonymized data on Ontarians who meet certain criteria. If you wanted to study every Ontarian who had Alzheimer’s disease over the last 40 years, that data would only be authorization and a few lines of code away.

There are currently 100 companies lined up to get access to data, comprised of health records from Ontario’s 14 million residents. (MaRS won’t say who the companies are). …(More)”

AI Nationalism


Blog by Ian Hogarth: “The central prediction I want to make and defend in this post is that continued rapid progress in machine learning will drive the emergence of a new kind of geopolitics; I have been calling it AI Nationalism. Machine learning is an omni-use technology that will come to touch all sectors and parts of society.

The transformation of both the economy and the military by machine learning will create instability at the national and international level forcing governments to act. AI policy will become the single most important area of government policy. An accelerated arms race will emerge between key countries and we will see increased protectionist state action to support national champions, block takeovers by foreign firms and attract talent. I use the example of Google, DeepMind and the UK as a specific example of this issue.

This arms race will potentially speed up the pace of AI development and shorten the timescale for getting to AGI. Although there will be many common aspects to this techno-nationalist agenda, there will also be important state specific policies. There is a difference between predicting that something will happen and believing this is a good thing. Nationalism is a dangerous path, particular when the international order and international norms will be in flux as a result and in the concluding section I discuss how a period of AI Nationalism might transition to one of global cooperation where AI is treated as a global public good….(More)”.

Big Data and AI – A transformational shift for government: So, what next for research?


Irina Pencheva, Marc Esteve and Slava Jenkin Mikhaylov in Public Policy and Administration: “Big Data and artificial intelligence will have a profound transformational impact on governments around the world. Thus, it is important for scholars to provide a useful analysis on the topic to public managers and policymakers. This study offers an in-depth review of the Policy and Administration literature on the role of Big Data and advanced analytics in the public sector. It provides an overview of the key themes in the research field, namely the application and benefits of Big Data throughout the policy process, and challenges to its adoption and the resulting implications for the public sector. It is argued that research on the subject is still nascent and more should be done to ensure that the theory adds real value to practitioners. A critical assessment of the strengths and limitations of the existing literature is developed, and a future research agenda to address these gaps and enrich our understanding of the topic is proposed…(More)”.