Data: The Lever to Promote Innovation in the EU


Blog Post by Juan Murillo Arias: “…But in order for data to truly become a lever that foments innovation in benefit of society as a whole, we must understand and address the following factors:

1. Disconnected, disperse sources. As users of digital services (transportation, finance, telecommunications, news or entertainment) we leave a different digital footprint for each service that we use. These footprints, which are different facets of the same polyhedron, can even be contradictory on occasion. For this reason, they must be seen as complementary. Analysts should be aware that they must cross data sources from different origins in order to create a reliable picture of our preferences, otherwise we will be basing decisions on partial or biased information. How many times do we receive advertising for items we have already purchased, or tourist destinations where we have already been? And this is just one example of digital marketing. When scoring financial solvency, or monitoring health, the more complete the digital picture is of the person, the more accurate the diagnosis will be.

Furthermore, from the user’s standpoint, proper management of their entire, disperse digital footprint is a challenge. Perhaps centralized consent would be very beneficial. In the financial world, the PSD2 regulations have already forced banks to open this information to other banks if customers so desire. Fostering competition and facilitating portability is the purpose, but this opening up has also enabled the development of new services of information aggregation that are very useful to financial services users. It would be ideal if this step of breaking down barriers and moving toward a more transparent market took place simultaneously in all sectors in order to avoid possible distortions to competition and by extension, consumer harm. Therefore, customer consent would open the door to building a more accurate picture of our preferences.

2. The public and private sectors’ asymmetric capacity to gather data.This is related to citizens using public services less frequently than private services in the new digital channels. However, governments could benefit from the information possessed by private companies. These anonymous, aggregated data can help to ensure a more dynamic public management. Even personal data could open the door to customized education or healthcare on an individual level. In order to analyze all of this, the European Commissionhas created a working group including 23 experts. The purpose is to come up with a series of recommendations regarding the best legal, technical and economic framework to encourage this information transfer across sectors.

3. The lack of incentives for companies and citizens to encourage the reuse of their data.The reality today is that most companies solely use the sources internally. Only a few have decided to explore data sharing through different models (for academic research or for the development of commercial services). As a result of this and other factors, the public sector largely continues using the survey method to gather information instead of reading the digital footprint citizens produce. Multiple studies have demonstrated that this digital footprint would be useful to describe socioeconomic dynamics and monitor the evolution of official statistical indicators. However, these studies have rarely gone on to become pilot projects due to the lack of incentives for a private company to open up to the public sector, or to society in general, making this new activity sustainable.

4. Limited commitment to the diversification of services.Another barrier is the fact that information based product development is somewhat removed from the type of services that the main data generators (telecommunications, banks, commerce, electricity, transportation, etc.) traditionally provide. Therefore, these data based initiatives are not part of their main business and are more closely tied to companies’ innovation areas where exploratory proofs of concept are often not consolidated as a new line of business.

5. Bidirectionality. Data should also flow from the public sector to the rest of society. The first regulatory framework was created for this purpose. Although it is still very recent (the PSI Directive on the re-use of public sector data was passed in 2013), it is currently being revised, in attempt to foster the consolidation of an open data ecosystem that emanates from the public sector as well. On the one hand it would enable greater transparency, and on the other, the development of solutions to improve multiple fields in which public actors are key, such as the environment, transportation and mobility, health, education, justice and the planning and execution of public works. Special emphasis will be placed on high value data sets, such as statistical or geospatial data — data with tremendous potential to accelerate the emergence of a wide variety of information based data products and services that add value.The Commission will begin working with the Member States to identify these data sets.

In its report, Creating Data through Open Data, the European open data portal estimates that government agencies making their data accessible will inject an extra €65 billion in the EU economy this year.

6. The commitment to analytical training and financial incentives for innovation.They are the key factors that have given rise to the digital unicorns that have emerged, more so in the U.S. and China than in Europe….(More)”

Protection of health-related data: new guidelines


Press Release: “The Council of Europe has issued a set of guidelines to its 47 member states urging them to ensure, in law and practice, that the processing of health-related data is done in full respect of human rights, notably the right to privacy and data protection.

With the development of new technological tools in the health sector the volume of health-related data processed has grown exponentially showing the need for guidance for health administrations and professionals.

In a Recommendation, applicable to both the public and private sectors, the Council of Europe´s Committee of Ministers, calls on governments to transmit these guidelines to health-care systems and to actors processing health-related data, in particular health-care professionals and data protection officers.

The recommendation contains a set of principles to protect health-related data incorporating the novelties introduced in the updated Council of Europe data protection convention, known as “Convention 108+”, opened for signature in October 2018.

The Committee of Ministers underlines that health-related data should be protected by appropriate security measures taking into account the latest technological developments, their sensitive nature and the assessment of potential risks. Protection measures should be incorporated by design to any information system which processes health-related data.

The recommendation contains guidance with regard to various issues including the legitimate basis for the data processing of health-care data – notably consent by the data subject -, data concerning unborn children, health-related genetic data, the sharing of health-related data by professionals and the storage of data.

The guidelines list a number of rights of data subjects, crucially the transparency of data processing. They also contain a number of principles that should be respected when data are processed for scientific research, when they are collected by mobile devices or when they are transferred across borders….(More)”.

New York City ‘Open Data’ Paves Way for Innovative Technology


Leo Gringut at the International Policy Digest: “The philosophy behind “Open Data for All” turns on the idea that easy access to government data offers everyday New Yorkers the chance to grow and innovate: “Data is more than just numbers – it’s information that can create new opportunities and level the playing field for New Yorkers. It’s the illumination that changes frameworks, the insight that turns impenetrable issues into solvable problems.” Fundamentally, the newfound accessibility of City data is revolutionizing NYC business. According to Albert Webber, Program Manager for Open Data, City of New York, a key part of his job is “to engage the civic technology community that we have, which is very strong, very powerful in New York City.”

Fundamentally, Open Data is a game-changer for hundreds of New York companies, from startups to corporate giants, all of whom rely on data for their operations. The effect is set to be particularly profound in New York City’s most important economic sector: real estate. Seeking to transform the real estate and construction market in the City, valued at a record-setting $1 trillion in 2016, companies have been racing to develop tools that will harness the power of Open Data to streamline bureaucracy and management processes.

One such technology is the Citiscape app. Developed by a passionate team of real estate experts with more than 15 years of experience in the field, the app assembles data from the Department of Building and the Environmental Control Board into one easy-to-navigate interface. According to Citiscape Chief Operational Officer Olga Khaykina, the secret is in the app’s simplicity, which puts every aspect of project management at the user’s fingertips. “We made DOB and ECB just one tap away,” said Khaykina. “You’re one tap away from instant and accurate updates and alerts from the DOB that will keep you informed about any changes to ongoing project. One tap away from organized and cloud-saved projects, including accessible and coordinated interaction with all team members through our in-app messenger. And one tap away from uncovering technical information about any building in NYC, just by entering its address.” Gone are the days of continuously refreshing the DOB website in hopes of an update on a minor complaint or a status change regarding your project; Citiscape does the busywork so you can focus on your project.

The Citiscape team emphasized that, without access to Open Data, this project would have been impossible….(More)”.

Fearful of fake news blitz, U.S. Census enlists help of tech giants


Nick Brown at Reuters: “The U.S. Census Bureau has asked tech giants Google, Facebook and Twitter to help it fend off “fake news” campaigns it fears could disrupt the upcoming 2020 count, according to Census officials and multiple sources briefed on the matter.

The push, the details of which have not been previously reported, follows warnings from data and cybersecurity experts dating back to 2016 that right-wing groups and foreign actors may borrow the “fake news” playbook from the last presidential election to dissuade immigrants from participating in the decennial count, the officials and sources told Reuters.

The sources, who asked not to be named, said evidence included increasing chatter on platforms like “4chan” by domestic and foreign networks keen to undermine the survey. The census, they said, is a powerful target because it shapes U.S. election districts and the allocation of more than $800 billion a year in federal spending.

Ron Jarmin, the Deputy Director of the Census Bureau, confirmed the bureau was anticipating disinformation campaigns, and was enlisting the help of big tech companies to fend off the threat.

“We expect that (the census) will be a target for those sorts of efforts in 2020,” he said.

Census Bureau officials have held multiple meetings with tech companies since 2017 to discuss ways they could help, including as recently as last week, Jarmin said.

So far, the bureau has gotten initial commitments from Alphabet Inc’s Google, Twitter Inc and Facebook Inc to help quash disinformation campaigns online, according to documents summarizing some of those meetings reviewed by Reuters.

But neither Census nor the companies have said how advanced any of the efforts are….(More)”.

How the NYPD is using machine learning to spot crime patterns


Colin Wood at StateScoop: “Civilian analysts and officers within the New York City Police Department are using a unique computational tool to spot patterns in crime data that is closing cases.

A collection of machine-learning models, which the department calls Patternizr, was first deployed in December 2016, but the department only revealed the system last month when its developers published a research paper in the Informs Journal on Applied Analytics. Drawing on 10 years of historical data about burglary, robbery and grand larceny, the tool is the first of its kind to be used by law enforcement, the developers wrote.

The NYPD hired 100 civilian analysts in 2017 to use Patternizr. It’s also available to all officers through the department’s Domain Awareness System, a citywide network of sensors, databases, devices, software and other technical infrastructure. Researchers told StateScoop the tool has generated leads on several cases that traditionally would have stretched officers’ memories and traditional evidence-gathering abilities.

Connecting similar crimes into patterns is a crucial part of gathering evidence and eventually closing in on an arrest, said Evan Levine, the NYPD’s assistant commissioner of data analytics and one of Patternizr’s developers. Taken independently, each crime in a string of crimes may not yield enough evidence to identify a perpetrator, but the work of finding patterns is slow and each officer only has a limited amount of working knowledge surrounding an incident, he said.

“The goal here is to alleviate all that kind of busywork you might have to do to find hits on a pattern,” said Alex Chohlas-Wood, a Patternizr researcher and deputy director of the Computational Policy Lab at Stanford University.

The knowledge of individual officers is limited in scope by dint of the NYPD’s organizational structure. The department divides New York into 77 precincts, and a person who commits crimes across precincts, which often have arbitrary boundaries, is often more difficult to catch because individual beat officers are typically focused on a single neighborhood.

There’s also a lot of data to sift through. In 2016 alone, about 13,000 burglaries, 15,000 robberies and 44,000 grand larcenies were reported across the five boroughs.

Levine said that last month, police used Patternizr to spot a pattern of three knife-point robberies around a Bronx subway station. It would have taken police much longer to connect those crimes manually, Levine said.

The software works by an analyst feeding it “seed” case, which is then compared against a database of hundreds of thousands of crime records that Patternizr has already processed. The tool generates a “similarity score” and returns a rank-ordered list and a map. Analysts can read a few details of each complaint before examining the seed complaint and similar complaints in a detailed side-by-side view or filtering results….(More)”.

Big Data in the U.S. Consumer Price Index: Experiences & Plans


Paper by Crystal G. Konny, Brendan K. Williams, and David M. Friedman: “The Bureau of Labor Statistics (BLS) has generally relied on its own sample surveys to collect the price and expenditure information necessary to produce the Consumer Price Index (CPI). The burgeoning availability of big data has created a proliferation of information that could lead to methodological improvements and cost savings in the CPI. The BLS has undertaken several pilot projects in an attempt to supplement and/or replace its traditional field collection of price data with alternative sources. In addition to cost reductions, these projects have demonstrated the potential to expand sample size, reduce respondent burden, obtain transaction prices more consistently, and improve price index estimation by incorporating real-time expenditure information—a foundational component of price index theory that has not been practical until now. In CPI, we use the term alternative data to refer to any data not collected through traditional field collection procedures by CPI staff, including third party datasets, corporate data, and data collected through web scraping or retailer API’s. We review how the CPI program is adapting to work with alternative data, followed by discussion of the three main sources of alternative data under consideration by the CPI with a description of research and other steps taken to date for each source. We conclude with some words about future plans… (More)”.

AI Ethics: Seven Traps


Blog Post by Annette Zimmermann and Bendert Zevenbergen: “… In what follows, we outline seven ‘AI ethics traps’. In doing so, we hope to provide a resource for readers who want to understand and navigate the public debate on the ethics of AI better, who want to contribute to ongoing discussions in an informed and nuanced way, and who want to think critically and constructively about ethical considerations in science and technology more broadly. Of course, not everybody who contributes to the current debate on AI Ethics is guilty of endorsing any or all of these traps: the traps articulate extreme versions of a range of possible misconceptions, formulated in a deliberately strong way to highlight the ways in which one might prematurely dismiss ethical reasoning about AI as futile.

1. The reductionism trap:

“Doing the morally right thing is essentially the same as acting in a fair way. (or: transparent, or egalitarian, or <substitute any other value>). So ethics is the same as fairness (or transparency, or equality, etc.). If we’re being fair, then we’re being ethical.”

            Even though the problem of algorithmic bias and its unfair impact on decision outcomes is an urgent problem, it does not exhaust the ethical problem space. As important as algorithmic fairness is, it is crucial to avoid reducing ethics to a fairness problem alone. Instead, it is important to pay attention to how the ethically valuable goal of optimizing for a specific value like fairness interacts with other important ethical goals. Such goals could include—amongst many others—the goal of creating transparent and explainable systems which are open to democratic oversight and contestation, the goal of improving the predictive accuracy of machine learning systems, the goal of avoiding paternalistic infringements of autonomy rights, or the goal of protecting the privacy interests of data subjects. Sometimes, these different values may conflict: we cannot always optimize for everything at once. This makes it all the more important to adopt a sufficiently rich, pluralistic view of the full range of relevant ethical values at stake—only then can one reflect critically on what kinds of ethical trade-offs one may have to confront.

2. The simplicity trap:

“In order to make ethics practical and action-guiding, we need to distill our moral framework into a user-friendly compliance checklist. After we’ve decided on a particular path of action, we’ll go through that checklist to make sure that we’re being ethical.”

            Given the high visibility and urgency of ethical dilemmas arising in the context of AI, it is not surprising that there are more and more calls to develop actionable AI ethics checklists. For instance, a 2018 draft report by the European Commission’s High-Level Expert Group on Artificial Intelligence specifies a preliminary ‘assessment list’ for ‘trustworthy AI’. While the report plausibly acknowledges that such an assessment list must be context-sensitive and that it is not exhaustive, it nevertheless identifies a list of ten fixed ethical goals, including privacy and transparency. But can and should ethical values be articulated in a checklist in the first place? It is worth examining this underlying assumption critically. After all, a checklist implies a one-off review process: on that view, developers or policy-makers could determine whether a particular system is ethically defensible at a specific moment in time, and then move on without confronting any further ethical concerns once the checklist criteria have been satisfied once. But ethical reasoning cannot be a static one-off assessment: it required an ongoing process of reflection, deliberation, and contestation. Simplicity is good—but the willingness to reconsider simple frameworks, when required, is better. Setting a fixed ethical agenda ahead of time risks obscuring new ethical problems that may arise at a later point in time, or ongoing ethical problems that become apparent to human decision-makers only later.

3. The relativism trap:

“We all disagree about what is morally valuable, so it’s pointless to imagine that there is a universalbaseline against which we can use in order to evaluate moral choices. Nothing is objectively morally good: things can only be morally good relative to each person’s individual value framework.”

            Public discourse on the ethics of AI frequently produces little more than an exchange of personal opinions or institutional positions. In light of pervasive moral disagreement, it is easy to conclude that ethical reasoning can never stand on firm ground: it always seems to be relative to a person’s views and context. But this does not mean that ethical reasoning about AI and its social and political implications is futile: some ethical arguments about AI may ultimately be more persuasive than others. While it may not always be possible to determine ‘the one right answer’, it is often possible to identify at least  some paths of action are clearly wrong, and some paths of action that are comparatively better (if not optimal all things considered). If that is the case, comparing the respective merits of ethical arguments can be action-guiding for developers and policy-makers, despite the presence of moral disagreement. Thus, it is possible and indeed constructive for AI ethics to welcome value pluralism, without collapsing into extreme value relativism.

4. The value alignment trap:

“If relativism is wrong (see #3), there must be one morally right answer. We need to find that right answer, and ensure that everyone in our organisation acts in alignment with that answer. If our ethical reasoning leads to moral disagreement, that means that we have failed.”…(More)”.

What you don’t know about your health data will make you sick


Jeanette Beebe at Fast Company: “Every time you shuffle through a line at the pharmacy, every time you try to get comfortable in those awkward doctor’s office chairs, every time you scroll through the web while you’re put on hold with a question about your medical bill, take a second to think about the person ahead of you and behind you.

Chances are, at least one of you is being monitored by a third party like data analytics giant Optum, which is owned by UnitedHealth Group, Inc. Since 1993, it’s captured medical data—lab results, diagnoses, prescriptions, and more—from 150 million Americans. That’s almost half of the U.S. population.

“They’re the ones that are tapping the data. They’re in there. I can’t remove them from my own health insurance contracts. So I’m stuck. It’s just part of the system,” says Joel Winston, an attorney who specializes in privacy and data protection law.

Healthcare providers can legally sell their data to a now-dizzyingly vast spread of companies, who can use it to make decisions, from designing new drugs to pricing your insurance rates to developing highly targeted advertising.

It’s written in the fine print: You don’t own your medical records. Well, except if you live in New Hampshire. It’s the only state that mandates its residents own their medical data. In 21 states, the law explicitly says that healthcare providers own these records, not patients. In the rest of the country, it’s up in the air.

Every time you visit a doctor or a pharmacy, your record grows. The details can be colorful: Using sources like Milliman’s IntelliScript and ExamOne’s ScriptCheck, a fuller picture of you emerges. Your interactions with the health are system, your medical payments, your prescription drug purchase history. And the market for the data is surging.

Its buyers and sharers—pharma giants, insurers, credit reporting agencies, and other data-hungry companies or “fourth parties” (like Facebook)—say that these massive health data sets can improve healthcare delivery and fuel advances in so-called “precision medicine.”

Still, this glut of health data has raised alarms among privacy advocates, who say many consumers are in the dark about how much of their health-related info is being gathered and mined….

Gardner predicted that traditional health data systems—electronic health records and electronic medical records—are less than ideal, given the “rigidity of the vendors and the products” and the way our data is owned and secured. Don’t count on them being around much longer, she said, “beyond the next few years.”

The future, Gardner suggested, is a system that runs on blockchain, which she defined for the committee as “basically a secure, visible, irrefutable ledger of transactions and ownership.” Still, a recent analysis of over 150 white papers revealed most healthcare blockchain projects “fall somewhere between half-baked and overly optimistic.”

As larger companies like IBM sign on, the technology may be edging closer to reality. Last year, Proof Work outlined a HIPAA-compliant system that manages patients’ medical histories over time, from acute care in the hospital to preventative checkups. The goal is to give these records to patients on their phones, and to create a “democratized ecosystem” to solve interoperability between patients, healthcare providers, insurance companies, and researchers. Similar proposals from blockchain-focused startups like Health Bank and Humanity.co would help patients store and share their health information securely—and sell it to researchers, too….(More)”.

Catch Me Once, Catch Me 218 Times


Josh Kaplan at Slate: “…It was 2010, and the San Diego County Sheriff’s Department had recently rolled out a database called GraffitiTracker—software also used by police departments in Denver and Los Angeles County—and over the previous year, they had accumulated a massive set of images that included a couple hundred photos with his moniker. Painting over all Kyle’s handiwork, prosecutors claimed, had cost the county almost $100,000, and that sort of damage came with life-changing consequences. Ultimately, he made a plea deal: one year of incarceration, five years of probation, and more than $87,000 in restitution.

Criticism of police technology often gets mired in the complexities of the algorithms involved—the obscurity of machine learning, the feedback loops, the potentials for racial bias and error. But GraffitiTracker can tell us a lot about data-driven policing in part because the concept is so simple. Whenever a public works crew goes to clean up graffiti, before they paint over it, they take a photo and put it in the county database. Since taggers tend to paint the same moniker over and over, now whenever someone is caught for vandalism, police can search the database for their pseudonym and get evidence of all the graffiti they’ve ever done.

In San Diego County, this has radically changed the way that graffiti is prosecuted and has pumped up the punishment for taggers—many of whom are minors—to levels otherwise unthinkable. The results have been lucrative. In 2011, the first year San Diego started using GraffitiTracker countywide (a few San Diego jurisdictions already had it in place), the amount of restitution received for graffiti jumped from about $170,000 to almost $800,000. Roughly $300,000 of that came from juvenile cases. For the jurisdictions that weren’t already using GraffitiTracker, the jump was even more stark: The annual total went from $45,000 to nearly $400,000. In these cities, the average restitution ordered in adult cases went from $1,281 to $5,620, and at the same time, the number of cases resulting in restitution tripled. (San Diego has said it makes prosecuting vandalism easier.)

Almost a decade later, San Diego County and other jurisdictions are still using GraffitiTracker, yet it’s received very little media attention, despite the startling consequences for vandalism prosecution. But its implications extend far beyond tagging. GraffitiTracker presaged a deeper problem with law enforcement’s ability to use technology to connect people to crimes that, as Deputy District Attorney Melissa Ocampo put it to me, “they thought they got away with.”…(More)”.

Seeing, Naming, Knowing


Essay by Nora N. Khan for Brooklyn Rail: “…. Throughout this essay, I use “machine eye” as a metaphor for the unmoored orb, a kind of truly omnidirectional camera (meaning, a camera that can look in every direction and vector that defines the dimensions of a sphere), and as a symbolic shorthand for the sum of four distinct realms in which automated vision is deployed as a service. (Vision as a Service, reads the selling tag for a new AI surveillance camera company).10 Those four general realms are: 

1. Massive AI systems fueled by the public’s flexible datasets of their personal images, creating a visual culture entirely out of digitized images. 

2. Facial recognition technologies and neural networks improving atop their databases. 

3. The advancement of predictive policing to sort people by types. 

4. The combination of location-based tracking, license plate-reading, and heat sensors to render skein-like, live, evolving maps of people moving, marked as likely to do X.

Though we live the results of its seeing, and its interpretation of its seeing, for now I would hold on blaming ourselves for this situation. We are, after all, the living instantiations of a few thousand years of such violent seeing globally, enacted through imperialism, colonialism, caste stratification, nationalist purges, internal class struggle, and all the evolving theory to support and galvanize the above. Technology simply recasts, concentrates, and amplifies these “tendencies.” They can be hard to see at first because the eye’s seeing seems innocuous, and is designed to seem so. It is a direct expression of the ideology of software, which reflects its makers’ desires. These makers are lauded as American pioneers, innovators, genius-heroes living in the Bay Area in the late 1970s, vibrating at a highly specific frequency, the generative nexus of failed communalism and an emerging Californian Ideology. That seductive ideology has been exported all over the world, and we are only now contending with its impact.

Because the workings of machine visual culture are so remote from our sense perception, and because it so acutely determines our material (economic, social), and affective futures, I invite you to see underneath the eye’s outer glass shell, its holder, beyond it, to the grid that organizes its “mind.” That mind simulates a strain of ideology about who exactly gets to gather data about those on that grid below, and how that data should be mobilized to predict the movements and desires of the grid dwellers. This mind, a vast computational regime we are embedded in, drives the machine eye. And this computational regime has specific values that determine what is seen, how it is seen, and what that seeing means….(More)”.