The privacy threat posed by detailed census data

Gillian Tett at the Financial Times: “Wilbur Ross suffered the political equivalent of a small(ish) black eye last month: a federal judge blocked the US commerce secretary’s attempts to insert a question about citizenship into the 2020 census and accused him of committing “egregious” legal violations.

The Supreme Court has agreed to hear the administration’s appeal in April. But while this high-profile fight unfolds, there is a second, less noticed, census issue about data privacy emerging that could have big implications for businesses (and citizens). Last weekend John Abowd, the Census Bureau’s chief scientist, told an academic gathering that statisticians had uncovered shortcomings in the protection of personal data in past censuses. There is no public evidence that anyone has actually used these weaknesses to hack records, and Mr Abowd insisted that the bureau is using cutting-edge tools to fight back. But, if nothing else, this revelation shows the mounting problem around data privacy. Or, as Mr Abowd, noted: “These developments are sobering to everyone.” These flaws are “not just a challenge for statistical agencies or internet giants,” he added, but affect any institution engaged in internet commerce and “bioinformatics”, as well as commercial lenders and non-profit survey groups. Bluntly, this includes most companies and banks.

The crucial problem revolves around what is known as “re-identification” risk. When companies and government institutions amass sensitive information about individuals, they typically protect privacy in two ways: they hide the full data set from outside eyes or they release it in an “anonymous” manner, stripped of identifying details. The census bureau does both: it is required by law to publish detailed data and protect confidentiality. Since 1990, it has tried to resolve these contradictory mandates by using “household-level swapping” — moving some households from one geographic location to another to generate enough uncertainty to prevent re-identification. This used to work. But today there are so many commercially-available data sets and computers are so powerful that it is possible to re-identify “anonymous” data by combining data sets. …

Thankfully, statisticians think there is a solution. The Census Bureau now plans to use a technique known as “differential privacy” which would introduce “noise” into the public statistics, using complex algorithms. This technique is expected to create just enough statistical fog to protect personal confidentiality in published data — while also preserving information in an encrypted form that statisticians can later unscramble, as needed. Companies such as Google, Microsoft and Apple have already used variants of this technique for several years, seemingly successfully. However, nobody has employed this system on the scale that the Census Bureau needs — or in relation to such a high stakes event. And the idea has sparked some controversy because some statisticians fear that even “differential privacy” tools can be hacked — and others fret it makes data too “noisy” to be useful….(More)”.

Tomorrow’s Data Heroes

Article by Florian GrönePierre Péladeau, and Rawia Abdel Samad: “Telecom companies are struggling to find a profitable identity in today’s digital sphere. What about helping customers control their information?…

By 2025, Alex had had enough. There no longer seemed to be any distinction between her analog and digital lives. Everywhere she went, every purchase she completed, and just about every move she made, from exercising at the gym to idly surfing the Web, triggered a vast flow of data. That in turn meant she was bombarded with personalized advertising messages, targeted more and more eerily to her. As she walked down the street, messages appeared on her phone about the stores she was passing. Ads popped up on her all-purpose tablet–computer–phone pushing drugs for minor health problems she didn’t know she had — until the symptoms appeared the next day. Worse, she had recently learned that she was being reassigned at work. An AI machine had mastered her current job by analyzing her use of the firm’s productivity software.

It was as if the algorithms of global companies knew more about her than she knew herself — and they probably did. How was it that her every action and conversation, even her thoughts, added to the store of data held about her? After all, it was her data: her preferences, dislikes, interests, friendships, consumer choices, activities, and whereabouts — her very identity — that was being collected, analyzed, profited from, and even used to manage her. All these companies seemed to be making money buying and selling this information. Why shouldn’t she gain some control over the data she generated, and maybe earn some cash by selling it to the companies that had long collected it free of charge?

So Alex signed up for the “personal data manager,” a new service that promised to give her control over her privacy and identity. It was offered by her U.S.-based connectivity company (in this article, we’ll call it DigiLife, but it could be one of many former telephone companies providing Internet services in 2025). During the previous few years, DigiLife had transformed itself into a connectivity hub: a platform that made it easier for customers to join, manage, and track interactions with media and software entities across the online world. Thanks to recently passed laws regarding digital identity and data management, including the “right to be forgotten,” the DigiLife data manager was more than window dressing. It laid out easy-to-follow choices that all Web-based service providers were required by law to honor….

Today, in 2019, personal data management applications like the one Alex used exist only in nascent form, and consumers have yet to demonstrate that they trust these services. Nor can they yet profit by selling their data. But the need is great, and so is the opportunity for companies that fulfill it. By 2025, the total value of the data economy as currently structured will rise to more than US$400 billion, and by monetizing the vast amounts of data they produce, consumers can potentially recapture as much as a quarter of that total.

Given the critical role of telecom operating companies within the digital economy — the central position of their data networks, their networking capabilities, their customer relationships, and their experience in government affairs — they are in a good position to seize this business opportunity. They might not do it alone; they are likely to form consortia with software companies or other digital partners. Nonetheless, for legacy connectivity companies, providing this type of service may be the most sustainable business option. It may also be the best option for the rest of us, as we try to maintain control in a digital world flooded with our personal data….(More)”.

Responsible AI for conservation

Oliver Wearn, RobinFreeman and David Jacoby in Nature: “Machine learning (ML) is revolutionizing efforts to conserve nature. ML algorithms are being applied to predict the extinction risk of thousands of species, assess the global footprint of fisheries, and identify animals and humans in wildlife sensor data recorded in the field. These efforts have recently been given a huge boost with support from the commercial sector. New initiatives, such as Microsoft’s AI for Earth and Google’s AI for Social Good, are bringing new resources and new ML tools to bear on some of the biggest challenges in conservation. In parallel to this, the open data revolution means that global-scale, conservation-relevant datasets can be fed directly to ML algorithms from open data repositories, such as Google Earth Engine for satellite data or Movebank for animal tracking data. Added to these will be Wildlife Insights, a Google-supported platform for hosting and analysing wildlife sensor data that launches this year. With new tools and a proliferation of data comes a bounty of new opportunities, but also new responsibilities….(More)”

Weather Service prepares to launch prediction model many forecasters don’t trust

Jason Samenow in the Washington Post: “In a month, the National Weather Service plans to launch its “next generation” weather prediction model with the aim of “better, more timely forecasts.” But many meteorologists familiar with the model fear it is unreliable.

The introduction of a model that forecasters lack confidence in matters, considering the enormous impact that weather has on the economy, valued at around $485 billion annually.

The Weather Service announced Wednesday that the model, known as the GFS-FV3 (FV3 stands for Finite­ Volume Cubed-Sphere dynamical core), is “tentatively” set to become the United States’ primary forecast model on March 20, pending tests. It is an update to the current version of the GFS (Global Forecast System), popularly known as the American model, which has existed in various forms for more than 30 years….

A concern is that if forecasters cannot rely on the FV3, they will be left to rely only on the European model for their predictions without a credible alternative for comparisons. And they’ll also have to pay large fees for the European model data. Whereas model data from the Weather Service is free, the European Center for Medium-Range Weather Forecasts, which produces the European model, charges for access.

But there is an alternative perspective, which is that forecasters will just need to adjust to the new model and learn to account for its biases. That is, a little short-term pain is worth the long-term potential benefits as the model improves….

The Weather Service’s parent agency, the National Oceanic and Atmospheric Administration, recently entered an agreement with the National Center for Atmospheric Research to increase collaboration between forecasters and researchers in improving forecast modeling.

In addition, President Trump recently signed into law the Weather Research and Forecast Innovation Act Reauthorization, which establishes the NOAA Earth Prediction Innovation Center, aimed at further enhancing prediction capabilities. But even while NOAA develops relationships and infrastructure to improve the Weather Service’s modeling, the question remains whether the FV3 can meet the forecasting needs of the moment. Until the problems identified are addressed, its introduction could represent a step back in U.S. weather prediction despite a well-intended effort to leap forward….(More).

Should Libraries Be the Keepers of Their Cities’ Public Data?

Linda Poon at CityLab: “In recent years, dozens of U.S. cities have released pools of public data. It’s an effort to improve transparency and drive innovation, and done well, it can succeed at both: Governments, nonprofits, and app developers alike have eagerly gobbled up that data, hoping to improve everything from road conditions to air quality to food delivery.

But what often gets lost in the conversation is the idea of how public data should be collected, managed, and disseminated so that it serves everyone—rather than just a few residents—and so that people’s privacy and data rights are protected. That’s where librarians come in.

“As far as how private and public data should be handled, there isn’t really a strong model out there,” says Curtis Rogers, communications director for the Urban Library Council (ULC), an association of leading libraries across North America. “So to have the library as the local institution that is the most trusted, and to give them that responsibility, is a whole new paradigm for how data could be handled in a local government.”

In fact, librarians have long been advocates of digital inclusion and literacy. That’s why, last month, ULC launched a new initiative to give public libraries a leading role in a future with artificial intelligence. They kicked it off with a working group meeting in Washington, D.C., where representatives from libraries in cities like Baltimore, Toronto, Toledo, and Milwaukee met to exchange ideas on how to achieve that through education and by taking on a larger role in data governance.

It’s a broad initiative, and Rogers says they are still in the beginning stages of determining what that role will ultimately look like. But the group will discuss how data should be organized and managed, hash out the potential risks of artificial intelligence, and eventually develop a field-wide framework for how libraries can help drive equitable public data policies in cities.

Already, individual libraries are involved with their city’s data. Chattanooga Public Library (which wasn’t part of the working group, but is a member of ULC) began hosting the city’s open data portal in 2014, turning a traditionally print-centered institution into a community data hub. Since then, the portal has added more than 280 data sets and garnered hundreds of thousands of page views, according to a report for the 2018 fiscal year….

The Toronto Public Library is also in a unique position because it may soon sit inside one of North America’s “smartest” cities. Last month, the city’s board of trade published a 17-page report titled “BiblioTech,” calling for the library to oversee data governance for all smart city projects.

It’s a grand example of just how big the potential is for public libraries. Ryan says the proposal remains just that at the moment, and there are no details yet on what such a model would even look like. She adds that they were not involved in drafting the proposal, and were only asked to provide feedback. But the library is willing to entertain the idea.

Such ambitions would be a large undertaking in the U.S., however, especially for smaller libraries that are already understaffed and under-resourced. According to ULC’s survey of its members, only 23 percent of respondents said they have a staff person designated as the AI lead. A little over a quarter said they even have AI-related educational programming, and just 15 percent report being part of any local or national initiative.

Debbie Rabina, a professor of library science at Pratt Institute in New York, also cautions that putting libraries in charge of data governance has to be carefully thought out. It’s one thing for libraries to teach data literacy and privacy, and to help cities disseminate data. But to go further than that—to have libraries collecting and owning data and to have them assessing who can and can’t use the data—can lead to ethical conflicts and unintended consequences that could erode the public’s trust….(More)”.

Bureaucracy vs. Democracy

Philip Howard in The American Interest: “…For 50 years since the 1960s, modern government has been rebuilt on what I call the “philosophy of correctness.” The person making the decision must be able to demonstrate its correctness by compliance with a precise rule or metric, or by objective evidence in a trial-type proceeding. All day long, Americans are trained to ask themselves, “Can I prove that what I’m about to do is legally correct?”

In the age of individual rights, no one talks about the rights of institutions. But the disempowerment of institutional authority in the name of individual rights has led, ironically, to the disempowerment of individuals at every level of responsibility. Instead of striding confidently toward their goals, Americans tiptoe through legal minefields. In virtually every area of social interaction—schools, healthcare, business, public agencies, public works, entrepreneurship, personal services, community activities, nonprofit organizations, churches and synagogues, candor in the workplace, children’s play, speech on campus, and more—studies and reports confirm all the ways that sensible choices are prevented, delayed, or skewed by overbearing regulation, by an overemphasis on objective metrics,3 or by legal fear of violating someone’s alleged rights.

A Three-Part Indictment of Modern Bureaucracy

Reformers have promised to rein in bureaucracy for 40 years, and it’s only gotten more tangled. Public anger at government has escalated at the same time, and particularly in the past decade.  While there’s a natural reluctance to abandon a bureaucratic structure that is well-intended, public anger is unlikely to be mollified until there is change, and populist solutions do not bode well for the future of democracy.  Overhauling operating structures to permit practical governing choices would re-energize democracy as well as relieve the pressures Americans feel from Big Brother breathing down their necks.

Viewed in hindsight, the operating premise of modern bureaucracy was utopian and designed to fail. Here’s the three-part indictment of why we should abandon it.

1. The Economic Dysfunction of Modern Bureaucracy

Regulatory programs are indisputably wasteful, and frequently extract costs that exceed benefits. The total cost of compliance is high, about $2 trillion for federal regulation alone….

2. Bureaucracy Causes Cognitive Overload

The complex tangle of bureaucratic rules impairs a human’s ability to focus on the actual problem at hand. The phenomenon of the unhelpful bureaucrat, famously depicted in fiction by Dickens, Balzac, Kafka, Gogol, Heller, and others, has generally been characterized as a cultural flaw of the bureaucratic personality. But studies of cognitive overload suggest that the real problem is that people who are thinking about rules actually have diminished capacity to think about solving problems. This overload not only impedes drawing on what  calls “system 2” thinking (questioning assumptions and reflecting on long term implications); it also impedes access to what they call “system 1” thinking (drawing on their instincts and heuristics to make intuitive judgments)….

3. Bureaucracy Subverts the Rule of Law

The purpose of law is to enhance freedom. By prohibiting bad conduct, such as crime or pollution, law liberates each of us to focus our energies on accomplishment instead of self-protection. Societies that protect property rights and the sanctity of contracts enjoy far greater economic opportunity and output than those that do not enforce the rule of law….(More)”.

The Big (data) Bang: Opportunities and Challenges for Compiling SDG Indicators

Steve MacFeely at Global Policy: “Official statisticians around the world are faced with the herculean task of populating the Sustainable Development Goals global indicator framework. As traditional data sources appear to be insufficient, statisticians are naturally considering whether big data can contribute anything useful. While the statistical possibilities appear to be theoretically endless, in practice big data also present some enormous challenges and potential pitfalls: legal; ethical; technical; and reputational. This paper examines the opportunities and challenges presented by big data for compiling indicators to support Agenda 2030….(More)”.

Facebook could be forced to share data on effects to the young

Nicola Davis at The Guardian: “Social media companies such as Facebook and Twitter could be required by law to share data with researchers to help examine potential harms to young people’s health and identify who may be at risk.

Surveys and studies have previously suggested a link between the use of devices and networking sites and an increase in problems among teenagers and younger children ranging from poor sleep to bullyingmental health issues and grooming.

However, high quality research in the area is scarce: among the conundrums that need to be looked at are matters of cause and effect, the size of any impacts, and the importance of the content of material accessed online.

According to a report by the Commons science and technology committee on the effects of social media and screen time among young people, companies should be compelled to protect users and legislation was needed to enable access to data for high quality studies to be carried out.

The committee noted that the government had failed to commission such research and had instead relied on requesting reviews of existing studies. This was despite a 2017 green paper that set out a consultation process on aUK internet safety strategy.

“We understand [social media companies’] eagerness to protect the privacy of users but sharing data with bona fide researchers is the only way society can truly start to understand the impact, both positive and negative, that social media is having on the modern world,” said Norman Lamb, the Liberal Democrat MP who chairs the committee. “During our inquiry, we heard that social media companies had openly refused to share data with researchers who are keen to examine patterns of use and their effects. This is not good enough.”

Prof Andrew Przybylski, the director of research at the Oxford Internet Institute, said the issue of good quality research was vital, adding that many people’s perception of the effect of social media is largely rooted in hype.

“Social media companies must participate in open, robust, and transparent science with independent scientists,” he said. “Their data, which we give them, is both their most valuable resource and it is the only means by which we can effectively study how these platforms affect users.”…(More)”

Privacy concerns collide with the public interest in data

Gillian Tett in the Financial Times: “Late last year Statistics Canada — the agency that collects government figures — launched an innovation: it asked the country’s banks to supply “individual-level financial transactions data” for 500,000 customers to allow it to track economic trends. The agency argued this was designed to gather better figures for the public interest. However, it tipped the banks into a legal quandary. Under Canadian law (as in most western countries) companies are required to help StatsCan by supplying operating information. But data privacy laws in Canada also say that individual bank records are confidential. When the StatsCan request leaked out, it sparked an outcry — forcing the agency to freeze its plans. “It’s a mess,” a senior Canadian banker says, adding that the laws “seem contradictory”.

Corporate boards around the world should take note. In the past year, executive angst has exploded about the legal and reputational risks created when private customer data leak out, either by accident or in a cyber hack. Last year’s Facebook scandals have been a hot debating topic among chief executives at this week’s World Economic Forum in Davos, as has the EU’s General Data Protection Regulation. However, there is another important side to this Big Data debate: must companies provide private digital data to public bodies for statistical and policy purposes? Or to put it another way, it is time to widen the debate beyond emotive privacy issues to include the public interest and policy needs. The issue has received little public debate thus far, except in Canada. But it is becoming increasingly important.

Companies are sitting on a treasure trove of digital data that offers valuable real-time signals about economic activity. This information could be even more significant than existing statistics, because they struggle to capture how the economy is changing. Take Canada. StatsCan has hitherto tracked household consumption by following retail sales statistics, supplemented by telephone surveys. But consumers are becoming less willing to answer their phones, which undermines the accuracy of surveys, and consumption of digital services cannot be easily pursued. ...

But the biggest data collections sit inside private companies. Big groups know this, and some are trying to respond. Google has created its own measures to track inflation, which it makes publicly available. JPMorgan and other banks crunch customer data and publish reports about general economic and financial trends. Some tech groups are even starting to volunteer data to government bodies. LinkedIn has offered to provide anonymised data on education and employment to municipal and city bodies in America and beyond, to help them track local trends; the group says this is in the public interest for policy purposes, as “it offers a different perspective” than official data sources. But it is one thing for LinkedIn to offer anonymised data when customers have signed consent forms permitting the transfer of data; it is quite another for banks (or other companies) who have operated with strict privacy rules. If nothing else, the CanStat saga shows there urgently needs to be more public debate, and more clarity, around these rules. Consumer privacy issues matter (a lot). But as corporate data mountains grow, we will need to ask whether we want to live in a world where Amazon and Google — and Mastercard and JPMorgan — know more about economic trends than central banks or finance ministries. Personally, I would say “no”. But sooner or later politicians will need to decide on their priorities in this brave new Big Data world; the issue cannot be simply left to the half-hidden statisticians….(More)”.

This Startup Is Challenging Google Maps—and It Needs You

Aarian Marshall at Wired: “A whole lifetime in New York City, and Christiana Ting didn’t realize just how many urgent care facilities there were until the app told her to start looking for them. “They were giving extra points for medical offices, and I found them, I think, on every block,” she says. “I’m not sure what that says about the neighborhood where I work.”

Ting was one of 761 New Yorkers who downloaded, played with, and occasionally became obsessed with an app called MapNYC this fall, vying for their share of an 8-bitcoin prize (worth about $50,000 at the time). The month-long contest, run by a new mapping startup called StreetCred, was really an experiment. StreetCred’s main research question: How do you convince regular people to build and verify mappingdata?

It turns out that the maps that guide you to the nearest Arby’s, or help your Lyft driver find your house, don’t just materialize. “I took mapping for granted until I started the competition,” Ting says, even though she pulls up Google Maps at least twice a day. “But it’s such an inconvenience if the info on the map is wrong, especially in a place like New York, that’s changing all the time.”

For regular folk, detailed, reliable mapping info is helpful. For businesses, it can be crucial. Some want to be found when a map user searches for the nearest sandwich shop. Others use products that rely on base maps—think Uber, the Weather Channel, your car’s navigation system—and require up-to-date location data. “One of the huge challenges to any geographic database is its currency,” says Renee Sieber, a geographer who studies participatory mapping at McGill University. That is to say, yesterday’s map is no good to anybody doing business today.

StreetCred sees that as an opportunity. “There’s a lot of companies, none of whom I can name, who have location data, and that data needs improvement,” says Randy Meech, CEO of the small startup. (Meech’s last open-source mapping company, a Samsung subsidiary called Mapzen, shut down in January.) Maybe a client found a data set online or purchased one from another company. Either way, it’s static, and that means it’s only a matter of time before it fails to represent reality.

Google Maps, the giant in this space, has created its extensive database through years of web scraping, Streetview roaming, purchasing and collecting satellite data, and both paying and asking volunteers to verify that the businesses it identifies are still in the same place. But the company doesn’t provide all of its specific location or “point of interest” data to developers—where that Thai restaurant is, or where the hiking trail starts, or where the hospital parking lot is located. When it and other mapping services like HERE Technologies, TomTom, and Foursquare do offer that intel, it can be pricey. StreetCred wants to make that info free for customers who don’t need that much data and cheaper for those that do….(More)”.