Wikipedia vandalism could thwart hoax-busting on Google, YouTube and Facebook


Daniel Funke at Poynter: “For a brief moment, the California Republican Party supported Nazism. At least, that’s what Google said.

That’s because someone vandalized the Wikipedia page for the party on May 31 to list “Nazism” alongside ideologies like “Conservatism,” “Market liberalism” and “Fiscal conservatism.” The mistake was removed from search results, with Google clarifying to Vice News that the search engine had failed to catch the vandalism in the Wikipedia entry….

Google has long drawn upon the online encyclopedia for appending basic information to search results. According to the edit log for the California GOP page, someone added “Nazism” to the party’s ideology section around 7:40 UTC on May 31. The edit was removed within a minute, but it appears Google’s algorithm scraped the page just in time for the fake.

“Sometimes people vandalize public information sources, like Wikipedia, which can impact the information that appears in search,” a Google spokesperson told Poynter in an email. “We have systems in place that catch vandalism before it impacts search results, but occasionally errors get through, and that’s what happened here.”…

According to Google, more than 99.9 percent of Wikipedia edits that show up in Knowledge Panels, which display basic information about searchable keywords at the top of results, aren’t vandalism. The user who authored the original edit to the California GOP’s page did not use a user profile, making them hard to track down.

That’s a common tactic among people who vandalize Wikipedia pages, a practice the nonprofit has documented extensively. But given the volume of edits that are made on Wikipedia — about 10 per second, with 600 new pages per day — and the fact that Facebook and YouTube are now pulling from them to provide more context to posts, the potential for and effect of abuse is high….(More)”.

Ontario is trying a wild experiment: Opening access to its residents’ health data


Dave Gershorn at Quartz: “The world’s most powerful technology companies have a vision for the future of healthcare. You’ll still go to your doctor’s office, sit in a waiting room, and explain your problem to someone in a white coat. But instead of relying solely on their own experience and knowledge, your doctor will consult an algorithm that’s been trained on the symptoms, diagnoses, and outcomes of millions of other patients. Instead of a radiologist reading your x-ray, a computer will be able to detect minute differences and instantly identify a tumor or lesion. Or at least that’s the goal.

AI systems like these, currently under development by companies including Google and IBM, can’t read textbooks and journals, attend lectures, and do rounds—they need millions of real life examples to understand all the different variations between one patient and another. In general, AI is only as good as the data it’s trained on, but medical data is exceedingly private—most developed countries have strict health data protection laws, such as HIPAA in the United States….

These approaches, which favor companies with considerable resources, are pretty much the only way to get large troves of health data in the US because the American health system is so disparate. Healthcare providers keep personal files on each of their patients, and can only transmit them to other accredited healthcare workers at the patient’s request. There’s no single place where all health data exists. It’s more secure, but less efficient for analysis and research.

Ontario, Canada, might have a solution, thanks to its single-payer healthcare system. All of Ontario’s health data exists in a few enormous caches under government control. (After all, the government needs to keep track of all the bills its paying.) Similar structures exist elsewhere in Canada, such as Quebec, but Toronto, which has become a major hub for AI research, wants to lead the charge in providing this data to businesses.

Until now, the only people allowed to study this data were government organizations or researchers who partnered with the government to study disease. But Ontario has now entrusted the MaRS Discovery District—a cross between a tech incubator and WeWork—to build a platform for approved companies and researchers to access this data, dubbed Project Spark. The project, initiated by MaRS and Canada’s University Health Network, began exploring how to share this data after both organizations expressed interest to the government about giving broader health data access to researchers and companies looking to build healthcare-related tools.

Project Spark’s goal is to create an API, or a way for developers to request information from the government’s data cache. This could be used to create an app for doctors to access the full medical history of a new patient. Ontarians could access their health records at any time through similar software, and catalog health issues as they occur. Or researchers, like the ones trying to build AI to assist doctors, could request a different level of access that provides anonymized data on Ontarians who meet certain criteria. If you wanted to study every Ontarian who had Alzheimer’s disease over the last 40 years, that data would only be authorization and a few lines of code away.

There are currently 100 companies lined up to get access to data, comprised of health records from Ontario’s 14 million residents. (MaRS won’t say who the companies are). …(More)”

Data Stewards: Data Leadership to Address 21st Century Challenges


Post by Stefaan Verhulst: “…Over the last two years, we have focused on the opportunities (and challenges) surrounding what we call “data collaboratives.” Data collaboratives are an emerging form of public-private partnership, in which information held by companies (or other entities) is shared with the public sector, civil society groups, research institutes and international organizations. …

For all its promise, the practice of data collaboratives remains ad hoc and limited. In part, this is a result of the lack of a well-defined, professionalized concept of data stewardship within corporations that has a mandate to explore ways to harness the potential of their data towards positive public ends.

Today, each attempt to establish a cross-sector partnership built on the analysis of private-sector data requires significant and time-consuming efforts, and businesses rarely have personnel tasked with undertaking such efforts and making relevant decisions.

As a consequence, the process of establishing data collaboratives and leveraging privately held data for evidence-based policy making and service delivery is onerous, generally one-off, not informed by best practices or any shared knowledge base, and prone to dissolution when the champions involved move on to other functions.

By establishing data stewardship as a corporate function, recognized and trusted within corporations as a valued responsibility, and by creating the methods and tools needed for responsible data-sharing, the practice of data collaboratives can become regularized, predictable, and de-risked….

To take stock of current practice and scope needs and opportunities we held a small yet in-depth kick-off event at the offices of the Cloudera Foundation in San Francisco on May 8th 2018 that was attended by representatives from Linkedin, Facebook, Uber, Mastercard, DigitalGlobe, Cognizant, Streetlight Data, the World Economic Forum, and Nethope — among others.

Four Key Take Aways

The discussions were varied and wide-ranging.

Several reflected on the risks involved — including the risks of NOT sharing or collaborating on privately held data that could improve people’s lives (and in some occasions save lives).

Others warned that the window of opportunity to increase the practice of data collaboratives may be closing — given new regulatory requirements and other barriers that may disincentivize corporations from engaging with third parties around their data.

Ultimately four key take aways emerged. These areas — at the nexus of opportunities and challenges — are worth considering further, because they help us better understand both the potential and limitations of data collaboratives….(More)”

Technology and satellite companies open up a world of data


Gabriel Popkin at Nature: “In the past few years, technology and satellite companies’ offerings to scientists have increased dramatically. Thousands of researchers now use high-resolution data from commercial satellites for their work. Thousands more use cloud-computing resources provided by big Internet companies to crunch data sets that would overwhelm most university computing clusters. Researchers use the new capabilities to track and visualize forest and coral-reef loss; monitor farm crops to boost yields; and predict glacier melt and disease outbreaks. Often, they are analysing much larger areas than has ever been possible — sometimes even encompassing the entire globe. Such studies are landing in leading journals and grabbing media attention.

Commercial data and cloud computing are not panaceas for all research questions. NASA and the European Space Agency carefully calibrate the spectral quality of their imagers and test them with particular types of scientific analysis in mind, whereas the aim of many commercial satellites is to take good-quality, high-resolution pictures for governments and private customers. And no company can compete with Landsat’s free, publicly available, 46-year archive of images of Earth’s surface. For commercial data, scientists must often request images of specific regions taken at specific times, and agree not to publish raw data. Some companies reserve cloud-computing assets for researchers with aligned interests such as artificial intelligence or geospatial-data analysis. And although companies publicly make some funding and other resources available for scientists, getting access to commercial data and resources often requires personal connections. Still, by choosing the right data sources and partners, scientists can explore new approaches to research problems.

Mapping poverty

Joshua Blumenstock, an information scientist at the University of California, Berkeley (UCB), is always on the hunt for data he can use to map wealth and poverty, especially in countries that do not conduct regular censuses. “If you’re trying to design policy or do anything to improve living conditions, you generally need data to figure out where to go, to figure out who to help, even to figure out if the things you’re doing are making a difference.”

In a 2015 study, he used records from mobile-phone companies to map Rwanda’s wealth distribution (J. Blumenstock et al. Science 350, 1073–1076; 2015). But to track wealth distribution worldwide, patching together data-sharing agreements with hundreds of these companies would have been impractical. Another potential information source — high-resolution commercial satellite imagery — could have cost him upwards of US$10,000 for data from just one country….

Use of commercial images can also be restricted. Scientists are free to share or publish most government data or data they have collected themselves. But they are typically limited to publishing only the results of studies of commercial data, and at most a limited number of illustrative images.

Many researchers are moving towards a hybrid approach, combining public and commercial data, and running analyses locally or in the cloud, depending on need. Weiss still uses his tried-and-tested ArcGIS software from Esri for studies of small regions, and jumps to Earth Engine for global analyses.

The new offerings herald a shift from an era when scientists had to spend much of their time gathering and preparing data to one in which they’re thinking about how to use them. “Data isn’t an issue any more,” says Roy. “The next generation is going to be about what kinds of questions are we going to be able to ask?”…(More)”.

Democracy doomsday prophets are missing this critical shift


Bruno Kaufmann and Joe Mathews in the Washington Post: “The new conventional wisdom seems to be that electoral democracy is in decline. But this ignores another widespread trend: direct democracy at the local and regional level is booming, even as disillusion with representative government at the national level grows.

Today, 113 of the world’s 117 democratic countries offer their citizens legally or constitutionally established rights to bring forward a citizens’ initiative, referendum or both. And since 1980, roughly 80 percent of countries worldwide have had at least one nationwide referendum or popular vote on a legislative or constitutional issue.

Of all the nationwide popular votes in the history of the world, more than half have taken place in the past 30 years. As of May 2018, almost 2,000 nationwide popular votes on substantive issues have taken place, with 1,059 in Europe, 191 in Africa, 189 in Asia, 181 in the Americas and 115 in Oceania, based on our research.

That is just at the national level. Other major democracies — Germany, the United States and India — do not permit popular votes on substantive issues nationally but support robust direct democracy at the local and regional levels. The number of local votes on issues has so far defied all attempts to count them — they run into the tens of thousands.

This robust democratization, at least when it comes to direct legislation, provides a context that’s generally missing when doomsday prophets suggest that democracy is dying by pointing to authoritarian-leaning leaders like Turkish President Recep Tayyip Erdogan, Russian President Vladimir Putin, Hungarian Prime Minister Viktor Orbán, Philippine President Rodrigo Duterte and U.S. President Donald Trump.

Indeed, the two trends — the rise of populist authoritarianism in some nations and the rise of local and direct democracy in some areas — are related. Frustration is growing with democratic systems at national levels, and yes, some people become more attracted to populism. But some of that frustration is channeled into positive energy — into making local democracy more democratic and direct.

Cities from Seoul to San Francisco are hungry for new and innovative tools that bring citizens into processes of deliberation that allow the people themselves to make decisions and feel invested in government actions. We’ve seen local governments embrace participatory budgeting, participatory planning, citizens’ juries and a host of experimental digital tools in service of that desired mix of greater public deliberation and more direct public action….(More).”

Data Governance in the Digital Age


Centre for International Governance Innovation: “Data is being hailed as “the new oil.” The analogy seems appropriate given the growing amount of data being collected, and the advances made in its gathering, storage, manipulation and use for commercial, social and political purposes.

Big data and its application in artificial intelligence, for example, promises to transform the way we live and work — and will generate considerable wealth in the process. But data’s transformative nature also raises important questions around how the benefits are shared, privacy, public security, openness and democracy, and the institutions that will govern the data revolution.

The delicate interplay between these considerations means that they have to be treated jointly, and at every level of the governance process, from local communities to the international arena. This series of essays by leading scholars and practitioners, which is also published as a special report, will explore topics including the rationale for a data strategy, the role of a data strategy for Canadian industries, and policy considerations for domestic and international data governance…

RATIONALE OF A DATA STRATEGY

THE ROLE OF A DATA STRATEGY FOR CANADIAN INDUSTRIES

BALANCING PRIVACY AND COMMERCIAL VALUES

DOMESTIC POLICY FOR DATA GOVERNANCE

INTERNATIONAL POLICY CONSIDERATIONS

EPILOGUE

Skills for a Lifetime


Nate Silver’s commencement address at Kenyon College: “….Power has shifted toward people and companies with a lot of proficiency in data science.

I obviously don’t think that’s entirely a bad thing. But it’s by no means entirely a good thing, either. You should still inherently harbor some suspicion of big, powerful institutions and their potentially self-serving and short-sighted motivations. Companies and governments that are capable of using data in powerful ways are also capable of abusing it.

What worries me the most, especially at companies like Facebook and at other Silicon Valley behemoths, is the idea that using data science allows one to remove human judgment from the equation. For instance, in announcing a recent change to Facebook’s News Feed algorithm, Mark Zuckerberg claimed that Facebook was not “comfortable” trying to come up with a way to determine which news organizations were most trustworthy; rather, the “most objective” solution was to have readers vote on trustworthiness instead. Maybe this is a good idea and maybe it isn’t — but what bothered me was in the notion that Facebook could avoid responsibility for its algorithm by outsourcing the judgment to its readers.

I also worry about this attitude when I hear people use terms such as “artificial intelligence” and “machine learning” (instead of simpler terms like “computer program”). Phrases like “machine learning” appeal to people’s notion of a push-button solution — meaning, push a button, and the computer does all your thinking for you, no human judgment required.

But the reality is that working with data requires lots of judgment. First, it requires critical judgment — and experience — when drawing inferences from data. And second, it requires moral judgment in deciding what your goals are and in establishing boundaries for your work.

Let’s talk about that first type of judgment — critical judgment. The more experience you have in working with different data sets, the more you’ll realize that the correct interpretation of the data is rarely obvious, and that the obvious-seeming interpretation isn’t always correct. Sometimes changing a single assumption or a single line of code can radically change your conclusion. In the 2016 U.S. presidential election, for instance, there were a series of models that all used almost exactly the same inputs — but they ranged in giving Trump as high as roughly a one-in-three chance of winning the presidency (that was FiveThirtyEight’s model) to as low as one chance in 100, based on fairly subtle aspects of how each algorithm was designed….(More)”.

Plunging response rates to household surveys worry policymakers


The Economist: “Response rates to surveys are plummeting all across the rich world. Last year only around 43% of households contacted by the British government responded to the LFS, down from 70% in 2001 (see chart). In America the share of households responding to the Current Population Survey (CPS) has fallen from 94% to 85% over the same period. The rest of Europe and Canada have seen similar trends.

Poor response rates drain budgets, as it takes surveyors more effort to hunt down interviewees. And a growing reluctance to give interviewers information threatens the quality of the data. Politicians often complain about inaccurate election polls. Increasingly misleading economic surveys would be even more disconcerting.

Household surveys derive their power from randomness. Since it is impractical to get every citizen to complete a long questionnaire regularly, statisticians interview what they hope is a representative sample instead. But some types are less likely to respond than others—people who live in flats not houses, for example. A study by Christopher Bollinger of the University of Kentucky and three others matched data from the CPS with social-security records and found that poorer and very rich households were more likely to ignore surveyors than middle-income ones. Survey results will be skewed if the types who do not answer are different from those who do, or if certain types of people are more loth to answer some questions, or more likely to fib….

Statisticians have been experimenting with methods of improving response rates: new ways to ask questions, or shorter questionnaires, for example. Payment raises response rates, and some surveys offer more money for the most reluctant interviewees. But such persistence can have drawbacks. One study found that more frequent attempts to contact interviewees raised the average response rate, but lowered the average quality of answers.

Statisticians have also been exploring supplementary data sources, including administrative data. Such statistics come with two big advantages. One is that administrative data sets can include many more people and observations than is practical in a household survey, giving researchers the statistical power to run more detailed studies. Another is that governments already collect them, so they can offer huge cost savings over household surveys. For instance, Finland’s 2010 census, which was based on administrative records rather than surveys, cost its government just €850,000 ($1.1m) to produce. In contrast, America’s government spent $12.3bn on its 2010 census, roughly 200 times as much on a per-person basis.

Recent advances in computing mean that vast data sets are no longer too unwieldy for use by researchers. However, in many rich countries (those in Scandinavia are exceptions), socioeconomic statistics are collected by several agencies, meaning that researchers who want to combine, say, health records with tax data, face formidable bureaucratic and legal challenges.

Governments in English-speaking countries are especially keen to experiment. In January HMRC, the British tax authority, started publishing real-time tax data as an “experimental statistic” to be compared with labour-market data from household surveys. Two-fifths of Canada’s main statistical agency’s programmes are based at least in part on administrative records. Last year, Britain passed the Digital Economy Act, which will give its Office of National Statistics (ONS) the right to requisition data from other departments and from private sources for statistics-and-research purposes. America is exploring using such data as part of its 2020 census.

Administrative data also have their limitations (see article). They are generally not designed to be used in statistical analyses. A data set on income taxes might be representative of the population receiving benefits or earning wages, but not the population as a whole. Most important, some things are not captured in administrative records, such as well-being, informal employment and religious affiliation….(More)”.

Most Maps of the New Ebola Outbreak Are Wrong


Ed Kong in The Atlantic: “Almost all the maps of the outbreak zone that have thus far been released contain mistakes of this kind. Different health organizations all seem to use their own maps, most of which contain significant discrepancies. Things are roughly in the right place, but their exact positions can be off by miles, as can the boundaries between different regions.

Sinai, a cartographer at UCLA, has been working with the Ministry of Health to improve the accuracy of the Congo’s maps, and flew over on Saturday at their request. For each health zone within the outbreak region, Sinai compiled a list of the constituent villages, plotted them using the most up-to-date sources of geographical data, and drew boundaries that include these places and no others. The maps at the top of this piece show the before (left) and after (right) images….

Consider Bikoro, the health zone where the outbreak may have originated, and where most cases are found. Sinai took a list of all Bikoro’s villages, plotted them using the most up-to-date sources of geographical data, and drew a boundary that includes these places and no others. This new shape is roughly similar to the one on current maps, but with critical differences. Notably, existing maps have the village of Ikoko Impenge—one of the epicenters of the outbreak—outside the Bikoro health zone, when it actually lies within the zone.

 “These visualizations are important for communicating the reality on the ground to all levels of the health hierarchy, and to international partners who don’t know the country,” says Mathias Mossoko, the head of disease surveillance data in DRC.

“It’s really important for the outbreak response to have real and accurate data,” adds Bernice Selo, who leads the cartographic work from the Ministry of Health’s command center in Kinshasa. “You need to know exactly where the villages are, where the health facilities are, where the transport routes and waterways are. All of this helps you understand where the outbreak is, where it’s moving, how it’s moving. You can see which villages have the highest risk.”

To be clear, there’s no evidence that these problems are hampering the response to the current outbreak. It’s not like doctors are showing up in the middle of the forest, wondering why they’re in the wrong place. “Everyone on the ground knows where the health zones start and end,” says Sinai. “I don’t think this will make or break the response. But you surely want the most accurate data.”

It feels unusual to not have this information readily at hand, especially in an era when digital maps are so omnipresent and so supposedly truthful. If you search for San Francisco on Google Maps, you can be pretty sure that what comes up is actually where San Francisco is. On Google Street View, you can even walk along a beach at the other end of the world….(More)”.

But the Congo is a massive country—a quarter the size of the United States with considerably fewer resources. Until very recently, they haven’t had the resources to get accurate geolocalized data. Instead, the boundaries of the health zones and their constituent “health areas,” as well as the position of specific villages, towns, rivers, hospitals, clinics, and other landmarks, are often based on local knowledge and hand-drawn maps. Here’s an example, which I saw when I visited the National Institute for Biomedical Research in March. It does the job, but it’s clearly not to scale.

Blockchain as a force for good: How this technology could transform the sharing economy


Aaron Fernando at Shareable: “The volatility in the price of cryptocurrencies doesn’t matter to restaurateur Helena Fabiankovic, who started Baba’s Pierogies in Brooklyn with her partner Robert in 2015. Yet she and her business are already positioned to reap the real-world benefits of the technology that underpins these digital currencies — the blockchain — and they will be at the forefront  of a sustainable, community-based peer-to-peer energy revolution because of it.

So what does a restaurateur have to do with the blockchain and local energy? Fabiankovic is one of the early participants in the Brooklyn Microgrid, a project of the startup LO3 Energy that uses a combination of innovative technologies — blockchain and smart meters — to operate a virtual microgrid in the borough of Brooklyn in New York City, New York. This microgrid enables residents to buy and sell green energy directly to their neighbors at much better rates than if they only interacted with centralized utility providers.

Just as we don’t pay much attention to the critical infrastructure that powers our digital world and exists just out of sight — from the Automated Clearing House (ACH), which undergirds our financial system, to the undersea cables that enable the Internet to be globally useful, blockchain is likely to change our lives in ways that will eventually be invisible. In the sharing economy, we have traditionally just used existing infrastructure and built platforms and services on top of it. Considering that those undersea cables are owned by private companies with their own motives and that the locations of ACH data centers are heavily classified, there is a lot to be desired in terms of transparency, resilience, and independence from self-interested third parties. That’s where open-source, decentralized infrastructure of the blockchain for the sharing economy offers much promise and potential.

In the case of Brooklyn Microgrid, which is part of an emerging model for shared energy use via the blockchain, this decentralized infrastructure would allow residents like Fabiankovic to save money and make sustainable choices. Shared ownership and community financing for green infrastructure like solar panels is part of the model. “Everyone can pay a different amount and you can get a proportional amount of energy that’s put off by the panel, based on how much that you own,” says Scott Kessler, director of business development at LO3. “It’s really just a way of crowdfunding an asset.”

The type of blockchain used by the Brooklyn Microgrid makes it possible to collect and communicate data from smart meters every second, so that the price of electricity can be updated in real time and users will still transact with each other using U.S. dollars. The core idea of the Brooklyn Microgrid is to utilize a tailored blockchain to align energy consumption with energy production, and to do this with rapidly-updated price information that then changes behavior around energy….(More)