Google's fact-checking bots build vast knowledge bank


Hal Hodson in the New Scientist: “The search giant is automatically building Knowledge Vault, a massive database that could give us unprecedented access to the world’s facts

GOOGLE is building the largest store of knowledge in human history – and it’s doing so without any human help. Instead, Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it.

The breadth and accuracy of this gathered knowledge is already becoming the foundation of systems that allow robots and smartphones to understand what people ask them. It promises to let Google answer questions like an oracle rather than a search engine, and even to turn a new lens on human history.

Knowledge Vault is a type of “knowledge base” – a system that stores information so that machines as well as people can read it. Where a database deals with numbers, a knowledge base deals with facts. When you type “Where was Madonna born” into Google, for example, the place given is pulled from Google’s existing knowledge base.

This existing base, called Knowledge Graph, relies on crowdsourcing to expand its information. But the firm noticed that growth was stalling; humans could only take it so far. So Google decided it needed to automate the process. It started building the Vault by using an algorithm to automatically pull in information from all over the web, using machine learning to turn the raw data into usable pieces of knowledge.

Knowledge Vault has pulled in 1.6 billion facts to date. Of these, 271 million are rated as “confident facts”, to which Google’s model ascribes a more than 90 per cent chance of being true. It does this by cross-referencing new facts with what it already knows.

“It’s a hugely impressive thing that they are pulling off,” says Fabian Suchanek, a data scientist at Télécom ParisTech in France.

Google’s Knowledge Graph is currently bigger than the Knowledge Vault, but it only includes manually integrated sources such as the CIA Factbook.

Knowledge Vault offers Google fast, automatic expansion of its knowledge – and it’s only going to get bigger. As well as the ability to analyse text on a webpage for facts to feed its knowledge base, Google can also peer under the surface of the web, hunting for hidden sources of data such as the figures that feed Amazon product pages, for example.

Tom Austin, a technology analyst at Gartner in Boston, says that the world’s biggest technology companies are racing to build similar vaults. “Google, Microsoft, Facebook, Amazon and IBM are all building them, and they’re tackling these enormous problems that we would never even have thought of trying 10 years ago,” he says.

The potential of a machine system that has the whole of human knowledge at its fingertips is huge. One of the first applications will be virtual personal assistants that go way beyond what Siri and Google Now are capable of, says Austin…”

Station display shows waiting commuters the best train carriage to get on


Springwise: “When a train arrives into a station, it’s often the case that travelers aren’t spread evenly along the platform and are huddled in the same spot. This is annoying for both commuters and operators because it means carriages get full while others are left empty and leads to longer boarding times. In the Netherlands, the NS Reisplanner Xtra app has already offered train users a way to find a seat using their smartphone. Now the country’s Edenspiekermann design agency has developed a platform-length LED display which provides real-time information on carriage crowdedness and other details.
Created for train operators ProRail and NS with the help of design researchers STBY, the service consists of a 180-meter long color LED strip that spans the length of the platform. The display aims to give commuters all the information they need to know where they should wait to get on the right carriage. Numbers show whether the carriage is first or standard class, and the exact position the doors will be is also marked. Symbols show the carriages that are best for bikes, buggies, wheelchairs and large luggage, as well as quiet carriages. The boards also work with infrared sensors located on each train that detect how full each carriage is. A green strip means there are seats available, a yellow strip indicates that the carriage is fairly crowded and a red strip means it’s full.
Website: www.edenspiekermann.com

EU-funded tool to help our brain deal with big data


EU Press Release: “Every single minute, the world generates 1.7 million billion bytes of data, equal to 360,000 DVDs. How can our brain deal with increasingly big and complex datasets? EU researchers are developing an interactive system which not only presents data the way you like it, but also changes the presentation constantly in order to prevent brain overload. The project could enable students to study more efficiently or journalists to cross check sources more quickly. Several museums in Germany, the Netherlands, the UK and the United States have already showed interest in the new technology.

Data is everywhere: it can either be created by people or generated by machines, such as sensors gathering climate information, satellite imagery, digital pictures and videos, purchase transaction records, GPS signals, etc. This information is a real gold mine. But it is also challenging: today’s datasets are so huge and complex to process that they require new ideas, tools and infrastructures.

Researchers within CEEDs (@ceedsproject) are transposing big data into an interactive environment to allow the human mind to generate new ideas more efficiently. They have built what they are calling an eXperience Induction Machine (XIM) that uses virtual reality to enable a user to ‘step inside’ large datasets. This immersive multi-modal environment – located at Pompeu Fabra University in Barcelona – also contains a panoply of sensors which allows the system to present the information in the right way to the user, constantly tailored according to their reactions as they examine the data. These reactions – such as gestures, eye movements or heart rate – are monitored by the system and used to adapt the way in which the data is presented.

Jonathan Freeman,Professor of Psychology at Goldsmiths, University of London and coordinator of CEEDs, explains: The system acknowledges when participants are getting fatigued or overloaded with information.  And it adapts accordingly. It either simplifies the visualisations so as to reduce the cognitive load, thus keeping the user less stressed and more able to focus.  Or it will guide the person to areas of the data representation that are not as heavy in information.

Neuroscientists were the first group the CEEDs researchers tried their machine on (BrainX3). It took the typically huge datasets generated in this scientific discipline and animated them with visual and sound displays. By providing subliminal clues, such as flashing arrows, the machine guided the neuroscientists to areas of the data that were potentially more interesting to each person. First pilots have already demonstrated the power of this approach in gaining new insights into the organisation of the brain….”

How Thousands Of Dutch Civil Servants Built A Virtual 'Government Square' For Online Collaboration


Federico Guerrini at Forbes: “Democracy needs a reboot, or as the founders of Democracy Os, an open source platform for political debate say, “a serious upgrade”. They are not alone in trying to change the way citizens and governments communicate with each other. Not long ago, I covered on this blog a Greek platform, VouliWatch, which aims at boosting civic engagement following the model of other similar initiatives in countries like Germany, France and Austria, all running thanks to a software called Parliament Watch.
Other decision making tools, used by activists and organizations that try to reduce the distance between the people and their representatives include Liquid Feedback, and Airesis. But the quest for disintermediation doesn’t regard only the relationship between governments and citizens: it’s changing the way public organisations work internally as well. Civil servants are starting to develop and use their internal “social networks”, to exchange ideas, discussing issues and collaborate on projects.
One such thing is happening in the Netherlands: thousands of civil servants belonging to all government organizations have built their own “intranet” using Pleio (“government square”, in Dutch) a platform that runs on the open source networking engine Elgg.
It all started in 2010, thanks to the work of a group of four founders, Davied van Berlo, Harrie Custers, Wim Essers and Marcel Ziemerink. Growth has been steady and now Pleio can count on some 75.000 users spread in about 800 subsites. The nice thing about the platform, in fact, is that it is modular: subscribers can collaborate on a group and then start a sub group to get in more depth with a smaller team. To learn a little more about this unique experience, I reached out for van Berlo, who kindly answered a few questions. Check the interview below.
pleio
Where did the Pleio idea come from?Were you inspired by other experiences?

The idea came mainly from the developments around us: the whole web 2.0 movement at the time. This has shown us the power of platforms to connect people, bring them together and let them cooperate. I noticed that civil servants were looking for ways of collaborating across organisational borders and many were using the new online tools. That’s why I started the Civil Servant 2.0 network, so they could exchange ideas and experiences in this new way of working.
However, these tools are not always the ideal solution. They’re commercial for one, which can get in the way of the public goals we work for. They’re often American, where other laws and practices apply. You can’t change them or add to them. Usually you have to get another tool (and login) for different functionalities. And they were outright forbidden by some government agencies. I noticed there was a need for a platform where different tools were integrated, where people from different organisations and outside government could work together and where all information would remain in the Netherlands and in the hands of the original owner. Since there was no such platform we started one of our own….”

France: Rapport de la Commission Open Data en santé


“La Commission « open data en santé », qui s’est réunie de novembre 2013 à mai 2014, avait pour mission de débattre, dans un cadre pluraliste associant les parties prenantes, des enjeux et des propositions en matière d’accès aux données de santé.
Ce rapport, remis le 9 juillet 2014 à Marisol Touraine, Ministre des Affaires sociales et de la Santé, retrace les travaux et discussions de la Commission :

  • Un panorama de l’existant (partie 1) : définitions des concepts, état du droit, présentation de la gouvernance, présentation de l’accès aux données du SNIIRAM et du PMSI, cartographie des données de santé et enseignements tirés des expériences étrangères ;
  • Les enjeux pour l’avenir (partie 2) ;
  • Les actions à mener (partie 3) : données à ouvrir en open data, orientations en matière de données réidentifiantes, données relatives aux professionnels et aux établissements.

Ce rapport a été adopté consensuellement par l’ensemble des membres de la commission, qui partagent des attentes communes et fortes.”
Rapport final commission open data (pdf – 1 Mo) – [09/07/2014] – [MAJ : 09/07/2014]

Forget The Wisdom of Crowds; Neurobiologists Reveal The Wisdom Of The Confident


Emerging Technology From the arXiv: “Way back in 1906, the English polymath Francis Galton visited a country fair in which 800 people took part in a contest to guess the weight of a slaughtered ox. After the fair, he collected the guesses and calculated their average which turned out to be 1208 pounds. To Galton’s surprise, this was within 1 per cent of the true weight of 1198 pounds.
This is one of the earliest examples of a phenomenon that has come to be known as the wisdom of the crowd. The idea is that the collective opinion of a group of individuals can be better than a single expert opinion.
This phenomenon is commonplace today on websites such as Reddit in which users vote on the importance of particular stories and the most popular are given greater prominence.
However, anyone familiar with Reddit will know that the collective opinion isn’t always wise. In recent years, researchers have spent a significant amount of time and effort teasing apart the factors that make crowds stupid. One important factor turns out to be the way members of a crowd influence each other.
It turns out that if a crowd offers a wide range of independent estimates, then it is more likely to be wise. But if members of the crowd are influenced in the same way, for example by each other or by some external factor, then they tend to converge on a biased estimate. In this case, the crowd is likely to be stupid.
Today, Gabriel Madirolas and Gonzalo De Polavieja at the Cajal Institute in Madrid, Spain, say they found a way to analyse the answers from a crowd which allows them to remove this kind of bias and so settle on a wiser answer.
The theory behind their work is straightforward. Their idea is that some people are more strongly influenced by additional information than others who are confident in their own opinion. So identifying these more strongly influenced people and separating them from the independent thinkers creates two different groups. The group of independent thinkers is then more likely to give a wise estimate. Or put another way, ignore the wisdom of the crowd in favour of the wisdom of the confident.
So how to identify confident thinkers. Madirolas and De Polavieja began by studying the data from an earlier set of experiments in which groups of people were given tasks such as to estimate the length of the border between Switzerland and Italy, the correct answer being 734 kilometres.
After one task, some groups were shown the combined estimates of other groups before beginning their second task. These experiments clearly showed how this information biased the answers from these groups in their second tasks.
Madirolas and De Polavieja then set about creating a mathematical model of how individuals incorporate this extra information. They assume that each person comes to a final estimate based on two pieces of information: first, their own independent estimate of the length of the border and second, the earlier combined estimate revealed to the group. Each individual decides on a final estimate depending on the weighting they give to each piece of information.
Those people who are heavily biased give a strong weighting to the additional information whereas people who are confident in their own estimate give a small or zero weighting to the additional information.
Madirolas and De Polavieja then take each person’s behaviour and fit it to this model to reveal how independent their thinking has been.
That allows them to divide the groups into independent thinkers and biased thinkers. Taking the collective opinion of the independent thinkers then gives a much more accurate estimate of the length of the border.
“Our results show that, while a simple operation like the mean, median or geometric mean of a group may not allow groups to make good estimations, a more complex operation taking into account individuality in the social dynamics can lead to a better collective intelligence,” they say.

Ref: arxiv.org/abs/1406.7578 : Wisdom of the Confident: Using Social Interactions to Eliminate the Bias in Wisdom of the Crowds”

Are the Authoritarians Winning?


Review of several books by Michael Ignatieff in the New York Review of Books: “In the 1930s travelers returned from Mussolini’s Italy, Stalin’s Russia, and Hitler’s Germany praising the hearty sense of common purpose they saw there, compared to which their own democracies seemed weak, inefficient, and pusillanimous.
Democracies today are in the middle of a similar period of envy and despondency. Authoritarian competitors are aglow with arrogant confidence. In the 1930s, Westerners went to Russia to admire Stalin’s Moscow subway stations; today they go to China to take the bullet train from Beijing to Shanghai, and just as in the 1930s, they return wondering why autocracies can build high-speed railroad lines seemingly overnight, while democracies can take forty years to decide they cannot even begin. The Francis Fukuyama moment—when in 1989 Westerners were told that liberal democracy was the final form toward which all political striving was directed—now looks like a quaint artifact of a vanished unipolar moment.
For the first time since the end of the cold war, the advance of democratic constitutionalism has stopped. The army has staged a coup in Thailand and it’s unclear whether the generals will allow democracy to take root in Burma. For every African state, like Ghana, where democratic institutions seem secure, there is a Mali, a Côte d’Ivoire, and a Zimbabwe, where democracy is in trouble.
In Latin America, democracy has sunk solid roots in Chile, but in Mexico and Colombia it is threatened by violence, while in Argentina it struggles to shake off the dead weight of Peronism. In Brazil, the millions who took to the streets last June to protest corruption seem to have had no impact on the cronyism in Brasília. In the Middle East, democracy has a foothold in Tunisia, but in Syria there is chaos; in Egypt, plebiscitary authoritarianism rules; and in the monarchies, absolutism is ascendant.
In Europe, the policy elites keep insisting that the remedy for their continent’s woes is “more Europe” while a third of their electorate is saying they want less of it. From Hungary to Holland, including in France and the UK, the anti-European right gains ground by opposing the European Union generally and immigration in particular. In Russia the democratic moment of the 1990s now seems as distant as the brief constitutional interlude between 1905 and 1914 under the tsar….
It is not at all apparent that “governance innovation,” a bauble Micklethwait and Wooldridge chase across three continents, watching innovators at work making government more efficient in Chicago, Sacramento, Singapore, and Stockholm, will do the trick. The problem of the liberal state is not that it lacks modern management technique, good software, or different schemes to improve the “interface” between the bureaucrats and the public. By focusing on government innovation, Micklethwait and Wooldridge assume that the problem is improving the efficiency of government. But what is required is both more radical and more traditional: a return to constitutional democracy itself, to courts and regulatory bodies that are freed from the power of money and the influence of the powerful; to legislatures that cease to be circuses and return to holding the executive branch to public account while cooperating on measures for which there is a broad consensus; to elected chief executives who understand that they are not entertainers but leaders….”
Books reviewed:

Reforming Taxation to Promote Growth and Equity

a white paper by Joseph Stiglitz
Roosevelt Institute, 28 pp., May 28, 2014; available at rooseveltinstitute.org

Digital Government: Turning the Rhetoric into Reality


Miguel Carrasco and Peter Goss at BCG Perspectives: “Getting better—but still plenty of room for improvement: that’s the current assessment by everyday users of their governments’ efforts to deliver online services. The public sector has made good progress, but most countries are not moving nearly as quickly as users would like. Many governments have made bold commitments, and a few countries have determined to go “digital by default.” Most are moving more modestly, often overwhelmed by complexity and slowed by bureaucratic skepticism over online delivery as well as by a lack of digital skills. Developing countries lead in the rate of online usage, but they mostly trail developed nations in user satisfaction.
Many citizens—accustomed to innovation in such sectors as retailing, media, and financial services—wish their governments would get on with it. Of the services that can be accessed online, many only provide information and forms, while users are looking to get help and transact business. People want to do more. Digital interaction is often faster, easier, and more efficient than going to a service center or talking on the phone, but users become frustrated when the services do not perform as expected. They know what good online service providers offer. They have seen a lot of improvement in recent years, and they want their governments to make even better use of digital’s capabilities.
Many governments are already well on the way to improving digital service delivery, but there is often a gap between rhetoric and reality. There is no shortage of government policies and strategies relating to “digital first,” “e-government,” and “gov2.0,” in addition to digital by default. But governments need more than a strategy. “Going digital” requires leadership at the highest levels, investments in skills and human capital, and cultural and behavioral change. Based on BCG’s work with numerous governments and new research into the usage of, and satisfaction with, government digital services in 12 countries, we see five steps that most governments will want to take:

1. Focus on value. Put the priority on services with the biggest gaps between their importance to constituents and constituents’ satisfaction with digital delivery. In most countries, this will mean services related to health, education, social welfare, and immigration.

2. Adopt service design thinking. Governments should walk in users’ shoes. What does someone encounter when he or she goes to a government service website—plain language or bureaucratic legalese? How easy is it for the individual to navigate to the desired information? How many steps does it take to do what he or she came to do? Governments can make services easy to access and use by, for example, requiring users to register once and establish a digital credential, which can be used in the future to access online services across government.

3. Lead users online, keep users online. Invest in seamless end-to-end capabilities. Most government-service sites need to advance from providing information to enabling users to transact their business in its entirety, without having to resort to printing out forms or visiting service centers.

4. Demonstrate visible senior-leadership commitment. Governments can signal—to both their own officials and the public—the importance and the urgency that they place on their digital initiatives by where they assign responsibility for the effort.

5. Build the capabilities and skills to execute. Governments need to develop or acquire the skills and capabilities that will enable them to develop and deliver digital services.

This report examines the state of government digital services through the lens of Internet users surveyed in Australia, Denmark, France, Indonesia, the Kingdom of Saudi Arabia, Malaysia, the Netherlands, Russia, Singapore, the United Arab Emirates (UAE), the UK, and the U.S. We investigated 37 different government services. (See Exhibit 1.)…”

Opening Public Transportation Data in Germany


Thesis by Kaufmann, Stefan: “Open data has been recognized as a valuable resource, and public institutions have taken to publishing their data under open licenses, also in Germany. However, German public transit agencies are still reluctant to publish their schedules as open data. Also, two widely used data exchange formats used in German transit planning are proprietary, with no documentation publicly available. Through this work, one of the proprietary formats was reverse-engineered, and a transformation process into the open GTFS schedule format was developed. This process allowed a partnering transit operator to publish their schedule as open data. Also, through a survey taken with German transit authorities and operators, the prevalence of transit data exchange formats, and reservations concerning open transit data were evaluated. The survey brought a series of issues to light which serve as obstacles for opening up transit data. Addressing the issues found through this work, and partnering with open-minded transit authorities to further develop transit data publishing processes can serve as a foundation for wider adoption of publishing open transit data in Germany”

Every citizen a scientist? An EU project tries to change the face of research


Project News from the European Commission:  “SOCIENTIZE builds on the concept of ‘Citizen Science’, which sees thousands of volunteers, teachers, researchers and developers put together their skills, time and resources to advance scientific research. Thanks to open source tools developed under the project, participants can help scientists collect data – which will then be analysed by professional researchers – or even perform tasks that require human cognition or intelligence like image classification or analysis.

Every citizen can be a scientist
The project helps usher in new advances in everything from astronomy to social science.
‘One breakthrough is our increased capacity to reproduce, analyse and understand complex issues thanks to the engagement of large groups of volunteers,’ says Mr Fermin Serrano Sanz, researcher at the University of Zaragoza and Project Coordinator of SOCIENTIZE. ‘And everyone can be a neuron in our digitally-enabled brain.’
But how can ordinary citizens help with such extraordinary science? The key, says Mr Serrano Sanz, is in harnessing the efforts of thousands of volunteers to collect and classify data. ‘We are already gathering huge amounts of user-generated data from the participants using their mobile phones and surrounding knowledge,’ he says.
For example, the experiment ‘SavingEnergy@Home’ asks users to submit data about the temperatures in their homes and neighbourhoods in order to build up a clearer picture of temperatures in cities across the EU, while in Spain, GripeNet.es asks citizens to report when they catch the flu in order to monitor outbreaks and predict possible epidemics.
Many Hands Make Light Work
But citizens can also help analyse data. Even the most advanced computers are not very good at recognising things like sun spots or cells, whereas people can tell the difference between living and dying cells very easily, given only a short training.
The SOCIENTIZE projects ‘Sun4All’ and ‘Cell Spotting’ ask volunteers to label images of solar activity and cancer cells from an application on their phone or computer. With Cell Spotting, for instance, participants can observe cell cultures being studied with a microscope in order to determine their state and the effectiveness of medicines. Analysing this data would take years and cost hundreds of thousands of euros if left to a small team of scientists – but with thousands of volunteers helping the effort, researchers can make important breakthroughs quickly and more cheaply than ever before.
But in addition to bringing citizens closer to science, SOCIENTIZE also brings science closer to citizens. On 12-14 June, the project participated in the SONAR festival with ‘A Collective Music Experiment’ (CME). ‘Two hundred people joined professional DJs and created musical patterns using a web tool; participants shared their creations and re-used other parts in real time. The activity in the festival also included a live show of RdeRumba and Mercadal playing amateurs rhythms’ Mr. Serrano Sanz explains.
The experiment – which will be presented in a mini-documentary to raise awareness about citizen science – is expected to help understand other innovation processes observed in emergent social, technological, economic or political transformations. ‘This kind of event brings together a really diverse set of participants. The diversity does not only enrich the data; it improves the dialogue between professionals and volunteers. As a result, we see some new and innovative approaches to research.’
The EUR 0.7 million project brings together 6 partners from 4 countries: Spain (University of Zaragoza and TECNARA), Portugal (Museu da Ciência-Coimbra, MUSC ; Universidade de Coimbra),  Austria (Zentrum für Soziale Innovation) and Brazil (Universidade Federal de Campina Grande, UFCG).
SOCIENTIZE will end in October 2104 after bringing together 12000 citizens in different phases of research activities for 24 months.”