Google's fact-checking bots build vast knowledge bank


Hal Hodson in the New Scientist: “The search giant is automatically building Knowledge Vault, a massive database that could give us unprecedented access to the world’s facts

GOOGLE is building the largest store of knowledge in human history – and it’s doing so without any human help. Instead, Knowledge Vault autonomously gathers and merges information from across the web into a single base of facts about the world, and the people and objects in it.

The breadth and accuracy of this gathered knowledge is already becoming the foundation of systems that allow robots and smartphones to understand what people ask them. It promises to let Google answer questions like an oracle rather than a search engine, and even to turn a new lens on human history.

Knowledge Vault is a type of “knowledge base” – a system that stores information so that machines as well as people can read it. Where a database deals with numbers, a knowledge base deals with facts. When you type “Where was Madonna born” into Google, for example, the place given is pulled from Google’s existing knowledge base.

This existing base, called Knowledge Graph, relies on crowdsourcing to expand its information. But the firm noticed that growth was stalling; humans could only take it so far. So Google decided it needed to automate the process. It started building the Vault by using an algorithm to automatically pull in information from all over the web, using machine learning to turn the raw data into usable pieces of knowledge.

Knowledge Vault has pulled in 1.6 billion facts to date. Of these, 271 million are rated as “confident facts”, to which Google’s model ascribes a more than 90 per cent chance of being true. It does this by cross-referencing new facts with what it already knows.

“It’s a hugely impressive thing that they are pulling off,” says Fabian Suchanek, a data scientist at Télécom ParisTech in France.

Google’s Knowledge Graph is currently bigger than the Knowledge Vault, but it only includes manually integrated sources such as the CIA Factbook.

Knowledge Vault offers Google fast, automatic expansion of its knowledge – and it’s only going to get bigger. As well as the ability to analyse text on a webpage for facts to feed its knowledge base, Google can also peer under the surface of the web, hunting for hidden sources of data such as the figures that feed Amazon product pages, for example.

Tom Austin, a technology analyst at Gartner in Boston, says that the world’s biggest technology companies are racing to build similar vaults. “Google, Microsoft, Facebook, Amazon and IBM are all building them, and they’re tackling these enormous problems that we would never even have thought of trying 10 years ago,” he says.

The potential of a machine system that has the whole of human knowledge at its fingertips is huge. One of the first applications will be virtual personal assistants that go way beyond what Siri and Google Now are capable of, says Austin…”

Cell-Phone Data Might Help Predict Ebola’s Spread


David Talbot at MIT Technology Review: “A West African mobile carrier has given researchers access to data gleaned from cell phones in Senegal, providing a window into regional population movements that could help predict the spread of Ebola. The current outbreak is so far known to have killed at least 1,350 people, mainly in Liberia, Guinea, and Sierra Leone.
The model created using the data is not meant to lead to travel restrictions, but rather to offer clues about where to focus preventive measures and health care. Indeed, efforts to restrict people’s movements, such as Senegal’s decision to close its border with Guinea this week, remain extremely controversial.
Orange Telecom made “an exceptional authorization in support of Ebola control efforts,” according to Flowminder, the Swedish nonprofit that analyzed the data. “If there are outbreaks in other countries, this might tell what places connected to the outbreak location might be at increased risk of new outbreaks,” says Linus Bengtsson, a medical doctor and cofounder of Flowminder, which builds models of population movements using cell-phone data and other sources.
The data from Senegal was gathered in 2013 from 150,000 phones before being anonymized and aggregated. This information had already been given to a number of researchers as part of a data analysis challenge planned for 2015, and the carrier chose to authorize its release to Flowminder as well to help meet the Ebola crisis.
The new model helped Flowminder build a picture of the overall travel patterns of people across West Africa. In addition to using data from Senegal, researchers used an earlier data set from Ivory Coast, which Orange had released two years ago as part of a similar conference (see “Released: A Trove of Data-Mining Research from Phones” and “African Bus Routes Redrawn Using Cell-Phone Data”). The model also includes data about population movements from more conventional sources, including surveys.
Separately, Flowminder has produced an animation of the epidemic’s spread since March, based on records of when and where people died of the disease….”

Our future government will work more like Amazon


Michael Case in The Verge: “There is a lot of government in the United States. Several hundred federal agencies, 535 voting members in two houses of Congress, more than 90,000 state and local governments, and over 20 million Americans involved in public service.

We say we have a government for and by the people. But the way American government conducts its day-to-day business does not feel like anything we, the people weaned on the internet, would design in 2014. Most interactions with the US government don’t resemble anything else we’re used to in our daily lives….

But if the government is ever going to completely retool itself to provide sensible services to a growing, aging, diversifying American population, it will have to do more than bring in a couple innovators and throw data at the public. At the federal level, these kinds of adjustments will require new laws to change the way money is allocated to executive branch agencies so they can coordinate the purchase and development of a standard set of tools. State and local governments will have to agree on standard tools and data formats as well so that the mayor of Anchorage can collaborate with the governor of Delaware.

Technology is the answer to a lot of American government’s current operational shortcomings. Not only are the tools and systems most public servants use outdated and suboptimal, but the organizations and processes themselves have also calcified around similarly out-of-date thinking. So the real challenge won’t be designing cutting edge software or high tech government facilities — it’s going to be conjuring the will to overcome decades of old thinking. It’s going to be convincing over 90,000 employees to learn new skills, coaxing a bitterly divided Congress to collaborate on something scary, and finding a way to convince a timid and distracted White House to put its name on risky investments that won’t show benefits for many years.

But! If we can figure out a way for governments across the country to perform their basic functions and provide often life-saving services, maybe we can move on to chase even more elusive government tech unicorns. Imagine voting from your smartphone, having your taxes calculated and filed automatically with a few online confirmations, or filing for your retirement at a friendly tablet kiosk at your local government outpost. Government could — feasibly — be not only more effective, but also a pleasure to interact with someday. Someday.”

Big Data: Google Searches Predict Unemployment in Finland


Paper by Tuhkuri, Joonas: “There are over 3 billion searches globally on Google every day. This report examines whether Google search queries can be used to predict the present and the near future unemployment rate in Finland. Predicting the present and the near future is of interest, as the official records of the state of the economy are published with a delay. To assess the information contained in Google search queries, the report compares a simple predictive model of unemployment to a model that contains a variable, Google Index, formed from Google data. In addition, cross-correlation analysis and Granger-causality tests are performed. Compared to a simple benchmark, Google search queries improve the prediction of the present by 10 % measured by mean absolute error. Moreover, predictions using search terms perform 39 % better over the benchmark for near future unemployment 3 months ahead. Google search queries also tend to improve the prediction accuracy around turning points. The results suggest that Google searches contain useful information of the present and the near future unemployment rate in Finland.”

Crowd-Sourced, Gamified Solutions to Geopolitical Issues


Gamification Corp: “Daniel Green, co-founder and CTO of Wikistrat, spoke at GSummit 2014 on an intriguing topic: How Gamification Motivates All Age Groups: Or How to Get Retired Generals to Play Games Alongside Students and Interns.

Wikistrat, a crowdsourced consulting company, leverages a worldwide network of experts from various industries to solve some of the world’s geopolitical problems through the power of gamification. Wikistrat also leverages fun, training, mentorship, and networking as core concepts in their company.

Dan (@wsdan) spoke with TechnologyAdvice host Clark Buckner about Wikistrat’s work, origins, what clients can expect from working with Wikistrat, and how gamification correlates with big data and business intelligence. Listen to the podcast and read the summary below:

Wikistrat aims to solve a common problem faced by most governments and organizations when generating strategies: “groupthink.” Such entities can devise a diverse set of strategies, but they always seem to find their resolution in the most popular answer.

In order to break group thinking, Wikistrat carries out geopolitical simulations that work around “collaborative competition.” The process involves:

  • Securing analysts: Wikistrat recruits a diverse group of analysts who are experts in certain fields and located in different strategic places.

  • Competing with ideas: These analysts are placed in an online environment where, instead of competing with each other, one analyst contributes an idea, then other analysts create 2-3 more ideas based on the initial idea.

  • Breaking group thinking: Now the competition becomes only about ideas. People champion the ideas they care about rather than arguing with other analysts. That’s when Wikistrat breaks group thinking and helps their clients discover ideas they may have never considered before.

Gamification occurs when analysts create different scenarios for a specific angle or question the client raises. Plus, Wikistrat’s global analyst coverage is so good that they tout having at least one expert in every country. They accomplished this by allowing anyone—not just four-star generals—to register as an analyst. However, applicants must submit a resume and a writing sample, as well as pass a face-to-face interview….”

Beyond just politics: A systematic literature review of online participation


Paper by Christoph Lutz, Christian Pieter Hoffmann, and Miriam Meckel in First Monday :”This paper presents a systematic literature review of the current state–of–research on online participation. The review draws on four databases and is guided by the application of six topical search terms. The analysis strives to differentiate distinct forms of online participation and to identify salient discourses within each research field. We find that research on online participation is highly segregated into specific sub–discourses that reflect disciplinary boundaries. Research on online political participation and civic engagement is identified as the most prominent and extensive research field. Yet research on other forms of participation, such as cultural, business, education and health participation, provides distinct perspectives and valuable insights. We outline both field–specific and common findings and derive propositions for future research.”

America in Decay


Francis Fukuyama in Foreign Affairs:”… Institutions are “stable, valued, recurring patterns of behaviour”, as Huntington put it, the most important function of which is to facilitate collective action. Without some set of clear and relatively stable rules, human beings would have to renegotiate their interactions at every turn. Such rules are often culturally determined and vary across different societies and eras, but the capacity to create and adhere to them is genetically hard-wired into the human brain. A natural tendency to conformism helps give institutions inertia and is what has allowed human societies to achieve levels of social cooperation unmatched by any other animal species.
The very stability of institutions, however, is also the source of political decay. Institutions are created to meet the demands of specific circumstances, but then circumstances change and institutions fail to adapt. One reason is cognitive: people develop mental models of how the world works and tend to stick to them, even in the face of contradictory evidence. Another reason is group interest: institutions create favored classes of insiders who develop a stake in the status quo and resist pressures to reform.
In theory, democracy, and particularly the Madisonian version of democracy that was enshrined in the US Constitution, should mitigate the problem of such insider capture by preventing the emergence of a dominant faction or elite that can use its political power to tyrannize over the country. It does so by spreading power among a series of competing branches of government and allowing for competition among different interests across a large and diverse country.
But Madisonian democracy frequently fails to perform as advertised. Elite insiders typically have superior access to power and information, which they use to protect their interests. Ordinary voters will not get angry at a corrupt politician if they don’t know that money is being stolen in the first place. Cognitive rigidities or beliefs may also prevent social groups from mobilizing in their own interests. For example, in the United States, many working-class voters support candidates promising to lower taxes on the wealthy, despite the fact that such tax cuts will arguably deprive them of important government services.
Furthermore, different groups have different abilities to organize to defend their interests. Sugar producers and corn growers are geographically concentrated and focused on the prices of their products, unlike ordinary consumers or taxpayers, who are dispersed and for whom the prices of these commodities are only a small part of their budgets. Given institutional rules that often favor special interests (such as the fact that Florida and Iowa, where sugar and corn are grown, are electoral swing states), those groups develop an outsized influence over agricultural and trade policy. Similarly, middle-class groups are usually much more willing and able to defend their interests, such as the preservation of the home mortgage tax deduction, than are the poor. This makes such universal entitlements as Social Security or health insurance much easier to defend politically than programs targeting the poor only.
Finally, liberal democracy is almost universally associated with market economies, which tend to produce winners and losers and amplify what James Madison termed the “different and unequal faculties of acquiring property.” This type of economic inequality is not in itself a bad thing, insofar as it stimulates innovation and growth and occurs under conditions of equal access to the economic system. It becomes highly problematic, however, when the economic winners seek to convert their wealth into unequal political influence. They can do so by bribing a legislator or a bureaucrat, that is, on a transactional basis, or, what is more damaging, by changing the institutional rules to favor themselves — for example, by closing off competition in markets they already dominate, tilting the playing field ever more steeply in their favor.
Political decay thus occurs when institutions fail to adapt to changing external circumstances, either out of intellectual rigidities or because of the power of incumbent elites to protect their positions and block change. Decay can afflict any type of political system, authoritarian or democratic. And while democratic political systems theoretically have self-correcting mechanisms that allow them to reform, they also open themselves up to decay by legitimating the activities of powerful interest groups that can block needed change.
This is precisely what has been happening in the United States in recent decades, as many of its political institutions have become increasingly dysfunctional. A combination of intellectual rigidity and the power of entrenched political actors is preventing those institutions from being reformed. And there is no guarantee that the situation will change much without a major shock to the political order….”

Out in the Open: This Man Wants to Turn Data Into Free Food (And So Much More)


in Wired: “Let’s say your city releases a list of all trees planted on its public property. It would be a godsend—at least in theory. You could filter the data into a list of all the fruit and nut trees in the city, transfer it into an online database, and create a smartphone app that helps anyone find free food.

Such is promise of “open data”—the massive troves of public information our governments now post to the net. The hope is that, if governments share enough of this data with the world at large, hackers and entrepreneurs will find a way of putting it to good use. But although so much of this government data is now available, the revolution hasn’t exactly happened.
In far too many cases, the data just sits there on a computer server, unseen and unused. Sometimes, no one knows about the data, or no one knows what to do with it. Other times, the data is just too hard to work with. If you’re building that free food app, how do you update your database when the government releases a new version of the spreadsheet? And if you let people report corrections to the data, how do you contribute that data back to the city?
These are the sorts of problems that obsess 25-year-old software developer Max Ogden, and they’re the reason he built Dat, a new piece of open source software that seeks to restart the open data revolution. Basically, Dat is a way of synchronizing data between two or more sources, tracking any changes to that data, and handling transformations from one data format to another. The aim is a simple one: Ogden wants to make it easier for governments to share their data with a world of software developers.
That’s just the sort of thing that government agencies are looking for, says Waldo Jaquith, the director of US Open Data Institute, the non-profit that is now hosting Dat…
Git is a piece of software originally written by Linux creator Linus Torvalds. It keeps track of code changes and makes it easier to integrate code submissions from outside developers. Ogden realized what developers needed wasn’t a GitHub for data, but a Git for data. And that’s what Dat is.
Instead of CouchDB, Dat relies on a lightweight, open-source data storage system from Google called LevelDB. The rest of the software was written in JavaScript by Ogden and his growing number of collaborators, which enables them to keep things minimal and easily run the software on multiple operating systems, including Windows, Linux and Macintosh OS X….”