A computational algorithm for fact-checking


Kurzweil News: “Computers can now do fact-checking for any body of knowledge, according to Indiana University network scientists, writing in an open-access paper published June 17 in PLoS ONE.

Using factual information from summary infoboxes from Wikipedia* as a source, they built a “knowledge graph” with 3 million concepts and 23 million links between them. A link between two concepts in the graph can be read as a simple factual statement, such as “Socrates is a person” or “Paris is the capital of France.”

In the first use of this method, IU scientists created a simple computational fact-checker that assigns “truth scores” to statements concerning history, geography and entertainment, as well as random statements drawn from the text of Wikipedia. In multiple experiments, the automated system consistently matched the assessment of human fact-checkers in terms of the humans’ certitude about the accuracy of these statements.

Dealing with misinformation and disinformation

In what the IU scientists describe as an “automatic game of trivia,” the team applied their algorithm to answer simple questions related to geography, history, and entertainment, including statements that matched states or nations with their capitals, presidents with their spouses, and Oscar-winning film directors with the movie for which they won the Best Picture awards. The majority of tests returned highly accurate truth scores.

Lastly, the scientists used the algorithm to fact-check excerpts from the main text of Wikipedia, which were previously labeled by human fact-checkers as true or false, and found a positive correlation between the truth scores produced by the algorithm and the answers provided by the fact-checkers.

Significantly, the IU team found their computational method could even assess the truthfulness of statements about information not directly contained in the infoboxes. For example, the fact that Steve Tesich — the Serbian-American screenwriter of the classic Hoosier film “Breaking Away” — graduated from IU, despite the information not being specifically addressed in the infobox about him.

Using multiple sources to improve accuracy and richness of data

“The measurement of the truthfulness of statements appears to rely strongly on indirect connections, or ‘paths,’ between concepts,” said Giovanni Luca Ciampaglia, a postdoctoral fellow at the Center for Complex Networks and Systems Research in the IU Bloomington School of Informatics and Computing, who led the study….

“These results are encouraging and exciting. We live in an age of information overload, including abundant misinformation, unsubstantiated rumors and conspiracy theories whose volume threatens to overwhelm journalists and the public. Our experiments point to methods to abstract the vital and complex human task of fact-checking into a network analysis problem, which is easy to solve computationally.”

Expanding the knowledge base

Although the experiments were conducted using Wikipedia, the IU team’s method does not assume any particular source of knowledge. The scientists aim to conduct additional experiments using knowledge graphs built from other sources of human knowledge, such as Freebase, the open-knowledge base built by Google, and note that multiple information sources could be used together to account for different belief systems….(More)”

Harnessing the Crowd to Solve Healthcare


PSFK Labs: “While being sick is never a good situation to be in, the majority of people can still take solace in the fact that modern medicine will be able to diagnose their problem and get them on the path to a quick recovery. For a small percentage of patients, however, simply finding out what ails them can be a challenge. Despite countless visits to specialists and mounting costs, these individuals can struggle for years to find out any reliable information about their illness.

This is only exacerbated by the fact that in a heavily regulated industry like healthcare, words like “personalization,” “transparency” and “collaboration” are near impossibilities, leaving these patients locked into a system that can’t care for them. Enter CrowdMed, an online platform that uses the combined knowledge of its community to overcome these obstacles, getting people the answers and treatment they need.

…we spoke with Jared Heyman, the company’s founder, to understand how the crowd can deliver unprecedented efficiencies to a system sorely in need of them…. “CrowdMed harnesses the wisdom of crowds to solve the world’s most difficult medical cases online. Let’s say that you’ve been bouncing doctor to doctor, but don’t yet have a definitive diagnosis or treatment plan. You can submit your case on our site by answering an in‑depth patient questionnaire, uploading relevant medical records, diagnostic test results or even medical images. We expose your case to our community of currently over 15,000 medical detectives. These are people mostly with medical backgrounds who enjoy solving these challenges.

We have about a 70 percent success rate, bringing patients closer to a direct diagnosis or cure and we do so in a very small fraction of the time and cost of what it would take through the traditional medical system….

Every entrepreneur builds upon the tools and technologies that preceded them. I think that CrowdMed needed the Internet. It needed Facebook. It needed Wikipedia. It needed Quora, and other companies or products that have proven that you can trust in the wisdom of the crowd. I think we’re built upon the shoulders of these other companies.

We looked at all these other companies that have proven the value of social networks through crowdsourcing, and that’s inspired us to do what we do. It’s been instructive for us in the best way to do it, and it’s also prepared society, psychologically and culturally, for what we’re doing. All these things were important….(More)”

When Guarding Student Data Endangers Valuable Research


Susan M. Dynarski  in the New York Times: “There is widespread concern over threats to privacy posed by the extensive personal data collected by private companies and public agencies.

Some of the potential danger comes from the government: The National Security Agency has swept up the telephone records of millions of people, in what it describes as a search for terrorists. Other threats are posed by hackers, who have exploited security gaps to steal data from retail giantslike Target and from the federal Office of Personnel Management.

Resistance to data collection was inevitable — and it has been particularly intense in education.

Privacy laws have already been strengthened in some states, and multiple bills now pending in state legislatures and in Congress would tighten the security and privacy of student data. Some of this proposed legislation is so broadly written, however, that it could unintentionally choke off the use of student data for its original purpose: assessing and improving education. This data has already exposed inequities, allowing researchers and advocates to pinpoint where poor, nonwhite and non-English-speaking children have been educated inadequately by their schools.

Data gathering in education is indeed extensive: Across the United States, large, comprehensive administrative data sets now track the academic progress of tens of millions of students. Educators parse this data to understand what is working in their schools. Advocates plumb the data to expose unfair disparities in test scores and graduation rates, building cases to target more resources for the poor. Researchers rely on this data when measuring the effectiveness of education interventions.

To my knowledge there has been no large-scale, Target-like theft of private student records — probably because students’ test scores don’t have the market value of consumers’ credit card numbers. Parents’ concerns have mainly centered not on theft, but on the sharing of student data with third parties, including education technology companies. Last year, parentsresisted efforts by the tech start-up InBloom to draw data on millions of students into the cloud and return it to schools as teacher-friendly “data dashboards.” Parents were deeply uncomfortable with a third party receiving and analyzing data about their children.

In response to such concerns, some pending legislation would scale back the authority of schools, districts and states to share student data with third parties, including researchers. Perhaps the most stringent of these proposals, sponsored by Senator David Vitter, a Louisiana Republican, would effectively end the analysis of student data by outside social scientists. This legislation would have banned recent prominent research documenting the benefits of smaller classes, the value of excellent teachersand the varied performance of charter schools.

Under current law, education agencies can share data with outside researchers only to benefit students and improve education. Collaborations with researchers allow districts and states to tap specialized expertise that they otherwise couldn’t afford. The Boston public school district, for example, has teamed up with early-childhood experts at Harvard to plan and evaluate its universal prekindergarten program.

In one of the longest-standing research partnerships, the University of Chicago works with the Chicago Public Schools to improve education. Partnerships like Chicago’s exist across the nation, funded by foundations and the United States Department of Education. In one initiative, a Chicago research consortium compiled reports showing high school principals that many of the seniors they had sent off to college swiftly dropped out without earning a degree. This information spurred efforts to improve high school counseling and college placement.

Specific, tailored information in the hands of teachers, principals or superintendents empowers them to do better by their students. No national survey could have told Chicago’s principals how their students were doing in college. Administrative data can provide this information, cheaply and accurately…(More)”

Beating the news’ with EMBERS: Forecasting Civil Unrest using Open Source Indicators


Paper by Naren Ramakrishnan et al: “We describe the design, implementation, and evaluation of EMBERS, an automated, 24×7 continuous system for forecasting civil unrest across 10 countries of Latin America using open source indicators such as tweets, news sources, blogs, economic indicators, and other data sources. Unlike retrospective studies, EMBERS has been making forecasts into the future since Nov 2012 which have been (and continue to be) evaluated by an independent T&E team (MITRE). Of note, EMBERS has successfully forecast the uptick and downtick of incidents during the June 2013 protests in Brazil. We outline the system architecture of EMBERS, individual models that leverage specific data sources, and a fusion and suppression engine that supports trading off specific evaluation criteria. EMBERS also provides an audit trail interface that enables the investigation of why specific predictions were made along with the data utilized for forecasting. Through numerous evaluations, we demonstrate the superiority of EMBERS over baserate methods and its capability to forecast significant societal happenings….(More)”

Big Data’s Impact on Public Transportation


InnovationEnterprise: “Getting around any big city can be a real pain. Traffic jams seem to be a constant complaint, and simply getting to work can turn into a chore, even on the best of days. With more people than ever before flocking to the world’s major metropolitan areas, the issues of crowding and inefficient transportation only stand to get much worse. Luckily, the traditional methods of managing public transportation could be on the verge of changing thanks to advances in big data. While big data use cases have been a part of the business world for years now, city planners and transportation experts are quickly realizing how valuable it can be when making improvements to city transportation. That hour long commute may no longer be something travelers will have to worry about in the future.

In much the same way that big data has transformed businesses around the world by offering greater insight in the behavior of their customers, it can also provide a deeper look at travellers. Like retail customers, commuters have certain patterns they like to keep to when on the road or riding the rails. Travellers also have their own motivations and desires, and getting to the heart of their actions is all part of what big data analytics is about. By analyzing these actions and the factors that go into them, transportation experts can gain a better understanding of why people choose certain routes or why they prefer one method of transportation over another. Based on these findings, planners can then figure out where to focus their efforts and respond to the needs of millions of commuters.

Gathering the accurate data needed to make knowledgeable decisions regarding city transportation can be a challenge in itself, especially considering how many people commute to work in a major city. New methods of data collection have made that effort easier and a lot less costly. One way that’s been implemented is through the gathering of call data records (CDR). From regular transactions made from mobile devices, information about location, time, and duration of an action (like a phone call) can give data scientists the necessary details on where people are traveling to, how long it takes them to get to their destination, and other useful statistics. The valuable part of this data is the sample size, which provides a much bigger picture of the transportation patterns of travellers.

That’s not the only way cities are using big data to improve public transportation though. Melbourne in Australia has long been considered one of the world’s best cities for public transit, and much of that is thanks to big data. With big data and ad hoc analysis, Melbourne’s acclaimed tram system can automatically reconfigure routes in response to sudden problems or challenges, such as a major city event or natural disaster. Data is also used in this system to fix problems before they turn serious.Sensors located in equipment like tram cars and tracks can detect when maintenance is needed on a specific part. Crews are quickly dispatched to repair what needs fixing, and the tram system continues to run smoothly. This is similar to the idea of the Internet of Things, wherein embedded sensors collect data that is then analyzed to identify problems and improve efficiency.

Sao Paulo, Brazil is another city that sees the value of using big data for its public transportation. The city’s efforts concentrate on improving the management of its bus fleet. With big data collected in real time, the city can get a more accurate picture of just how many people are riding the buses, which routes are on time, how drivers respond to changing conditions, and many other factors. Based off of this information, Sao Paulo can optimize its operations, providing added vehicles where demand is genuine whilst finding which routes are the most efficient. Without big data analytics, this process would have taken a very long time and would likely be hit-or-miss in terms of accuracy, but now, big data provides more certainty in a shorter amount of time….(More)”

Civic open data at a crossroads: Dominant models and current challenges


Renee E. Sieber and Peter A. Johnson in Government Information Quarterly: “As open data becomes more widely provided by government, it is important to ask questions about the future possibilities and forms that government open data may take. We present four models of open data as they relate to changing relations between citizens and government. These models include; a status quo ‘data over the wall’ form of government data publishing, a form of ‘code exchange’, with government acting as an open data activist, open data as a civic issue tracker, and participatory open data. These models represent multiple end points that can be currently viewed from the unfolding landscape of government open data. We position open data at a crossroads, with significant concerns of the conflicting motivations driving open data, the shifting role of government as a service provider, and the fragile nature of open data within the government space. We emphasize that the future of open data will be driven by the negotiation of the ethical-economic tension that exists between provisioning governments, citizens, and private sector data users….(More)”

 

The Climatologist’s Almanac


Clara Chaisson at onEarth: “Forget your weather app with its five- or even ten-day forecasts—a supercomputer at NASA has just provided us with high-resolution climate projections through the end of the century. The massive new 11-terabyte data set combines historical daily temperatures and precipitation measurements with climate simulations under two greenhouse gas emissions scenarios. The project spans from 1950 to 2100, but users can easily zero in on daily timescales for their own locales—which is precisely the point.

The projections can be found on Amazon for free for all to see and plan by. The space agency hopes that developing nations and poorer communities that may not have any spare supercomputers lying around will use the info to predict and prepare for climate change. …(More)”

Why open data should be central to Fifa reform


Gavin Starks in The Guardian: “Over the past two weeks, Fifa has faced mounting pressure to radically improve its transparency and governance in the wake of corruption allegations. David Cameron has called for reforms including expanding the use of open data.

Open data is information made available by governments, businesses and other groups for anyone to read, use and share. Data.gov.uk was launched as the home of UK open government data in January 2010 and now has almost 21,000 published datasets, including on government spending.

Allowing citizens to freely access data related to the institutions that govern them is essential to a well-functioning democratic society. It is the first step towards holding leaders to account for failures and wrongdoing.

Fifa has a responsibility for the shared interests of millions of fans around the world. Football’s popularity means that Fifa’s governance has wide-ranging implications for society, too. This is particularly true of decisions about hosting the World Cup, which is often tied to large-scale government investment in infrastructure and even extends to law-making. Brazil spent up to £10bn hosting the 2014 World Cup and had to legalise the sale of beer at matches.

Following Sepp Blatter’s resignation, Fifa will gather its executive committee in July to plan for a presidential election, expected to take place in mid-December. Open data should form the cornerstone of any prospective candidate’s manifesto. It can help Fifa make better spending decisions and ensure partners deliver value for money, restore the trust of the international football community.

Fifa’s lengthy annual financial report gives summaries of financial expenditure,budgeted at £184m for operations and governance alone in 2016, but individual transactions are not published. Publishing spending data incentivises better spending decisions. If all Fifa’s outgoings – which totalled around £3.5bn between 2011 and 2014 – were made open, it would encourage much more efficiency….(more)”

Exploring Open Energy Data in Urban Areas


The Worldbank: “…Energy efficiency – using less energy input to deliver the same level of service – has been described by many as the ‘first fuel’ of our societies. However, lack of adequate data to accurately predict and measure energy efficiency savings, particularly at the city level, has limited the realization of its promise over the past two decades.
Why Open Energy Data?
Open Data can be a powerful tool to reduce information asymmetry in markets, increase transparency and help achieve local economic development goals. Several sectors like transport, public sector management and agriculture have started to benefit from Open Data practices. Energy markets are often characterized by less-than-optimal conditions with high system inefficiencies, misaligned incentives and low levels of transparency. As such, the sector has a lot to potentially gain from embracing Open Data principles.
The United States is a leader in this field with its ‘Energy Data’ initiative. This initiative makes data easy to find, understand and apply, helping to fuel a clean energy economy. For example, the Energy Information Administration’s (EIA) open application programming interface (API) has more than 1.2 million time series of data and is frequently visited by users from the private sector, civil society and media. In addition, the Green Button  initiative is empowering American citizens to have access to their own energy usage data, and OpenEI.org is an Open Energy Information platform to help people find energy information, share their knowledge and connect to other energy stakeholders.
Introducing the Open Energy Data Assessment
To address this data gap in emerging and developing countries, the World Bank is conducting a series of Open Energy Data Assessments in urban areas. The objective is to identify important energy-related data, raise awareness of the benefits of Open Data principles and improve the flow of data between traditional energy stakeholders and others interested in the sector.
The first cities we assessed were Accra, Ghana and Nairobi, Kenya. Both are among the fastest-growing cities in the world, with dynamic entrepreneurial and technology sectors, and both are capitals of countries with an ongoing National Open Data Initiative., The two cities have also been selected to be part of the Negawatt Challenge, a World Bank international competition supporting technology innovation to solve local energy challenges.
The ecosystem approach
The starting point for the exercise was to consider the urban energy sector as an ecosystem, comprised of data suppliers, data users, key datasets, a legal framework, funding mechanisms, and ICT infrastructure. The methodology that we used adapted the established World Bank Open Data Readiness Assessment (ODRA), which highlights valuable connections between data suppliers and data demand.  The assessment showcases how to match pressing urban challenges with the opportunity to release and use data to address them, creating a longer-term commitment to the process. Mobilizing key stakeholders to provide quick, tangible results is also key to this approach….(More) …See also World Bank Open Government Data Toolkit.”