Out in the Open: This Man Wants to Turn Data Into Free Food (And So Much More)


in Wired: “Let’s say your city releases a list of all trees planted on its public property. It would be a godsend—at least in theory. You could filter the data into a list of all the fruit and nut trees in the city, transfer it into an online database, and create a smartphone app that helps anyone find free food.

Such is promise of “open data”—the massive troves of public information our governments now post to the net. The hope is that, if governments share enough of this data with the world at large, hackers and entrepreneurs will find a way of putting it to good use. But although so much of this government data is now available, the revolution hasn’t exactly happened.
In far too many cases, the data just sits there on a computer server, unseen and unused. Sometimes, no one knows about the data, or no one knows what to do with it. Other times, the data is just too hard to work with. If you’re building that free food app, how do you update your database when the government releases a new version of the spreadsheet? And if you let people report corrections to the data, how do you contribute that data back to the city?
These are the sorts of problems that obsess 25-year-old software developer Max Ogden, and they’re the reason he built Dat, a new piece of open source software that seeks to restart the open data revolution. Basically, Dat is a way of synchronizing data between two or more sources, tracking any changes to that data, and handling transformations from one data format to another. The aim is a simple one: Ogden wants to make it easier for governments to share their data with a world of software developers.
That’s just the sort of thing that government agencies are looking for, says Waldo Jaquith, the director of US Open Data Institute, the non-profit that is now hosting Dat…
Git is a piece of software originally written by Linux creator Linus Torvalds. It keeps track of code changes and makes it easier to integrate code submissions from outside developers. Ogden realized what developers needed wasn’t a GitHub for data, but a Git for data. And that’s what Dat is.
Instead of CouchDB, Dat relies on a lightweight, open-source data storage system from Google called LevelDB. The rest of the software was written in JavaScript by Ogden and his growing number of collaborators, which enables them to keep things minimal and easily run the software on multiple operating systems, including Windows, Linux and Macintosh OS X….”

Technology’s Crucial Role in the Fight Against Hunger


Crowdsourcing, predictive analytics and other new tools could go far toward finding innovative solutions for America’s food insecurity.

National Geographic recently sent three photographers to explore hunger in the United States. It was an effort to give a face to a very troubling statistic: Even today, one-sixth of Americans do not have enough food to eat. Fifty million people in this country are “food insecure” — having to make daily trade-offs among paying for food, housing or medical care — and 17 million of them skip at least one meal a day to get by. When choosing what to eat, many of these individuals must make choices between lesser quantities of higher-quality food and larger quantities of less-nutritious processed foods, the consumption of which often leads to expensive health problems down the road.
This is an extremely serious, but not easily visible, social problem. Nor does the challenge it poses become any easier when poorly designed public-assistance programs continue to count the sauce on a pizza as a vegetable. The deficiencies caused by hunger increase the likelihood that a child will drop out of school, lowering her lifetime earning potential. In 2010 alone, food insecurity cost America $167.5 billion, a figure that includes lost economic productivity, avoidable health-care expenses and social-services programs.
As much as we need specific policy innovations, if we are to eliminate hunger in America food insecurity is just one of many extraordinarily complex and interdependent “systemic” problems facing us that would benefit from the application of technology, not just to identify innovative solutions but to implement them as well. In addition to laudable policy initiatives by such states as Illinois and Nevada, which have made hunger a priority, or Arkansas, which suffers the greatest level of food insecurity but which is making great strides at providing breakfast to schoolchildren, we can — we must — bring technology to bear to create a sustained conversation between government and citizens to engage more Americans in the fight against hunger.

Identifying who is genuinely in need cannot be done as well by a centralized government bureaucracy — even one with regional offices — as it can through a distributed network of individuals and organizations able to pinpoint with on-the-ground accuracy where the demand is greatest. Just as Ushahidi uses crowdsourcing to help locate and identify disaster victims, it should be possible to leverage the crowd to spot victims of hunger. As it stands, attempts to eradicate so-called food deserts are often built around developing solutions for residents rather than with residents. Strategies to date tend to focus on the introduction of new grocery stores or farmers’ markets but with little input from or involvement of the citizens actually affected.

Applying predictive analytics to newly available sources of public as well as private data, such as that regularly gathered by supermarkets and other vendors, could also make it easier to offer coupons and discounts to those most in need. In addition, analyzing nonprofits’ tax returns, which are legally open and available to all, could help map where the organizations serving those in need leave gaps that need to be closed by other efforts. The Governance Lab recently brought together U.S. Department of Agriculture officials with companies that use USDA data in an effort to focus on strategies supporting a White House initiative to use climate-change and other open data to improve food production.

Such innovative uses of technology, which put citizens at the center of the service-delivery process and streamline the delivery of government support, could also speed the delivery of benefits, thus reducing both costs and, every bit as important, the indignity of applying for assistance.

Being open to new and creative ideas from outside government through brainstorming and crowdsourcing exercises using social media can go beyond simply improving the quality of the services delivered. Some of these ideas, such as those arising from exciting new social-science experiments involving the use of incentives for “nudging” people to change their behaviors, might even lead them to purchase more healthful food.

Further, new kinds of public-private collaborative partnerships could create the means for people to produce their own food. Both new kinds of financing arrangements and new apps for managing the shared use of common real estate could make more community gardens possible. Similarly, with the kind of attention, convening and funding that government can bring to an issue, new neighbor-helping-neighbor programs — where, for example, people take turns shopping and cooking for one another to alleviate time away from work — could be scaled up.

Then, too, advances in citizen engagement and oversight could make it more difficult for lawmakers to cave to the pressures of lobbying groups that push for subsidies for those crops, such as white potatoes and corn, that result in our current large-scale reliance on less-nutritious foods. At the same time, citizen scientists reporting data through an app would be able do a much better job than government inspectors in reporting what is and is not working in local communities.

As a society, we may not yet be able to banish hunger entirely. But if we commit to using new technologies and mechanisms of citizen engagement widely and wisely, we could vastly reduce its power to do harm.

What Cars Did for Today’s World, Data May Do for Tomorrow’s


Quentin Hardy in the New York Times: “New technology products head at us constantly. There’s the latest smartphone, the shiny new app, the hot social network, even the smarter thermostat.

As great (or not) as all these may be, each thing is a small part of a much bigger process that’s rarely admired. They all belong inside a world-changing ecosystem of digital hardware and software, spreading into every area of our lives.

Thinking about what is going on behind the scenes is easier if we consider the automobile, also known as “the machine that changed the world.” Cars succeeded through the widespread construction of highways and gas stations. Those things created a global supply chain of steel plants and refineries. Seemingly unrelated things, including suburbs, fast food and drive-time talk radio, arose in the success.

Today’s dominant industrial ecosystem is relentlessly acquiring and processing digital information. It demands newer and better ways of collecting, shipping, and processing data, much the way cars needed better road building. And it’s spinning out its own unseen businesses.

A few recent developments illustrate the new ecosystem. General Electric plans to announce Monday that it has created a “data lake” method of analyzing sensor information from industrial machinery in places like railroads, airlines, hospitals and utilities. G.E. has been putting sensors on everything it can for a couple of years, and now it is out to read all that information quickly.

The company, working with an outfit called Pivotal, said that in the last three months it has looked at information from 3.4 million miles of flights by 24 airlines using G.E. jet engines. G.E. said it figured out things like possible defects 2,000 times as fast as it could before.

The company has to, since it’s getting so much more data. “In 10 years, 17 billion pieces of equipment will have sensors,” said William Ruh, vice president of G.E. software. “We’re only one-tenth of the way there.”

It hardly matters if Mr. Ruh is off by five billion or so. Billions of humans are already augmenting that number with their own packages of sensors, called smartphones, fitness bands and wearable computers. Almost all of that will get uploaded someplace too.

Shipping that data creates challenges. In June, researchers at the University of California, San Diego announced a method of engineering fiber optic cable that could make digital networks run 10 times faster. The idea is to get more parts of the system working closer to the speed of light, without involving the “slow” processing of electronic semiconductors.

“We’re going from millions of personal computers and billions of smartphones to tens of billions of devices, with and without people, and that is the early phase of all this,” said Larry Smarr, drector of the California Institute for Telecommunications and Information Technology, located inside U.C.S.D. “A gigabit a second was fast in commercial networks, now we’re at 100 gigabits a second. A terabit a second will come and go. A petabit a second will come and go.”

In other words, Mr. Smarr thinks commercial networks will eventually be 10,000 times as fast as today’s best systems. “It will have to grow, if we’re going to continue what has become our primary basis of wealth creation,” he said.

Add computation to collection and transport. Last month, U.C. Berkeley’s AMP Lab, created two years ago for research into new kinds of large-scale computing, spun out a company called Databricks, that uses new kinds of software for fast data analysis on a rental basis. Databricks plugs into the one million-plus computer servers inside the global system of Amazon Web Services, and will soon work inside similar-size megacomputing systems from Google and Microsoft.

It was the second company out of the AMP Lab this year. The first, called Mesosphere, enables a kind of pooling of computing services, building the efficiency of even million-computer systems….”

Monitoring Arms Control Compliance With Web Intelligence


Chris Holden and Maynard Holliday at Commons Lab: “Traditional monitoring of arms control treaties, agreements, and commitments has required the use of National Technical Means (NTM)—large satellites, phased array radars, and other technological solutions. NTM was a good solution when the treaties focused on large items for observation, such as missile silos or nuclear test facilities. As the targets of interest have shrunk by orders of magnitude, the need for other, more ubiquitous, sensor capabilities has increased. The rise in web-based, or cloud-based, analytic capabilities will have a significant influence on the future of arms control monitoring and the role of citizen involvement.
Since 1999, the U.S. Department of State has had at its disposal the Key Verification Assets Fund (V Fund), which was established by Congress. The Fund helps preserve critical verification assets and promotes the development of new technologies that support the verification of and compliance with arms control, nonproliferation, and disarmament requirements.
Sponsored by the V Fund to advance web-based analytic capabilities, Sandia National Laboratories, in collaboration with Recorded Future (RF), synthesized open-source data streams from a wide variety of traditional and nontraditional web sources in multiple languages along with topical texts and articles on national security policy to determine the efficacy of monitoring chemical and biological arms control agreements and compliance. The team used novel technology involving linguistic algorithms to extract temporal signals from unstructured text and organize that unstructured text into a multidimensional structure for analysis. In doing so, the algorithm identifies the underlying associations between entities and events across documents and sources over time. Using this capability, the team analyzed several events that could serve as analogs to treaty noncompliance, technical breakout, or an intentional attack. These events included the H7N9 bird flu outbreak in China, the Shanghai pig die-off and the fungal meningitis outbreak in the United States last year.
h7n9-for-blog
 
For H7N9 we found that open source social media were the first to report the outbreak and give ongoing updates.  The Sandia RF system was able to roughly estimate lethality based on temporal hospitalization and fatality reporting.  For the Shanghai pig die-off the analysis tracked the rapid assessment by Chinese authorities that H7N9 was not the cause of the pig die-off as had been originally speculated. Open source reporting highlighted a reduced market for pork in China due to the very public dead pig display in Shanghai. Possible downstream health effects were predicted (e.g., contaminated water supply and other overall food ecosystem concerns). In addition, legitimate U.S. food security concerns were raised based on the Chinese purchase of the largest U.S. pork producer (Smithfield) because of a fear of potential import of tainted pork into the United States….
To read the full paper, please click here.”

The infrastructure Africa really needs is better data reporting


Data reporting on the continent is sketchy. Just look at the recent GDP revisions of large countries. How is it that Nigeria’s April GDP recalculation catapulted it ahead of South Africa, making it the largest economy in Africa overnight? Or that Kenya’s economy is actually 20% larger (paywall) than previously thought?

Indeed, countries in Africa get noticeably bad scores on the World Bank’s Bulletin Board on Statistical Capacity, an index of data reporting integrity.

Bad data is not simply the result of inconsistencies or miscalculations: African governments have an incentive to produce statistics that overstate their economic development.

A recent working paper from the Center for Global Development (CGD) shows how politics influence the statistics released by many African countries…

But in the long run, dodgy statistics aren’t good for anyone. They “distort the way we understand the opportunities that are available,” says Amanda Glassman, one of the CGD report’s authors. US firms have pledged $14 billion in trade deals at the summit in Washington. No doubt they would like to know whether high school enrollment promises to create a more educated workforce in a given country, or whether its people have been immunized for viruses.

Overly optimistic indicators also distort how a government decides where to focus its efforts. If school enrollment appears to be high, why implement programs intended to increase it?

The CGD report suggests increased funding to national statistical agencies, and making sure that they are wholly independent from their governments. President Obama is talking up $7 billion into African agriculture. But unless cash and attention are given to improving statistical integrity, he may never know whether that investment has borne fruit”

The Data Act's unexpected benefit


Adam Mazmanian at FCW: “The Digital Accountability and Transparency Act sets an aggressive schedule for creating governmentwide financial standards. The first challenge belongs to the Treasury Department and the Office of Management and Budget. They must come up with a set of common data elements for financial information that will cover just about everything the government spends money on and every entity it pays in order to give oversight bodies and government watchdogs a top-down view of federal spending from appropriation to expenditure. Those data elements are scheduled for completion by May 2015, one year after the act’s passage.
Two years after those standards are in place, agencies will be required to report their financial information following Data Act guidelines. The government currently supports more than 150 financial management systems but lacks a common data dictionary, so there are not necessarily agreed-upon definitions of how to classify and track government programs and types of expenditures.
“As far as systems today and how we can get there, they don’t necessarily map in the way that the act described,” U.S. CIO Steven VanRoekel said in June. “It’s going to be a journey to get to where the act aspires for us to be.”
However, an Obama administration initiative to encourage agencies to share financial services could be part of the solution. In May, OMB and Treasury designated four financial shared-services providers for government agencies: the Agriculture Department’s National Finance Center, the Interior Department’s Interior Business Center, the Transportation Department’s Enterprise Services Center and Treasury’s Administrative Resource Center.
There are some synergies between shared services and data standardization, but shared financial services alone will not guarantee Data Act compliance, especially considering that the government expects the migration to take 10 to 15 years. Nevertheless, the discipline required under the Data Act could boost agency efforts to prepare financial data when it comes time to move to a shared service….”

The Quiet Revolution: Open Data Is Transforming Citizen-Government Interaction


Maury Blackman at Wired: “The public’s trust in government is at an all-time low. This is not breaking news.
But what if I told you that just this past May, President Obama signed into law a bill that passed Congress with unanimous support. A bill that could fundamentally transform the way citizens interact with their government. This legislation could also create an entirely new, trillion-dollar industry right here in the U.S. It could even save lives.
On May 9th, the Digital Accountability and Transparency Act of 2014 (DATA Act) became law. There were very few headlines, no Rose Garden press conference.
I imagine most of you have never heard of the DATA Act. The bill with the nerdy name has the potential to revolutionize government. It requires federal agencies to make their spending data available in standardized, publicly accessible formats.  Supporters of the legislation included Tea Partiers and the most liberal Democrats. But the bill is only scratches the surface of what’s possible.
So What’s the Big Deal?
On his first day in Office, President Obama signed a memorandum calling for a more open and transparent government. The President wrote, “Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” This was followed by the creation of Data.gov, a one-stop shop for all government data. The site does not just include financial data, but also a wealth of other information related to education, public safety, climate and much more—all available in open and machine-readable format. This has helped fuel an international movement.
Tech minded citizens are building civic apps to bring government into the digital age; reporters are now more able to connect the dots easier, not to mention the billions of taxpayer dollars saved. And last year the President took us a step further. He signed an Executive Order making open government data the default option.
Cities and states have followed Washington’s lead with similar open data efforts on the local level. In San Francisco, the city’s Human Services Agency has partnered with Promptly; a text message notification service that alerts food stamp recipients (CalFresh) when they are at risk of being disenrolled from the program. This service is incredibly beneficial, because most do not realize any change in status, until they are in the grocery store checkout line, trying to buy food for their family.
Other products and services created using open data do more than just provide an added convenience—they actually have the potential to save lives. The PulsePoint mobile app sends text messages to citizens trained in CPR when someone in walking distance is experiencing a medical emergency that may require CPR. The app is currently available in almost 600 cities in 18 states, which is great. But shouldn’t a product this valuable be available to every city and state in the country?…”

Unleashing Climate Data to Empower America’s Agricultural Sector


Secretary Tom Vilsack and John P. Holdren at the White House Blog: “Today, in a major step to advance the President’s Climate Data Initiative, the Obama administration is inviting leaders of the technology and agricultural sectors to the White House to discuss new collaborative steps to unleash data that will help ensure our food system is resilient to the effects of climate change.

More intense heat waves, heavier downpours, and severe droughts and wildfires out west are already affecting the nation’s ability to produce and transport safe food. The recently released National Climate Assessment makes clear that these kinds of impacts are projected to become more severe over this century.

Food distributors, agricultural businesses, farmers, and retailers need accessible, useable data, tools, and information to ensure the effectiveness and sustainability of their operations – from water availability, to timing of planting and harvest, to storage practices, and more.

Today’s convening at the White House will include formal commitments by a host of private-sector companies and nongovernmental organizations to support the President’s Climate Data Initiative by harnessing climate data in ways that will increase the resilience of America’s food system and help reduce the contribution of the nation’s agricultural sector to climate change.

Microsoft Research, for instance, will grant 12 months of free cloud-computing resources to winners of a national challenge to create a smartphone app that helps farmers increase the resilience of their food production systems in the face of weather variability and climate change; the Michigan Agri-Business Association will soon launch a publicly available web-based mapping tool for use by the state’s agriculture sector; and the U.S. dairy industry will test and pilot four new modules – energy, feed, nutrient, and herd management – on the data-driven Farm Smart environmental-footprint calculation tool by the end of 2014. These are just a few among dozens of exciting commitments.

And the federal government is also stepping up. Today, anyone can log onto climate.data.gov and find new features that make data accessible and usable about the risks of climate change to food production, delivery, and nutrition – including current and historical data from the Census of Agriculture on production, supply, and distribution of agricultural products, and data on climate-change-related risks such as storms, heat waves, and drought.

These steps are a direct response to the President’s call for all hands on deck to generate further innovation to help prepare America’s communities and business for the impacts of climate change.

We are delighted about the steps being announced by dozens of collaborators today, and we can’t wait to see what further tools, apps, and services are developed as the Administration and its partners continue to unleash data to make America’s agriculture enterprise stronger and more resilient than ever before.

Read a fact sheet about all of today’s Climate Data Initiative commitments here.

Open Data for economic growth: the latest evidence


Andrew Stott at the Worldbank OpenData Blog: “One of the key policy drivers for Open Data has been to drive economic growth and business innovation. There’s a growing amount of evidence and analysis not only for the total potential economic benefit but also for some of the ways in which this is coming about. This evidence is summarised and reviewed in a new World Bank paper published today.
There’s a range of studies that suggest that the potential prize from Open Data could be enormous – including an estimate of $3-5 trillion a year globally from McKinsey Global Institute and an estimate of $13 trillion cumulative over the next 5 years in the G20 countries.  There are supporting studies of the value of Open Data to certain sectors in certain countries – for instance $20 billion a year to Agriculture in the US – and of the value of key datasets such as geospatial data.  All these support the conclusion that the economic potential is at least significant – although with a range from “significant” to “extremely significant”!
At least some of this benefit is already being realised by new companies that have sprung up to deliver new, innovative, data-rich services and by older companies improving their efficiency by using open data to optimise their operations. Five main business archetypes have been identified – suppliers, aggregators, enrichers, application developers and enablers. What’s more there are at least four companies which did not exist ten years ago, which are driven by Open Data, and which are each now valued at around $1 billion or more. Somewhat surprisingly the drive to exploit Open Data is coming from outside the traditional “ICT sector” – although the ICT sector is supplying many of the tools required.
It’s also becoming clear that if countries want to maximise their gain from Open Data the role of government needs to go beyond simply publishing some data on a website. Governments need to be:

  • Suppliers – of the data that business need
  • Leaders – making sure that municipalities, state owned enterprises and public services operated by the private sector also release important data
  • Catalysts – nurturing a thriving ecosystem of data users, coders and application developers and incubating new, data-driven businesses
  • Users – using Open Data themselves to overcome the barriers to using data within government and innovating new ways to use the data they collect to improve public services and government efficiency.

Nevertheless, most of the evidence for big economic benefits for Open Data comes from the developed world. So on Wednesday the World Bank is holding an open seminar to examine critically “Can Open Data Boost Economic Growth and Prosperity” in developing countries. Please join us and join the debate!
Learn more:

Selected Readings on Sentiment Analysis


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of sentiment analysis was originally published in 2014.

Sentiment Analysis is a field of Computer Science that uses techniques from natural language processing, computational linguistics, and machine learning to predict subjective meaning from text. The term opinion mining is often used interchangeably with Sentiment Analysis, although it is technically a subfield focusing on the extraction of opinions (the umbrella under which sentiment, evaluation, appraisal, attitude, and emotion all lie).

The rise of Web 2.0 and increased information flow has led to an increase in interest towards Sentiment Analysis — especially as applied to social networks and media. Events causing large spikes in media — such as the 2012 Presidential Election Debates — are especially ripe for analysis. Such analyses raise a variety of implications for the future of crowd participation, elections, and governance.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Choi, Eunsol et al. “Hedge detection as a lens on framing in the GMO debates: a position paper.” Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics 13 Jul. 2012: 70-79. http://bit.ly/1wweftP

  • Understanding the ways in which participants in public discussions frame their arguments is important for understanding how public opinion is formed. This paper adopts the position that it is time for more computationally-oriented research on problems involving framing. In the interests of furthering that goal, the authors propose the following question: In the controversy regarding the use of genetically-modified organisms (GMOs) in agriculture, do pro- and anti-GMO articles differ in whether they choose to adopt a more “scientific” tone?
  • Prior work on the rhetoric and sociology of science suggests that hedging may distinguish popular-science text from text written by professional scientists for their colleagues. The paper proposes a detailed approach to studying whether hedge detection can be used to understand scientific framing in the GMO debates, and provides corpora to facilitate this study. Some of the preliminary analyses suggest that hedges occur less frequently in scientific discourse than in popular text, a finding that contradicts prior assertions in the literature.

Michael, Christina, Francesca Toni, and Krysia Broda. “Sentiment analysis for debates.” (Unpublished MSc thesis). Department of Computing, Imperial College London (2013). http://bit.ly/Wi86Xv

  • This project aims to expand on existing solutions used for automatic sentiment analysis on text in order to capture support/opposition and agreement/disagreement in debates. In addition, it looks at visualizing the classification results for enhancing the ease of understanding the debates and for showing underlying trends. Finally, it evaluates proposed techniques on an existing debate system for social networking.

Murakami, Akiko, and Rudy Raymond. “Support or oppose?: classifying positions in online debates from reply activities and opinion expressions.” Proceedings of the 23rd International Conference on Computational Linguistics: Posters 23 Aug. 2010: 869-875. https://bit.ly/2Eicfnm

  • In this paper, the authors propose a method for the task of identifying the general positions of users in online debates, i.e., support or oppose the main topic of an online debate, by exploiting local information in their remarks within the debate. An online debate is a forum where each user posts an opinion on a particular topic while other users state their positions by posting their remarks within the debate. The supporting or opposing remarks are made by directly replying to the opinion, or indirectly to other remarks (to express local agreement or disagreement), which makes the task of identifying users’ general positions difficult.
  • A prior study has shown that a link-based method, which completely ignores the content of the remarks, can achieve higher accuracy for the identification task than methods based solely on the contents of the remarks. In this paper, it is shown that utilizing the textual content of the remarks into the link-based method can yield higher accuracy in the identification task.

Pang, Bo, and Lillian Lee. “Opinion mining and sentiment analysis.” Foundations and trends in information retrieval 2.1-2 (2008): 1-135. http://bit.ly/UaCBwD

  • This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Its focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. It includes material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

Ranade, Sarvesh et al. “Online debate summarization using topic directed sentiment analysis.” Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining 11 Aug. 2013: 7. http://bit.ly/1nbKtLn

  • Social networking sites provide users a virtual community interaction platform to share their thoughts, life experiences and opinions. Online debate forum is one such platform where people can take a stance and argue in support or opposition of debate topics. An important feature of such forums is that they are dynamic and grow rapidly. In such situations, effective opinion summarization approaches are needed so that readers need not go through the entire debate.
  • This paper aims to summarize online debates by extracting highly topic relevant and sentiment rich sentences. The proposed approach takes into account topic relevant, document relevant and sentiment based features to capture topic opinionated sentences. ROUGE (Recall-Oriented Understudy for Gisting Evaluation, which employ a set of metrics and a software package to compare automatically produced summary or translation against human-produced onces) scores are used to evaluate the system. This system significantly outperforms several baseline systems and show improvement over the state-of-the-art opinion summarization system. The results verify that topic directed sentiment features are most important to generate effective debate summaries.

Schneider, Jodi. “Automated argumentation mining to the rescue? Envisioning argumentation and decision-making support for debates in open online collaboration communities.” http://bit.ly/1mi7ztx

  • Argumentation mining, a relatively new area of discourse analysis, involves automatically identifying and structuring arguments. Following a basic introduction to argumentation, the authors describe a new possible domain for argumentation mining: debates in open online collaboration communities.
  • Based on our experience with manual annotation of arguments in debates, the authors propose argumentation mining as the basis for three kinds of support tools, for authoring more persuasive arguments, finding weaknesses in others’ arguments, and summarizing a debate’s overall conclusions.