What Cars Did for Today’s World, Data May Do for Tomorrow’s


Quentin Hardy in the New York Times: “New technology products head at us constantly. There’s the latest smartphone, the shiny new app, the hot social network, even the smarter thermostat.

As great (or not) as all these may be, each thing is a small part of a much bigger process that’s rarely admired. They all belong inside a world-changing ecosystem of digital hardware and software, spreading into every area of our lives.

Thinking about what is going on behind the scenes is easier if we consider the automobile, also known as “the machine that changed the world.” Cars succeeded through the widespread construction of highways and gas stations. Those things created a global supply chain of steel plants and refineries. Seemingly unrelated things, including suburbs, fast food and drive-time talk radio, arose in the success.

Today’s dominant industrial ecosystem is relentlessly acquiring and processing digital information. It demands newer and better ways of collecting, shipping, and processing data, much the way cars needed better road building. And it’s spinning out its own unseen businesses.

A few recent developments illustrate the new ecosystem. General Electric plans to announce Monday that it has created a “data lake” method of analyzing sensor information from industrial machinery in places like railroads, airlines, hospitals and utilities. G.E. has been putting sensors on everything it can for a couple of years, and now it is out to read all that information quickly.

The company, working with an outfit called Pivotal, said that in the last three months it has looked at information from 3.4 million miles of flights by 24 airlines using G.E. jet engines. G.E. said it figured out things like possible defects 2,000 times as fast as it could before.

The company has to, since it’s getting so much more data. “In 10 years, 17 billion pieces of equipment will have sensors,” said William Ruh, vice president of G.E. software. “We’re only one-tenth of the way there.”

It hardly matters if Mr. Ruh is off by five billion or so. Billions of humans are already augmenting that number with their own packages of sensors, called smartphones, fitness bands and wearable computers. Almost all of that will get uploaded someplace too.

Shipping that data creates challenges. In June, researchers at the University of California, San Diego announced a method of engineering fiber optic cable that could make digital networks run 10 times faster. The idea is to get more parts of the system working closer to the speed of light, without involving the “slow” processing of electronic semiconductors.

“We’re going from millions of personal computers and billions of smartphones to tens of billions of devices, with and without people, and that is the early phase of all this,” said Larry Smarr, drector of the California Institute for Telecommunications and Information Technology, located inside U.C.S.D. “A gigabit a second was fast in commercial networks, now we’re at 100 gigabits a second. A terabit a second will come and go. A petabit a second will come and go.”

In other words, Mr. Smarr thinks commercial networks will eventually be 10,000 times as fast as today’s best systems. “It will have to grow, if we’re going to continue what has become our primary basis of wealth creation,” he said.

Add computation to collection and transport. Last month, U.C. Berkeley’s AMP Lab, created two years ago for research into new kinds of large-scale computing, spun out a company called Databricks, that uses new kinds of software for fast data analysis on a rental basis. Databricks plugs into the one million-plus computer servers inside the global system of Amazon Web Services, and will soon work inside similar-size megacomputing systems from Google and Microsoft.

It was the second company out of the AMP Lab this year. The first, called Mesosphere, enables a kind of pooling of computing services, building the efficiency of even million-computer systems….”

Monitoring Arms Control Compliance With Web Intelligence


Chris Holden and Maynard Holliday at Commons Lab: “Traditional monitoring of arms control treaties, agreements, and commitments has required the use of National Technical Means (NTM)—large satellites, phased array radars, and other technological solutions. NTM was a good solution when the treaties focused on large items for observation, such as missile silos or nuclear test facilities. As the targets of interest have shrunk by orders of magnitude, the need for other, more ubiquitous, sensor capabilities has increased. The rise in web-based, or cloud-based, analytic capabilities will have a significant influence on the future of arms control monitoring and the role of citizen involvement.
Since 1999, the U.S. Department of State has had at its disposal the Key Verification Assets Fund (V Fund), which was established by Congress. The Fund helps preserve critical verification assets and promotes the development of new technologies that support the verification of and compliance with arms control, nonproliferation, and disarmament requirements.
Sponsored by the V Fund to advance web-based analytic capabilities, Sandia National Laboratories, in collaboration with Recorded Future (RF), synthesized open-source data streams from a wide variety of traditional and nontraditional web sources in multiple languages along with topical texts and articles on national security policy to determine the efficacy of monitoring chemical and biological arms control agreements and compliance. The team used novel technology involving linguistic algorithms to extract temporal signals from unstructured text and organize that unstructured text into a multidimensional structure for analysis. In doing so, the algorithm identifies the underlying associations between entities and events across documents and sources over time. Using this capability, the team analyzed several events that could serve as analogs to treaty noncompliance, technical breakout, or an intentional attack. These events included the H7N9 bird flu outbreak in China, the Shanghai pig die-off and the fungal meningitis outbreak in the United States last year.
h7n9-for-blog
 
For H7N9 we found that open source social media were the first to report the outbreak and give ongoing updates.  The Sandia RF system was able to roughly estimate lethality based on temporal hospitalization and fatality reporting.  For the Shanghai pig die-off the analysis tracked the rapid assessment by Chinese authorities that H7N9 was not the cause of the pig die-off as had been originally speculated. Open source reporting highlighted a reduced market for pork in China due to the very public dead pig display in Shanghai. Possible downstream health effects were predicted (e.g., contaminated water supply and other overall food ecosystem concerns). In addition, legitimate U.S. food security concerns were raised based on the Chinese purchase of the largest U.S. pork producer (Smithfield) because of a fear of potential import of tainted pork into the United States….
To read the full paper, please click here.”

The infrastructure Africa really needs is better data reporting


Data reporting on the continent is sketchy. Just look at the recent GDP revisions of large countries. How is it that Nigeria’s April GDP recalculation catapulted it ahead of South Africa, making it the largest economy in Africa overnight? Or that Kenya’s economy is actually 20% larger (paywall) than previously thought?

Indeed, countries in Africa get noticeably bad scores on the World Bank’s Bulletin Board on Statistical Capacity, an index of data reporting integrity.

Bad data is not simply the result of inconsistencies or miscalculations: African governments have an incentive to produce statistics that overstate their economic development.

A recent working paper from the Center for Global Development (CGD) shows how politics influence the statistics released by many African countries…

But in the long run, dodgy statistics aren’t good for anyone. They “distort the way we understand the opportunities that are available,” says Amanda Glassman, one of the CGD report’s authors. US firms have pledged $14 billion in trade deals at the summit in Washington. No doubt they would like to know whether high school enrollment promises to create a more educated workforce in a given country, or whether its people have been immunized for viruses.

Overly optimistic indicators also distort how a government decides where to focus its efforts. If school enrollment appears to be high, why implement programs intended to increase it?

The CGD report suggests increased funding to national statistical agencies, and making sure that they are wholly independent from their governments. President Obama is talking up $7 billion into African agriculture. But unless cash and attention are given to improving statistical integrity, he may never know whether that investment has borne fruit”

The Data Act's unexpected benefit


Adam Mazmanian at FCW: “The Digital Accountability and Transparency Act sets an aggressive schedule for creating governmentwide financial standards. The first challenge belongs to the Treasury Department and the Office of Management and Budget. They must come up with a set of common data elements for financial information that will cover just about everything the government spends money on and every entity it pays in order to give oversight bodies and government watchdogs a top-down view of federal spending from appropriation to expenditure. Those data elements are scheduled for completion by May 2015, one year after the act’s passage.
Two years after those standards are in place, agencies will be required to report their financial information following Data Act guidelines. The government currently supports more than 150 financial management systems but lacks a common data dictionary, so there are not necessarily agreed-upon definitions of how to classify and track government programs and types of expenditures.
“As far as systems today and how we can get there, they don’t necessarily map in the way that the act described,” U.S. CIO Steven VanRoekel said in June. “It’s going to be a journey to get to where the act aspires for us to be.”
However, an Obama administration initiative to encourage agencies to share financial services could be part of the solution. In May, OMB and Treasury designated four financial shared-services providers for government agencies: the Agriculture Department’s National Finance Center, the Interior Department’s Interior Business Center, the Transportation Department’s Enterprise Services Center and Treasury’s Administrative Resource Center.
There are some synergies between shared services and data standardization, but shared financial services alone will not guarantee Data Act compliance, especially considering that the government expects the migration to take 10 to 15 years. Nevertheless, the discipline required under the Data Act could boost agency efforts to prepare financial data when it comes time to move to a shared service….”

The Quiet Revolution: Open Data Is Transforming Citizen-Government Interaction


Maury Blackman at Wired: “The public’s trust in government is at an all-time low. This is not breaking news.
But what if I told you that just this past May, President Obama signed into law a bill that passed Congress with unanimous support. A bill that could fundamentally transform the way citizens interact with their government. This legislation could also create an entirely new, trillion-dollar industry right here in the U.S. It could even save lives.
On May 9th, the Digital Accountability and Transparency Act of 2014 (DATA Act) became law. There were very few headlines, no Rose Garden press conference.
I imagine most of you have never heard of the DATA Act. The bill with the nerdy name has the potential to revolutionize government. It requires federal agencies to make their spending data available in standardized, publicly accessible formats.  Supporters of the legislation included Tea Partiers and the most liberal Democrats. But the bill is only scratches the surface of what’s possible.
So What’s the Big Deal?
On his first day in Office, President Obama signed a memorandum calling for a more open and transparent government. The President wrote, “Openness will strengthen our democracy and promote efficiency and effectiveness in Government.” This was followed by the creation of Data.gov, a one-stop shop for all government data. The site does not just include financial data, but also a wealth of other information related to education, public safety, climate and much more—all available in open and machine-readable format. This has helped fuel an international movement.
Tech minded citizens are building civic apps to bring government into the digital age; reporters are now more able to connect the dots easier, not to mention the billions of taxpayer dollars saved. And last year the President took us a step further. He signed an Executive Order making open government data the default option.
Cities and states have followed Washington’s lead with similar open data efforts on the local level. In San Francisco, the city’s Human Services Agency has partnered with Promptly; a text message notification service that alerts food stamp recipients (CalFresh) when they are at risk of being disenrolled from the program. This service is incredibly beneficial, because most do not realize any change in status, until they are in the grocery store checkout line, trying to buy food for their family.
Other products and services created using open data do more than just provide an added convenience—they actually have the potential to save lives. The PulsePoint mobile app sends text messages to citizens trained in CPR when someone in walking distance is experiencing a medical emergency that may require CPR. The app is currently available in almost 600 cities in 18 states, which is great. But shouldn’t a product this valuable be available to every city and state in the country?…”

Unleashing Climate Data to Empower America’s Agricultural Sector


Secretary Tom Vilsack and John P. Holdren at the White House Blog: “Today, in a major step to advance the President’s Climate Data Initiative, the Obama administration is inviting leaders of the technology and agricultural sectors to the White House to discuss new collaborative steps to unleash data that will help ensure our food system is resilient to the effects of climate change.

More intense heat waves, heavier downpours, and severe droughts and wildfires out west are already affecting the nation’s ability to produce and transport safe food. The recently released National Climate Assessment makes clear that these kinds of impacts are projected to become more severe over this century.

Food distributors, agricultural businesses, farmers, and retailers need accessible, useable data, tools, and information to ensure the effectiveness and sustainability of their operations – from water availability, to timing of planting and harvest, to storage practices, and more.

Today’s convening at the White House will include formal commitments by a host of private-sector companies and nongovernmental organizations to support the President’s Climate Data Initiative by harnessing climate data in ways that will increase the resilience of America’s food system and help reduce the contribution of the nation’s agricultural sector to climate change.

Microsoft Research, for instance, will grant 12 months of free cloud-computing resources to winners of a national challenge to create a smartphone app that helps farmers increase the resilience of their food production systems in the face of weather variability and climate change; the Michigan Agri-Business Association will soon launch a publicly available web-based mapping tool for use by the state’s agriculture sector; and the U.S. dairy industry will test and pilot four new modules – energy, feed, nutrient, and herd management – on the data-driven Farm Smart environmental-footprint calculation tool by the end of 2014. These are just a few among dozens of exciting commitments.

And the federal government is also stepping up. Today, anyone can log onto climate.data.gov and find new features that make data accessible and usable about the risks of climate change to food production, delivery, and nutrition – including current and historical data from the Census of Agriculture on production, supply, and distribution of agricultural products, and data on climate-change-related risks such as storms, heat waves, and drought.

These steps are a direct response to the President’s call for all hands on deck to generate further innovation to help prepare America’s communities and business for the impacts of climate change.

We are delighted about the steps being announced by dozens of collaborators today, and we can’t wait to see what further tools, apps, and services are developed as the Administration and its partners continue to unleash data to make America’s agriculture enterprise stronger and more resilient than ever before.

Read a fact sheet about all of today’s Climate Data Initiative commitments here.

Open Data for economic growth: the latest evidence


Andrew Stott at the Worldbank OpenData Blog: “One of the key policy drivers for Open Data has been to drive economic growth and business innovation. There’s a growing amount of evidence and analysis not only for the total potential economic benefit but also for some of the ways in which this is coming about. This evidence is summarised and reviewed in a new World Bank paper published today.
There’s a range of studies that suggest that the potential prize from Open Data could be enormous – including an estimate of $3-5 trillion a year globally from McKinsey Global Institute and an estimate of $13 trillion cumulative over the next 5 years in the G20 countries.  There are supporting studies of the value of Open Data to certain sectors in certain countries – for instance $20 billion a year to Agriculture in the US – and of the value of key datasets such as geospatial data.  All these support the conclusion that the economic potential is at least significant – although with a range from “significant” to “extremely significant”!
At least some of this benefit is already being realised by new companies that have sprung up to deliver new, innovative, data-rich services and by older companies improving their efficiency by using open data to optimise their operations. Five main business archetypes have been identified – suppliers, aggregators, enrichers, application developers and enablers. What’s more there are at least four companies which did not exist ten years ago, which are driven by Open Data, and which are each now valued at around $1 billion or more. Somewhat surprisingly the drive to exploit Open Data is coming from outside the traditional “ICT sector” – although the ICT sector is supplying many of the tools required.
It’s also becoming clear that if countries want to maximise their gain from Open Data the role of government needs to go beyond simply publishing some data on a website. Governments need to be:

  • Suppliers – of the data that business need
  • Leaders – making sure that municipalities, state owned enterprises and public services operated by the private sector also release important data
  • Catalysts – nurturing a thriving ecosystem of data users, coders and application developers and incubating new, data-driven businesses
  • Users – using Open Data themselves to overcome the barriers to using data within government and innovating new ways to use the data they collect to improve public services and government efficiency.

Nevertheless, most of the evidence for big economic benefits for Open Data comes from the developed world. So on Wednesday the World Bank is holding an open seminar to examine critically “Can Open Data Boost Economic Growth and Prosperity” in developing countries. Please join us and join the debate!
Learn more:

Selected Readings on Sentiment Analysis


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of sentiment analysis was originally published in 2014.

Sentiment Analysis is a field of Computer Science that uses techniques from natural language processing, computational linguistics, and machine learning to predict subjective meaning from text. The term opinion mining is often used interchangeably with Sentiment Analysis, although it is technically a subfield focusing on the extraction of opinions (the umbrella under which sentiment, evaluation, appraisal, attitude, and emotion all lie).

The rise of Web 2.0 and increased information flow has led to an increase in interest towards Sentiment Analysis — especially as applied to social networks and media. Events causing large spikes in media — such as the 2012 Presidential Election Debates — are especially ripe for analysis. Such analyses raise a variety of implications for the future of crowd participation, elections, and governance.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Choi, Eunsol et al. “Hedge detection as a lens on framing in the GMO debates: a position paper.” Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics 13 Jul. 2012: 70-79. http://bit.ly/1wweftP

  • Understanding the ways in which participants in public discussions frame their arguments is important for understanding how public opinion is formed. This paper adopts the position that it is time for more computationally-oriented research on problems involving framing. In the interests of furthering that goal, the authors propose the following question: In the controversy regarding the use of genetically-modified organisms (GMOs) in agriculture, do pro- and anti-GMO articles differ in whether they choose to adopt a more “scientific” tone?
  • Prior work on the rhetoric and sociology of science suggests that hedging may distinguish popular-science text from text written by professional scientists for their colleagues. The paper proposes a detailed approach to studying whether hedge detection can be used to understand scientific framing in the GMO debates, and provides corpora to facilitate this study. Some of the preliminary analyses suggest that hedges occur less frequently in scientific discourse than in popular text, a finding that contradicts prior assertions in the literature.

Michael, Christina, Francesca Toni, and Krysia Broda. “Sentiment analysis for debates.” (Unpublished MSc thesis). Department of Computing, Imperial College London (2013). http://bit.ly/Wi86Xv

  • This project aims to expand on existing solutions used for automatic sentiment analysis on text in order to capture support/opposition and agreement/disagreement in debates. In addition, it looks at visualizing the classification results for enhancing the ease of understanding the debates and for showing underlying trends. Finally, it evaluates proposed techniques on an existing debate system for social networking.

Murakami, Akiko, and Rudy Raymond. “Support or oppose?: classifying positions in online debates from reply activities and opinion expressions.” Proceedings of the 23rd International Conference on Computational Linguistics: Posters 23 Aug. 2010: 869-875. https://bit.ly/2Eicfnm

  • In this paper, the authors propose a method for the task of identifying the general positions of users in online debates, i.e., support or oppose the main topic of an online debate, by exploiting local information in their remarks within the debate. An online debate is a forum where each user posts an opinion on a particular topic while other users state their positions by posting their remarks within the debate. The supporting or opposing remarks are made by directly replying to the opinion, or indirectly to other remarks (to express local agreement or disagreement), which makes the task of identifying users’ general positions difficult.
  • A prior study has shown that a link-based method, which completely ignores the content of the remarks, can achieve higher accuracy for the identification task than methods based solely on the contents of the remarks. In this paper, it is shown that utilizing the textual content of the remarks into the link-based method can yield higher accuracy in the identification task.

Pang, Bo, and Lillian Lee. “Opinion mining and sentiment analysis.” Foundations and trends in information retrieval 2.1-2 (2008): 1-135. http://bit.ly/UaCBwD

  • This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Its focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. It includes material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

Ranade, Sarvesh et al. “Online debate summarization using topic directed sentiment analysis.” Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining 11 Aug. 2013: 7. http://bit.ly/1nbKtLn

  • Social networking sites provide users a virtual community interaction platform to share their thoughts, life experiences and opinions. Online debate forum is one such platform where people can take a stance and argue in support or opposition of debate topics. An important feature of such forums is that they are dynamic and grow rapidly. In such situations, effective opinion summarization approaches are needed so that readers need not go through the entire debate.
  • This paper aims to summarize online debates by extracting highly topic relevant and sentiment rich sentences. The proposed approach takes into account topic relevant, document relevant and sentiment based features to capture topic opinionated sentences. ROUGE (Recall-Oriented Understudy for Gisting Evaluation, which employ a set of metrics and a software package to compare automatically produced summary or translation against human-produced onces) scores are used to evaluate the system. This system significantly outperforms several baseline systems and show improvement over the state-of-the-art opinion summarization system. The results verify that topic directed sentiment features are most important to generate effective debate summaries.

Schneider, Jodi. “Automated argumentation mining to the rescue? Envisioning argumentation and decision-making support for debates in open online collaboration communities.” http://bit.ly/1mi7ztx

  • Argumentation mining, a relatively new area of discourse analysis, involves automatically identifying and structuring arguments. Following a basic introduction to argumentation, the authors describe a new possible domain for argumentation mining: debates in open online collaboration communities.
  • Based on our experience with manual annotation of arguments in debates, the authors propose argumentation mining as the basis for three kinds of support tools, for authoring more persuasive arguments, finding weaknesses in others’ arguments, and summarizing a debate’s overall conclusions.

The People’s Platform


Book Review by Tim Wu in the New York Times: “Astra Taylor is a documentary filmmaker who has described her work as the “steamed broccoli” in our cultural diet. Her last film, “Examined Life,” depicted philosophers walking around and talking about their ideas. She’s the kind of creative person who was supposed to benefit when the Internet revolution collapsed old media hierarchies. But two decades since that revolution began, she’s not impressed: “We are at risk of starving in the midst of plenty,” Taylor writes. “Free culture, like cheap food, incurs hidden costs.” Instead of serving as the great equalizer, the web has created an abhorrent cultural feudalism. The creative masses connect, create and labor, while Google, Facebook and Amazon collect the cash.
Taylor’s thesis is simply stated. The pre-Internet cultural industry, populated mainly by exploitative conglomerates, was far from perfect, but at least the ancien régime felt some need to cultivate cultural institutions, and to pay for talent at all levels. Along came the web, which swept away hierarchies — as well as paychecks, leaving behind creators of all kinds only the chance to be fleetingly “Internet famous.” And anyhow, she says, the web never really threatened to overthrow the old media’s upper echelons, whether defined as superstars, like Beyoncé, big broadcast television shows or Hollywood studios. Instead, it was the cultural industry’s middle ­classes that have been wiped out and replaced by new cultural plantations ruled over by the West Coast aggregators.
It is hard to know if the title, “The People’s Platform,” is aspirational or sarcastic, since Taylor believes the classless aura of the web masks an unfair power structure. “Open systems can be starkly inegalitarian,” she says, arguing that the web is afflicted by what the feminist scholar Jo Freeman termed a “tyranny of structurelessness.” Because there is supposedly no hierarchy, elites can happily deny their own existence. (“We just run a platform.”) But the effects are real: The web has reduced professional creators to begging for scraps of attention from a spoiled public, and forced creators to be their own brand.

The tech industry might be tempted to dismiss Taylor’s arguments as merely a version of typewriter manufacturers’ complaints circa 1984, but that would be a mistake. “The People’s Platform” should be taken as a challenge by the new media that have long claimed to be improving on the old order. Can they prove they are capable of supporting a sustainable cultural ecosystem, in a way that goes beyond just hosting parties at the Sundance Film ­Festival?
We see some of this in the tech firms that have begun to pay for original content, as with Netflix’s investments in projects like “Orange Is the New Black.” It’s also worth pointing out that the support of culture is actually pretty cheap. Consider the nonprofit ProPublica, which employs investigative journalists, and has already won two Pulitzers, all on a budget of just over $10 million a year. That kind of money is a rounding error for much of Silicon Valley, where losing billions on bad acquisitions is routinely defended as “strategic.” If Google, Apple, Facebook and Amazon truly believe they’re better than the old guard, let’s see it.”
See : THE PEOPLE’S PLATFORM. Taking Back Power and Culture in the Digital Age By Astra Taylor, 276 pp. Metropolitan Books/Henry Holt & Company.

Incentivizing Peer Review


in Wired on “The Last Obstacle for Open Access Science: The Galapagos Islands’ Charles Darwin Foundation runs on an annual operating budget of about $3.5 million. With this money, the center conducts conservation research, enacts species-saving interventions, and provides educational resources about the fragile island ecosystems. As a science-based enterprise whose work would benefit greatly from the latest research findings on ecological management, evolution, and invasive species, there’s one glaring hole in the Foundation’s budget: the $800,000 it would cost per year for subscriptions to leading academic journals.
According to Richard Price, founder and CEO of Academia.edu, this episode is symptomatic of a larger problem. “A lot of research centers” – NGOs, academic institutions in the developing world – “are just out in the cold as far as access to top journals is concerned,” says Price. “Research is being commoditized, and it’s just another aspect of the digital divide between the haves and have-nots.”
 
Academia.edu is a key player in the movement toward open access scientific publishing, with over 11 million participants who have uploaded nearly 3 million scientific papers to the site. It’s easy to understand Price’s frustration with the current model, in which academics donate their time to review articles, pay for the right to publish articles, and pay for access to articles. According to Price, journals charge an average of $4000 per article: $1500 for production costs (reformatting, designing), $1500 to orchestrate peer review (labor costs for hiring editors, administrators), and $1000 of profit.
“If there were no legacy in the scientific publishing industry, and we were looking at the best way to disseminate and view scientific results,” proposes Price, “things would look very different. Our vision is to build a complete replacement for scientific publishing,” one that would allow budget-constrained organizations like the CDF full access to information that directly impacts their work.
But getting to a sustainable new world order requires a thorough overhaul of academic publishing industry. The alternative vision – of “open science” – has two key properties: the uninhibited sharing of research findings, and a new peer review system that incorporates the best of the scientific community’s feedback. Several groups have made progress on the former, but the latter has proven particularly difficult given the current incentive structure. The currency of scientific research is the number of papers you’ve published and their citation counts – the number of times other researchers have referred to your work in their own publications. The emphasis is on creation of new knowledge – a worthy goal, to be sure – but substantial contributions to the quality, packaging, and contextualization of that knowledge in the form of peer review goes largely unrecognized. As a result, researchers view their role as reviewers as a chore, a time-consuming task required to sustain the ecosystem of research dissemination.
“Several experiments in this space have tried to incorporate online comment systems,” explains Price, “and the result is that putting a comment box online and expecting high quality comments to flood in is just unrealistic. My preference is to come up with a system where you’re just as motivated to share your feedback on a paper as you are to share your own findings.” In order to make this lofty aim a reality, reviewers’ contributions would need to be recognized. “You need something more nuanced, and more qualitative,” says Price. “For example, maybe you gather reputation points from your community online.” Translating such metrics into tangible benefits up the food chain – hirings, tenure decisions, awards – is a broader community shift that will no doubt take time.
A more iterative peer review process could allow the community to better police faulty methods by crowdsourcing their evaluation. “90% of scientific studies are not reproducible,” claims Price; a problem that is exacerbated by the strong bias toward positive results. Journals may be unlikely to publish methodological refutations, but a flurry of well-supported comments attached to a paper online could convince the researchers to marshal more convincing evidence. Typically, this sort of feedback cycle takes years….”