Selected Readings on Sentiment Analysis


The Living Library’s Selected Readings series seeks to build a knowledge base on innovative approaches for improving the effectiveness and legitimacy of governance. This curated and annotated collection of recommended works on the topic of sentiment analysis was originally published in 2014.

Sentiment Analysis is a field of Computer Science that uses techniques from natural language processing, computational linguistics, and machine learning to predict subjective meaning from text. The term opinion mining is often used interchangeably with Sentiment Analysis, although it is technically a subfield focusing on the extraction of opinions (the umbrella under which sentiment, evaluation, appraisal, attitude, and emotion all lie).

The rise of Web 2.0 and increased information flow has led to an increase in interest towards Sentiment Analysis — especially as applied to social networks and media. Events causing large spikes in media — such as the 2012 Presidential Election Debates — are especially ripe for analysis. Such analyses raise a variety of implications for the future of crowd participation, elections, and governance.

Selected Reading List (in alphabetical order)

Annotated Selected Reading List (in alphabetical order)

Choi, Eunsol et al. “Hedge detection as a lens on framing in the GMO debates: a position paper.” Proceedings of the Workshop on Extra-Propositional Aspects of Meaning in Computational Linguistics 13 Jul. 2012: 70-79. http://bit.ly/1wweftP

  • Understanding the ways in which participants in public discussions frame their arguments is important for understanding how public opinion is formed. This paper adopts the position that it is time for more computationally-oriented research on problems involving framing. In the interests of furthering that goal, the authors propose the following question: In the controversy regarding the use of genetically-modified organisms (GMOs) in agriculture, do pro- and anti-GMO articles differ in whether they choose to adopt a more “scientific” tone?
  • Prior work on the rhetoric and sociology of science suggests that hedging may distinguish popular-science text from text written by professional scientists for their colleagues. The paper proposes a detailed approach to studying whether hedge detection can be used to understand scientific framing in the GMO debates, and provides corpora to facilitate this study. Some of the preliminary analyses suggest that hedges occur less frequently in scientific discourse than in popular text, a finding that contradicts prior assertions in the literature.

Michael, Christina, Francesca Toni, and Krysia Broda. “Sentiment analysis for debates.” (Unpublished MSc thesis). Department of Computing, Imperial College London (2013). http://bit.ly/Wi86Xv

  • This project aims to expand on existing solutions used for automatic sentiment analysis on text in order to capture support/opposition and agreement/disagreement in debates. In addition, it looks at visualizing the classification results for enhancing the ease of understanding the debates and for showing underlying trends. Finally, it evaluates proposed techniques on an existing debate system for social networking.

Murakami, Akiko, and Rudy Raymond. “Support or oppose?: classifying positions in online debates from reply activities and opinion expressions.” Proceedings of the 23rd International Conference on Computational Linguistics: Posters 23 Aug. 2010: 869-875. https://bit.ly/2Eicfnm

  • In this paper, the authors propose a method for the task of identifying the general positions of users in online debates, i.e., support or oppose the main topic of an online debate, by exploiting local information in their remarks within the debate. An online debate is a forum where each user posts an opinion on a particular topic while other users state their positions by posting their remarks within the debate. The supporting or opposing remarks are made by directly replying to the opinion, or indirectly to other remarks (to express local agreement or disagreement), which makes the task of identifying users’ general positions difficult.
  • A prior study has shown that a link-based method, which completely ignores the content of the remarks, can achieve higher accuracy for the identification task than methods based solely on the contents of the remarks. In this paper, it is shown that utilizing the textual content of the remarks into the link-based method can yield higher accuracy in the identification task.

Pang, Bo, and Lillian Lee. “Opinion mining and sentiment analysis.” Foundations and trends in information retrieval 2.1-2 (2008): 1-135. http://bit.ly/UaCBwD

  • This survey covers techniques and approaches that promise to directly enable opinion-oriented information-seeking systems. Its focus is on methods that seek to address the new challenges raised by sentiment-aware applications, as compared to those that are already present in more traditional fact-based analysis. It includes material on summarization of evaluative text and on broader issues regarding privacy, manipulation, and economic impact that the development of opinion-oriented information-access services gives rise to. To facilitate future work, a discussion of available resources, benchmark datasets, and evaluation campaigns is also provided.

Ranade, Sarvesh et al. “Online debate summarization using topic directed sentiment analysis.” Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining 11 Aug. 2013: 7. http://bit.ly/1nbKtLn

  • Social networking sites provide users a virtual community interaction platform to share their thoughts, life experiences and opinions. Online debate forum is one such platform where people can take a stance and argue in support or opposition of debate topics. An important feature of such forums is that they are dynamic and grow rapidly. In such situations, effective opinion summarization approaches are needed so that readers need not go through the entire debate.
  • This paper aims to summarize online debates by extracting highly topic relevant and sentiment rich sentences. The proposed approach takes into account topic relevant, document relevant and sentiment based features to capture topic opinionated sentences. ROUGE (Recall-Oriented Understudy for Gisting Evaluation, which employ a set of metrics and a software package to compare automatically produced summary or translation against human-produced onces) scores are used to evaluate the system. This system significantly outperforms several baseline systems and show improvement over the state-of-the-art opinion summarization system. The results verify that topic directed sentiment features are most important to generate effective debate summaries.

Schneider, Jodi. “Automated argumentation mining to the rescue? Envisioning argumentation and decision-making support for debates in open online collaboration communities.” http://bit.ly/1mi7ztx

  • Argumentation mining, a relatively new area of discourse analysis, involves automatically identifying and structuring arguments. Following a basic introduction to argumentation, the authors describe a new possible domain for argumentation mining: debates in open online collaboration communities.
  • Based on our experience with manual annotation of arguments in debates, the authors propose argumentation mining as the basis for three kinds of support tools, for authoring more persuasive arguments, finding weaknesses in others’ arguments, and summarizing a debate’s overall conclusions.

What ‘urban physics’ could tell us about how cities work


Ruth Graham at Boston Globe: “What does a city look like? If you’re walking down the street, perhaps it looks like people and storefronts. Viewed from higher up, patterns begin to emerge: A three-dimensional grid of buildings divided by alleys, streets, and sidewalks, nearly flat in some places and scraping the sky in others. Pull back far enough, and the city starts to look like something else entirely: a cluster of molecules.

At least, that’s what it looks like to Franz-Josef Ulm, an engineering professor at the Massachusetts Institute of Technology. Ulm has built a career as an expert on the properties, patterns, and environmental potential of concrete. Taking a coffee break at MIT’s Stata Center late one afternoon, he and a colleague were looking at a large aerial photograph of a city when they had a “eureka” moment: “Hey, doesn’t that look like a molecular structure?”
With colleagues, Ulm began analyzing cities the way you’d analyze a material, looking at factors such as the arrangement of buildings, each building’s center of mass, and how they’re ordered around each other. They concluded that cities could be grouped into categories: Boston’s structure, for example, looks a lot like an “amorphous liquid.” Seattle is another liquid, and so is Los Angeles. Chicago, which was designed on a grid, looks like glass, he says; New York resembles a highly ordered crystal.
So far Ulm and his fellow researchers have presented their work at conferences, but it has not yet been published in a scientific journal. If the analogy does hold up, Ulm hopes it will give planners a new tool to understand a city’s structure, its energy use, and possibly even its resilience to climate change.
Ulm calls his new work “urban physics,” and it places him among a number of scientists now using the tools of physics to analyze the practically infinite amount of data that cities produce in the 21st century, from population density to the number of patents produced to energy bill charges. Physicist Marta González, Ulm’s colleague at MIT, recently used cellphone data to analyze traffic patterns in Boston with unprecedented complexity, for example. In 2012, a theoretical physicist was named founding director of New York University’s Center for Urban Science and Progress, whose research is devoted to “urban informatics”; one of its first projects is helping to create the country’s first “quantified community” on the West Side of Manhattan.
In Ulm’s case, he and his colleagues have used freely available data, including street layouts and building coordinates, to plot the structures of 12 cities and analogize them to existing complex materials. In physics, an “order parameter” is a number between 0 and 1 that describes how atoms are arranged in relationship to other atoms nearby; Ulm applies this idea to city layouts. Boston, he says, has an “order parameter” of .52, equivalent to that of a liquid like water. This means its structure is notably disordered, which may have something to do with how it developed. “Boston has grown organically,” he said. “The city, in the way its buildings are organized today, carries that information from its historical evolution.”…

Generative Emergence: A New Discipline of Organizational, Entrepreneurial, and Social Innovation


New book by Benyamin Lichtenstein: “Culminating more than 30 years of research into evolution, complexity science, organizing and entrepreneurship, this book provides insights to scholars who are increasingly using emergence to explain social phenomena. In addition to providing the first comprehensive definition and framework for understanding emergence, it is the first publication of data from a year-long experimental study of emergence in high-potential ventures—a week-by-week longitudinal analysis of their processes based on over 750 interviews and 1000 hours of on-site observation.  These data, combined with reports from over a dozen other studies, confirm the dynamics of the five phase model in multiple contexts…

Key insights from the book include:

  • Findings which show a major difference between an aspiration that generates a purposive drive for generative emergence, versus a performance-driven crisis that sparks organizational change and transformation.  This difference has important implications for studies of entrepreneurship, innovation, and social change.
  • A definition of emergence based on 100+ years of work in philosophy and philosophy of science, evolutionary studies, sociology, and organization science.
  • The most inclusive review of complexity science published, to help reinvigorate and legitimize those methods in the social sciences.
  • The Dynamic States Model—a new approach for understanding the non-linear growth and development of new ventures.
  • In-depth examinations of more than twenty well-known emergence studies, to reveal their shared dynamics and underlying drivers.
  • Proposals for applying the five-phase model—as a logic of emergence—to social innovation, organizational leadership, and entrepreneurial development.”

Crowd-Sourced Augmented Realities: Social Media and the Power of Digital Representation


Pre-publication version of a chapter by Matthew Zook, Mark Graham and  Andrew Boulton  in S. Mains, J. Cupples, and C. Lukinbeal. Mediated Geographies/Geographies of Media. Springer Science International Handbooks in Human Geography, (Forthcoming): “A key and distinguishing feature of society today is that its increasingly documented by crowd-sourced social media discourse about public experiences. Much of this social media content is geo-referenced and exists in layers of information draped over the physical world, invisible to the naked eye but accessible to range of digital (and often) mobile devices. When we access these information layers, they mediate the mundane practices of everyday life, (e.g., What or who is nearby? How do I move from point A to B) through the creation of augmented realities, i.e., unstable, context dependent representations of places brought temporary into being by combining the space of material and virtual experience.
These augmented realities, as particular representations of locations, places and events, are vigorously promoted or contested and thus become important spots in which power is exercised, much in the same way that maps have long had power to reinforce or challenge the status quo. However, because many of the processes and practices behind the creation of augmented realities are unseen, its power is often overlooked in the process of representation or place-making. This paper highlights the points at which power acts and demonstrate that all representations of place – including augmented realities derived from social media – are products of and productive of, social relationships and associated power relations.”
Building upon a case study of Abbottabad, Pakistan after the raid on Osama bin Laden’s compound we construct a four-part typology of the power relations emerging from social practices that enact augmented realities. These include: Distributed power, the complex and socially/spatially distributed authorship of user-generated geospatial content; Communication power, the ways in which particular representations gain prominence; language is a particularly key variable; Code power, the autonomy of software code to regulate actions, or mediate content, or ordering representations in particular ways; and Timeless power, the ways in which digital representations of place reconfigure temporal relationships, particularly sequence and duration, between people and events.

Business Models That Take Advantage of Open Data Opportunities


Mark Boyd at the Programmeableweb: “At last week’s OKFestival in Berlin, Kat Borlongan and Chloé Bonnet from Parisian open data startup Five By Five moderated an interactive speed-geek session to examine how startups are building viability using open data and open data APIs. The picture that emerged revealed a variety of composite approaches being used, with all those presenting having just one thing in common: a commitment to fostering ecosystems that will allow other startups to build alongside them.
The OKFestival—hosted by the Open Knowledge Foundation—brought together more than 1,000 participants from around the globe working on various aspects of the open data agenda: the use of corporate data, open science research, government open data and crowdsourced data projects.
In a session held on the first day of the event, Borlongan facilitated an interactive workshop to help would-be entrepreneurs understand how startups are building business models that take advantage of open data opportunities to create sustainable, employment-generating businesses.
Citing research from the McKinsey Institute that calculates the value of open data to be worth $3 trillion globally, Borlongan said: “So the understanding of the open data process is usually: We throw open data over the wall, then we hold a hackathon, and then people will start making products off it, and then we make the $3 trillion.”
Borlongan argued that it is actually a “blurry identity to be an open data startup” and encouraged participants to unpack, with each of the startups presenting exactly how income can be generated and a viable business built in this space.
Jeni Tennison, from the U.K.’s Open Data Institute (which supports 15 businesses in its Startup Programme) categorizes two types of business models:

  1. Businesses that publish (but do not sell) open data.
  2. Businesses built on top of using open data.

Businesses That Publish but Do Not Sell Open Data

At the Open Data Institute, Tennison is investigating the possibility of an open address database that would provide street address data for every property in the U.K. She describes three types of business models that could be created by projects that generated and published such data:
Freemium: In this model, the bulk data of open addresses could be made available freely, “but if you want an API service, then you would pay for it.” Tennison pointed to lots of opportunities also to degrade the freemium-level data—for example, having it available in bulk but not at a particularly granular level (unless you pay for it), or by provisioning reuse on a share-only basis, but you would pay if you wanted the data for corporate use cases (similar to how OpenCorporates sells access to its data).
Cross-subsidy: In this approach, the data would be available, and the opportunities to generate income would come from providing extra services, like consultancy or white labeling data services alongside publishing the open data.
Network: In this business model, value is created by generating a network effect around the core business interest, which may not be the open data itself. As an example, Tennison suggested that if a post office or delivery company were to create the open address database, it might be interested in encouraging private citizens to collaboratively maintain or crowdsource the quality of the data. The revenue generated by this open data would then come from reductions in the cost of delivery services as the data improved accuracy.

Businesses Built on Top of Open Data

Six startups working in unique ways to make use of available open data also presented their business models to OKFestival attendees: Development Seed, Mapbox, OpenDataSoft, Enigma.io, Open Bank API, and Snips.

Startup: Development Seed
What it does: Builds solutions for development, public health and citizen democracy challenges by creating open source tools and utilizing open data.
Open data API focus: Regularly uses open data APIs in its projects. For example, it worked with the World Bank to create a data visualization website built on top of the World Bank API.
Type of business model: Consultancy, but it has also created new businesses out of the products developed as part of its work, most notably Mapbox (see below).

Startup: Enigma.io
What it does: Open data platform with advanced discovery and search functions.
Open data API focus: Provides the Enigma API to allow programmatic access to all data sets and some analytics from the Enigma platform.
Type of business model: SaaS including a freemium plan with no degradation of data and with access to API calls; some venture funding; some contracting services to particular enterprises; creating new products in Enigma Labs for potential later sale.

Startup: Mapbox
What it does: Enables users to design and publish maps based on crowdsourced OpenStreetMap data.
Open data API focus: Uses OpenStreetMap APIs to draw data into its map-creation interface; provides the Mapbox API to allow programmatic creation of maps using Mapbox web services.
Type of business model: SaaS including freemium plan; some tailored contracts for big map users such as Foursquare and Evernote.

Startup: Open Bank Project
What it does: Creates an open source API for use by banks.
Open data API focus: Its core product is to build an API so that banks can use a standard, open source API tool when creating applications and web services for their clients.
Type of business model: Contract license with tiered SLAs depending on the number of applications built using the API; IT consultancy projects.

Startup: OpenDataSoft
What it does: Provides an open data publishing platform so that cities, governments, utilities and companies can publish their own data portal for internal and public use.
Open data API focus: It’s able to route data sources into the portal from a publisher’s APIs; provides automatic API-creation tools so that any data set uploaded to the portal is then available as an API.
Type of business model: SaaS model with freemium plan, pricing by number of data sets published and number of API calls made against the data, with free access for academic and civic initiatives.

Startup: Snips
What it does: Predictive modeling for smart cities.
Open data API focus: Channels some open and client proprietary data into its modeling algorithm calculations via API; provides a predictive modeling API for clients’ use to programmatically generate solutions based on their data.
Type of business model: Creating one B2C app product for sale as a revenue-generation product; individual contracts with cities and companies to solve particular pain points, such as using predictive modeling to help a post office company better manage staff rosters (matched to sales needs) and a consultancy project to create a visualization mapping tool that can predict the risk of car accidents for a city….”

The Quiet Movement to Make Government Fail Less Often


in The New York Times: “If you wanted to bestow the grandiose title of “most successful organization in modern history,” you would struggle to find a more obviously worthy nominee than the federal government of the United States.

In its earliest stirrings, it established a lasting and influential democracy. Since then, it has helped defeat totalitarianism (more than once), established the world’s currency of choice, sent men to the moon, built the Internet, nurtured the world’s largest economy, financed medical research that saved millions of lives and welcomed eager immigrants from around the world.

Of course, most Americans don’t think of their government as particularly successful. Only 19 percent say they trust the government to do the right thing most of the time, according to Gallup. Some of this mistrust reflects a healthy skepticism that Americans have always had toward centralized authority. And the disappointing economic growth of recent decades has made Americans less enamored of nearly every national institution.

But much of the mistrust really does reflect the federal government’s frequent failures – and progressives in particular will need to grapple with these failures if they want to persuade Americans to support an active government.

When the federal government is good, it’s very, very good. When it’s bad (or at least deeply inefficient), it’s the norm.

The evidence is abundant. Of the 11 large programs for low- and moderate-income people that have been subject to rigorous, randomized evaluation, only one or two show strong evidence of improving most beneficiaries’ lives. “Less than 1 percent of government spending is backed by even the most basic evidence of cost-effectiveness,” writes Peter Schuck, a Yale law professor, in his new book, “Why Government Fails So Often,” a sweeping history of policy disappointments.

As Mr. Schuck puts it, “the government has largely ignored the ‘moneyball’ revolution in which private-sector decisions are increasingly based on hard data.”

And yet there is some good news in this area, too. The explosion of available data has made evaluating success – in the government and the private sector – easier and less expensive than it used to be. At the same time, a generation of data-savvy policy makers and researchers has entered government and begun pushing it to do better. They have built on earlier efforts by the Bush and Clinton administrations.

The result is a flowering of experiments to figure out what works and what doesn’t.

New York City, Salt Lake City, New York State and Massachusetts have all begun programs to link funding for programs to their success: The more effective they are, the more money they and their backers receive. The programs span child care, job training and juvenile recidivism.

The approach is known as “pay for success,” and it’s likely to spread to Cleveland, Denver and California soon. David Cameron’s conservative government in Britain is also using it. The Obama administration likes the idea, and two House members – Todd Young, an Indiana Republican, and John Delaney, a Maryland Democrat – have introduced a modest bill to pay for a version known as “social impact bonds.”

The White House is also pushing for an expansion of randomized controlled trials to evaluate government programs. Such trials, Mr. Schuck notes, are “the gold standard” for any kind of evaluation. Using science as a model, researchers randomly select some people to enroll in a government program and others not to enroll. The researchers then study the outcomes of the two groups….”

No silver bullet: De-identification still doesn’t work


Arvind Narayanan and Edward W. Felten: “Paul Ohm’s 2009 article Broken Promises of Privacy spurred a debate in legal and policy circles on the appropriate response to computer science research on re-identification techniques. In this debate, the empirical research has often been misunderstood or misrepresented. A new report by Ann Cavoukian and Daniel Castro is full of such inaccuracies, despite its claims of “setting the record straight.” In a response to this piece, Ed Felten and I point out eight of our most serious points of disagreement with Cavoukian and Castro. The thrust of our arguments is that (i) there is no evidence that de-identification works either in theory or in practice and (ii) attempts to quantify its efficacy are unscientific and promote a false sense of security by assuming unrealistic, artificially constrained models of what an adversary might do. Specifically, we argue that:

  1. There is no known effective method to anonymize location data, and no evidence that it’s meaningfully achievable.
  2. Computing re-identification probabilities based on proof-of-concept demonstrations is silly.
  3. Cavoukian and Castro ignore many realistic threats by focusing narrowly on a particular model of re-identification.
  4. Cavoukian and Castro concede that de-identification is inadequate for high-dimensional data. But nowadays most interesting datasets are high-dimensional.
  5. Penetrate-and-patch is not an option.
  6. Computer science knowledge is relevant and highly available.
  7. Cavoukian and Castro apply different standards to big data and re-identification techniques.
  8. Quantification of re-identification probabilities, which permeates Cavoukian and Castro’s arguments, is a fundamentally meaningless exercise.

Data privacy is a hard problem. Data custodians face a choice between roughly three alternatives: sticking with the old habit of de-identification and hoping for the best; turning to emerging technologies like differential privacy that involve some trade-offs in utility and convenience; and using legal agreements to limit the flow and use of sensitive data. These solutions aren’t fully satisfactory, either individually or in combination, nor is any one approach the best in all circumstances. Change is difficult. When faced with the challenge of fostering data science while preventing privacy risks, the urge to preserve the status quo is understandable. However, this is incompatible with the reality of re-identification science. If a “best of both worlds” solution exists, de-identification is certainly not that solution. Instead of looking for a silver bullet, policy makers must confront hard choices.”

Incentivizing Peer Review


in Wired on “The Last Obstacle for Open Access Science: The Galapagos Islands’ Charles Darwin Foundation runs on an annual operating budget of about $3.5 million. With this money, the center conducts conservation research, enacts species-saving interventions, and provides educational resources about the fragile island ecosystems. As a science-based enterprise whose work would benefit greatly from the latest research findings on ecological management, evolution, and invasive species, there’s one glaring hole in the Foundation’s budget: the $800,000 it would cost per year for subscriptions to leading academic journals.
According to Richard Price, founder and CEO of Academia.edu, this episode is symptomatic of a larger problem. “A lot of research centers” – NGOs, academic institutions in the developing world – “are just out in the cold as far as access to top journals is concerned,” says Price. “Research is being commoditized, and it’s just another aspect of the digital divide between the haves and have-nots.”
 
Academia.edu is a key player in the movement toward open access scientific publishing, with over 11 million participants who have uploaded nearly 3 million scientific papers to the site. It’s easy to understand Price’s frustration with the current model, in which academics donate their time to review articles, pay for the right to publish articles, and pay for access to articles. According to Price, journals charge an average of $4000 per article: $1500 for production costs (reformatting, designing), $1500 to orchestrate peer review (labor costs for hiring editors, administrators), and $1000 of profit.
“If there were no legacy in the scientific publishing industry, and we were looking at the best way to disseminate and view scientific results,” proposes Price, “things would look very different. Our vision is to build a complete replacement for scientific publishing,” one that would allow budget-constrained organizations like the CDF full access to information that directly impacts their work.
But getting to a sustainable new world order requires a thorough overhaul of academic publishing industry. The alternative vision – of “open science” – has two key properties: the uninhibited sharing of research findings, and a new peer review system that incorporates the best of the scientific community’s feedback. Several groups have made progress on the former, but the latter has proven particularly difficult given the current incentive structure. The currency of scientific research is the number of papers you’ve published and their citation counts – the number of times other researchers have referred to your work in their own publications. The emphasis is on creation of new knowledge – a worthy goal, to be sure – but substantial contributions to the quality, packaging, and contextualization of that knowledge in the form of peer review goes largely unrecognized. As a result, researchers view their role as reviewers as a chore, a time-consuming task required to sustain the ecosystem of research dissemination.
“Several experiments in this space have tried to incorporate online comment systems,” explains Price, “and the result is that putting a comment box online and expecting high quality comments to flood in is just unrealistic. My preference is to come up with a system where you’re just as motivated to share your feedback on a paper as you are to share your own findings.” In order to make this lofty aim a reality, reviewers’ contributions would need to be recognized. “You need something more nuanced, and more qualitative,” says Price. “For example, maybe you gather reputation points from your community online.” Translating such metrics into tangible benefits up the food chain – hirings, tenure decisions, awards – is a broader community shift that will no doubt take time.
A more iterative peer review process could allow the community to better police faulty methods by crowdsourcing their evaluation. “90% of scientific studies are not reproducible,” claims Price; a problem that is exacerbated by the strong bias toward positive results. Journals may be unlikely to publish methodological refutations, but a flurry of well-supported comments attached to a paper online could convince the researchers to marshal more convincing evidence. Typically, this sort of feedback cycle takes years….”

Do We Choose Our Friends Because They Share Our Genes?


Rob Stein at NPR: “People often talk about how their friends feel like family. Well, there’s some new research out that suggests there’s more to that than just a feeling. People appear to be more like their friends genetically than they are to strangers, the research found.
“The striking thing here is that friends are actually significantly more similar to one another than we were expecting,” says  James Fowler, a professor of medical genetics at the University of California, San Diego, who conducted the study with Nicholas A. Christakis, a social scientist at Yale University.
In fact, the study in Monday’s issue of the Proceedings of the National Academy of Sciences found that friends are as genetically similar as fourth cousins.
“It’s as if they shared a great- great- great-grandparent in common,” Fowler told Shots.
Some of the genes that friends were most likely to have in common involve smell. “We tend to smell things the same way that our friends do,” Fowler says. The study involved nearly 2,000 adults.
This suggests that as humans evolved, the ability to tolerate and be drawn to certain smells may have influenced where people hung out. Today we might call this the Starbucks effect.
“You may really love the smell of coffee. And you’re drawn to a place where other people have been drawn to who also love the smell of coffee,” Fowler says. “And so that might be the opportunity space for you to make friends. You’re all there together because you love coffee and you make friends because you all love coffee.”…”

The open data imperative


Paper by Geoffrey Boulton in Insights: the UKSG journal: “The information revolution of recent decades is a world historical event that is changing the lives of individuals, societies and economies and with major implications for science, research and learning. It offers profound opportunities to explore phenomena that were hitherto beyond our power to resolve, and at the same time is undermining the process whereby concurrent publication of scientific concept and evidence (data) permitted scrutiny, replication and refutation and that has been the bedrock of scientific progress and of ‘self-correction’ since the inception of the first scientific journals in the 17th century. Open publication, release and sharing of data are vital habits that need to be redefined and redeveloped for the modern age by the research community if it is to exploit technological opportunities, maintain self-correction and maximize the contribution of research to human understanding and welfare.”