New paper by Microsoft Research (Omar Alonso, Catherine C. Marshall, and Marc Najork): “Twitter has evolved into a significant communication nexus, coupling personal and highly contextual utterances with local news, memes, celebrity gossip, headlines, and other microblogging subgenres. If we take Twitter as a large and varied dynamic collection, how can we predict which tweets will be interesting to a broad audience in advance of lagging social indicators of interest such as retweets? The telegraphic form of tweets, coupled with the subjective notion of interestingness, makes it difficult for human judges to agree on which tweets are indeed interesting.
In this paper, we address two questions: Can we develop a reliable strategy that results in high-quality labels for a collection of tweets, and can we use this labeled collection to predict a tweet’s interestingness?
To answer the first question, we performed a series of studies using crowdsourcing to reach a diverse set of workers who served as a proxy for an audience with variable interests and perspectives. This method allowed us to explore different labeling strategies, including varying the judges, the labels they applied, the datasets, and other aspects of the task.
To address the second question, we used crowdsourcing to assemble a set of tweets rated as interesting or not; we scored these tweets using textual and contextual features; and we used these scores as inputs to a binary classifier. We were able to achieve moderate agreement (kappa = 0.52) between the best classifier and the human assessments, a figure which reflects the challenges of the judgment task.”
Defining Open Data
As the open data movement grows, and even more governments and organisations sign up to open data, it becomes ever more important that there is a clear and agreed definition for what “open data” means if we are to realise the full benefits of openness, and avoid the risks of creating incompatibility between projects and splintering the community.
Open can apply to information from any source and about any topic. Anyone can release their data under an open licence for free use by and benefit to the public. Although we may think mostly about government and public sector bodies releasing public information such as budgets or maps, or researchers sharing their results data and publications, any organisation can open information (corporations, universities, NGOs, startups, charities, community groups and individuals).
Read more about different kinds of data in our one page introduction to open data
There is open information in transport, science, products, education, sustainability, maps, legislation, libraries, economics, culture, development, business, design, finance …. So the explanation of what open means applies to all of these information sources and types. Open may also apply both to data – big data and small data – or to content, like images, text and music!
So here we set out clearly what open means, and why this agreed definition is vital for us to collaborate, share and scale as open data and open content grow and reach new communities.
What is Open?
The full Open Definition provides a precise definition of what open data is. There are 2 important elements to openness:
- Legal openness: you must be allowed to get the data legally, to build on it, and to share it. Legal openness is usually provided by applying an appropriate (open) license which allows for free access to and reuse of the data, or by placing data into the public domain.
- Technical openness: there should be no technical barriers to using that data. For example, providing data as printouts on paper (or as tables in PDF documents) makes the information extremely difficult to work with. So the Open Definition has various requirements for “technical openness,” such as requiring that data be machine readable and available in bulk.”…
The role of task difficulty in the effectiveness of collective intelligence
New article by Christian Wagner: “The article presents a framework and empirical investigation to demonstrate the role of task difficulty in the effectiveness of collective intelligence. The research contends that collective intelligence, a form of community engagement to address problem solving tasks, can be superior to individual judgment and choice, but only when the addressed tasks are in a range of appropriate difficulty, which we label the “collective range”. Outside of that difficulty range, collectives will perform about as poorly as individuals for high difficulty tasks, or only marginally better than individuals for low difficulty tasks. An empirical investigation with subjects randomly recruited online supports our conjecture. Our findings qualify prior research on the strength of collective intelligence in general and offer preliminary insights into the mechanisms that enable individuals and collectives to arrive at good solutions. Within the framework of digital ecosystems, the paper argues that collective intelligence has more survival strength than individual intelligence, with highest sustainability for tasks of medium difficulty”
A New Kind of Economy is Born – Social Decision-Makers Beat the "Homo Economicus"
A new paper by Dirk Helbing: “The Internet and Social Media change our way of decision-making. We are no longer the independent decision makers we used to be. Instead, we have become networked minds, social decision-makers, more than ever before. This has several fundamental implications. First of all, our economic theories must change, and second, our economic institutions must be adapted to support the social decision-maker, the “homo socialis”, rather than tailored to the perfect egoist, known as “homo economicus”….
Such developments will eventually create a participatory market society. “Prosumers”, i.e. co-producing consumers, the new “makers” movement, and the sharing economy are some examples illustrating this. Just think of the success of Wikipedia, Open Streetmap or Github. Open Streetmap now provides the most up-to-date maps of the world, thanks to more than 1 million volunteers.
This is just the beginning of a new era, where production and public engagement will more and more happen in a bottom up way through fluid “projects”, where people can contribute as a leaders (“entrepreneurs”) or participants. A new intellectual framework is emerging, and a creative and participatory era is ahead.
The paradigm shift towards participatory bottom-up self-regulation may be bigger than the paradigm shift from a geocentric to a heliocentric worldview. If we build the right institutions for the information society of the 21st century, we will finally be able to mitigate some very old problems of humanity. “Tragedies of the commons” are just one of them. After so many centuries, they are still plaguing us, but this needn’t be.”
Social media analytics for future oriented policy making
New paper by Verena Grubmüller, Katharina Götsch, and Bernhard Krieger: “Research indicates that evidence-based policy making is most successful when public administrators refer to diversified information portfolios. With the rising prominence of social media in the last decade, this paper argues that governments can benefit from integrating this publically available, user-generated data through the technique of social media analytics (SMA). There are already several initiatives set up to predict future policy issues, e.g. for the policy fields of crisis mitigation or migrant integration insights. The authors analyse these endeavours and their potential for providing more efficient and effective public policies. Furthermore, they scrutinise the challenges to governmental SMA usage in particular with regards to legal and ethical aspects. Reflecting the latter, this paper provides forward-looking recommendations on how these technologies can best be used for future policy making in a legally and ethically sound manner.”
Undefined By Data: A Survey of Big Data Definitions
Using Participatory Crowdsourcing in South Africa to Create a Safer Living Environment
The study illustrates how participatory crowdsourcing (specifically humans as sensors) can be used as a Smart City initiative focusing on public safety by illustrating what is required to contribute to the Smart City, and developing a roadmap in the form of a model to assist decision making when selecting an optimal crowdsourcing initiative. Public safety data quality criteria were developed to assess and identify the problems affecting data quality.
This study is guided by design science methodology and applies three driving theories: the Data Information Knowledge Action Result (DIKAR) model, the characteristics of a Smart City, and a credible Data Quality Framework. Four critical success factors were developed to ensure high quality public safety data is collected through participatory crowdsourcing utilising voice technologies.”
Digital Participation – The Case of the Italian 'Dialogue with Citizens'
New paper by Gianluca Sgueo presented at Democracy and Technology – Europe in Tension from the 19th to the 21th Century – Sorbonne Paris, 2013: “This paper focuses on the initiative named “Dialogue With Citizens” that the Italian Government introduced in 2012. The Dialogue was an entirely web-based experiment of participatory democracy aimed at, first, informing citizens through documents and in-depth analysis and, second, designed for answering to their questions and requests. During the year and half of life of the initiative roughly 90.000 people wrote (approximately 5000 messages/month). Additionally, almost 200.000 participated in a number of public online consultations that the government launched in concomitance with the adoption of crucial decisions (i.e. the spending review national program).
From the analysis of this experiment of participatory democracy three questions can be raised. (1) How can a public institution maximize the profits of participation and minimize its costs? (2) How can public administrations manage the (growing) expectations of the citizens once they become accustomed to participation? (3) Is online participatory democracy going to develop further, and why?
In order to fully answer such questions, the paper proceeds as follows: it will initially provide a general overview of online public participation both at the central and the local level. It will then discuss the “Dialogue with Citizens” and a selected number of online public consultations lead by the Italian government in 2012. The conclusions will develop a theoretical framework for reflection on the peculiarities and problems of the web-participation.”
Mobile phone data are a treasure-trove for development
Paul van der Boor and Amy Wesolowski in SciDevNet: “Each of us generates streams of digital information — a digital ‘exhaust trail’ that provides real-time information to guide decisions that affect our lives. For example, Google informs us about traffic by using both its ‘My Location’ feature on mobile phones and third-party databases to aggregate location data. BBVA, one of Spain’s largest banks, analyses transactions such as credit card payments as well as ATM withdrawals to find out when and where peak spending occurs.This type of data harvest is of great value. But, often, there is so much data that its owners lack the know-how to process it and fail to realise its potential value to policymakers.
Meanwhile, many countries, particularly in the developing world, have a dearth of information. In resource-poor nations, the public sector often lives in an analogue world where piles of paper impede operations and policymakers are hindered by uncertainty about their own strengths and capabilities.Nonetheless, mobile phones have quickly pervaded the lives of even the poorest: 75 per cent of the world’s 5.5 billion mobile subscriptions are in emerging markets. These people are also generating digital trails of anything from their movements to mobile phone top-up patterns. It may seem that putting this information to use would take vast analytical capacity. But using relatively simple methods, researchers can analyse existing mobile phone data, especially in poor countries, to improve decision-making.
Think of existing, available data as low-hanging fruit that we — two graduate students — could analyse in less than a month. This is not a test of data-scientist prowess, but more a way of saying that anyone could do it.
There are three areas that should be ‘low-hanging fruit’ in terms of their potential to dramatically improve decision-making in information-poor countries: coupling healthcare data with mobile phone data to predict disease outbreaks; using mobile phone money transactions and top-up data to assess economic growth; and predicting travel patterns after a natural disaster using historical movement patterns from mobile phone data to design robust response programmes.
Another possibility is using call-data records to analyse urban movement to identify traffic congestion points. Nationally, this can be used to prioritise infrastructure projects such as road expansion and bridge building.
The information that these analyses could provide would be lifesaving — not just informative or revenue-increasing, like much of this work currently performed in developed countries.
But some work of high social value is being done. For example, different teams of European and US researchers are trying to estimate the links between mobile phone use and regional economic development. They are using various techniques, such as merging night-time satellite imagery from NASA with mobile phone data to create behavioural fingerprints. They have found that this may be a cost-effective way to understand a country’s economic activity and, potentially, guide government spending.
Another example is given by researchers (including one of this article’s authors) who have analysed call-data records from subscribers in Kenya to understand malaria transmission within the country and design better strategies for its elimination. [1]
In this study, published in Science, the location data of the mobile phones of more than 14 million Kenyan subscribers was combined with national malaria prevalence data. After identifying the sources and sinks of malaria parasites and overlaying these with phone movements, analysis was used to identify likely transmission corridors. UK scientists later used similar methods to create different epidemic scenarios for the Côte d’Ivoire.”
Prizes and Productivity: How Winning the Fields Medal Affects Scientific Output
New NBER working paper by George J. Borjas and Kirk B. Doran: “Knowledge generation is key to economic growth, and scientific prizes are designed to encourage it. But how does winning a prestigious prize affect future output? We compare the productivity of Fields medalists (winners of the top mathematics prize) to that of similarly brilliant contenders. The two groups have similar publication rates until the award year, after which the winners’ productivity declines. The medalists begin to “play the field,” studying unfamiliar topics at the expense of writing papers. It appears that tournaments can have large post-prize effects on the effort allocation of knowledge producers.”