New portal to crowdsource captions, transcripts of old photos, national archives


Irene Tham at The Straits Times: “Wanted: history enthusiasts to caption old photographs and transcribe handwritten manuscripts that contain a piece of Singapore’s history.

They are invited to contribute to an upcoming portal that will carry some 3,000 unidentified photographs dating back to the late 1800s, and 3,000 pages of Straits Settlement records including letters written during Sir Stamford Raffles’ administration of Singapore.

These are collections from the Government and individuals waiting to be “tagged” on the new portal – The Citizen Archivist Project at www.nas.gov.sg/citizenarchivist….

Without tagging – such as by photo captioning and digital transcription – these records cannot be searched. There are over 140,000 photos and about one million pages of Straits Settlements Records in total that cannot be searched today.

These records date back to the 1800s, and include letters written during Sir Stamford Raffles’ administration in Singapore.

“The key challenge is that they were written in elaborate cursive penmanship which is not machine-readable,” said Dr Yaacob, adding that the knowledge and wisdom of the public can be tapped on to make these documents more accessible.

Mr Arthur Fong (West Coast GRC) had asked how the Government could get young people interested in history, and Dr Yaacob said this initiative was something they would enjoy.

Portal users must first log in using their existing Facebook, Google or National Library Board accounts. Contributions will be saved in users’ profiles, automatically created upon signing in.

Transcript contributions on the portal work in similar ways to Wikipedia; contributed text will be uploaded immediately on the portal.

However, the National Archives will take up to three days to review photo caption contributions. Approved captions will be uploaded on its website at www.nas.gov.sg/archivesonline….(More)”

On the importance of being negative


Stephen Curry in The Guardian: “The latest paper from my group, published just over a week ago in the open access journal PeerJ, reports an unusual result. It was not the result we were looking for because it was negative: our experiment failed.

Nevertheless I am pleased with the paper – negative results matter. Their value lies in mapping out blind alleys, warning other investigators not to waste their time or at least to tread carefully. The only trouble is, it can be hard to get them published.

The scientific literature has long been skewed by a preponderance of positive results, largely because journals are keen to nurture their reputations for publishing significant, exciting research – new discoveries that change the way we think about the world. They have tended to look askance at manuscripts reporting beautiful hypotheses undone by the ugly fact of experimental failure. Scientific reporting inverts the traditional values of news media: good news sells. This tendency is reinforced within academic culture because our reward mechanisms are so strongly geared to publication in the most prestigious journals. In the worst cases it can foster fraudulent or sloppy practices by scientists and journals. A complete record of reporting positive and negative results is at the heart of the AllTrials campaign to challenge the distortion of clinical trials for commercial gain….

Normally that would have been that. Our data would have sat on the computer hard-drive till the machine decayed to obsolescence and was thrown out. But now it’s easier to publish negative results, so we did. The change has come about because of the rise of online publishing through open access, which aims to make research freely available on the internet.

The most significant change is the emergence of new titles from nimble-footed publishers aiming to leverage the reduced costs of publishing digitally rather than on paper. They have created open access journals that judge research only on its originality and competency; in contrast to more traditional outlets, no attempt is made to pre-judge significance. These journals include titles such as PLOS ONE (the originator of the concept), F1000 Research, ScienceOpen, and Scientific Reports, as well as new pre-print servers, such as PeerJ Preprints or bioaRXiv, which are seeking to emulate the success of the ArXiv that has long served physics and maths researchers.

As far as I know, these outlets were not designed specifically for negative results but the shift in the review criteria – and their lower costs – have opened up new opportunities and negative results are now creeping out of the laboratory in greater numbers. PLOS ONE has recently started to highlight collections of papers reporting negative findings; Elsevier, one of the more established publishers, has evidently sensed an opportunity and just launched a new journal dedicated to negative results in the plant sciences….(More)”

How Open Is University Data?


Daniel Castro  at GovTech: “Many states now support open data, or data that’s made freely available without restriction in a nonproprietary, machine-readable format, to increase government transparency, improve public accountability and participation, and unlock opportunities for civic innovation. To date, 10 states have adopted open data policies, via executive order or legislation, and 24 states have built open data portals. But while many agencies have joined the open data movement, state colleges and universities have largely ignored this opportunity. To remedy this, policymakers should consider how to extend open data policies to state colleges and universities.

There are many potential benefits of open data for higher education. First, it can help prospective students and their parents better understand the value of different degree programs. One way to control rising higher ed costs is to create more informed consumers. The feds are already pushing for such changes. President Obama and Education Secretary Arne Duncan called for schools to make more information publicly available about the costs of obtaining a college degree, and the White House launched the College Scorecard, an online tool to compare data about the average tuition cost, size of loan payments and loan default rate for different schools.

But students deserve more detailed information. Prospective students should be able to decide where to attend and what to study based on historical data like program costs, percentage of students completing the program and how long they take to do so, and what kind of earning power they have after graduating.

Second, open data can aid better fiscal oversight and accountability of university operations. In 2014, states provided about $76 billion in support for higher ed, yet few colleges and universities have adopted open data policies to increase the transparency of their budgets. Contrast this with California cities like Oakland, Palo Alto and Los Angeles, which created online tools to let others explore and visualize their budgets. Additional oversight, including from the public, could help reduce fraud, waste and abuse in higher education, save taxpayers money and create more opportunities for public participation in state budgeting.

Third, open data can be a valuable resource for producing innovations that make universities a better place to work and study. Large campuses are basically small cities, and many cities have found open data useful for improving public safety and optimizing transportation services. Universities hold much untapped data: course catalogs, syllabi, bus schedules, campus menus, campus directories, faculty evaluations, etc. Creating portals to release these data sets and building application programming interfaces to access this information would give developers direct access to data that students, faculty, alumni and other stakeholders could use to build apps and services to improve the college experience….(More)”

Tweets Can Predict Health Insurance Exchange Enrollment


PennMedicine: “An increase in Twitter sentiment (the positivity or negativity of tweets) is associated with an increase in state-level enrollment in the Affordable Care Act’s (ACA) health insurance marketplaces — a phenomenon that points to use of the social media platform as a real-time gauge of public opinion and provides a way for marketplaces to quickly identify enrollment changes and emerging issues. Although Twitter has been previously used to measure public perception on a range of health topics, this study, led by researchers at the Perelman School of Medicine at the University of Pennsylvania and published online in the Journal of Medical Internet Research, is the first to look at its relationship with the new national health insurance marketplace enrollment.

The study examined 977,303 ACA and “Obamacare”-related tweets — along with those directed toward the Twitter handle for HealthCare.gov and the 17 state-based marketplace Twitter accounts — in March 2014, then tested a correlation of Twitter sentiment with marketplace enrollment by state. Tweet sentiment was determined using the National Research Council (NRC) sentiment lexicon, which contains more than 54,000 words with corresponding sentiment weights ranging from positive to negative. For example, the word “excellent” has a positive sentiment weight, and is more positive than the word “good,” but the word “awful” is negative. Using this lexicon, researchers found that a .10 increase in the sentiment of tweets was associated with a nine percent increase in health insurance marketplace enrollment at the state level. While a .10 increase may seem small, these numbers indicate a significant correlation between Twitter sentiment and enrollment based on a continuum of sentiment scores that were examined over a million tweets.

“The correlation between Twitter sentiment and the number of eligible individuals who enrolled in a marketplace plan highlights the potential for Twitter to be a real-time monitoring strategy for future enrollment periods,” said first author Charlene A. Wong, MD, a Robert Wood Johnson Foundation Clinical Scholar and Fellow in Penn’s Leonard Davis Institute of Health Economics. “This would be especially valuable for quickly identifying emerging issues and making adjustments, instead of having to wait weeks or months for that information to be released in enrollment reports, for example.”…(More)”

Encyclopedia of Social Network Analysis and Mining


“The Encyclopedia of Social Network Analysis and Mining (ESNAM) is the first major reference work to integrate fundamental concepts and research directions in the areas of social networks and  applications to data mining. While ESNAM  reflects the state-of-the-art in  social network research, the field  had its start in the 1930s when fundamental issues in social network research were broadly defined. These communities were limited to relatively small numbers of nodes (actors) and links. More recently the advent of electronic communication, and in particular on-line communities, have created social networks of hitherto unimaginable sizes. People around the world are directly or indirectly connected by popular social networks established using web-based platforms rather than by physical proximity.

Reflecting the interdisciplinary nature of this unique field, the essential contributions of diverse disciplines, from computer science, mathematics, and statistics to sociology and behavioral science, are described among the 300 authoritative yet highly readable entries. Students will find a world of information and insight behind the familiar façade of the social networks in which they participate. Researchers and practitioners will benefit from a comprehensive perspective on the methodologies for analysis of constructed networks, and the data mining and machine learning techniques that have proved attractive for sophisticated knowledge discovery in complex applications. Also addressed is the application of social network methodologies to other domains, such as web networks and biological networks….(More)”

Philadelphia’s Newly Upgraded Open Data Portal


Michael Grass at Government Executive: “If you’re looking for streets where vending is prohibited in the city of Philadelphia, the city’s newly upgraded open data portal has that information. If you’re looking for information on reported bicycle thefts, the city’s open data portal has that information, too. Same goes for the city’s budget.

Philadelphia’s recently relaunched open data portal, Open Data Philly, has 264 data sets, applications and APIs available for the public to access and use. Much of that information comes from municipal sources.

“The redesign of OpenDataPhilly will increase access to available data, thereby enabling our citizens to become more engaged and knowledgeable and our government more accountable,” Mayor Michael Nutter said in a statement last month.

But Philadelphia’s open data portal isn’t just designed to unlock datasets at City Hall.

The city’s universities, cultural and non-profit organizations and commercial entities are part of the portal as well. Portal users interested in historic maps of the city can access the Philadelphia GeoHistory Network, a project of Philadelphia’s Athenaeum Museum, which maintains a tool where layers of historic maps can overlaid on an interactive Google map.

You can even find a list of current happy hour specials, courtesy of DrinkPhilly….(More)”

Citizens Connect


Harvard Business School Case Study by Mitchell Weiss: “Funding to scale Citizens Connect, Boston’s 311 app, is both a blessing and a burden and tests two public entrepreneurs. In 2012, the Commonwealth of Massachusetts provides Boston’s Mayor’s Office of New Urban Mechanics with a grant to scale Citizens Connect across the state. The money gives two co-creators of Citizens Connect, Chris Osgood and Nigel Jacob, a chance to grow their vision for citizen-engaged governance and civic innovation, but it also requires that the two City of Boston leaders sit on a formal selection committee that pits their original partner, Connected Bits, against another player that might meet the specific requirements for delivering a statewide version. The selection and scaling process raise questions beyond just which partner to choose. What would happen to the Citizens Connect brand as Osgood and Jacob’s product spreads across the state? Who could help scale their work best then nationally? Which business models were best positioned to drive that growth? What intellectual property arrangements would best enable it? And what role should the two city employees have, anyway, in scaling Citizens Connect outside of Boston in the first place? These questions hung in the air as they pondered the one big one about passing over Connected Bits for another partner: should they?…(More)”

Collective Intelligence or Group Think?


Paper analyzing “Engaging Participation Patterns in World without Oil” by Nassim JafariNaimi and Eric M. Meyers: “This article presents an analysis of participation patterns in an Alternate Reality Game, World Without Oil. This game aims to bring people together in an online environment to reflect on how an oil crisis might affect their lives and communities as a way to both counter such a crisis and to build collective intelligence about responding to it. We present a series of participation profiles based on a quantitative analysis of 1554 contributions to the game narrative made by 322 players. We further qualitatively analyze a sample of these contributions. We outline the dominant themes, the majority of which engage the global oil crisis for its effects on commute options and present micro-sustainability solutions in response. We further draw on the quantitative and qualitative analysis of this space to discuss how the design of the game, specifically its framing of the problem, feedback mechanism, and absence of subject-matter expertise, counter its aim of generating collective intelligence, making it conducive to groupthink….(More)”

Wittgenstein, #TheDress and Google’s search for a bigger truth


Robert Shrimsley at the Financial Times: “As the world burnt with a BuzzFeed-prompted debate over whether a dress was black and blue or white and gold, the BBC published a short article posing the question everyone was surely asking: “What would Wittgenstein say about that dress?

Wittgenstein died in 1951, so we cannot know if the philosopher of language, truth and context would have been a devotee of BuzzFeed. (I guess it depends on whether we are talking of the early or the late Ludwig. The early Wittgenstein, it is well known, was something of an enthusiast for LOLs, whereas the later was more into WTFs and OMGs.)

The dress will now join the pantheon of web phenomena such as “Diet Coke and Mentos” and “Charlie bit my finger”. But this trivial debate on perceived truth captured in miniature a wider issue for the web: how to distil fact from noise when opinion drowns out information and value is determined by popularity.

At about the same time as the dress was turning the air blue — or was it white? — the New Scientist published a report on how one web giant might tackle this problem, a development in which Wittgenstein might have been very interested. The magazine reported on a Google research paper about how the company might reorder its search rankings to promote sites that could be trusted to tell the truth. (Google produces many such papers a year so this is a long way short of official policy.) It posits a formula for finding and promoting sites with a record of reliability.

This raises an interesting question over how troubled we should be by the notion that a private company with its own commercial interests and a huge concentration of power could be the arbiter of truth. There is no current reason to see sinister motives in Google’s search for a better web: it is both honourable and good business. But one might ask how, for example, Google Truth might determine established truths on net neutrality….

The paper suggests using fidelity to proved facts as a proxy for trust. This is easiest with single facts, such as a date or place of birth. For example, it suggests claiming Barack Obama was born in Kenya would push a site down the rankings. This would be good for politics but facts are not always neutral. Google would risk being depicted as part of “the mainstream media”. Fox Search here we come….(More)”

Models and Patterns of Trust


Paper presented by Bran Knowles et al at the 18th ACM Conference on Computer Supported Cooperative Work & Social Computing: “As in all collaborative work, trust is a vital ingredient of successful computer supported cooperative work, yet there is little in the way of design principles to help practitioners develop systems that foster trust. To address this gap, we present a set of design patterns, based on our experience designing systems with the explicit intention of increasing trust between stakeholders. We contextualize these patterns by describing our own learning process, from the development, testing and refinement of a trust model, to our realization that the insights we gained along the way were most usefully expressed through design patterns. In addition to a set of patterns for trust, this paper seeks to demonstrate of the value of patterns as a means of communicating the nuances revealed through ethnographic investigation….(More)