Amid Open Data Push, Agencies Feel Urge for Analytics


Jack Moore at NextGov: “Federal agencies, thanks to their unique missions, have long been collectors of valuable, vital and, no doubt, arcane data. Under a nearly two-year-old executive order from President Barack Obama, agencies are releasing more of this data in machine-readable formats to the public and entrepreneurs than ever before.
But agencies still need a little help parsing through this data for their own purposes. They are turning to industry, academia and outside researchers for cutting-edge analytics tools to parse through their data to derive insights and to use those insights to drive decision-making.
Take the U.S. Agency for International Development, for example. The agency administers U.S. foreign aid programs aimed at ending extreme poverty and helping support democratic societies around the globe.
Under the agency’s own recent open data policy, it’s started collecting reams of data from its overseas missions. Starting Oct. 1, organizations doing development work on the ground – including through grants and contracts – have been directed to also collect data generated by their work and submit it to back to agency headquarters. Teams go through the data, scrub it to remove sensitive material and then publish it.
The data spans the gamut from information on land ownership in South Sudan to livestock demographics in Senegal and HIV prevention activities in Zambia….The agency took the first step in solving that problem with a Jan. 20 request for information from outside groups for cutting-edge data analytics tools.
“Operating units within USAID are sometimes constrained by existing capacity to transform data into insights that could inform development programming,” the RFI stated.
The RFI queries industry on their capabilities in data mining and social media analytics and forecasting and systems modeling.
USAID is far from alone in its quest for data-driven decision-making.
A Jan. 26 RFI from the Transportation Department’s Federal Highway Administration also seeks innovative ideas from industry for “advanced analytical capabilities.”…(More)”

One State Wants To Let You Carry Your Driver’s License On Your Phone


at Singularity Hub: “There’s now a technology to replace almost everything in your wallet. Your cash, credit cards, and loyalty programs are all on their way to becoming obsolete. Money can now be sent via app, text, e-mail — it can even be sent via Snapchat. But you can’t leave your wallet home just yet. That’s because there is one item that remains largely unchanged: your driver’s license.

If the Iowa Department of Motor Vehicles has its way, that may no longer be the case. According to an article in the Des Moines Register, the agency is in the early stages of developing mobile software for just this purpose. The app would store a resident’s personal information, whatever is already on the physical licenses, and also include a scannable bar code. The plans are for the app to include a two-step verification process including some type of biometric or pin code. At this time, it appears that specific implementation details are still being worked out.

The governments of the United Kingdom and United Arab Emirates had both previously announced their own attempts to experiment with the concept. It’s becoming increasingly common to see mobile versions of other documents. Over 30 states now allow motorists to show electronic proof of insurance. It only follows that the driver’s license would be next. But the considerations around that document are different — it is perhaps the most regulated and important document that a person carries….(More)”

'From Atoms to Bits': A Visual History of American Ideas


in The Atlantic: “A new paper employs a simple technique—counting words in patent texts—to trace the history of American invention, from chemistry to computers….in a new paper, Mikko Packalen at the University of Waterloo and Jay Bhattacharya of Stanford University, devised a brilliant way to address this question empirically. In short, they counted words in patent texts.

In a series of papers studying the history of American innovation, Packalen and Bhattacharya indexed every one-word, two-word, and three-word phrase that appeared in more than 4 million patent texts in the last 175 years. To focus their search on truly new concepts, they recorded the year those phrases first appeared in a patent. Finally, they ranked each concept’s popularity based on how many times it reappeared in later patents. Essentially, they trawled the billion-word literature of patents to document the birth-year and the lifespan of American concepts, from “plastic” to “world wide web” and “instant messaging.”

Here are the 20 most popular sequences of words in each decade from the 1840s to the 2000s. You can see polymerase chain reactions in the middle of the 1980s stack. Since the timeline, as it appears in the paper, is too wide to be visible on this article page, I’ve chopped it up and inserted the color code both above and below the timeline….

Another theme of Packalen and Bhattacharya’s research is that innovation has become more collaborative. Indeed, computers have not only taken over the world of inventions, but also they have changed the geography of innovation, Bhattacharya said. Larger cities have historically held an innovative advantage, because (the theory goes) their density of smarties speeds up debate on the merits of new ideas, which are often born raw and poorly understood. But the researchers found that in the last few decades, larger cities are no more likely to produce new ideas in patents than smaller cities that can just as easily connect online with their co-authors. “Perhaps due to the Internet, the advantage of larger cities appears to be eroding,” Packalen wrote in an email….(More)”

Ad hoc encounters with big data: Engaging citizens in conversations around tabletops


Morten Fjeld, Paweł Woźniak, Josh Cowls, Bonnie Nardi at FirstMonday: “The increasing abundance of data creates new opportunities for communities of interest and communities of practice. We believe that interactive tabletops will allow users to explore data in familiar places such as living rooms, cafés, and public spaces. We propose informal, mobile possibilities for future generations of flexible and portable tabletops. In this paper, we build upon current advances in sensing and in organic user interfaces to propose how tabletops in the future could encourage collaboration and engage users in socially relevant data-oriented activities. Our work focuses on the socio-technical challenges of future democratic deliberation. As part of our vision, we suggest switching from fixed to mobile tabletops and provide two examples of hypothetical interface types: TableTiles and Moldable Displays. We consider how tabletops could foster future civic communities, expanding modes of participation originating in the Greek Agora and in European notions of cafés as locales of political deliberation….(More)”

Citizen Science: Catch, Click and Submit Contest


Wilson Commons Lab: “The inaugural Catch, Click and Submit Contest begins on Feb 21st in honor of the National Invasive Species Awareness Week running Feb 22nd through the 28th. The contest, which calls on anglers to photograph and report non-native fish species caught during the derby, will award prizes to various categories such as “Most Unusual Catch” and “Most Species”.  Submissions from the contest will aid researchers in developing a better understanding of the distribution of fish species throughout Florida waterways.
By engaging the existing angler community, the contest hopes to increase public awareness of the potential impacts that arise from non-native fish species. “The Catch, Click and Submit Contest offers anglers the opportunity to assist natural resource managers in finding nonnative species by doing what they enjoy – fishing!” said biologist Kelly Gestring. “The early detection of a new, nonnative species could provide a better opportunity to control or even eradicate a population.” The hope is that participants will choose to target non-native fish for consumption in the future, helping to control invasive populations…(More).”

Fifty Shades of Manipulation


New paper by Cass Sunstein: “A statement or action can be said to be manipulative if it does not sufficiently engage or appeal to people’s capacity for reflective and deliberative choice. One problem with manipulation, thus understood, is that it fails to respect people’s autonomy and is an affront to their dignity. Another problem is that if they are products of manipulation, people’s choices might fail to promote their own welfare, and might instead promote the welfare of the manipulator. To that extent, the central objection to manipulation is rooted in a version of Mill’s Harm Principle: People know what is in their best interests and should have a (manipulation-free) opportunity to make that decision. On welfarist grounds, the norm against manipulation can be seen as a kind of heuristic, one that generally works well, but that can also lead to serious errors, at least when the manipulator is both informed and genuinely interested in the welfare of the chooser.
For the legal system, a pervasive puzzle is why manipulation is rarely policed. The simplest answer is that manipulation has so many shades, and in a social order that values free markets and is committed to freedom of expression, it is exceptionally difficult to regulate manipulation as such. But as the manipulator’s motives become more self-interested or venal, and as efforts to bypass people’s deliberative capacities becomes more successful, the ethical objections to manipulation become very forceful, and the argument for a legal response is fortified. The analysis of manipulation bears on emerging first amendment issues raised by compelled speech, especially in the context of graphic health warnings. Importantly, it can also help orient the regulation of financial products, where manipulation of consumer choices is an evident but rarely explicit concern….(More)”.

Medical Wikis Dedicated to Clinical Practice: A Systematic Review


New paper by Alexandre Brulet et al:  “Wikis may give clinician communities the opportunity to build knowledge relevant to their practice. The only previous study reviewing a set of health-related wikis, without specification of purpose or audience, globally showed a poor reliability…. Our aim was to review medical wiki websites dedicated to clinical practices…..Among 25 wikis included, 11 aimed at building an encyclopedia, five a textbook, three lessons, two oncology protocols, one a single article, and three at reporting clinical cases. Sixteen wikis were specialized with specific themes or disciplines. Fifteen wikis were using MediaWiki software as-is, three were hosted by online wiki farms, and seven were purpose-built. Except for one MediaWiki-based site, only purpose-built platforms managed detailed user disclosures. ….The 25 medical wikis we studied present various limitations in their format, management, and collaborative features. Professional medical wikis may be improved by using clinical cases, developing more detailed transparency and editorial policies, and involving postgraduate and continuing medical education learners….(More)”

Crowdsourcing Dilemma


New paper by Victor Naroditskiy, Nicholas R. Jennings, Pascal Van Hentenryck, Manuel Cebrian: “Crowdsourcing offers unprecedented potential for solving tasks efficiently by tapping into the skills of large groups of people. A salient feature of crowdsourcing—its openness of entry—makes it vulnerable to malicious behavior. Such behavior took place in a number of recent popular crowdsourcing competitions. We provide game-theoretic analysis of a fundamental tradeoff between the potential for increased productivity and the possibility of being set back by malicious behavior. Our results show that in crowdsourcing competitions malicious behavior is the norm, not the anomaly—a result contrary to the conventional wisdom in the area. Counterintuitively, making the attacks more costly does not deter them but leads to a less desirable outcome. These findings have cautionary implications for the design of crowdsourcing competitions…(More)”

Dataset Inventorying Tool


at US Open Data: “Today we’re releasing Let Me Get That Data For You (LMGTDFY), a free, open source tool that quickly and automatically creates a machine-readable inventory of all the data files found on a given website.
When government agencies create an open data repository, they need to start by inventorying the data that the agency is already publishing on their website. This is a laborious process. It means searching their own site with a query like this:

site:example.gov filetype:csv OR filetype:xls OR filetype:json

Then they have to read through all of the results, download all of the files, and create a spreadsheet that they can load into their repository. It’s a lot of work, and as a result it too often goes undone, resulting in a data repository that doesn’t actually contain all of that government‘s data.
Realizing that this was a common problem, we hired Silicon Valley Software Group to create a tool to automate the inventorying process. We worked with Dan Schultz and Ted Han, who created a system built on Django and Celery, using Microsoft’s great Bing Search API as its data source. The result is a free, installable tool, which produces a CSV file that lists all CSV, XML, JSON, XLS, XLSX, XML, and Shapefiles found on a given domain name.
We use this tool to power our new Let Me Get That Data For You website. We’re trying to keep our site within Bing’s free usage tier, so we’re limiting results to 300 datasets per site….(More)”

Opinion Mining in Social Big Data


New Paper by Wlodarczak, Peter and Ally, Mustafa and Soar, Jeffrey: “Opinion mining has rapidly gained importance due to the unprecedented amount of opinionated data on the Internet. People share their opinions on products, services, they rate movies, restaurants or vacation destinations. Social Media such as Facebook or Twitter has made it easier than ever for users to share their views and make it accessible for anybody on the Web. The economic potential has been recognized by companies who want to improve their products and services, detect new trends and business opportunities or find out how effective their online marketing efforts are. However, opinion mining using social media faces many challenges due to the amount and the heterogeneity of the available data. Also, spam or fake opinions have become a serious issue. There are also language related challenges like the usage of slang and jargon on social media or special characters like smileys that are widely adopted on social media sites.
These challenges create many interesting research problems such as determining the influence of social media on people’s actions, understanding opinion dissemination or determining the online reputation of a company. Not surprisingly opinion mining using social media has become a very active area of research, and a lot of progress has been made over the last years. This article describes the current state of research and the technologies that have been used in recent studies….(More)”