White House Expands Guidance on Promoting Open Data


NextGov: “White House officials have announced expanded technical guidance to help agencies make more data accessible to the public in machine-readable formats.
Following up on President Obama’s May executive order linking the pursuit of open data to economic growth, innovation and government efficiency, two budget and science office spokesmen on Friday published a blog post highlighting new instructions and answers to frequently asked questions.
Nick Sinai, deputy chief technology officer at the Office of Science and Technology Policy, and Dominic Sale, supervisory policy analyst at the Office of Management and Budget, noted that the policy now in place means that all “newly generated government data will be required to be made available in open, machine-readable formats, greatly enhancing their accessibility and usefulness, while ensuring privacy and security.”

Do you want to live in a smart city?


Jane Wakefield from BBC News: “In the future everything in a city, from the electricity grid, to the sewer pipes to roads, buildings and cars will be connected to the network. Buildings will turn off the lights for you, self-driving cars will find you that sought-after parking space, even the rubbish bins will be smart. But how do we get to this smarter future. Who will be monitoring and controlling the sensors that will increasingly be on every building, lamp-post and pipe in the city?…
There is another chapter in the smart city story – and this one is being written by citizens, who are using apps, DIY sensors, smartphones and the web to solve the city problems that matter to them.
Don’t Flush Me is a neat little DIY sensor and app which is single-handedly helping to solve one of New York’s biggest water issues.
Every time there is heavy rain in the city, raw sewage is pumped into the harbour, at a rate of 27 billion gallons each year.
Using an Arduino processor, a sensor which measures water levels in the sewer overflows and a smart phone app, Don’t Flush Me lets people know when it is ‘safe to flush’.
Meanwhile Egg, a community-led sensor network, is alerting people to an often hidden problem in our cities.
Researchers estimate that two million people die each year as a result of air pollution and as cities get more over-crowded, the problem is likely to get worse.
Egg is compiling data about air quality by selling cheap sensor which people put outside their homes where they collect readings of green gases, nitrogen oxide (NO2) and carbon monoxide (CO)….
The reality is that most smart city projects are currently pretty small scale – creating tech hubs or green areas of the city, experimenting with smart electricity grids or introducing electric buses or bike-sharing schemes.”

Collaboration In Biology's Century


Todd Sherer, Chief Executive Officer of The Michael J. Fox Foundation for Parkinson’s Research, in Forbes: “he problem is, we all still work in a system that feeds on secrecy and competition. It’s hard enough work just to dream up win/win collaborative structures; getting them off the ground can feel like pushing a boulder up a hill. Yet there is no doubt that the realities of today’s research environment — everything from the accumulation of big data to the ever-shrinking availability of funds — demand new models for collaboration. Call it “collaboration 2.0.”…I share a few recent examples in the hope of increasing the reach of these initiatives, inspiring others like them, and encouraging frank commentary on how they’re working.
Open-Access Data
The successes of collaborations in the traditional sense, coupled with advanced techniques such as genomic sequencing, have yielded masses of data. Consortia of clinical sites around the world are working together to collect and characterize data and biospecimens through standardized methods, leading to ever-larger pools — more like Great Lakes — of data. Study investigators draw their own conclusions, but there is so much more to discover than any individual lab has the bandwidth for….
Crowdsourcing
A great way to grow engagement with resources you’re willing to share? Ask for it. Collaboration 2.0 casts a wide net. We dipped our toe in the crowdsourcing waters earlier this year with our Parkinson’s Data Challenge, which asked anyone interested to download a set of data that had been collected from PD patients and controls using smart phones. …
Cross-Disciplinary Collaboration 2.0
The more we uncover about the interconnectedness and complexity of the human system, the more proof we are gathering that findings and treatments for one disease may provide invaluable insights for others. We’ve seen some really intriguing crosstalk between the Parkinson’s and Alzheimer’s disease research communities recently…
The results should be: More ideas. More discovery. Better health.”
 
 
 

A collaborative way to get to the heart of 3D printing problems


PSFK: “Because most of us only see the finished product when it comes to 3D printing projects – it’s easy to forget that things can, and do, go wrong when it comes to this miracle technology.
3D printing is constantly evolving, reaching exciting new heights, and touching every industry you can think of – but all this progress has left a trail of mangled plastic, and a devastated machines in it’s wake.
The Art of 3D Print Failure is a Flickr group that aims to document this failure, because after all, mistakes are how we learn, and how we make sure the same thing doesn’t happen the next time around. It can also prevent mistakes from happening to those who are new to 3D printing, before they even make them!”

On our best behaviour


Paper by Hector J. Levesque: “The science of AI is concerned with the study of intelligent forms of behaviour in computational terms. But what does it tell us when a good semblance of a behaviour can be achieved using cheap tricks that seem to have little to do with what we intuitively imagine intelligence to be? Are these intuitions wrong, and is intelligence really just a bag of tricks? Or are the philosophers right, and is a behavioural understanding of intelligence simply too weak? I think both of these are wrong. I suggest in the context of question-answering that what matters when it comes to the science of AI is not a good semblance of intelligent behaviour at all, but the behaviour itself, what it depends on, and how it can be achieved. I go on to discuss two major hurdles that I believe will need to be cleared.”

Crowd-Sourcing the Nation: Now a National Effort


Release from the U.S. Department of the Interior, U.S. Geological Survey: “The mapping crowd-sourcing program, known as The National Map Corps (TNMCorps), encourages citizens to collect structures data by adding new features, removing obsolete points, and correcting existing data for The National Map database. Structures being mapped in the project include schools, hospitals, post offices, police stations and other important public buildings.
Since the start of the project in 2012, more than 780 volunteers have made in excess of 13,000 contributions.  In addition to basic editing, a second volunteer peer review process greatly enhances the quality of data provided back to The National Map.  A few months ago, volunteers in 35 states were actively involved.  This final release of states opens up the entire country for volunteer structures enhancement.
To show appreciation of our volunteer’s efforts, The National Map Corps has instituted a recognition program that awards “virtual” badges to volunteers. The badges consist of a series of antique surveying instruments ranging from the Order of the Surveyor’s Chain (25 – 50 points) to the Theodolite Assemblage (2000+ points). Additionally, volunteers are publically acclaimed (with permission) via Twitter, Facebook and Google+….
Tools on TNMCorps website explain how a volunteer can edit any area, regardless of their familiarity with the selected structures, and becoming a volunteer for TNMCorps is easy; go to The National Map Corps website to learn more and to sign up as a volunteer. If you have access to the Internet and are willing to dedicate some time to editing map data, we hope you will consider participating!”

Five myths about big data


Samuel Arbesman, senior scholar at the Ewing Marion Kauffman Foundation and the author of “The Half-Life of Facts” in the Washington Post: “Big data holds the promise of harnessing huge amounts of information to help us better understand the world. But when talking about big data, there’s a tendency to fall into hyperbole. It is what compels contrarians to write such tweets as “Big Data, n.: the belief that any sufficiently large pile of s— contains a pony.” Let’s deflate the hype.
1. “Big data” has a clear definition.
The term “big data” has been in circulation since at least the 1990s, when it is believed to have originated in Silicon Valley. IBM offers a seemingly simple definition: Big data is characterized by the four V’s of volume, variety, velocity and veracity. But the term is thrown around so often, in so many contexts — science, marketing, politics, sports — that its meaning has become vague and ambiguous….
2. Big data is new.
By many accounts, big data exploded onto the scene quite recently. “If wonks were fashionistas, big data would be this season’s hot new color,” a Reuters report quipped last year. In a May 2011 report, the McKinsey Global Institute declared big data “the next frontier for innovation, competition, and productivity.”
It’s true that today we can mine massive amounts of data — textual, social, scientific and otherwise — using complex algorithms and computer power. But big data has been around for a long time. It’s just that exhaustive datasets were more exhausting to compile and study in the days when “computer” meant a person who performed calculations….
3. Big data is revolutionary.
In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,”Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.
If you want more precise advertising directed toward you, then yes, big data is revolutionary. Generally, though, it’s likely to have a modest and gradual impact on our lives….
4. Bigger data is better.
In science, some admittedly mind-blowing big-data analyses are being done. In business, companies are being told to “embrace big data before your competitors do.” But big data is not automatically better.
Really big datasets can be a mess. Unless researchers and analysts can reduce the number of variables and make the data more manageable, they get quantity without a whole lot of quality. Give me some quality medium data over bad big data any day…
5. Big data means the end of scientific theories.
Chris Anderson argued in a 2008 Wired essay that big data renders the scientific method obsolete: Throw enough data at an advanced machine-learning technique, and all the correlations and relationships will simply jump out. We’ll understand everything.
But you can’t just go fishing for correlations and hope they will explain the world. If you’re not careful, you’ll end up with spurious correlations. Even more important, to contend with the “why” of things, we still need ideas, hypotheses and theories. If you don’t have good questions, your results can be silly and meaningless.
Having more data won’t substitute for thinking hard, recognizing anomalies and exploring deep truths.”

Announcing Project Open Data from Cloudant Labs


Yuriy Dybskiy from Cloudant: “There has been an emerging pattern over the last few years of more and more government datasets becoming available for public access. Earlier this year, the White House announced official policy on such data – Project Open Data.

Available resources

Here are four resources on the topic:

  1. Tim Berners-Lee: Open, Linked Data for a Global Community – [10 min video]
  2. Rufus Pollock: Open Data – How We Got Here and Where We’re Going – [24 min video]
  3. Open Knowledge Foundation Datasets – http://data.okfn.org/data
  4. Max Ogden: Project dat – collaborative data – [github repo]

One of the main challenges is access to the datasets. If only there were a database that had easy access to its data baked right in it.
Luckily, there is CouchDB and Cloudant, which share the same APIs to access data via HTTP. This makes for a really great option to store interesting datasets.

Cloudant Open Data

Today we are happy to announce a Cloudant Labs project – Cloudant Open Data!
Several datasets are available at the moment, for example, businesses_sf – data regarding businesses registered in San Francisco and sf_pd_incidents – a collection of incident reports (criminal and non-criminal) made by the San Francisco Police Department.
We’ll add more, but if you have one you’d like us to add faster – drop us a line at [email protected]
Create an account and play with these datasets yourself”

From Machinery to Mobility: Government and Democracy in a Participative Age


From Machinery to Mobility

New book by Jeffrey Roy: “The Westminster-stylized model of Parliamentary democratic politics and public service accountability is increasingly out of step with the realities of today’s digitally and socially networked era. This book explores the reconfiguration of democratic and managerial governance within democratic societies due to the advent of technological mobility. More specifically, the traditional public sector prism of organizational and accountability – denoted as ‘machinery of government’, is increasingly strained in an era characterized by smart devices, social media, and cloud computing. This book examines the roots and implications of the tensions between machinery and mobility and the sorts of investments and initiatives that have been undertaken by governments around the world as well as their appropriateness and relative impacts. This book also examines the prospects for holistic adaptation of democratic and managerial systems going forward, identifying the most crucial directions and determinants for improving public sector performance in terms of outcomes, accountability, and agility. Accordingly, the ultimate aim of this initiative is to contribute to the formation of intellectual foundations for more systemic reforms of public sector governance in Canada and elsewhere, and to offer forward-looking trajectories for government adaptation in shifting from a traditional prism of ‘machinery’ to new organizational and institutional arrangements better suited for an era of ‘mobility’.”

Defense Against National Vulnerabilities in Public Data


DOD/DARPA Notice (See also Foreign Policy article): “OBJECTIVE: Investigate the national security threat posed by public data available either for purchase or through open sources. Based on principles of data science, develop tools to characterize and assess the nature, persistence, and quality of the data. Develop tools for the rapid anonymization and de-anonymization of data sources. Develop framework and tools to measure the national security impact of public data and to defend against the malicious use of public data against national interests.
DESCRIPTION: The vulnerabilities to individuals from a data compromise are well known and documented now as “identity theft.” These include regular stories published in the news and research journals documenting the loss of personally identifiable information by corporations and governments around the world. Current trends in social media and commerce, with voluntary disclosure of personal information, create other potential vulnerabilities for individuals participating heavily in the digital world. The Netflix Challenge in 2009 was launched with the goal of creating better customer pick prediction algorithms for the movie service [1]. An unintended consequence of the Netflix Challenge was the discovery that it was possible to de-anonymize the entire contest data set with very little additional data. This de-anonymization led to a federal lawsuit and the cancellation of the sequel challenge [2]. The purpose of this topic is to understand the national level vulnerabilities that may be exploited through the use of public data available in the open or for purchase.
Could a modestly funded group deliver nation-state type effects using only public data?…”
The official link for this solicitation is: www.acq.osd.mil/osbp/sbir/solicitations/sbir20133.