Eric Mill at the Sunlight Foundation blog: “Last year, a group of us who work daily with open government data — Josh Tauberer of GovTrack.us, Derek Willis at The New York Times, and myself — decided to stop each building the same basic tools over and over, and start building a foundation we could share.
We set up a small home at github.com/unitedstates, and kicked it off with a couple of projects to gather data on the people and work of Congress. Using a mix of automation and curation, they gather basic information from all over the government — THOMAS.gov, the House and Senate, the Congressional Bioguide, GPO’s FDSys, and others — that everyone needs to report, analyze, or build nearly anything to do with Congress.
Once we centralized this work and started maintaining it publicly, we began getting contributions nearly immediately. People educated us on identifiers, fixed typos, and gathered new data. Chris Wilson built an impressive interactive visualization of the Senate’s budget amendments by extending our collector to find and link the text of amendments.
This is an unusual, and occasionally chaotic, model for an open data project. github.com/unitedstates is a neutral space; GitHub’s permissions system allows many of us to share the keys, so no one person or institution controls it. What this means is that while we all benefit from each other’s work, no one is dependent or “downstream” from anyone else. It’s a shared commons in the public domain.
There are a few principles that have helped make the unitedstates project something that’s worth our time:…”
Is Connectivity A Human Right?
“Mark Zuckerberg (Facebook): For almost ten years, Facebook has been on a mission to make the world more open and connected. Today we connect more than 1.15 billion people each month, but as we started thinking about connecting the next 5 billion, we realized something important: the vast majority of people in the world don’t have access to the internet.
Today, only 2.7 billion people are online — a little more than one third of the world. That is growing by less than 9% each year, but that’s slow considering how early we are in the internet’s development. Even though projections show most people will get smartphones in the next decade, most people still won’t have data access because the cost of data remains much more expensive than the price of a smartphone.
Below, I’ll share a rough proposal for how we can connect the next 5 billion people, and a rough plan to work together as an industry to get there. We’ll discuss how we can make internet access more affordable by making it more efficient to deliver data, how we can use less data by improving the efficiency of the apps we build and how we can help businesses drive internet access by developing a new model to get people online.
I call this a “rough plan” because, like many long term technology projects, we expect the details to evolve. It may be possible to achieve more than we lay out here, but it may also be more challenging than we predict. The specific technical work will evolve as people contribute better ideas, and we welcome all feedback on how to improve this.
Connecting the world is one of the greatest challenges of our generation. This is just one small step toward achieving that goal. I’m excited to work together to make this a reality.
For the full version, click here.“
White House Expands Guidance on Promoting Open Data
NextGov: “White House officials have announced expanded technical guidance to help agencies make more data accessible to the public in machine-readable formats.
Following up on President Obama’s May executive order linking the pursuit of open data to economic growth, innovation and government efficiency, two budget and science office spokesmen on Friday published a blog post highlighting new instructions and answers to frequently asked questions.
Nick Sinai, deputy chief technology officer at the Office of Science and Technology Policy, and Dominic Sale, supervisory policy analyst at the Office of Management and Budget, noted that the policy now in place means that all “newly generated government data will be required to be made available in open, machine-readable formats, greatly enhancing their accessibility and usefulness, while ensuring privacy and security.”
Do you want to live in a smart city?
Jane Wakefield from BBC News: “In the future everything in a city, from the electricity grid, to the sewer pipes to roads, buildings and cars will be connected to the network. Buildings will turn off the lights for you, self-driving cars will find you that sought-after parking space, even the rubbish bins will be smart. But how do we get to this smarter future. Who will be monitoring and controlling the sensors that will increasingly be on every building, lamp-post and pipe in the city?…
There is another chapter in the smart city story – and this one is being written by citizens, who are using apps, DIY sensors, smartphones and the web to solve the city problems that matter to them.
Don’t Flush Me is a neat little DIY sensor and app which is single-handedly helping to solve one of New York’s biggest water issues.
Every time there is heavy rain in the city, raw sewage is pumped into the harbour, at a rate of 27 billion gallons each year.
Using an Arduino processor, a sensor which measures water levels in the sewer overflows and a smart phone app, Don’t Flush Me lets people know when it is ‘safe to flush’.
Meanwhile Egg, a community-led sensor network, is alerting people to an often hidden problem in our cities.
Researchers estimate that two million people die each year as a result of air pollution and as cities get more over-crowded, the problem is likely to get worse.
Egg is compiling data about air quality by selling cheap sensor which people put outside their homes where they collect readings of green gases, nitrogen oxide (NO2) and carbon monoxide (CO)….
The reality is that most smart city projects are currently pretty small scale – creating tech hubs or green areas of the city, experimenting with smart electricity grids or introducing electric buses or bike-sharing schemes.”
Collaboration In Biology's Century
Todd Sherer, Chief Executive Officer of The Michael J. Fox Foundation for Parkinson’s Research, in Forbes: “he problem is, we all still work in a system that feeds on secrecy and competition. It’s hard enough work just to dream up win/win collaborative structures; getting them off the ground can feel like pushing a boulder up a hill. Yet there is no doubt that the realities of today’s research environment — everything from the accumulation of big data to the ever-shrinking availability of funds — demand new models for collaboration. Call it “collaboration 2.0.”…I share a few recent examples in the hope of increasing the reach of these initiatives, inspiring others like them, and encouraging frank commentary on how they’re working.
Open-Access Data
The successes of collaborations in the traditional sense, coupled with advanced techniques such as genomic sequencing, have yielded masses of data. Consortia of clinical sites around the world are working together to collect and characterize data and biospecimens through standardized methods, leading to ever-larger pools — more like Great Lakes — of data. Study investigators draw their own conclusions, but there is so much more to discover than any individual lab has the bandwidth for….
Crowdsourcing
A great way to grow engagement with resources you’re willing to share? Ask for it. Collaboration 2.0 casts a wide net. We dipped our toe in the crowdsourcing waters earlier this year with our Parkinson’s Data Challenge, which asked anyone interested to download a set of data that had been collected from PD patients and controls using smart phones. …
Cross-Disciplinary Collaboration 2.0
The more we uncover about the interconnectedness and complexity of the human system, the more proof we are gathering that findings and treatments for one disease may provide invaluable insights for others. We’ve seen some really intriguing crosstalk between the Parkinson’s and Alzheimer’s disease research communities recently…
The results should be: More ideas. More discovery. Better health.”
A collaborative way to get to the heart of 3D printing problems
PSFK: “Because most of us only see the finished product when it comes to 3D printing projects – it’s easy to forget that things can, and do, go wrong when it comes to this miracle technology.
3D printing is constantly evolving, reaching exciting new heights, and touching every industry you can think of – but all this progress has left a trail of mangled plastic, and a devastated machines in it’s wake.
The Art of 3D Print Failure is a Flickr group that aims to document this failure, because after all, mistakes are how we learn, and how we make sure the same thing doesn’t happen the next time around. It can also prevent mistakes from happening to those who are new to 3D printing, before they even make them!”
On our best behaviour
Paper by Hector J. Levesque: “The science of AI is concerned with the study of intelligent forms of behaviour in computational terms. But what does it tell us when a good semblance of a behaviour can be achieved using cheap tricks that seem to have little to do with what we intuitively imagine intelligence to be? Are these intuitions wrong, and is intelligence really just a bag of tricks? Or are the philosophers right, and is a behavioural understanding of intelligence simply too weak? I think both of these are wrong. I suggest in the context of question-answering that what matters when it comes to the science of AI is not a good semblance of intelligent behaviour at all, but the behaviour itself, what it depends on, and how it can be achieved. I go on to discuss two major hurdles that I believe will need to be cleared.”
Crowd-Sourcing the Nation: Now a National Effort
Release from the U.S. Department of the Interior, U.S. Geological Survey: “The mapping crowd-sourcing program, known as The National Map Corps (TNMCorps), encourages citizens to collect structures data by adding new features, removing obsolete points, and correcting existing data for The National Map database. Structures being mapped in the project include schools, hospitals, post offices, police stations and other important public buildings.
Since the start of the project in 2012, more than 780 volunteers have made in excess of 13,000 contributions. In addition to basic editing, a second volunteer peer review process greatly enhances the quality of data provided back to The National Map. A few months ago, volunteers in 35 states were actively involved. This final release of states opens up the entire country for volunteer structures enhancement.
To show appreciation of our volunteer’s efforts, The National Map Corps has instituted a recognition program that awards “virtual” badges to volunteers. The badges consist of a series of antique surveying instruments ranging from the Order of the Surveyor’s Chain (25 – 50 points) to the Theodolite Assemblage (2000+ points). Additionally, volunteers are publically acclaimed (with permission) via Twitter, Facebook and Google+….
Tools on TNMCorps website explain how a volunteer can edit any area, regardless of their familiarity with the selected structures, and becoming a volunteer for TNMCorps is easy; go to The National Map Corps website to learn more and to sign up as a volunteer. If you have access to the Internet and are willing to dedicate some time to editing map data, we hope you will consider participating!”
Five myths about big data
Samuel Arbesman, senior scholar at the Ewing Marion Kauffman Foundation and the author of “The Half-Life of Facts” in the Washington Post: “Big data holds the promise of harnessing huge amounts of information to help us better understand the world. But when talking about big data, there’s a tendency to fall into hyperbole. It is what compels contrarians to write such tweets as “Big Data, n.: the belief that any sufficiently large pile of s— contains a pony.” Let’s deflate the hype.
1. “Big data” has a clear definition.
The term “big data” has been in circulation since at least the 1990s, when it is believed to have originated in Silicon Valley. IBM offers a seemingly simple definition: Big data is characterized by the four V’s of volume, variety, velocity and veracity. But the term is thrown around so often, in so many contexts — science, marketing, politics, sports — that its meaning has become vague and ambiguous….
2. Big data is new.
By many accounts, big data exploded onto the scene quite recently. “If wonks were fashionistas, big data would be this season’s hot new color,” a Reuters report quipped last year. In a May 2011 report, the McKinsey Global Institute declared big data “the next frontier for innovation, competition, and productivity.”
It’s true that today we can mine massive amounts of data — textual, social, scientific and otherwise — using complex algorithms and computer power. But big data has been around for a long time. It’s just that exhaustive datasets were more exhausting to compile and study in the days when “computer” meant a person who performed calculations….
3. Big data is revolutionary.
In their new book, “Big Data: A Revolution That Will Transform How We Live, Work, and Think,”Viktor Mayer-Schonberger and Kenneth Cukier compare “the current data deluge” to the transformation brought about by the Gutenberg printing press.
If you want more precise advertising directed toward you, then yes, big data is revolutionary. Generally, though, it’s likely to have a modest and gradual impact on our lives….
4. Bigger data is better.
In science, some admittedly mind-blowing big-data analyses are being done. In business, companies are being told to “embrace big data before your competitors do.” But big data is not automatically better.
Really big datasets can be a mess. Unless researchers and analysts can reduce the number of variables and make the data more manageable, they get quantity without a whole lot of quality. Give me some quality medium data over bad big data any day…
5. Big data means the end of scientific theories.
Chris Anderson argued in a 2008 Wired essay that big data renders the scientific method obsolete: Throw enough data at an advanced machine-learning technique, and all the correlations and relationships will simply jump out. We’ll understand everything.
But you can’t just go fishing for correlations and hope they will explain the world. If you’re not careful, you’ll end up with spurious correlations. Even more important, to contend with the “why” of things, we still need ideas, hypotheses and theories. If you don’t have good questions, your results can be silly and meaningless.
Having more data won’t substitute for thinking hard, recognizing anomalies and exploring deep truths.”
Announcing Project Open Data from Cloudant Labs
Yuriy Dybskiy from Cloudant: “There has been an emerging pattern over the last few years of more and more government datasets becoming available for public access. Earlier this year, the White House announced official policy on such data – Project Open Data.
Available resources
Here are four resources on the topic:
- Tim Berners-Lee: Open, Linked Data for a Global Community – [10 min video]
- Rufus Pollock: Open Data – How We Got Here and Where We’re Going – [24 min video]
- Open Knowledge Foundation Datasets – http://data.okfn.org/data
- Max Ogden: Project
dat
– collaborative data – [github repo]
One of the main challenges is access to the datasets. If only there were a database that had easy access to its data baked right in it.
Luckily, there is CouchDB and Cloudant, which share the same APIs to access data via HTTP. This makes for a really great option to store interesting datasets.
Cloudant Open Data
Today we are happy to announce a Cloudant Labs project – Cloudant Open Data!
Several datasets are available at the moment, for example, businesses_sf – data regarding businesses registered in San Francisco and sf_pd_incidents – a collection of incident reports (criminal and non-criminal) made by the San Francisco Police Department.
We’ll add more, but if you have one you’d like us to add faster – drop us a line at [email protected]
Create an account and play with these datasets yourself”