Big Data


Special Report on Big Data by Volta – A newsletter on Science, Technology and Society in Europe:  “Locating crime spots, or the next outbreak of a contagious disease, Big Data promises benefits for society as well as business. But more means messier. Do policy-makers know how to use this scale of data-driven decision-making in an effective way for their citizens and ensure their privacy?90% of the world’s data have been created in the last two years. Every minute, more than 100 million new emails are created, 72 hours of new video are uploaded to YouTube and Google processes more than 2 million searches. Nowadays, almost everyone walks around with a small computer in their pocket, uses the internet on a daily basis and shares photos and information with their friends, family and networks. The digital exhaust we leave behind every day contributes to an enormous amount of data produced, and at the same time leaves electronic traces that contain a great deal of personal information….
Until recently, traditional technology and analysis techniques have not been able to handle this quantity and type of data. But recent technological developments have enabled us to collect, store and process data in new ways. There seems to be no limitations, either to the volume of data or technology for storing and analyzing them. Big Data can map a driver’s sitting position to identify a car thief, it can use Google searches to predict outbreaks of the H1N1 flu virus, it can data-mine Twitter to predict the price of rice or use mobile phone top-ups to describe unemployment in Asia.
The word ‘data’ means ‘given’ in Latin. It commonly refers to a description of something that can be recorded and analyzed. While there is no clear definition of the concept of ‘Big Data’, it usually refers to the processing of huge amounts and new types of data that have not been possible with traditional tools.

‘The new development is not necessarily that there are so much more data. It’s rather that data is available to us in a new way.’

The notion of Big Data is kind of misleading, argues Robindra Prabhu, a project manager at the Norwegian Board of Technology. “The new development is not necessarily that there are so much more data. It’s rather that data is available to us in a new way. The digitalization of society gives us access to both ‘traditional’, structured data – like the content of a database or register – and unstructured data, for example the content in a text, pictures and videos. Information designed to be read by humans is now also readable by machines. And this development makes a whole new world of  data gathering and analysis available. Big Data is exciting not just because of the amount and variety of data out there, but that we can process data about so much more than before.”

Open Data Index provides first major assessment of state of open government data


Press Release from the Open Knowledge Foundation: “In the week of a major international summit on government transparency in London, the Open Knowledge Foundation has published its 2013 Open Data Index, showing that governments are still not providing enough information in an accessible form to their citizens and businesses.
The UK and US top the 2013 Index, which is a result of community-based surveys in 70 countries. They are followed by Denmark, Norway and the Netherlands. Of the countries assessed, Cyprus, St Kitts & Nevis, the British Virgin Islands, Kenya and Burkina Faso ranked lowest. There are many countries where the governments are less open but that were not assessed because of lack of openness or a sufficiently engaged civil society. This includes 30 countries who are members of the Open Government Partnership.
The Index ranks countries based on the availability and accessibility of information in ten key areas, including government spending, election results, transport timetables, and pollution levels, and reveals that whilst some good progress is being made, much remains to be done.
Rufus Pollock, Founder and CEO of the Open Knowledge Foundation said:

Opening up government data drives democracy, accountability and innovation. It enables citizens to know and exercise their rights, and it brings benefits across society: from transport, to education and health. There has been a welcome increase in support for open data from governments in the last few years, but this Index reveals that too much valuable information is still unavailable.

The UK and US are leaders on open government data but even they have room for improvement: the US for example does not provide a single consolidated and open register of corporations, while the UK Electoral Commission lets down the UK’s good overall performance by not allowing open reuse of UK election data.
There is a very disappointing degree of openness of company registers across the board: only 5 out of the 20 leading countries have even basic information available via a truly open licence, and only 10 allow any form of bulk download. This information is critical for range of reasons – including tackling tax evasion and other forms of financial crime and corruption.
Less than half of the key datasets in the top 20 countries are available to re-use as open data, showing that even the leading countries do not fully understand the importance of citizens and businesses being able to legally and technically use, reuse and redistribute data. This enables them to build and share commercial and non-commercial services.
To see the full results: https://index.okfn.org. For graphs of the data: https://index.okfn.org/visualisations.”

A Data Revolution for Poverty Eradication


Report from devint.org: “The High Level Panel on the Post–2015 Development Agenda called for a data revolution for sustainable development, with a new international initiative to improve the quality of statistics and information available to citizens. It recommended actively taking advantage of new technology, crowd sourcing, and improved connectivity to empower people with information on the progress towards the targets. Development Initiatives believes there a number of steps that should be put in place in order to deliver the ambition set out by the Panel.
The data revolution should be seen as a basis on which greater openness and a wider transparency revolution can be built. The openness movement – one of the most exciting and promising developments of the last decade – is starting to transform the citizen-state compact. Rich and developing country governments are adapting the way they do business, recognising that greater transparency and participation leads to more effective, efficient, and equitable management of scarce public resources. Increased openness of data has potential to democratise access to information, empowering individuals with the knowledge they need to tackle the problems that they face. To realise this bold ambition, the revolution will need to reach beyond the niche data and statistical communities, sell the importance of the revolution to a wide range of actors (governments, donors, CSOs and the media) and leverage the potential of open data to deliver more usable information”

You Can Predict What Government Agencies Will Buy; For Real!


Jen Clement at GovLoop: “Two great free government-run websites that show how federal government agencies are spending their money are USASpending.gov and FedBizOpps.gov. Each site allows you to research how the government has spent its procurement dollars in the last several years, and can give business owners a snapshot of what industry segments and what type of commercial products and services offer the best contracting opportunities so vendors can conduct their target business analysis and approach a select group of potential buyers.

SmartProcure offers a unique service that allows you to search thousands and thousands of government purchase orders, providing you ability to predict purchasing opportunity in the future. SmartProcure lets you search specifically for a product or service you sell and show you exactly which government agencies have bought that product or service, how much they paid, and which vendors (your competitors) they’ve purchased from. In addition to purchasing histories you’ll have access to powerful market analysis tools to help you conduct thorough competitive and market intelligence reviews to find the right niches for your business to take advantage of.  Whether it is federal, state, or local governments, a snapshot into the past can help determine the future…
For more helpful tips visit:  https://ow133.infusionsoft.com/go/blog/jc/

Why Crowdsourcing is the Next Cloud Computing


Alpheus Bingham, co-founder and a member of the board of directors at InnoCentive, in Wired: “But over the course of a decade, what we now call cloud-based or software-as-a-service (SaaS) applications has taken the world by storm and become mainstream. Today, cloud computing is an umbrella term that applies to a wide variety of successful technologies (and business models), from business apps like Salesforce.com, to infrastructure like Amazon Elastic Compute Cloud (Amazon EC2), to consumer apps like Netflix. It took years for all these things to become mainstream, and if the last decade saw the emergence (and eventual dominance) of the cloud over previous technologies and models, this decade will see the same thing with crowdsourcing.
Both an art and a science, crowdsourcing taps into the global experience and wisdom of individuals, teams, communities, and networks to accomplish tasks and work. It doesn’t matter who you are, where you live, or what you do or believe — in fact, the more diversity of thought and perspective, the better. Diversity is king and it’s common for people on the periphery of — or even completely outside of — a discipline or science to end up solving important problems.
The specific nature of the work offers few constraints – from a small business needing a new logo, to the large consumer goods company looking to ideate marketing programs, or to the nonprofit research organization looking to find a biomarker for ALS, the value is clear as well.
To get to the heart of the matter on why crowdsourcing is this decade’s cloud computing, several immediate reasons come to mind:
Crowdsourcing Is Disruptive
Much as cloud computing has created a new guard that in many ways threatens the old guard, so too has crowdsourcing. …
Crowdsourcing Provides On-Demand Talent Capacity
Labor is expensive and good talent is scarce. Think about the cost of adding ten additional researchers to a 100-person R&D team. You’ve increased your research capacity by 10% (more or less), but at a significant cost – and, a significant FIXED cost at that. …
Crowdsourcing Enables Pay-for-Performance.
You pay as you go with cloud computing — gone are the days of massive upfront capital expenditures followed by years of ongoing maintenance and upgrade costs. Crowdsourcing does even better: you pay for solutions, not effort, which predictably sometimes results in failure. In fact, with crowdsourcing, the marketplace bears the cost of failure, not you….
Crowdsourcing “Consumerizes” Innovation
Crowdsourcing can provide a platform for bi-directional communication and collaboration with diverse individuals and groups, whether internal or external to your organization — employees, customers, partners and suppliers. Much as cloud computing has consumerized technology, crowdsourcing has the same potential to consumerize innovation, and more broadly, how we collaborate to bring new ideas, products and services to market.
Crowdsourcing Provides Expert Services and Skills That You Don’t Possess.
One of the early value propositions of cloud-based business apps was that you didn’t need to engage IT to deploy them or Finance to help procure them, thereby allowing general managers and line-of-business heads to do their jobs more fluently and more profitably…”

The Decline of Wikipedia


Tom Simonite in MIT Technology Review: “The sixth most widely used website in the world is not run anything like the others in the top 10. It is not operated by a sophisticated corporation but by a leaderless collection of volunteers who generally work under pseudonyms and habitually bicker with each other. It rarely tries new things in the hope of luring visitors; in fact, it has changed little in a decade. And yet every month 10 billion pages are viewed on the English version of Wikipedia alone. When a major news event takes place, such as the Boston Marathon bombings, complex, widely sourced entries spring up within hours and evolve by the minute. Because there is no other free information source like it, many online services rely on Wikipedia. Look something up on Google or ask Siri a question on your iPhone, and you’ll often get back tidbits of information pulled from the encyclopedia and delivered as straight-up facts.
Yet Wikipedia and its stated ambition to “compile the sum of all human knowledge” are in trouble. The volunteer workforce that built the project’s flagship, the English-language Wikipedia—and must defend it against vandalism, hoaxes, and manipulation—has shrunk by more than a third since 2007 and is still shrinking. Those participants left seem incapable of fixing the flaws that keep Wikipedia from becoming a high-quality encyclopedia by any standard, including the project’s own. Among the significant problems that aren’t getting resolved is the site’s skewed coverage: its entries on Pokemon and female porn stars are comprehensive, but its pages on female novelists or places in sub-Saharan Africa are sketchy. Authoritative entries remain elusive. Of the 1,000 articles that the project’s own volunteers have tagged as forming the core of a good encyclopedia, most don’t earn even Wikipedia’s own middle-­ranking quality scores.
The main source of those problems is not mysterious….”

Smart Machines: IBM's Watson and the Era of Cognitive Computing


New book from Columbia Business School Publishing: “We are crossing a new frontier in the evolution of computing and entering the era of cognitive systems. The victory of IBM’s Watson on the television quiz show Jeopardy! revealed how scientists and engineers at IBM and elsewhere are pushing the boundaries of science and technology to create machines that sense, learn, reason, and interact with people in new ways to provide insight and advice.
In Smart Machines, John E. Kelly III, director of IBM Research, and Steve Hamm, a writer at IBM and a former business and technology journalist, introduce the fascinating world of “cognitive systems” to general audiences and provide a window into the future of computing. Cognitive systems promise to penetrate complexity and assist people and organizations in better decision making. They can help doctors evaluate and treat patients, augment the ways we see, anticipate major weather events, and contribute to smarter urban planning. Kelly and Hamm’s comprehensive perspective describes this technology inside and out and explains how it will help us conquer the harnessing and understanding of “big data,” one of the major computing challenges facing businesses and governments in the coming decades. Absorbing and impassioned, their book will inspire governments, academics, and the global tech industry to work together to power this exciting wave in innovation.”
See also Why cognitive systems?

Beyond Transparency


New book on Open Data and the Future of Civic Innovation: The rise of open data in the public sector has sparked innovation, driven efficiency, and fueled economic development. And in the vein of high-profile federal initiatives like Data.gov and the White House’s Open Government Initiative, more and more local governments are making their foray into the field with Chief Data Officers, open data policies, and open data catalogs.
While still emerging, we are seeing evidence of the transformative potential of open data in shaping the future of our civic life. It’s at the local level that government most directly impacts the lives of residents—providing clean parks, fighting crime, or issuing permits to open a new business. This is where there is the biggest opportunity to use open data to reimagine the relationship between citizens and government.
Beyond Transparency is a cross-disciplinary survey of the open data landscape, in which practitioners share their own stories of what they’ve accomplished with open civic data. It seeks to move beyond the rhetoric of transparency for transparency’s sake and towards action and problem solving. Through these stories, we examine what is needed to build an ecosystem in which open data can become the raw materials to drive more effective decision-making and efficient service delivery, spur economic activity, and empower citizens to take an active role in improving their own communities….
This book is a resource for (and by) practitioners inside and outside government—from the municipal chief information officer to the community organizer to the civic-minded entrepreneur. Beyond Transparency is intended to capture and distill the community’s learnings around open data for the past four years. And we know that the community is going to continue learning. That’s why, in addition to the print version of the book which you can order on Amazon, we’ve also published the digital version of this book on this site under a Creative Commons license. The full text of this site is on GitHub — which means that anyone can submit a pull request with a suggested edit. Help us improve this resource for the community and write the next edition of Beyond Transparency by submitting your pull requests.
Code for America is a national nonprofit committed to building a government for the people, by the people, that works in the 21st century. Over the past four years, CfA has worked with dozens of cities to support civic innovation through open data. You can support this work by contributing to the book on GitHub, joining the CfA volunteer community (the Brigade), or connecting your city with CfA.

GitHub and Government


New site: “Make government better, together. Stories of open source, open data, and open government.
This site is an open source effort to showcase best practices of open sourcing government. See something that you think could be better? Want to submit your own story? Simply fork the project and submit a pull request.

Ready to get started on GitHub? Here are some ideas that are easy to get your feet wet with.

Feedback Repository

GitHub’s about connecting with developers. Whether you’re an API publishing pro, or just getting started, creating a “feedback” repository can go a long way to connect your organization with the community. Get feedback from current and potential data consumers by creating a specific repository for them to contribute ideas and suggestions for types of data or other information they’d like to see opened. Here’s how:

  1. Create a new repository
    • Choose your organization as the Owner
    • Name the repository “feedback” or similar
    • Click the checkbox to automatically create a README.md file
  2. Set up your Readme
    • Click README.md within your newly created repository
    • Click Edit
    • Introduce yourself, describe why you’ve joined GitHub, what you’re hoping to do and what you’d like to learn from the development community. Encourage them to leave feedback through issues on the repository.

Sample text for your README.md:

# City of Gotham Feedback
We've just joined GitHub and want to know what data would be interesting to our development community?
Leave us comments via issues!

Open source a Dataset

Open sourcing a dataset can be as simple as uploading a .csv to GitHub and letting people know about it. Rather than publishing data as a zip file on your website or an FTP server, you can add the files through the GitHub.com web interface, or via the GitHub for Windows or GitHub for Mac native clients. Create a new repository to store your datasets – in many cases, it’s as easy as drag, drop, sync.
GitHub can host any file type (although open, non-binary files like .csvs tend to work best). Plus, GitHub supports rendering certain open data formats interactively such as the popular geospacial .geojson format. Once uploaded, citizens can view the files, and can even open issues or submit pull requests with proposed fixes.

Explore Open Source Civic Apps

There are many open source applications freely available on GitHub that were built just for government. Check them out, and see if it fits a need. Here are some examples:

  • Adopt-a – This open source web app was created for the City of Boston in 2011 by Code for America fellows. It allows residents to “adopt” a hydrant and make sure it’s clear of snow in the winter so that emergency crews can locate them when needed. It has since been adopted in Chicago (for sidewalks), Seattle (for storm drains), and Honolulu (for tsunami sirens).
  • StreetMix – Another creation of Code for America fellows (2013) this website, www.streetmix.net, allows anyone to create street sections in a way that is not only beautiful but educational, too. No downloading, no installing, no paying – make and save your creations right at the website. Great for internal or public community planning meetings.
  • We The PeopleWe The People, the White House’s petitions application hosted at petitions.whitehouse.gov is a Drupal module to allow citizens to submit and digitally sign petitions.

Open source something small

Chances are you’ve got something small you can open source. Check in with your web or new media team, and see if they’ve got something they’ve been dying to share or blog about, no matter how small. It can be snippet of analytics code, or maybe a small script used internally. It doesn’t even have to be code.
Post your website’s privacy policy, comment moderation policy, or terms of service and let the community weigh in before your next edit. No matter how small it is, getting your first open source project going is a great first step.

Improve an existing project

Does you agency use an existing open source project to conduct its own business? Open an issue on the project’s repository with a feature request or a bug you spot. Better yet, fork the project, and submit your improvements. Even if it’s one or two lines of code, such examples are great to blog about to showcase your efforts.
Don’t forget, this site is an open source project, too. Making an needed edit is another great way to get started.”

Collaborative Internet Governance: Terms and Conditions of Analysis


New paper by Mathieu O’Neil in the special issue on Contested Internet Governance of the Revue française d’études américaines: “Online projects are communities of practice which attempt to bypass the hierarchies of everyday life and to create autonomous institutions and forms of organisation. A wealth of theoretical frameworks have been put forward to account for these networked actors’ capacity to communicate and self-organise. This article reviews terminology used in Internet research and assesses what it implies for the understanding of regulatory-oriented collective action. In terms of the environment in which interpersonal communication occurs, what differences does it make to speak of “public spheres” or of “public spaces”? In terms of social formations, of “organisations” or “networks”? And in terms of the diffusion of information over the global network, of “contagion” or “trajectories”? Selecting theoretical frames is a momentous decision for researchers, as it authorises or forbids the analysis of different types of behaviour and practices”.-
Other papers on Internet Governance in the Revue:
Divina Frau-Meigs  (Ed.).  Conducting Research on the Internet and its Governance
The Internet and its Governance: A General Bibliography
Glossary of Key Terms and Notions about Internet Governance
Julia Pohle et Luciano Morganti   The Internet Corporation for Assigned Names and Numbers (ICANN): Origins, Stakes and Tensions
Francesca Musiani et al.   Net Neutrality as an Internet Governance Issue: The Globalization of an American-Born Debate
Jeanette Hofmann   Narratives of Copyright Enforcement: The Upward Ratchet and the Sleeping Giant
Elizabeth Dubois et William H. Dutton   The Fifth Estate in Internet Governance: Collective Accountability of a Canadian Policy Initiative
Mathieu O’Neil   Collaborative Internet Governance: Terms and Conditions of Analysis
Peng Hwa Ang et Natalie Pang  Globalization of the Internet, Sovereignty or Democracy: The Trilemma of the Internet Governance Forum