Power of open data reveals global corporate networks


Open Data Institute: “The ODI today welcomed the move by OpenCorporates to release open data visualisations which show the global corporate networks of millions of businesses and the power of open data.
See the Maps
OpenCorporates, a company based at the ODI, has produced visuals using several sources, which it has published as open data for the first time:

  • Filings made by large domestic and foreign companies to the U.S. Securities and Exchange Commission
  • Banking data held by the National Information Center of the Federal Reserve System in the U.S.
  • Information about individual shareholders published by the official New Zealand corporate registry

Launched today, the visualisations are available through the main OpenCorporates website.”

Infoglut: How Too Much Information Is Changing the Way We Think and Know


New book by Mark Andrejevic: “Today, more mediated information is available to more people than at any other time in human history. New and revitalized sense-making strategies multiply in response to the challenges of “cutting through the clutter” of competing narratives and taming the avalanche of information. Data miners, “sentiment analysts,” and decision markets offer to help bodies of data “speak for themselves”—making sense of their own patterns so we don’t have to. Neuromarketers and body language experts promise to peer behind people’s words to see what their brains are really thinking and feeling. New forms of information processing promise to displace the need for expertise and even comprehension—at least for those with access to the data.
Infoglut explores the connections between these wide-ranging sense-making strategies for an era of information overload and “big data,” and the new forms of control they enable. Andrejevic critiques the popular embrace of deconstructive debunkery, calling into question the post-truth, post-narrative, and post-comprehension politics it underwrites, and tracing a way beyond them.”

Infographics: Winds of change


Book Review in the Economist:

  • Data Points: Visualisation That Means Something. By Nathan Yau. Wiley; 300 pages; $32 and £26.99.
  • Facts are Sacred. By Simon Rogers. Faber and Faber; 311 pages; £20.
  • The Infographic History of the World. By James Ball and Valentina D’Efilippo. Collins; 224 pages; £20.

“IN THE late 1700s William Playfair, a Scottish engineer, created the bar chart, pie chart and line graph. These amounted to visual breakthroughs, innovations that allowed people to see patterns in data that they would otherwise have missed if they just stared at long tables of numbers.
Big data, the idea that the world is replete with more information than ever, is now all the rage. And the search for fresh and enlightened ways to help people absorb it is causing a revolution. A new generation of statisticians and designers—often the same person—are working on computer technologies and visual techniques that will depict data at scales and in forms previously unimaginable. The simple line graph and pie chart are being supplemented by things like colourful, animated bubble charts, which can present more variables. Three-dimensional network diagrams show ratios and relationships that were impossible to depict before.

IRS database of nonprofits is filled with unredacted SSNs


in BoingBoing: “Remember when rogue archivist Carl Malamud asked the IRS for data on $1.5 trillion worth of data from nonprofit organizations? Well, it turns out that the IRS has totally failed to redact it properly, and left in the Social Security Numbers for thousands of people. So they’ve asked the IRS to take the database down and get it right. He explains:

Public.Resource.Org has issued a statement explaining why we asked the I.R.S. to temporarily take their political money database off the Internet and why they complied with our request. This database is a vital tool for researchers and we apologize to those of you that use this database on a daily basis.
This is only one of several exempt organization databases that the IRS has totally bungled. They’ve become addicted to bad Internet hygiene and it is time now for the Service to admit it needs help.
We deserve better for the public filings of exempt organizations, a category that makes up 10% of US wages and over $1.5 trillion in economic activity. Let’s hope the administration takes this seriously and sends in the A team.”

Why We Asked the I.R.S. to Temporarily Turn the Lights Off on Section 527 Data

Urban Observatory


Understanding Precedes Action: “Richard Saul Wurman, Radical Media, and Esri bring you the Urban Observatory—a live museum with a data pulse. You’ll have access to rich datasets for cities around the world that let you simultaneously view answers to the most important questions impacting today’s global cities—and you. Compare and contrast visualized information for a greater understanding of life in the 21st century.”

The Real-Time City? Big Data and Smart Urbanism


New paper by Rob Kitchin from the National University of Ireland, Maynooth (NUI Maynooth) – NIRSA: “‘Smart cities’ is a term that has gained traction in academia, business and government to describe cities that, on the one hand, are increasingly composed of and monitored by pervasive and ubiquitous computing and, on the other, whose economy and governance is being driven by innovation, creativity and entrepreneurship, enacted by smart people. This paper focuses on the former and how cities are being instrumented with digital devices and infrastructure that produce ‘big data’ which enable real-time analysis of city life, new modes of technocratic urban governance, and a re-imagining of cities. The paper details a number of projects that seek to produce a real-time analysis of the city and provides a critical reflection on the implications of big data and smart urbanism”
 
 

Open Government is an Open Conversation


Lisa Ellman and Hollie Russon Gilman at the White House Blog: “President Obama launched the first U.S. Open Government National Action Plan in September 2011, as part of the Nation’s commitment to the principles of the global Open Government Partnership. The Plan laid out twenty-six concrete steps the United States would take to promote public participation in government, increase transparency in government, and manage public resources more effectively.
A  year and a half later, we have fulfilled twenty-four of the Plan’s prescribed commitments—including launching the online We the People petition platform, which has been used by more than 9.6 million people, and unleashing thousands of government data resources as part of the Administration’s Open Data Initiatives.
We are proud of this progress, but recognize that there is always more work to be done to build a more efficient, effective, and transparent government. In that spirit, as part of our ongoing commitment to the international Open Government Partnership, the Obama Administration has committed to develop a second National Action Plan on Open Government.
To accomplish this task effectively, we’ll need all-hands-on-deck. That’s why we plan to solicit and incorporate your input as we develop the National Action Plan “2.0.”…
Over the next few months, we will continue to gather your thoughts. We will leverage online platforms such as Quora, Google+, and Twitter to communicate with the public and collect feedback.  We will meet with members of open government civil society organizations and other experts, to ensure all voices are brought to the table.  We will solicit input from Federal agencies on lessons learned from their unique experiences, and gather information about successful initiatives that could potentially be scaled across government.  And finally, we will canvass the international community for their diverse insights and innovative ideas.”

Frontiers in Massive Data Analysis


New Report from the National Research Council: “From Facebook to Google searches to bookmarking a webpage in our browsers, today’s society has become one with an enormous amount of data. Some internet-based companies such as Yahoo! are even storing exabytes (10 to the 18 bytes) of data. Like these companies and the rest of the world, scientific communities are also generating large amounts of data-—mostly terabytes and in some cases near petabytes—from experiments, observations, and numerical simulation. However, the scientific community, along with defense enterprise, has been a leader in generating and using large data sets for many years. The issue that arises with this new type of large data is how to handle it—this includes sharing the data, enabling data security, working with different data formats and structures, dealing with the highly distributed data sources, and more.
Frontiers in Massive Data Analysis presents the Committee on the Analysis of Massive Data’s work to make sense of the current state of data analysis for mining of massive sets of data, to identify gaps in the current practice and to develop methods to fill these gaps. The committee thus examines the frontiers of research that is enabling the analysis of massive data which includes data representation and methods for including humans in the data-analysis loop. The report includes the committee’s recommendations, details concerning types of data that build into massive data, and information on the seven computational giants of massive data analysis.”

City Data: Big, Open and Linked


Working Paper by Mark S. Fox (University of Toronto): “Cities are moving towards policymaking based on data. They are publishing data using Open Data standards, linking data from disparate sources, allowing the crowd to update their data with Smart Phone Apps that use Open APIs, and applying “Big Data” Techniques to discover relationships that lead to greater efficiencies.
One Big City Data example is from New York City (Schönberger & Cukier, 2013). Building owners were illegally converting their buildings into rooming houses that contained 10 times the number people they were designed for. These buildings posed a number of problems, including fire hazards, drugs, crime, disease and pest infestations. There are over 900,000 properties in New York City and only 200 inspectors who received over 25,000 illegal conversion complaints per year. The challenge was to distinguish nuisance complaints from those worth investigating where current methods were resulting in only 13% of the inspections resulting in vacate orders.
New York’s Analytics team created a dataset that combined data from 19 agencies including buildings, preservation, police, fire, tax, and building permits. By combining data analysis with expertise gleaned from inspectors (e.g., buildings that recently received a building permit were less likely to be a problem as they were being well maintained), the team was able to develop a rating system for complaints. Based on their analysis of this data, they were able to rate complaints such that in 70% of their visits, inspectors issued vacate orders; a fivefold increase in efficiency…
This paper provides an introduction to the concepts that underlie Big City Data. It explains the concepts of Open, Unified, Linked and Grounded data that lie at the heart of the Semantic Web. It then builds on this by discussing Data Analytics, which includes Statistics, Pattern Recognition and Machine Learning. Finally we discuss Big Data as the extension of Data Analytics to the Cloud where massive amounts of computing power and storage are available for processing large data sets. We use city data to illustrate each.”

Immersion: Using E-Mail Data to Connect the Dots of Your Life


Brian Chen in The New York Times: “The Obama administration for over two years allowed the National Security Agency to collect enormous amounts of metadata on e-mail usage by Americans, according to one of the latest leaks of government documents by the now-famous whistle-blower Edward J. Snowden.
But what is e-mail metadata anyway? It’s information about the people you’re sending e-mails to and receiving e-mails from, and the times that the messages were sent — as opposed to the contents of the messages. It’s the digital equivalent of a postal service worker looking at your mail envelope instead of opening it up and reading what’s inside.
That sounds harmless, but it turns out your e-mail metadata can be used to connect the dots of your life story. I learned this from participating in Immersion, a project by M.I.T.’s Media Laboratory, earlier reported by my colleague Juliet Lapidos. Immersion is a tool that mines your e-mail metadata and automatically stitches it all together into an interactive graphic. The result is a creepy spider web showing all the people you’ve corresponded with, how they know each other, and who your closest friends and professional partners are.
After entering my Google mail credentials, Immersion took five minutes to stitch together metadata from e-mails going back eight years. A quick glimpse at my results gives an accurate description of my life.”
Sign up here: https://immersion.media.mit.edu/