Principles and Practices for a Federal Statistical Agency


New National Academies Publication : “Publicly available statistics from government agencies that are credible, relevant, accurate, and timely are essential for policy makers, individuals, households, businesses, academic institutions, and other organizations to make informed decisions. Even more, the effective operation of a democratic system of government depends on the unhindered flow of statistical information to its citizens.
In the United States, federal statistical agencies in cabinet departments and independent agencies are the governmental units whose principal function is to compile, analyze, and disseminate information for such statistical purposes as describing population characteristics and trends, planning and monitoring programs, and conducting research and evaluation. The work of these agencies is coordinated by the U.S. Office of Management and Budget. Statistical agencies may acquire information not only from surveys or censuses of people and organizations, but also from such sources as government administrative records, private-sector datasets, and Internet sources that are judged of suitable quality and relevance for statistical use. They may conduct analyses, but they do not advocate policies or take partisan positions. Statistical purposes for which they provide information relate to descriptions of groups and exclude any interest in or identification of an individual person, institution, or economic unit.
Four principles are fundamental for a federal statistical agency: relevance to policy issues, credibility among data users, trust among data providers, and independence from political and other undue external influence. Principles and Practices for a Federal Statistical Agency: Fifth Edition explains these four principles in detail.”

Life and Death of Tweets Not so Random After All


MIT Technology Review: “MIT assistant professor Tauhid Zaman and two other researchers (Emily Fox at the University of Washington and Eric Bradlow at the University of Pennsylvania’s Wharton School) have come up with a model that can predict how many times a tweet will ultimately be retweeted, minutes after it is posted. The model was created by collecting retweets on a slew of topics and looking at the time when the original tweet was posted and how fast it spread. That provided knowledge used to predict how popular a new tweet will be by looking at how many times it was retweeted shortly after it was first posted.
The researchers’ findings were explained in a paper submitted to the Annals of Applied Statistics. In the paper, the authors note that “understanding retweet behavior could lead to a better understanding of how broader ideas spread in Twitter and in other social networks,” and such data may be helpful in a number of areas, like marketing and political campaigning.
You can check out the model here.”

If My Data Is an Open Book, Why Can’t I Read It?


Natasha Singer in the New York Times: “Never mind all the hoopla about the presumed benefits of an “open data” society. In our day-to-day lives, many of us are being kept in the data dark.

“The fact that I am producing data and companies are collecting it to monetize it, if I can’t get a copy myself, I do consider it unfair,” says Latanya Sweeney, the director of the Data Privacy Lab at Harvard, where she is a professor of government and technology….

In fact, a few companies are challenging the norm of corporate data hoarding by actually sharing some information with the customers who generate it — and offering tools to put it to use. It’s a small but provocative trend in the United States, where only a handful of industries, like health care and credit, are required by federal law to provide people with access to their records.

Last year, San Diego Gas and Electric, a utility, introduced an online energy management program in which customers can view their electricity use in monthly, daily or hourly increments. There is even a practical benefit: customers can earn credits by reducing energy consumption during peak hours….

100 Urban Trends


A Glossary of Ideas from the BMW Guggenheim Lab—New York, Berlin, and Mumbai : “Over the past two years, the BMW Guggenheim Lab, a mobile urban laboratory centered around the topic of life in cities today, has offered free programs and workshops and implemented urban projects in New York City (August 3–October 16, 2011), Berlin (June 15–July 29, 2012), and Mumbai (December 9–January 20, 2013). Created as a resource, 100 Urban Trends aims to identify the most talked-about trends in urban thinking, as they were discussed in these three venues. Each individual glossary offers 100 contextualized definitions that apply to the way we understand, design, and live in cities.
Integral to 100 Urban Trends is the concept of cities as “idea makers.” In cities, people come together, share their thoughts and common interests, and generate the ideas that shape our world. Dense, growing cities have been and continue to be the catalyst for human progress, powered by daily proximity among their citizens as much as anything else. Despite some of the drawbacks of such massive urban centers, they may well embody the future for human life. Today’s cities are competing to attract more people; greater urban density can mean more conflict, but it can also produce a greater diversity of viewpoints and more opportunity for positive change.
In recent years, there has been an unequivocal shift in the study of cities. Urban thinking, whether related to architecture or urbanism, has become dramatically less focused on infrastructure, and more on the ultimate goal and reason for the existence of cities — that is, the well-being of the people that inhabit them and constitute their very soul and essence. “Cluster,” “concentrate,” and “collaborate” seem to have become the three big Cs of urban thinking of late — but that story is not new. Clustering, searching for a concentration of people, and finding ways to collaborate have been part of the human experience since prehistoric times. Then, as now, people gathered in search of protection, conviviality, and exchange.”

The Declassification Engine


Wired: “The CIA offers an electronic search engine that lets you mine about 11 million agency documents that have been declassified over the years. It’s called CREST, short for CIA Records Search Tool. But this represents only a portion the CIA’s declassified materials, and if you want unfettered access to the search engine, you’ll have to physically visit the National Archives at College Park, Maryland….
a new project launched by a team of historians, mathematicians, and computer scientists at Columbia University in New York City. Led by Matthew Connelly — a Columbia professor trained in diplomatic history — the project is known as The Declassification Engine, and it seeks to provide a single online database for declassified documents from across the federal government, including the CIA, the State Department, and potentially any other agency.
The project is still in the early stages, but the team has already assembled a database of documents that stretches back to the 1940s, and it has begun building new tools for analyzing these materials. In aggregating all documents into a single database, the researchers hope to not only provide quicker access to declassified materials, but to glean far more information from these documents than we otherwise could.
In the parlance of the day, the project is tackling these documents with the help of Big Data. If you put enough of this declassified information in a single place, Connelly believes, you can begin to predict what government information is still being withheld”

Deepbills project


Cato Institute: “The Deepbills project takes the raw XML of Congressional bills (available at FDsys and Thomas) and adds additional semantic information to them in inside the text.

You can download the continuously-updated data at http://deepbills.cato.org/download

Congress already produces machine-readable XML of almost every bill it proposes, but that XML is designed primarily for formatting a paper copy, not for extracting information. For example, it’s not currently possible to find every mention of an Agency, every legal reference, or even every spending authorization in a bill without having a human being read it….
Currently the following information is tagged:

  • Legal citations…
  • Budget Authorities (both Authorizations of Appropriations and Appropriations)…
  • Agencies, bureaus, and subunits of the federal government.
  • Congressional committees
  • Federal elective officeholders (Congressmen)”

Crowdfunding gives rise to projects truly in public domain


USA Today: “Crowdfunding, the cyberpractice of pooling individuals’ money for a cause, so far has centered on private enterprise. It’s now spreading to public spaces and other community projects that are typically the domain of municipalities.

The global reach and speed of the Internet are raising not just money but awareness and galvanizing communities.

SmartPlanet.com recently reported that crowdfunding capital projects is gaining momentum, giving communities part ownership of everything from a 66-story downtown skyscraper in Bogota to a bridge in Rotterdam, the Netherlands. Several websites such as neighborland.com and neighbor.ly are platforms to raise money for projects ranging from planting fruit trees in San Francisco to building a playground that accommodates disabled children in Parsippany, N.J.

“Community groups are increasingly ready to challenge cities’ plans,” says Bryan Boyer, an independent consultant and adviser to The Finnish Innovation Fund SITRA, a think tank. “We’re all learning to live in the context of a networked society.”

Crowdfund
Crowdfunder, which connects entrepreneurs and investors globally, just launched a local version — CROWDFUNDx.”

What the Obama Campaign's Chief Data Scientist Is Up to Now


Alexis Madrigal in The Atlantic: “By all accounts, Rayid Ghani’s data work for President Obama’s reelection campaign was brilliant and unprecedented. Ghani probably could have written a ticket to work at any company in the world, or simply collected speaking fees for a few years telling companies how to harness the power of data like the campaign did.
But instead, Ghani headed to the University of Chicago to bring sophisticated data analysis to difficult social problems. Working with Computation Institute and the Harris School of Public Policy, Ghani will serve as the chief data scientist for the Urban Center for Computation and Data.”

Feel the force


The Economist: “Three new books look at power in the digital age…
To Save Everything, Click Here: The Folly of Technological Solutionism. By Evgeny Morozov. PublicAffairs; 415 pages; $28.99. Allen Lane; £20.
Who Owns the Future? By Jaron Lanier. Simon and Schuster; 397 pages; $28. Allen Lane; £20.
The New Digital Age: Reshaping the Future of People, Nations and Business. By Eric Schmidt and Jared Cohen. Knopf; 319 pages; $26.95. John Murray; £25.