Wired: “The CIA offers an electronic search engine that lets you mine about 11 million agency documents that have been declassified over the years. It’s called CREST, short for CIA Records Search Tool. But this represents only a portion the CIA’s declassified materials, and if you want unfettered access to the search engine, you’ll have to physically visit the National Archives at College Park, Maryland….
a new project launched by a team of historians, mathematicians, and computer scientists at Columbia University in New York City. Led by Matthew Connelly — a Columbia professor trained in diplomatic history — the project is known as The Declassification Engine, and it seeks to provide a single online database for declassified documents from across the federal government, including the CIA, the State Department, and potentially any other agency.
The project is still in the early stages, but the team has already assembled a database of documents that stretches back to the 1940s, and it has begun building new tools for analyzing these materials. In aggregating all documents into a single database, the researchers hope to not only provide quicker access to declassified materials, but to glean far more information from these documents than we otherwise could.
In the parlance of the day, the project is tackling these documents with the help of Big Data. If you put enough of this declassified information in a single place, Connelly believes, you can begin to predict what government information is still being withheld”