About Capitol Words: “For every day Congress is in session, Capitol Words visualizes the most frequently used words in the Congressional Record, giving you an at-a-glance view of which issues lawmakers address on a daily, weekly, monthly and yearly basis. Capitol Words lets you see what are the most popular words spoken by lawmakers on the House and Senate floor.
The contents of the Congressional Record are downloaded daily from the website of the Government Printing Office. The GPO distributes the Congressional Record in ZIP files containing the contents of the record in plain-text format.
Each text file is parsed and turned into an XML document, with things like the title and speaker marked up. The contents of each file are then split up into words and phrases — from one word to five.
The resulting data is saved to a search engine. Capitol Words has data from 1996 to the present.”