DATA – Page 557 – The Living Library

Twitter Can Now Predict Crime, and This Raises Serious Questions

Curated on April 28, 2014May 29, 2019 by Stefaan Verhulst

Motherboard: “Police departments in New York City may soon be using geo-tagged tweets to predict crime. It sounds like a far-fetched sci-fi scenario a la Minority Report, but when I contacted Dr. Matthew Greber, the University of Virginia researcher behind the technology, he explained that the system is far more mathematical than metaphysical.
The system Greber has devised is an amalgam of both old and new techniques. Currently, many police departments target hot spots for criminal activity based on actual occurrences of crime. This approach, called kernel density estimation (KDE), involves pairing a historical crime record with a geographic location and using a probability function to calculate the possibility of future crimes occurring in that area. While KDE is a serviceable approach to anticipating crime, it pales in comparison to the dynamism of Twitter’s real-time data stream, according to Dr. Gerber’s research paper “Predicting Crime Using Twitter and Kernel Density Estimation”.
Dr. Greber’s approach is similar to KDE, but deals in the ethereal realm of data and language, not paperwork. The system involves mapping the Twitter environment; much like how police currently map the physical environment with KDE. The big difference is that Greber is looking at what people are talking about in real time, as well as what they do after the fact, and seeing how well they match up. The algorithms look for certain language that is likely to indicate the imminent occurrence of a crime in the area, Greber says. “We might observe people talking about going out, getting drunk, going to bars, sporting events, and so on—we know that these sort of events correlate with crime, and that’s what the models are picking up on.”
Once this data is collected, the GPS tags in tweets allows Greber and his team to pin them to a virtual map and outline hot spots for potential crime. However, everyone who tweets about hitting the club later isn’t necessarily going to commit a crime. Greber tests the accuracy of his approach by comparing Twitter-based KDE predictions with traditional KDE predictions based on police data alone. The big question is, does it work? For Greber, the answer is a firm “sometimes.” “It helps for some, and it hurts for others,” he says.
According to the study’s results, Twitter-based KDE analysis yielded improvements in predictive accuracy over traditional KDE for stalking, criminal damage, and gambling. Arson, kidnapping, and intimidation, on the other hand, showed a decrease in accuracy from traditional KDE analysis. It’s not clear why these crimes are harder to predict using Twitter, but the study notes that the issue may lie with the kind of language used on Twitter, which is characterized by shorthand and informal language that can be difficult for algorithms to parse.
This kind of approach to high-tech crime prevention brings up the familiar debate over privacy and the use of users’ date for purposes they didn’t explicitly agree to. The case becomes especially sensitive when data will be used by police to track down criminals. On this point, though he acknowledges post-Snowden societal skepticism regarding data harvesting for state purposes, Greber is indifferent. “People sign up to have their tweets GPS tagged. It’s an opt-in thing, and if you don’t do it, your tweets won’t be collected in this way,” he says. “Twitter is a public service, and I think people are pretty aware of that.”…

LEG/EX – Legislative Explorer

Curated on April 26, 2014October 31, 2018 by Stefaan Verhulst

LEG/EX: Legislative Explorer:Data Driven Discovery: “A one of a kind interactive visualization that allows anyone to explore actual patterns of lawmaking in Congress.

Get the ‘big picture’

Compare the bills and resolutions introduced by Senators and Representatives and follow their progress from the beginning to the end of a two year Congress.

Dive deeper

Filter by topic, type of legislation, chamber, party, member, or even search for a specific bill.

Want to learn more about the legislative process?

The Legislative Process from the Library of Congress
Who’s your Representative or Senator?
LegSim — a student run simulation for government courses”

New York Police Twitter Strategy Has Unforeseen Consequences

Curated on April 24, 2014August 3, 2018 by Stefaan Verhulst

J. David Goodman in The New York Times: “The New York Police Department has long seen its crime-fighting strategies emulated across the country and around the world.

So when a departmental Twitter campaign, meant to elicit smiling snapshots, instead attracted tens of thousands of less flattering images of officers, it did not take long for the hashtag #myNYPD to spread far beyond the five boroughs.

By Wednesday, the public relations situation in New York City had sparked imitators from Los Angeles (#myLAPD) to Mexico (#MiPolicíaMexicana) and over the ocean to Greece (#myELAS), Germany (#DankePolizei) and France (#maPolice).

The images, including circles of police officers, in riot gear poised to strike a man on a bench, or hosing down protesters, closely resembled those posted on Tuesday by critics of the Police Department in New York, in which many of the most infamous moments in recent police history had been dredged up by Twitter users….”

Passage Of The DATA Act Is A Major Advance In Government Transparency

Curated on April 21, 2014August 3, 2018 by Stefaan Verhulst

OpEd by Hudson Hollister in Forbes: “Even as the debate over official secrecy grows on Capitol Hill, basic information about our government’s spending remains hidden in plain sight.
Information that is technically public — federal finance, awards, and expenditures — is effectively locked within a disconnected disclosure system that relies on outdated paper-based technology. Budgets, grants, contracts, and disbursements are reported manually and separately, using forms and spreadsheets. Researchers seeking insights into federal spending must invest time and resources crafting data sets out of these documents. Without common data standards across all government spending, analyses of cross-agency spending trends require endless conversions of apples to oranges.
For a nation whose tech industry leads the world, there is no reason to allow this antiquated system to persist.
That’s why we’re excited to welcome Thursday’s unanimous Senate approval of the Digital Accountability and Transparency Act — known as the DATA Act.
The DATA Act will mandate government-wide standards for federal spending data. It will also require agencies to publish this information online, fully searchable and open to everyone.
Watchdogs and transparency advocates from across the political spectrum have endorsed the DATA Act because all Americans will benefit from clear, accessible information about how their tax dollars are being spent.
It is darkly appropriate that the only organized opposition to this bill took place behind closed doors. In January, Senate sponsors Mark Warner (D-VA) and Rob Portman (R-OH) rejected amendments offered privately by the White House Office of Management and Budget. These nonpublic proposals would have gutted the DATA Act’s key data standards requirement. But Warner and Portman went public with their opposition, and Republicans and Democrats agreed to keep a strong standards mandate.
We now await swift action by the House of Representatives to pass this bill and put it on the President’s desk.
The tech industry is already delivering the technology and expertise that will use federal spending data, once it is open and standardized, to solve problems.
If the DATA Act is fully enforced, citizens will be able to track government spending on a particular contractor or from a particular program, payment by payment. Agencies will be able to deploy sophisticated Big Data analytics to illuminate, and eliminate, waste and fraud. And states and universities will be able to automate their complex federal grant reporting tasks, freeing up more tax dollars for their intended use. Our industry can perform these tasks — as soon as we get the data.
Chairman Earl Devaney’s Recovery Accountability and Transparency Board proved this is possible. Starting in 2009, the Recovery Board applied data standards to track stimulus spending. Our members’ software used that data to help inspectors general prevent and recover over $100 million in spending on suspicious grantees and contractors. The DATA Act applies that approach across the whole of government spending.
Congress is now poised to pass this landmark legislative mandate to transform spending from disconnected documents into open data. Next , the executive branch must implement that mandate.
So our Coalition’s work continues. We will press the Treasury Department and the White House to adopt robust, durable, and nonproprietary data standards for federal spending.
And we won’t stop with spending transparency. The American people deserve access to open data across all areas of government activity — financial regulatory reporting, legislative actions, judicial filings, and much more….”

How Can the Department of Education Increase Innovation, Transparency and Access to Data?

Curated on April 21, 2014August 15, 2018 by Stefaan Verhulst

David Soo at the Department of Education: “Despite the growing amount of information about higher education, many students and families still need access to clear, helpful resources to make informed decisions about going to – and paying for – college. President Obama has called for innovation in college access, including by making sure all students have easy-to-understand information.
Now, the U.S. Department of Education needs your input on specific ways that we can increase innovation, transparency, and access to data. In particular, we are interested in how APIs (application programming interfaces) could make our data and processes more open and efficient.
APIs are set of software instructions and standards that allow machine-to-machine communication. APIs could allow developers from inside and outside government to build apps, widgets, websites, and other tools based on government information and services to let consumers access government-owned data and participate in government-run processes from more places on the Web, even beyond .gov websites. Well-designed government APIs help make data and processes freely available for use within agencies, between agencies, in the private sector, or by citizens, including students and families.
So, today, we are asking you – student advocates, designers, developers, and others – to share your ideas on how APIs could spark innovation and enable processes that can serve students better. We need you to weigh in on a Request for Information (RFI) – a formal way the government asks for feedback – on how the Department could use APIs to increase access to higher education data or financial aid programs. There may be ways that Department forms – like the Free Application for Federal Student Aid (FAFSA) – or information-gathering processes could be made easier for students by incorporating the use of APIs. We invite the best and most creative thinking on specific ways that Department of Education APIs could be used to improve outcomes for students.
To weigh in, you can email [email protected] by June 2, or send your input via other addresses as detailed in the online notice.
The Department wants to make sure to do this right. It must ensure the security and privacy of the data it collects or maintains, especially when the information of students and families is involved. Openness only works if privacy and security issues are fully considered and addressed. We encourage the field to provide comments that identify concerns and offer suggestions on ways to ensure privacy, safeguard student information, and maintain access to federal resources at no cost to the student.
Through this request, we hope to gather ideas on how APIs could be used to fuel greater innovation and, ultimately, affordability in higher education. For further information, see the Federal Register notice.”

The Open Data 500: Putting Research Into Action

Curated on April 10, 2014August 3, 2018 by Stefaan Verhulst

TheGovLab Blog: “On April 8, the GovLab made two significant announcements. At an open data event in Washington, DC, I was pleased to announce the official launch of the Open Data 500, our study of 500 companies that use open government data as a key business resource. We also announced that the GovLab is now planning a series of Open Data Roundtables to bring together government agencies with the businesses that use their data – and that five federal agencies have agreed to participate. Video of the event, which was hosted by the Center for Data Innovation, is available here.
The Open Data 500, funded by the John S. and James L. Knight Foundation, is the first comprehensive study of U.S.-based companies that rely on open government data. Our website at OpenData500.com includes searchable, sortable information on 500 of these companies. Our data about them comes from responses to a survey we’ve sent to all the companies (190 have responded) and what we’ve been able to learn from research using public information. Anyone can now explore this website, read about specific companies or groups of companies, or download our data to analyze it. The website features an interactive tool on the home page, the Open Data Compass, that shows the connections between government agencies and different categories of companies visually.
We began work on the Open Data 500 study last fall with three goals. First, we wanted to collect information that will ultimately help calculate the economic value of open data – an important question for policymakers and others. Second, we wanted to present examples of open data companies to inspire others to use this important government resource in new ways. And third – and perhaps most important – we’ve hoped that our work will be a first step in creating a dialogue between the government agencies that provide open data and the companies that use it.
That dialogue is critically important to make government open data more accessible and useful. While open government data is a huge potential resource, and federal agencies are working to make it more available, it’s too often trapped in legacy systems that make the data difficult to find and to use. To solve this problem, we plan to connect agencies to their clients in the business community and help them work together to find and liberate the most valuable datasets.
We now plan to convene and facilitate a series of Open Data Roundtables – a new approach to bringing businesses and government agencies together. In these Roundtables, which will be informed by the Open Data 500 study, companies and the agencies that provide their data will come together in structured, results-oriented meetings that we will facilitate. We hope to help figure out what can be done to make the most valuable datasets more available and usable quickly.
We’ve been gratified by the immediate positive response to our plan from several federal agencies. The Department of Commerce has committed to help plan and participate in the first of our Roundtables, now being scheduled for May. By the time we announced our launch on April 8, the Departments of Labor, Transportation, and Treasury had also signed up. And at the end of the launch event, the Deputy Chief Information Officer of the USDA publicly committed her agency to participate as well…”

Historic release of data delivers unprecedented transparency on the medical services physicians provide and how much they are paid

Curated on April 9, 2014August 15, 2018 by Stefaan Verhulst

Jonathan Blum, Principal Deputy Administrator, Centers for Medicare & Medicaid Services : “Today the Centers for Medicare & Medicaid Services (CMS) took a major step forward in making Medicare data more transparent and accessible, while maintaining the privacy of beneficiaries, by announcing the release of new data on medical services and procedures furnished to Medicare fee-for-service beneficiaries by physicians and other healthcare professionals (http://www.cms.gov/newsroom/newsroom-center.html). For too long, the only information on physicians readily available to consumers was physician name, address and phone number. This data will, for the first time, provide a better picture of how physicians practice in the Medicare program.
This new data set includes over nine million rows of data on more than 880,000 physicians and other healthcare professionals in all 50 states, DC and Puerto Rico providing care to Medicare beneficiaries in 2012. The data set presents key information on the provision of services by physicians and how much they are paid for those services, and is organized by provider (National Provider Identifier or NPI), type of service (Healthcare Common Procedure Coding System, or HCPCS) code, and whether the service was performed in a facility or office setting. This public data set includes the number of services, average submitted charges, average allowed amount, average Medicare payment, and a count of unique beneficiaries treated. CMS takes beneficiary privacy very seriously and we will protect patient-identifiable information by redacting any data in cases where it includes fewer than 11 beneficiaries.
Previously, CMS could not release this information due to a permanent injunction issued by a court in 1979. However, in May 2013, the court vacated this injunction, causing a series of events that has led CMS to be able to make this information available for the first time.
Data to Fuel Research and Innovation
In addition to the public data release, CMS is making slight modifications to the process to request CMS data for research purposes. This will allow researchers to conduct important research at the physician level. As with the public release of information described above, CMS will continue to prohibit the release of patient-identifiable information. For more information about CMS’s disclosures to researchers, please contact the Research Data Assistance Center (ResDAC) at http://www.resdac.org/.
Unprecedented Data Access
This data release follows other CMS efforts to make more data available to the public. Since 2010, the agency has released an unprecedented amount of aggregated data in machine-readable form, with much of it available at http://www.healthdata.gov. These data range from previously unpublished statistics on Medicare spending, utilization, and quality at the state, hospital referral region, and county level, to detailed information on the quality performance of hospitals, nursing homes, and other providers.
In May 2013, CMS released information on the average charges for the 100 most common inpatient services at more than 3,000 hospitals nationwide http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Inpatient.html.
In June 2013, CMS released average charges for 30 selected outpatient procedures http://www.cms.gov/Research-Statistics-Data-and-Systems/Statistics-Trends-and-Reports/Medicare-Provider-Charge-Data/Outpatient.html.
We will continue to work toward harnessing the power of data to promote quality and value, and improve the health of our seniors and persons with disabilities.”