New paper by Beth Simone Noveck, The GovLab, for Democracy: “In the early 2000s, the Air Force struggled with a problem: Pilots and civilians were dying because of unusual soil and dirt conditions in Afghanistan. The soil was getting into the rotors of the Sikorsky UH-60 helicopters and obscuring the view of its pilots—what the military calls a “brownout.” According to the Air Force’s senior design scientist, the manager tasked with solving the problem didn’t know where to turn quickly to get help. As it turns out, the man practically sitting across from him had nine years of experience flying these Black Hawk helicopters in the field, but the manager had no way of knowing that. Civil service titles such as director and assistant director reveal little about skills or experience.
In the fall of 2008, the Air Force sought to fill in these kinds of knowledge gaps. The Air Force Research Laboratory unveiled Aristotle, a searchable internal directory that integrated people’s credentials and experience from existing personnel systems, public databases, and users themselves, thus making it easy to discover quickly who knew and had done what. Near-term budgetary constraints killed Aristotle in 2013, but the project underscored a glaring need in the bureaucracy.
Aristotle was an attempt to solve a challenge faced by every agency and organization: quickly locating expertise to solve a problem. Prior to Aristotle, the DOD had no coordinated mechanism for identifying expertise across 200,000 of its employees. Dr. Alok Das, the senior scientist for design innovation tasked with implementing the system, explained, “We don’t know what we know.”
This is a common situation. The government currently has no systematic way of getting help from all those with relevant expertise, experience, and passion. For every success on Challenge.gov—the federal government’s platform where agencies post open calls to solve problems for a prize—there are a dozen open-call projects that never get seen by those who might have the insight or experience to help. This kind of crowdsourcing is still too ad hoc, infrequent, and unpredictable—in short, too unreliable—for the purposes of policy-making.
Which is why technologies like Aristotle are so exciting. Smart, searchable expert networks offer the potential to lower the costs and speed up the process of finding relevant expertise. Aristotle never reached this stage, but an ideal expert network is a directory capable of including not just experts within the government, but also outside citizens with specialized knowledge. This leads to a dual benefit: accelerating the path to innovative and effective solutions to hard problems while at the same time fostering greater citizen engagement.
Could such an expert-network platform revitalize the regulatory-review process? We might find out soon enough, thanks to the Food and Drug Administration…”
How Local Governments Can Use Instameets to Promote Citizen Engagement
Chris Shattuck at Arc3Communications: “With more than 200 million active monthly users, Instagram reports that it shares more than 20 million photos every day with a combined average of 1.6 billion likes.
Instagram engagement is also more than 15 times that of Facebook with a user base that is predominately young, female and affluent, according to a recent report by L2, a think tank for digital innovation.
Therefore, it’s no wonder that 92 percent of prestige brands prominently incorporate Instagram into their social media strategies, according to the same report.
However, many local governments have been slow to adopt this rapidly maturing platform, even though many of their constituents are already actively using it.
So how can local governments utilize the power of Instagram to promote citizen engagement that is still organic and social?
Creating Instameets to promote local government events, parks, civic landmarks and institutional buildings may be part of that answer.
Once an Instagram meetup community is created for a city any user can suggest a “meet-up” where members get together at a set place, date and time to snap away at a landmark, festival, or other event of note – preferably with a unique hashtag so that photos can be easily shared.
For example, where other marketing efforts to brand the City of Atlanta failed, #weloveatl has become a popular, organic hashtag that crosses cultural and economic boundaries for photographers looking to share their favorite things about Atlanta and benefit the Atlanta Community Food Bank.
And in May, users were able to combine that energy with a worldwide Instameet campaign to photograph Streets Alive Atlanta, a major initiative by the Atlanta Bicycle Coalition.
This organic collaboration provides a unique example for local governments seeking to promote their cities and use Instameets….”
What Is Big Data?
datascience@berkeley Blog: ““Big Data.” It seems like the phrase is everywhere. The term was added to the Oxford English Dictionary in 2013 , appeared in Merriam-Webster’s Collegiate Dictionary by 2014 , and Gartner’s just-released 2014 Hype Cycle shows “Big Data” passing the “Peak of Inflated Expectations” and on its way down into the “Trough of Disillusionment.” Big Data is all the rage. But what does it actually mean?
A commonly repeated definition cites the three Vs: volume, velocity, and variety. But others argue that it’s not the size of data that counts, but the tools being used, or the insights that can be drawn from a dataset.
To settle the question once and for all, we asked 40+ thought leaders in publishing, fashion, food, automobiles, medicine, marketing and every industry in between how exactly they would define the phrase “Big Data.” Their answers might surprise you! Take a look below to find out what big data is:
- John Akred, Founder and CTO, Silicon Valley Data Science
- Philip Ashlock, Chief Architect of Data.gov
- Jon Bruner, Editor-at-Large, O’Reilly Media
- Reid Bryant, Data Scientist, Brooks Bell
- Mike Cavaretta, Data Scientist and Manager, Ford Motor Company
- Drew Conway, Head of Data, Project Florida
- Rohan Deuskar, CEO and Co-Founder, Stylitics
- Amy Escobar, Data Scientist, 2U
- Josh Ferguson, Chief Technology Officer, Mode Analytics
- John Foreman, Chief Data Scientist, MailChimp
- …
FULL LIST at datascience@berkeley Blog”
In democracy and disaster, emerging world embraces 'open data'
Jeremy Wagstaff’ at Reuters: “Open data’ – the trove of data-sets made publicly available by governments, organizations and businesses – isn’t normally linked to high-wire politics, but just may have saved last month’s Indonesian presidential elections from chaos.
Data is considered open when it’s released for anyone to use and in a format that’s easy for computers to read. The uses are largely commercial, such as the GPS data from U.S.-owned satellites, but data can range from budget numbers and climate and health statistics to bus and rail timetables.
It’s a revolution that’s swept the developed world in recent years as governments and agencies like the World Bank have freed up hundreds of thousands of data-sets for use by anyone who sees a use for them. Data.gov, a U.S. site, lists more than 100,000 data-sets, from food calories to magnetic fields in space.
Consultants McKinsey reckon open data could add up to $3 trillion worth of economic activity a year – from performance ratings that help parents find the best schools to governments saving money by releasing budget data and asking citizens to come up with cost-cutting ideas. All the apps, services and equipment that tap the GPS satellites, for example, generate $96 billion of economic activity each year in the United States alone, according to a 2011 study.
But so far open data has had a limited impact in the developing world, where officials are wary of giving away too much information, and where there’s the issue of just how useful it might be: for most people in emerging countries, property prices and bus schedules aren’t top priorities.
But last month’s election in Indonesia – a contentious face-off between a disgraced general and a furniture-exporter turned reformist – highlighted how powerful open data can be in tandem with a handful of tech-smart programmers, social media savvy and crowdsourcing.
“Open data may well have saved this election,” said Paul Rowland, a Jakarta-based consultant on democracy and governance…”
Out in the Open: This Man Wants to Turn Data Into Free Food (And So Much More)
Klint Finley in Wired: “Let’s say your city releases a list of all trees planted on its public property. It would be a godsend—at least in theory. You could filter the data into a list of all the fruit and nut trees in the city, transfer it into an online database, and create a smartphone app that helps anyone find free food.
In far too many cases, the data just sits there on a computer server, unseen and unused. Sometimes, no one knows about the data, or no one knows what to do with it. Other times, the data is just too hard to work with. If you’re building that free food app, how do you update your database when the government releases a new version of the spreadsheet? And if you let people report corrections to the data, how do you contribute that data back to the city?
These are the sorts of problems that obsess 25-year-old software developer Max Ogden, and they’re the reason he built Dat, a new piece of open source software that seeks to restart the open data revolution. Basically, Dat is a way of synchronizing data between two or more sources, tracking any changes to that data, and handling transformations from one data format to another. The aim is a simple one: Ogden wants to make it easier for governments to share their data with a world of software developers.
That’s just the sort of thing that government agencies are looking for, says Waldo Jaquith, the director of US Open Data Institute, the non-profit that is now hosting Dat…
Git is a piece of software originally written by Linux creator Linus Torvalds. It keeps track of code changes and makes it easier to integrate code submissions from outside developers. Ogden realized what developers needed wasn’t a GitHub for data, but a Git for data. And that’s what Dat is.
Instead of CouchDB, Dat relies on a lightweight, open-source data storage system from Google called LevelDB. The rest of the software was written in JavaScript by Ogden and his growing number of collaborators, which enables them to keep things minimal and easily run the software on multiple operating systems, including Windows, Linux and Macintosh OS X….”
“
Technology’s Crucial Role in the Fight Against Hunger
Crowdsourcing, predictive analytics and other new tools could go far toward finding innovative solutions for America’s food insecurity.
National Geographic recently sent three photographers to explore hunger in the United States. It was an effort to give a face to a very troubling statistic: Even today, one-sixth of Americans do not have enough food to eat. Fifty million people in this country are “food insecure” — having to make daily trade-offs among paying for food, housing or medical care — and 17 million of them skip at least one meal a day to get by. When choosing what to eat, many of these individuals must make choices between lesser quantities of higher-quality food and larger quantities of less-nutritious processed foods, the consumption of which often leads to expensive health problems down the road.
This is an extremely serious, but not easily visible, social problem. Nor does the challenge it poses become any easier when poorly designed public-assistance programs continue to count the sauce on a pizza as a vegetable. The deficiencies caused by hunger increase the likelihood that a child will drop out of school, lowering her lifetime earning potential. In 2010 alone, food insecurity cost America $167.5 billion, a figure that includes lost economic productivity, avoidable health-care expenses and social-services programs.
As much as we need specific policy innovations, if we are to eliminate hunger in America food insecurity is just one of many extraordinarily complex and interdependent “systemic” problems facing us that would benefit from the application of technology, not just to identify innovative solutions but to implement them as well. In addition to laudable policy initiatives by such states as Illinois and Nevada, which have made hunger a priority, or Arkansas, which suffers the greatest level of food insecurity but which is making great strides at providing breakfast to schoolchildren, we can — we must — bring technology to bear to create a sustained conversation between government and citizens to engage more Americans in the fight against hunger.
Identifying who is genuinely in need cannot be done as well by a centralized government bureaucracy — even one with regional offices — as it can through a distributed network of individuals and organizations able to pinpoint with on-the-ground accuracy where the demand is greatest. Just as Ushahidi uses crowdsourcing to help locate and identify disaster victims, it should be possible to leverage the crowd to spot victims of hunger. As it stands, attempts to eradicate so-called food deserts are often built around developing solutions for residents rather than with residents. Strategies to date tend to focus on the introduction of new grocery stores or farmers’ markets but with little input from or involvement of the citizens actually affected.
Applying predictive analytics to newly available sources of public as well as private data, such as that regularly gathered by supermarkets and other vendors, could also make it easier to offer coupons and discounts to those most in need. In addition, analyzing nonprofits’ tax returns, which are legally open and available to all, could help map where the organizations serving those in need leave gaps that need to be closed by other efforts. The Governance Lab recently brought together U.S. Department of Agriculture officials with companies that use USDA data in an effort to focus on strategies supporting a White House initiative to use climate-change and other open data to improve food production.
Such innovative uses of technology, which put citizens at the center of the service-delivery process and streamline the delivery of government support, could also speed the delivery of benefits, thus reducing both costs and, every bit as important, the indignity of applying for assistance.
Being open to new and creative ideas from outside government through brainstorming and crowdsourcing exercises using social media can go beyond simply improving the quality of the services delivered. Some of these ideas, such as those arising from exciting new social-science experiments involving the use of incentives for “nudging” people to change their behaviors, might even lead them to purchase more healthful food.
Further, new kinds of public-private collaborative partnerships could create the means for people to produce their own food. Both new kinds of financing arrangements and new apps for managing the shared use of common real estate could make more community gardens possible. Similarly, with the kind of attention, convening and funding that government can bring to an issue, new neighbor-helping-neighbor programs — where, for example, people take turns shopping and cooking for one another to alleviate time away from work — could be scaled up.
Then, too, advances in citizen engagement and oversight could make it more difficult for lawmakers to cave to the pressures of lobbying groups that push for subsidies for those crops, such as white potatoes and corn, that result in our current large-scale reliance on less-nutritious foods. At the same time, citizen scientists reporting data through an app would be able do a much better job than government inspectors in reporting what is and is not working in local communities.
As a society, we may not yet be able to banish hunger entirely. But if we commit to using new technologies and mechanisms of citizen engagement widely and wisely, we could vastly reduce its power to do harm.
What Cars Did for Today’s World, Data May Do for Tomorrow’s
Quentin Hardy in the New York Times: “New technology products head at us constantly. There’s the latest smartphone, the shiny new app, the hot social network, even the smarter thermostat.
As great (or not) as all these may be, each thing is a small part of a much bigger process that’s rarely admired. They all belong inside a world-changing ecosystem of digital hardware and software, spreading into every area of our lives.
Thinking about what is going on behind the scenes is easier if we consider the automobile, also known as “the machine that changed the world.” Cars succeeded through the widespread construction of highways and gas stations. Those things created a global supply chain of steel plants and refineries. Seemingly unrelated things, including suburbs, fast food and drive-time talk radio, arose in the success.
Today’s dominant industrial ecosystem is relentlessly acquiring and processing digital information. It demands newer and better ways of collecting, shipping, and processing data, much the way cars needed better road building. And it’s spinning out its own unseen businesses.
A few recent developments illustrate the new ecosystem. General Electric plans to announce Monday that it has created a “data lake” method of analyzing sensor information from industrial machinery in places like railroads, airlines, hospitals and utilities. G.E. has been putting sensors on everything it can for a couple of years, and now it is out to read all that information quickly.
The company, working with an outfit called Pivotal, said that in the last three months it has looked at information from 3.4 million miles of flights by 24 airlines using G.E. jet engines. G.E. said it figured out things like possible defects 2,000 times as fast as it could before.
The company has to, since it’s getting so much more data. “In 10 years, 17 billion pieces of equipment will have sensors,” said William Ruh, vice president of G.E. software. “We’re only one-tenth of the way there.”
It hardly matters if Mr. Ruh is off by five billion or so. Billions of humans are already augmenting that number with their own packages of sensors, called smartphones, fitness bands and wearable computers. Almost all of that will get uploaded someplace too.
Shipping that data creates challenges. In June, researchers at the University of California, San Diego announced a method of engineering fiber optic cable that could make digital networks run 10 times faster. The idea is to get more parts of the system working closer to the speed of light, without involving the “slow” processing of electronic semiconductors.
“We’re going from millions of personal computers and billions of smartphones to tens of billions of devices, with and without people, and that is the early phase of all this,” said Larry Smarr, drector of the California Institute for Telecommunications and Information Technology, located inside U.C.S.D. “A gigabit a second was fast in commercial networks, now we’re at 100 gigabits a second. A terabit a second will come and go. A petabit a second will come and go.”
In other words, Mr. Smarr thinks commercial networks will eventually be 10,000 times as fast as today’s best systems. “It will have to grow, if we’re going to continue what has become our primary basis of wealth creation,” he said.
Add computation to collection and transport. Last month, U.C. Berkeley’s AMP Lab, created two years ago for research into new kinds of large-scale computing, spun out a company called Databricks, that uses new kinds of software for fast data analysis on a rental basis. Databricks plugs into the one million-plus computer servers inside the global system of Amazon Web Services, and will soon work inside similar-size megacomputing systems from Google and Microsoft.
It was the second company out of the AMP Lab this year. The first, called Mesosphere, enables a kind of pooling of computing services, building the efficiency of even million-computer systems….”
Monitoring Arms Control Compliance With Web Intelligence
Chris Holden and Maynard Holliday at Commons Lab: “Traditional monitoring of arms control treaties, agreements, and commitments has required the use of National Technical Means (NTM)—large satellites, phased array radars, and other technological solutions. NTM was a good solution when the treaties focused on large items for observation, such as missile silos or nuclear test facilities. As the targets of interest have shrunk by orders of magnitude, the need for other, more ubiquitous, sensor capabilities has increased. The rise in web-based, or cloud-based, analytic capabilities will have a significant influence on the future of arms control monitoring and the role of citizen involvement.
Since 1999, the U.S. Department of State has had at its disposal the Key Verification Assets Fund (V Fund), which was established by Congress. The Fund helps preserve critical verification assets and promotes the development of new technologies that support the verification of and compliance with arms control, nonproliferation, and disarmament requirements.
Sponsored by the V Fund to advance web-based analytic capabilities, Sandia National Laboratories, in collaboration with Recorded Future (RF), synthesized open-source data streams from a wide variety of traditional and nontraditional web sources in multiple languages along with topical texts and articles on national security policy to determine the efficacy of monitoring chemical and biological arms control agreements and compliance. The team used novel technology involving linguistic algorithms to extract temporal signals from unstructured text and organize that unstructured text into a multidimensional structure for analysis. In doing so, the algorithm identifies the underlying associations between entities and events across documents and sources over time. Using this capability, the team analyzed several events that could serve as analogs to treaty noncompliance, technical breakout, or an intentional attack. These events included the H7N9 bird flu outbreak in China, the Shanghai pig die-off and the fungal meningitis outbreak in the United States last year.
For H7N9 we found that open source social media were the first to report the outbreak and give ongoing updates. The Sandia RF system was able to roughly estimate lethality based on temporal hospitalization and fatality reporting. For the Shanghai pig die-off the analysis tracked the rapid assessment by Chinese authorities that H7N9 was not the cause of the pig die-off as had been originally speculated. Open source reporting highlighted a reduced market for pork in China due to the very public dead pig display in Shanghai. Possible downstream health effects were predicted (e.g., contaminated water supply and other overall food ecosystem concerns). In addition, legitimate U.S. food security concerns were raised based on the Chinese purchase of the largest U.S. pork producer (Smithfield) because of a fear of potential import of tainted pork into the United States….
To read the full paper, please click here.”
The infrastructure Africa really needs is better data reporting
Quartz: “This week African leaders met with officials in Washington and agreed to billions of dollars of US investments and infrastructure deals. at But the terrible state of statistical reporting in most of Africa means that it will be nearly impossible to gauge how effective these deals are at making Africans, or the American investors, better off.
Data reporting on the continent is sketchy. Just look at the recent GDP revisions of large countries. How is it that Nigeria’s April GDP recalculation catapulted it ahead of South Africa, making it the largest economy in Africa overnight? Or that Kenya’s economy is actually 20% larger (paywall) than previously thought?
Indeed, countries in Africa get noticeably bad scores on the World Bank’s Bulletin Board on Statistical Capacity, an index of data reporting integrity.
A recent working paper from the Center for Global Development (CGD) shows how politics influence the statistics released by many African countries…
But in the long run, dodgy statistics aren’t good for anyone. They “distort the way we understand the opportunities that are available,” says Amanda Glassman, one of the CGD report’s authors. US firms have pledged $14 billion in trade deals at the summit in Washington. No doubt they would like to know whether high school enrollment promises to create a more educated workforce in a given country, or whether its people have been immunized for viruses.
Overly optimistic indicators also distort how a government decides where to focus its efforts. If school enrollment appears to be high, why implement programs intended to increase it?
The CGD report suggests increased funding to national statistical agencies, and making sure that they are wholly independent from their governments. President Obama is talking up $7 billion into African agriculture. But unless cash and attention are given to improving statistical integrity, he may never know whether that investment has borne fruit”
The Data Act's unexpected benefit
Adam Mazmanian at FCW: “The Digital Accountability and Transparency Act sets an aggressive schedule for creating governmentwide financial standards. The first challenge belongs to the Treasury Department and the Office of Management and Budget. They must come up with a set of common data elements for financial information that will cover just about everything the government spends money on and every entity it pays in order to give oversight bodies and government watchdogs a top-down view of federal spending from appropriation to expenditure. Those data elements are scheduled for completion by May 2015, one year after the act’s passage.
Two years after those standards are in place, agencies will be required to report their financial information following Data Act guidelines. The government currently supports more than 150 financial management systems but lacks a common data dictionary, so there are not necessarily agreed-upon definitions of how to classify and track government programs and types of expenditures.
“As far as systems today and how we can get there, they don’t necessarily map in the way that the act described,” U.S. CIO Steven VanRoekel said in June. “It’s going to be a journey to get to where the act aspires for us to be.”
However, an Obama administration initiative to encourage agencies to share financial services could be part of the solution. In May, OMB and Treasury designated four financial shared-services providers for government agencies: the Agriculture Department’s National Finance Center, the Interior Department’s Interior Business Center, the Transportation Department’s Enterprise Services Center and Treasury’s Administrative Resource Center.
There are some synergies between shared services and data standardization, but shared financial services alone will not guarantee Data Act compliance, especially considering that the government expects the migration to take 10 to 15 years. Nevertheless, the discipline required under the Data Act could boost agency efforts to prepare financial data when it comes time to move to a shared service….”