Government opens up: 10k active government users on GitHub


GitHub: “In the summer of 2009, The New York Senate was the first government organization to post code to GitHub, and that fall, Washington DC quickly followed suit. By 2011, cities like Miami, Chicago, and New York; Australian, Canadian, and British government initiatives like Gov.uk; and US Federal agencies like the Federal Communications Commission, General Services Administration, NASA, and Consumer Financial Protection Bureau were all coding in the open as they began to reimagine government for the 21st century.
Fast forward to just last year: The White House Open Data Policy is published as a collaborative, living document, San Francisco laws are now forkable, and government agencies are accepting pull requests from every day developers.
This is all part of a larger trend towards government adopting open source practices and workflows — a trend that spans not only software, but data, and policy as well — and the movement shows no signs of slowing, with government usage on GitHub nearly tripling in the past year, to exceed 10,000 active government users today.

How government uses GitHub

When government works in the open, it acknowledges the idea that government is the world’s largest and longest-running open source project. Open data efforts, efforts like the City of Philadelphia’s open flu shot spec, release machine-readable data in open, immediately consumable formats, inviting feedback (and corrections) from the general public, and fundamentally exposing who made what change when, a necessary check on democracy.
Unlike the private sector, however, where open sourcing the “secret sauce” may hurt the bottom line, with government, we’re all on the same team. With the exception of say, football, Illinois and Wisconsin don’t compete with one another, nor are the types of challenges they face unique. Shared code prevents reinventing the wheel and helps taxpayer dollars go further, with efforts like the White House’s recently released Digital Services Playbook, an effort which invites every day citizens to play a role in making government better, one commit at a time.
However, not all government code is open source. We see that adopting these open source workflows for open collaboration within an agency (or with outside contractors) similarly breaks down bureaucratic walls, and gives like-minded teams the opportunity to work together on common challenges.

Government Today

It’s hard to believe that what started with a single repository just five years ago, has blossomed into a movement where today, more than 10,000 government employees use GitHub to collaborate on code, data, and policy each day….
You can learn more about GitHub in government at government.github.com, and if you’re a government employee, be sure to join our semi-private peer group to learn best practices for collaborating on software, data, and policy in the open.”

Opening Health Data: What Do Researchers Want? Early Experiences With New York's Open Health Data Platform.


Paper by Martin, Erika G. PhD, MPH; Helbig, Natalie PhD, MPA; and Birkhead, Guthrie S. MD, MPH in the Journal of Public Health Management & Practice: “Governments are rapidly developing open data platforms to improve transparency and make information more accessible. New York is a leader, with currently the only state platform devoted to health. Although these platforms could build public health departments’ capabilities to serve more researchers, agencies have little guidance on releasing meaningful and usable data.

Objective: Structured focus groups with researchers and practitioners collected stakeholder feedback on potential uses of open health data and New York’s open data strategy….

Results: There was low awareness of open data, with 67% of researchers reporting never using open data portals prior to the workshop. Participants were interested in data sets that were geocoded, longitudinal, or aggregated to small area granularity and capabilities to link multiple data sets. Multiple environmental conditions and barriers hinder their capacity to use health data for research. Although open data platforms cannot address all barriers, they provide multiple opportunities for public health research and practice, and participants were overall positive about the state’s efforts to release open data.

Conclusions: Open data are not ideal for some researchers because they do not contain individually identifiable data, indicating a need for tiered data release strategies. However, they do provide important new opportunities to facilitate research and foster collaborations among agencies, researchers, and practitioners.”

Can big data help build more wind and solar farms?


Rachael Post in The Guardian: “Convincing customers to switch to renewable energy is an uphill battle. But for a former political operative, finding business is as easy as mining a consumer behavior database…After his father died from cancer related to pollution from a coal-burning plant, Tom Matzzie, the former director of democratic activist group MoveOn.org, decided that he’d had enough with traditional dirty energy. But when he installed solar panels on his home, he discovered that the complicated permitting and construction process made switching to renewable energy difficult and unwieldy. The solution, he concluded, was to use his online campaigning and big data skills – honed from his years of working in politics – to find the most likely customers for renewables and convince them to switch. Ethical Electric was born.
Matzzie’s company isn’t the first to sell renewable energy, but it might be the smartest. For the most part, convincing people to switch away from dirty energy is an unprofitable and work-intensive process, requiring electrical company representatives to approach thousands of randomly chosen customers. Ethical Electric, on the other hand, uses a highly-targeted, strategic method to identify its potential customers.
From finding votes to finding customers
Matzzie, who is now CEO of Ethical Electric, explained that the secret lies in his company’s use of big data, a resource that he and his partners mastered on the political front lines. In the last few presidential elections, big data fundamentally changed the way candidates – and their teams – approached voters. “We couldn’t rely on voter registration lists to make assumptions about who would be willing to vote in the next election,” Matzzie said. “What happened in politics is a real revolution in data.”…”

City Service Development Kit (CitySDK)


What is CitySDK?: “Helping cities to open their data and giving developers the tools they need, the CitySDK aims for a step change in how to deliver services in urban environments. With governments around the world looking at open data as a kick start for their economies, CitySDK provides better and easier ways for the cities throughout the Europe to release their data in a format that is easy for the developers to re-use.
Taking the best practices around the world the project will foresee the development of a toolkit – CitySDK v1.0 – that can be used by any city looking to create a sustainable infrastructure of “city apps”.
Piloting the CitySDK
The Project focuses on three Pilot domains: Smart Participation, Smart Mobility and Smart Tourism. Within each of the three domains, a large-scale Lead Pilot is carried out in one city. The experiences of the Lead Pilot will be applied in the Replication Pilots in other Partner cities.
Funding
CitySDK is a 6.8 million Euro project, part funded by the European Commission. It is a Pilot Type B within the ICT Policy Support Programme of the Competitiveness and Framework Programme. It runs from January 2012-October 2014.”

As Data Overflows Online, Researchers Grapple With Ethics


at The New York Times: “Scholars are exhilarated by the prospect of tapping into the vast troves of personal data collected by Facebook, Google, Amazon and a host of start-ups, which they say could transform social science research.

Once forced to conduct painstaking personal interviews with subjects, scientists can now sit at a screen and instantly play with the digital experiences of millions of Internet users. It is the frontier of social science — experiments on people who may never even know they are subjects of study, let alone explicitly consent.

“This is a new era,” said Jeffrey T. Hancock, a Cornell University professor of communication and information science. “I liken it a little bit to when chemistry got the microscope.”

But the new era has brought some controversy with it. Professor Hancock was a co-author of the Facebook study in which the social network quietly manipulated the news feeds of nearly 700,000 people to learn how the changes affected their emotions. When the research was published in June, the outrage was immediate…

Such testing raises fundamental questions. What types of experiments are so intrusive that they need prior consent or prompt disclosure after the fact? How do companies make sure that customers have a clear understanding of how their personal information might be used? Who even decides what the rules should be?

Existing federal rules governing research on human subjects, intended for medical research, generally require consent from those studied unless the potential for harm is minimal. But many social science scholars say the federal rules never contemplated large-scale research on Internet users and provide inadequate guidance for it.

For Internet projects conducted by university researchers, institutional review boards can be helpful in vetting projects. However, corporate researchers like those at Facebook don’t face such formal reviews.

Sinan Aral, a professor at the Massachusetts Institute of Technology’s Sloan School of Management who has conducted large-scale social experiments with several tech companies, said any new rules must be carefully formulated.

“We need to understand how to think about these rules without chilling the research that has the promise of moving us miles and miles ahead of where we are today in understanding human populations,” he said. Professor Aral is planning a panel discussion on ethics at a M.I.T. conference on digital experimentation in October. (The professor also does some data analysis for The New York Times Company.)

Mary L. Gray, a senior researcher at Microsoft Research and associate professor at Indiana University’s Media School, who has worked extensively on ethics in social science, said that too often, researchers conducting digital experiments work in isolation with little outside guidance.

She and others at Microsoft Research spent the last two years setting up an ethics advisory committee and training program for researchers in the company’s labs who are working with human subjects. She is now working with Professor Hancock to bring such thinking to the broader research world.

“If everyone knew the right thing to do, we would never have anyone hurt,” she said. “We really don’t have a place where we can have these conversations.”…

Knowledge is Beautiful


New book by David McCandless: “In this mind-blowing follow-up to the bestselling Information is Beautiful, the undisputed king of infographics David McCandless uses stunning and unique visuals to reveal unexpected insights into how the world really works. Every minute of every hour of every day we are bombarded with information – be it on television, in print or online. How can we relate to this mind-numbing overload? Enter David McCandless and his amazing infographics: simple, elegant ways to understand information too complex or abstract to grasp any way but visually. McCandless creates dazzling displays that blend the facts with their connections, contexts and relationships, making information meaningful, entertaining – and beautiful. Knowledge is Beautiful is an endlessly fascinating spin through the world of visualized data, all of it bearing the hallmark of David McCandless’s ground-breaking signature style. Taking infographics to the next level, Knowledge is Beautiful offers a deeper, more wide-ranging look at the world and its history. Covering everything from dog breeds and movie plots to the most commonly used passwords and crazy global warming solutions, Knowledge is Beautiful is guaranteed to enrich your understanding of the world.”

The city as living labortory: A playground for the innovative development of smart city applications


Paper by Veeckman, Carina and van der Graaf, Shenja: “Nowadays the smart-city concept is shifting from a top-down, mere technological approach towards bottom-up processes that are based on the participation of creative citizens, research organisations and companies. Here, the city acts as an urban innovation ecosystem in which smart applications, open government data and new modes of participation are fostering innovation in the city. However, detailed analyses on how to manage smart city initiatives as well as descriptions of underlying challenges and barriers seem still scarce. Therefore, this paper investigates four, collaborative smart city initiatives in Europe to learn how cities can optimize the citizen’s involvement in the context of open innovation. The analytical framework focuses on the innovation ecosystem and the civic capacities to engage in the public domain. Findings show that public service delivery can be co-designed between the city and citizens, if different toolkits aligned with the specific capacities and skills of the users are provided. By providing the right tools, even ordinary citizens can take a much more active role in the evolution of their cities and generate solutions from which both the city and everyday urban life can possibly benefit.”

Reality Mining: Using Big Data to Engineer a Better World


New book by Nathan Eagle and Kate Greene : “Big Data is made up of lots of little data: numbers entered into cell phones, addresses entered into GPS devices, visits to websites, online purchases, ATM transactions, and any other activity that leaves a digital trail. Although the abuse of Big Data—surveillance, spying, hacking—has made headlines, it shouldn’t overshadow the abundant positive applications of Big Data. In Reality Mining, Nathan Eagle and Kate Greene cut through the hype and the headlines to explore the positive potential of Big Data, showing the ways in which the analysis of Big Data (“Reality Mining”) can be used to improve human systems as varied as political polling and disease tracking, while considering user privacy.

Eagle, a recognized expert in the field, and Greene, an experienced technology journalist, describe Reality Mining at five different levels: the individual, the neighborhood and organization, the city, the nation, and the world. For each level, they first offer a nontechnical explanation of data collection methods and then describe applications and systems that have been or could be built. These include a mobile app that helps smokers quit smoking; a workplace “knowledge system”; the use of GPS, Wi-Fi, and mobile phone data to manage and predict traffic flows; and the analysis of social media to track the spread of disease. Eagle and Greene argue that Big Data, used respectfully and responsibly, can help people live better, healthier, and happier lives.”

Monitoring Arms Control Compliance With Web Intelligence


Chris Holden and Maynard Holliday at Commons Lab: “Traditional monitoring of arms control treaties, agreements, and commitments has required the use of National Technical Means (NTM)—large satellites, phased array radars, and other technological solutions. NTM was a good solution when the treaties focused on large items for observation, such as missile silos or nuclear test facilities. As the targets of interest have shrunk by orders of magnitude, the need for other, more ubiquitous, sensor capabilities has increased. The rise in web-based, or cloud-based, analytic capabilities will have a significant influence on the future of arms control monitoring and the role of citizen involvement.
Since 1999, the U.S. Department of State has had at its disposal the Key Verification Assets Fund (V Fund), which was established by Congress. The Fund helps preserve critical verification assets and promotes the development of new technologies that support the verification of and compliance with arms control, nonproliferation, and disarmament requirements.
Sponsored by the V Fund to advance web-based analytic capabilities, Sandia National Laboratories, in collaboration with Recorded Future (RF), synthesized open-source data streams from a wide variety of traditional and nontraditional web sources in multiple languages along with topical texts and articles on national security policy to determine the efficacy of monitoring chemical and biological arms control agreements and compliance. The team used novel technology involving linguistic algorithms to extract temporal signals from unstructured text and organize that unstructured text into a multidimensional structure for analysis. In doing so, the algorithm identifies the underlying associations between entities and events across documents and sources over time. Using this capability, the team analyzed several events that could serve as analogs to treaty noncompliance, technical breakout, or an intentional attack. These events included the H7N9 bird flu outbreak in China, the Shanghai pig die-off and the fungal meningitis outbreak in the United States last year.
h7n9-for-blog
 
For H7N9 we found that open source social media were the first to report the outbreak and give ongoing updates.  The Sandia RF system was able to roughly estimate lethality based on temporal hospitalization and fatality reporting.  For the Shanghai pig die-off the analysis tracked the rapid assessment by Chinese authorities that H7N9 was not the cause of the pig die-off as had been originally speculated. Open source reporting highlighted a reduced market for pork in China due to the very public dead pig display in Shanghai. Possible downstream health effects were predicted (e.g., contaminated water supply and other overall food ecosystem concerns). In addition, legitimate U.S. food security concerns were raised based on the Chinese purchase of the largest U.S. pork producer (Smithfield) because of a fear of potential import of tainted pork into the United States….
To read the full paper, please click here.”

EU-funded tool to help our brain deal with big data


EU Press Release: “Every single minute, the world generates 1.7 million billion bytes of data, equal to 360,000 DVDs. How can our brain deal with increasingly big and complex datasets? EU researchers are developing an interactive system which not only presents data the way you like it, but also changes the presentation constantly in order to prevent brain overload. The project could enable students to study more efficiently or journalists to cross check sources more quickly. Several museums in Germany, the Netherlands, the UK and the United States have already showed interest in the new technology.

Data is everywhere: it can either be created by people or generated by machines, such as sensors gathering climate information, satellite imagery, digital pictures and videos, purchase transaction records, GPS signals, etc. This information is a real gold mine. But it is also challenging: today’s datasets are so huge and complex to process that they require new ideas, tools and infrastructures.

Researchers within CEEDs (@ceedsproject) are transposing big data into an interactive environment to allow the human mind to generate new ideas more efficiently. They have built what they are calling an eXperience Induction Machine (XIM) that uses virtual reality to enable a user to ‘step inside’ large datasets. This immersive multi-modal environment – located at Pompeu Fabra University in Barcelona – also contains a panoply of sensors which allows the system to present the information in the right way to the user, constantly tailored according to their reactions as they examine the data. These reactions – such as gestures, eye movements or heart rate – are monitored by the system and used to adapt the way in which the data is presented.

Jonathan Freeman,Professor of Psychology at Goldsmiths, University of London and coordinator of CEEDs, explains: The system acknowledges when participants are getting fatigued or overloaded with information.  And it adapts accordingly. It either simplifies the visualisations so as to reduce the cognitive load, thus keeping the user less stressed and more able to focus.  Or it will guide the person to areas of the data representation that are not as heavy in information.

Neuroscientists were the first group the CEEDs researchers tried their machine on (BrainX3). It took the typically huge datasets generated in this scientific discipline and animated them with visual and sound displays. By providing subliminal clues, such as flashing arrows, the machine guided the neuroscientists to areas of the data that were potentially more interesting to each person. First pilots have already demonstrated the power of this approach in gaining new insights into the organisation of the brain….”