The Web Observatory: A Middle Layer for Broad Data


New paper by Tiropanis Thanassis, Hall Wendy, Hendler James, and de Larrinaga Christian in Big Data: “The Web Observatory project1 is a global effort that is being led by the Web Science Trust,2 its network of WSTnet laboratories, and the wider Web Science community. The goal of this project is to create a global distributed infrastructure that will foster communities exchanging and using each other’s web-related datasets as well as sharing analytic applications for research and business web applications.3 It will provide the means to observe the digital planet, explore its processes, and understand their impact on different sectors of human activity.
The project is creating a network of separate web observatories, collections of datasets and tools for analyzing data about the Web and its use, each with their own use community. This allows researchers across the world to develop and share data, analytic approaches, publications related to their datasets, and tools (Fig. 1). The network of web observatories aims to bridge the gap that currently exists between big data analytics and the rapidly growing web of “broad data,”4 making it difficult for a large number of people to engage with them….”

New Data for a New Energy Future


(This post originally appeared on the blog of the U.S. Chamber of Commerce Foundation.)

Two growing concerns—climate change and U.S. energy self-sufficiency—have accelerated the search for affordable, sustainable approaches to energy production and use. In this area, as in many others, data-driven innovation is a key to progress. Data scientists are working to help improve energy efficiency and make new forms of energy more economically viable, and are building new, profitable businesses in the process.
In the same way that government data has been used by other kinds of new businesses, the Department of Energy is releasing data that can help energy innovators. At a recent “Energy Datapalooza” held by the department, John Podesta, counselor to the President, summed up the rationale: “Just as climate data will be central to helping communities prepare for climate change, energy data can help us reduce the harmful emissions that are driving climate change.” With electric power accounting for one-third of greenhouse gas emissions in the United States, the opportunities for improvement are great.
The GovLab has been studying the business applications of public government data, or “open data,” for the past year. The resulting study, the Open Data 500, now provides structured, searchable information on more than 500 companies that use open government data as a key business driver. A review of those results shows four major areas where open data is creating new business opportunities in energy and is likely to build many more in the near future.

Commercial building efficiency
Commercial buildings are major energy consumers, and energy costs are a significant business expense. Despite programs like LEED Certification, many commercial buildings waste large amounts of energy. Now a company called FirstFuel, based in Boston, is using open data to drive energy efficiency in these buildings. At the Energy Datapalooza, Swap Shah, the company’s CEO, described how analyzing energy data together with geospatial, weather, and other open data can give a very accurate view of a building’s energy consumption and ways to reduce it. (Sometimes the solution is startlingly simple: According to Shah, the largest source of waste is running heating and cooling systems at the same time.) Other companies are taking on the same kind of task – like Lucid, which provides an operating system that can track a building’s energy use in an integrated way.

Home energy use
A number of companies are finding data-driven solutions for homeowners who want to save money by reducing their energy usage. A key to success is putting together measurements of energy use in the home with public data on energy efficiency solutions. PlotWatt, for example, promises to help consumers “save money with real-time energy tracking” through the data it provides. One of the best-known companies in this area, Opower, uses a psychological strategy: it simultaneously gives people access to their own energy data and lets them compare their energy use to their neighbors’ as an incentive to save. Opower partners with utilities to provide this information, and the Virginia-based company has been successful enough to open offices in San Francisco, London, and Singapore. Soon more and more people will have access to data on their home energy use: Green Button, a government-promoted program implemented by utilities, now gives about 100 million Americans data about their energy consumption.

Solar power and renewable energy
As solar power becomes more efficient and affordable, a number of companies are emerging to support this energy technology. Clean Power Finance, for example, uses its database to connect solar entrepreneurs with sources of capital. In a different way, a company called Solar Census is analyzing publicly available data to find exactly where solar power can be produced most efficiently. The kind of analysis that used to require an on-site survey over several days can now be done in less than a minute with their algorithms.
Other kinds of geospatial and weather data can support other forms of renewable energy. The data will make it easier to find good sites for wind power stations, water sources for small-scale hydroelectric projects, and the best opportunities to tap geothermal energy.

Supporting new energy-efficient vehicles
The Tesla and other electric vehicles are becoming commercially viable, and we will soon see even more efficient vehicles on the road. Toyota has announced that its first fuel-cell cars, which run on hydrogen, will be commercially available by mid-2015, and other auto manufacturers have announced plans to develop fuel-cell vehicles as well. But these vehicles can’t operate without a network to supply power, be it electricity for a Tesla battery or hydrogen for a fuel cell.
It’s a chicken-and-egg problem: People won’t buy large numbers of electric or fuel-cell cars unless they know they can power them, and power stations will be scarce until there are enough vehicles to support their business. Now some new companies are facilitating this transition by giving drivers data-driven tools to find and use the power sources they need. Recargo, for example, provides tools to help electric car owners find charging stations and operate their vehicles.
The development of new energy sources will involve solving social, political, economic, and technological issues. Data science can help develop solutions and bring us more quickly to a new kind of energy future.
Joel Gurin, senior advisor at the GovLab and project director, Open Data 500. He also currently serves as a fellow of the U.S. Chamber of Commerce Foundation.

Codifying Collegiality: Recent Developments in Data Sharing Policy in the Life Sciences


New paper by Genevieve Pham-Kanter et al in PLoS ONE: “Over the last decade, there have been significant changes in data sharing policies and in the data sharing environment faced by life science researchers. Using data from a 2013 survey of over 1600 life science researchers, we analyze the effects of sharing policies of funding agencies and journals. We also examine the effects of new sharing infrastructure and tools (i.e., third party repositories and online supplements). We find that recently enacted data sharing policies and new sharing infrastructure and tools have had a sizable effect on encouraging data sharing. In particular, third party repositories and online supplements as well as data sharing requirements of funding agencies, particularly the NIH and the National Human Genome Research Institute, were perceived by scientists to have had a large effect on facilitating data sharing. In addition, we found a high degree of compliance with these new policies, although noncompliance resulted in few formal or informal sanctions. Despite the overall effectiveness of data sharing policies, some significant gaps remain: about one third of grant reviewers placed no weight on data sharing plans in their reviews, and a similar percentage ignored the requirements of material transfer agreements. These patterns suggest that although most of these new policies have been effective, there is still room for policy improvement.”

The Glass Cage: Automation and Us


New Book by Nicholas Carr: “What kind of world are we building for ourselves? That’s the question bestselling author Nicholas Carr tackles in this urgent, absorbing book on the human consequences of automation. At once a celebration of technology and a warning about its misuse, The Glass Cage will change the way you think about the tools you use every day.
GlassCage250Digging behind the headlines about factory robots and self-driving cars, wearable computers and digitized medicine, Carr explores the hidden costs of granting software dominion over our work and our leisure. Even as they bring ease to our lives, computer programs are stealing something essential from us.
Drawing on psychological and neurological studies that underscore how tightly people’s happiness and satisfaction are tied to performing meaningful work in the real world, Carr reveals something we already suspect: shifting our attention to computer screens can leave us disengaged and discontented.
From nineteenth-century textile mills to the cockpits of modern jets, from the frozen hunting grounds of Inuit tribes to the sterile landscapes of GPS maps, The Glass Cage explores the impact of automation from a deeply human perspective, examining the personal as well as the economic consequences of our growing dependence on computers.
With a characteristic blend of history and philosophy, poetry and science, Carr takes us on a journey from the work and early theory of Adam Smith and Alfred North Whitehead to the latest research into human attention, memory, and happiness, culminating in a moving meditation on how we can use technology to expand the human experience.
Nicholas Carr’s The Glass Cage: Automation and Us. Coming on September 29.”

Smarter video games, thanks to crowdsourcing


AAAS –Science Magazine: “Despite the stereotypes, any serious gamer knows it’s way more fun to play with real people than against the computer. Video game artificial intelligence, or AI, just isn’t very good; it’s slow, predictable, and generally stupid. All that stands to change, however, if GiantOtter, a Massachusetts-based startup, has its way, New Scientist reports. By crowdsourcing the AI’s learning, GiantOtter hopes to build systems where the computer can learn based on player’s previous behaviors, decision-making, and even voice communication—yes, the computer is listening in as you strategize. The hope is that by abandoning the traditional scripted programming models, AIs can be taught to mimic human behaviors, leading to more dynamic and challenging scenarios even in incredibly complex games like Blizzard Entertainment Inc.’s professionally played StarCraft II.

Data-based Civic Participation


New workshop paper by C. A. Le Dantec in  HCOMP 2014/Citizen + X: Workshop on Volunteer-based Crowdsourcing in Science, Public Health and Government, Pittsburgh, PA. November 2, 2014:  “Within the past five years, a new form of technology -mediated public participation that experiments with crowdsourced data production in place of community discourse has emerged. Examples of this class of system include SeeClickFix, PublicStuff, and Street Bump, each of which mediate feedback about local neighborhood issues and help communities mobilize resources to address those issues. The experiments being playing out by this new class of services are derived from a form of public participation built on the ideas of smart cities where residents and physical environments are instrumented to provide data to improve operational efficiency and sustainability (Caragliu, Del Bo, and Nijkamp 2011). Ultimately, smart cities is the application to local government all the efficiencies that computing has always promised—efficiencies of scale, of productivity, of data—minus the messiness and contention of citizenship that play out through more traditional modes of public engagement and political discourse.
The question then, is what might it look like to incorporate more active forms of civic participation and issue advocacy in an app- and data-driven world? To begin to explore this question, my students and I have developed a smartphone app as part of a larger regional planning partnership with the City of Atlanta and the Atlanta Regional Commission. The app, called Cycle Atlanta, enables cyclists to record their ride data —where they have gone, why they went there, what kind of cyclist they are— in an effort to both generate data for planners developing new bicycling infrastructure and to broaden public participation and input in the creation of those plans…”
 

Francis Fukuyama’s ‘Political Order and Political Decay’


Book Review by David Runciman of  “Political Order and Political Decay: From the Industrial Revolution to the Globalisation of Democracy”, by Francis Fukuyama in the Financial TImes: “It is not often that a 600-page work of political science ends with a cliffhanger. But the first volume of Francis Fukuyama’s epic two-part account of what makes political societies work, published three years ago, left the big question unanswered. That book took the story of political order from prehistoric times to the dawn of modern democracy in the aftermath of the French Revolution. Fukuyama is still best known as the man who announced in 1989 that the birth of liberal democracy represented the end of history: there were simply no better ideas available. But here he hinted that liberal democracies were not immune to the pattern of stagnation and decay that afflicted all other political societies. They too might need to be replaced by something better. So which was it: are our current political arrangements part of the solution, or part of the problem?
Political Order and Political Decay is his answer. He squares the circle by insisting that democratic institutions are only ever one component of political stability. In the wrong circumstances they can be a destabilising force as well. His core argument is that three building blocks are required for a well-ordered society: you need a strong state, the rule of law and democratic accountability. And you need them all together. The arrival of democracy at the end of the 18th century opened up that possibility but by no means guaranteed it. The mere fact of modernity does not solve anything in the domain of politics (which is why Fukuyama is disdainful of the easy mantra that failing states just need to “modernise”).
The explosive growth in industrial capacity and wealth that the world has experienced in the past 200 years has vastly expanded the range of political possibilities available, for better and for worse (just look at the terrifying gap between the world’s best functioning societies – such as Denmark – and the worst – such as the Democratic Republic of Congo). There are now multiple different ways state capacity, legal systems and forms of government can interact with each other, and in an age of globalisation multiple different ways states can interact with each other as well. Modernity has speeded up the process of political development and it has complicated it. It has just not made it any easier. What matters most of all is getting the sequence right. Democracy doesn’t come first. A strong state does. …”

Forget GMOs. The Future of Food Is Data—Mountains of It


Cade Metz at Wired: “… Led by Dan Zigmond—who previously served as chief data scientist for YouTube, then Google Maps—this ambitious project aims to accelerate the work of all the biochemists, food scientists, and chefs on the first floor, providing a computer-generated shortcut to what Hampton Creek sees as the future of food. “We’re looking at the whole process,” Zigmond says of his data team, “trying to figure out what it all means and make better predictions about what is going to happen next.”

The project highlights a movement, spreading through many industries, that seeks to supercharge research and development using the kind of data analysis and manipulation pioneered in the world of computer science, particularly at places like Google and Facebook. Several projects already are using such techniques to feed the development of new industrial materials and medicines. Others hope the latest data analytics and machine learning techniques can help diagnosis disease. “This kind of approach is going to allow a whole new type of scientific experimentation,” says Jeremy Howard, who as the president of Kaggle once oversaw the leading online community of data scientists and is now applying tricks of the data trade to healthcare as the founder of Enlitic.
Zigmond’s project is the first major effort to apply “big data” to the development of food, and though it’s only just getting started—with some experts questioning how effective it will be—it could spur additional research in the field. The company may license its database to others, and Hampton Creek founder and CEO Josh Tetrick says it may even open source the data, so to speak, freely sharing it with everyone. “We’ll see,” says Tetrick, a former college football linebacker who founded Hampton Creek after working on economic and social campaigns in Liberia and Kenya. “That would be in line with who we are as a company.”…
Initially, Zigmond and his team will model protein interactions on individual machines, using tools like the R programming language (a common means of crunching data) and machine learning algorithms much like those that recommend products on Amazon.com. As the database expands, they plan to arrange for much larger and more complex models that run across enormous clusters of computer servers, using the sort of sweeping data-analysis software systems employed by the likes of Google. “Even as we start to get into the tens and hundreds of thousands and millions of proteins,” Zigmond says, “it starts to be more than you can handle with traditional database techniques.”
In particular, Zigmond is exploring the use of deep learning, a form of artificial intelligence that goes beyond ordinary machine learning. Google is using deep learning to drive the speech recognition system in Android phones. Microsoft is using it to translate Skype calls from one language to another. Zigmond believes it can help model the creation of new foods….”

Announcing New U.S. Open Government Commitments on the Third Anniversary of the Open Government Partnership


US White House Fact Sheet: “Three years ago, President Obama joined with the leaders of seven other nations to launch the Open Government Partnership (OGP), an international partnership between governments and civil society to promote transparency, fight corruption, energize civic engagement, and leverage new technologies to open up governments worldwide.  The United States and other founding countries pledged to transform the way that governments serve their citizens in the 21st century.  Today, as heads of state of OGP participating countries gather at the UN General Assembly, this partnership has grown from 8 to 65 nations and hundreds of civil society organizations around the world. These countries are embracing the challenge by taking steps in partnership with civil society to increase the ability of citizens to engage their governments, access government data to fuel entrepreneurship and innovation, and promote accountability….
The United States is committed to continuing to lead by example in OGP.  Since assuming office, President Obama has prioritized making government more open and accountable and has taken substantial steps to increase citizen participation, collaboration with civil society, and transparency in government.  The United States will remain a global leader of international efforts to promote transparency, stem corruption and hold to account those who exploit the public’s trust for private gain.  Yesterday, President Obama announced several steps the United States is taking to deepen our support for civil society globally.
Today, to mark the third anniversary of OGP, President Obama is announcing four new and expanded open government initiatives that will advance our efforts through the end of 2015.
1.      Promote Open Education to Increase Awareness and Engagement
Open education is the open sharing of digital learning materials, tools, and practices that ensures free access to and legal adoption of learning resources.  The United States is committed to open education and will:

  • Raise open education awareness and identify new partnerships. The U.S. Department of State, the U.S. Department of Education, and the Office of Science and Technology Policy will jointly host a workshop on challenges and opportunities in open education internationally with stakeholders from academia, industry, and government.
  • Pilot new models for using open educational resources to support learning.  The State Department will conduct three pilots overseas by December 2015 that use open educational resources to support learning in formal and informal learning contexts. The pilots’ results, including best practices, will be made publicly available for interested educators.
  • Launch an online skills academy. The Department of Labor (DOL), with cooperation from the Department of Education, will award $25 million through competitive grants to launch an online skills academy in 2015 that will offer open online courses of study, using technology to create high-quality, free, or low-cost pathways to degrees, certificates, and other employer-recognized credentials.

2.      Deliver Government Services More Effectively Through Information Technology
The Administration is committed to serving the American people more effectively and efficiently through smarter IT delivery. The newly launched U.S. Digital Service will work to remove barriers to digital service delivery and remake the experience that people and businesses have with their government. To improve delivery of Federal services, information, and benefits, the Administration will:

  • Expand digital service delivery expertise in government. Throughout 2015, the Administration will continue recruiting top digital talent from the private and public sectors to expand services across the government. These individuals —who have expertise in technology, procurement, human resources, and financing —will serve as digital professionals in a number of capacities in the Federal government, including the new U.S. Digital Service and 18F digital delivery team within the U.S. General Services Administration, as well as within Federal agencies. These teams will take best practices from the public and private sectors and scale them across agencies with a focus on the customer experience.
  • Build digital services in the open. The Administration will expand its efforts to build digital services in the open. This includes using open and transparent processes intended to better understand user needs, testing pilot digital projects, and designing and developing digital services at scale. In addition, building on the recently published Digital Services Playbook, the Administration will continue to openly publish best practices on collaborative websites that enable the public to suggest improvements.
  • Adopt an open source software policy. Using and contributing back to open source software can fuel innovation, lower costs, and benefit the public. No later than December 31, 2015, the Administration will work through the Federal agencies to develop an open source software policy that, together with the Digital Services Playbook, will support improved access to custom software code developed for the Federal government.

3.      Increase Transparency in Spending
The Administration has made an increasing amount of Federal spending data publicly available and searchable, allowing nationwide stakeholders to perform analysis of Federal spending. The Administration will build on these efforts by committing to:

  • Improve USAspending.gov. In 2015, the Administration will launch a refreshed USAspending.gov website that will improve the site’s design and user experience, including better enabling users to explore the data using interactive maps and improving the search functionality and application programming interface.
  • Improve accessibility and reusability of Federal financial data.  In 2015, as part of implementation of the DATA Act,[2] the Administration will work to improve the accessibility and reusability of Federal financial data by issuing data element definition standards and standards for exchanging financial data. The Administration, through the Office of Management and Budget, will leverage industry data exchange standards to the extent practicable to maximize the sharing and utilization of Federal financial data.
  • Explore options for visualization and publication of additional Federal financial data.  The Administration, through the Treasury Department, will use small-scale pilots to help explore options for visualizing and publishing Federal financial data from across the government as required by the DATA Act.
  • Continue to engage stakeholders. The Administration will continue to engage with a broad group of stakeholders to seek input on Federal financial transparency initiatives including DATA Act implementation, by hosting town hall meetings, conducting interactive workshops, and seeking input via open innovation collaboration tools.

4.      Use Big Data to Support Greater Openness and Accountability
President Obama has recognized the growing importance of “big data” technologies for our economy and the advancement of public good in areas such as education, energy conservation, and healthcare. The Administration is taking action to ensure responsible uses of big data to promote greater openness and accountability across a range of areas and sectors. As part of the work it is doing in this area, the Administration has committed to:

  • Enhance sharing of best practices on data privacy for state and local law enforcement.  Federal agencies with expertise in law enforcement, privacy, and data practices will seek to enhance collaboration and information sharing about privacy best practices among state and local law enforcement agencies receiving Federal grants.
  • Ensure privacy protection for big data analyses in health. Big data introduces new opportunities to advance medicine and science, improve health care, and support better public health. To ensure that individual privacy is protected while capitalizing on new technologies and data, the Administration, led by the Department of Health and Human Services, will: (1) consult with stakeholders to assess how Federal laws and regulations can best accommodate big data analyses that promise to advance medical science and reduce health care costs; and (2) develop recommendations for ways to promote and facilitate research through access to data while safeguarding patient privacy and autonomy.
  • Expand technical expertise in government to stop discrimination. U.S. Government departments and agencies will work to expand their technical expertise to identify outcomes facilitated by big data analytics that may have a discriminatory impact on protected classes. …”