New book by Nathan Eagle and Kate Greene : “Big Data is made up of lots of little data: numbers entered into cell phones, addresses entered into GPS devices, visits to websites, online purchases, ATM transactions, and any other activity that leaves a digital trail. Although the abuse of Big Data—surveillance, spying, hacking—has made headlines, it shouldn’t overshadow the abundant positive applications of Big Data. In Reality Mining, Nathan Eagle and Kate Greene cut through the hype and the headlines to explore the positive potential of Big Data, showing the ways in which the analysis of Big Data (“Reality Mining”) can be used to improve human systems as varied as political polling and disease tracking, while considering user privacy.
EU-funded tool to help our brain deal with big data
EU Press Release: “Every single minute, the world generates 1.7 million billion bytes of data, equal to 360,000 DVDs. How can our brain deal with increasingly big and complex datasets? EU researchers are developing an interactive system which not only presents data the way you like it, but also changes the presentation constantly in order to prevent brain overload. The project could enable students to study more efficiently or journalists to cross check sources more quickly. Several museums in Germany, the Netherlands, the UK and the United States have already showed interest in the new technology.
Data is everywhere: it can either be created by people or generated by machines, such as sensors gathering climate information, satellite imagery, digital pictures and videos, purchase transaction records, GPS signals, etc. This information is a real gold mine. But it is also challenging: today’s datasets are so huge and complex to process that they require new ideas, tools and infrastructures.
Researchers within CEEDs (@ceedsproject) are transposing big data into an interactive environment to allow the human mind to generate new ideas more efficiently. They have built what they are calling an eXperience Induction Machine (XIM) that uses virtual reality to enable a user to ‘step inside’ large datasets. This immersive multi-modal environment – located at Pompeu Fabra University in Barcelona – also contains a panoply of sensors which allows the system to present the information in the right way to the user, constantly tailored according to their reactions as they examine the data. These reactions – such as gestures, eye movements or heart rate – are monitored by the system and used to adapt the way in which the data is presented.
Jonathan Freeman,Professor of Psychology at Goldsmiths, University of London and coordinator of CEEDs, explains: “The system acknowledges when participants are getting fatigued or overloaded with information. And it adapts accordingly. It either simplifies the visualisations so as to reduce the cognitive load, thus keeping the user less stressed and more able to focus. Or it will guide the person to areas of the data representation that are not as heavy in information.”
Neuroscientists were the first group the CEEDs researchers tried their machine on (BrainX3). It took the typically huge datasets generated in this scientific discipline and animated them with visual and sound displays. By providing subliminal clues, such as flashing arrows, the machine guided the neuroscientists to areas of the data that were potentially more interesting to each person. First pilots have already demonstrated the power of this approach in gaining new insights into the organisation of the brain….”
The Emergence of Government Innovation Teams
Hollie Russon Gilman at TechTank: “A new global currency is emerging. Governments understand that people at home and abroad evaluate them based on how they use technology and innovative approaches in their service delivery and citizen engagement. This raises opportunities, and critical questions about the role of innovation in 21st century governance.
Bloomberg Philanthropies and Nesta, the UK’s Innovation foundation, recently released a global report highlighting 20 government innovation teams. Importantly, the study included teams that were established and funded by all levels of government (city, regional and national), and aims to find creative solutions to seemingly intractable solutions. This report features 20 teams across six continents and features some basic principles and commonalities that are instructive for all types of innovators, inside and outside, of government.
Using Government to Locally Engage
One of the challenges of representational democracy is that elected officials and government officials spend time in bureaucracies isolated from the very people they aim to serve. Perhaps there can be different models. For example, Seoul’s Innovation Bureau is engaging citizens to re-design and re-imagine public services. Seoul is dedicated to becoming a Sharing City; including Tool Kit Centers where citizens can borrow machinery they would rarely use that would also benefit the whole community. This approach puts citizens at the center of their communities and leverages government to work for the people…
As I’ve outlined in a earlier TechTank post, there are institutional constraints for governments to try the unknown. There are potential electoral costs, greater disillusionment, and gaps in vital service delivery. Yet, despite all of these barriers there are a variety of promising tools. For example, Finland has Sitra, an Innovation fund, whose mission is to foster experimentation to transform a diverse set of policy issues including sustainable energy and healthcare. Sitra invests in both the practical research and experiments to further public sector issues as well as invest in early stage companies.
We need a deeper understanding of the opportunities, and challenges, of innovation in government. Luckily there are many researchers, think-tanks, and organizations beginning analysis. For example, Professor and Associate Dean Anita McGahan, of the Rotman School of Management at the University of Toronto, calls for a more strategic approach toward understanding the use of innovation, including big data, in the public sector…”
Time for 21st century democracy
Martin Smith and Dave Richards at Policy Network (UK): “…The way that the world has changed is leading to a clash between two contrasting cultures. Traditional, top down, elite models of democracy and accountability are no longer sustainable in an age of a digitally more open-society. As the recent Hansard Society Report into PMQs clearly reveals, the people see politicians as out of touch and remote. What we need are two major changes. One is the recognition by institutions that they are now making decisions in an open world. That even if they make decisions in private (which in certain cases they clearly have to) they should recognise that at some point those decisions may need to be justified. Therefore every decision should be made on the basis that if it were open it would be deemed as legitimate.
The second is the development of bottom up accountability – we have to develop mechanisms where accountability is not mediated through institutions (as is the case with parliamentary accountability). In its conclusion, the Hansard Society report proposes new technology could be used to allow citizens rather than MPs to ask questions at Prime Minister’s question time. This is one of many forms of citizen led accountability that could reinforce the openness of decision making.
New technology creates the opportunity to move away from 19th century democracy. Technology can be used to change the way decisions are made, how citizens are involved and how institutions are held to account. This is already happening with social groups using social media, on-line petitions and mobile technologies as part of their campaigns. However, this process needs to be formalised (such as in the Hansard Society’s suggestion for citizen’s questions). There is also a need for more user friendly ways of analysing big data around government performance. Big data creates many new ways in which decisions can be opened up and critically reviewed. We also need much more explicit policies of leak and whistleblowing so that those who do reveal the inner workings of governments are not criminalised….”
Fundamentally, the real change is about treating citizens as grown-ups recognising that they can be privy to the details of the policy-making process. There is a great irony in the playground behaviour of Prime Minister’s question time and the patronising attitudes of political elites towards voters (which tends to infantilise citizens as not to have the expertise to fully participate). The most important change is that institutions start to act as if they are operating in an open society where they are directly accountable and hence are in a position to start regaining the trust of the people. The closed world of institutions is no longer viable in a digital age.
Quantifying the Interoperability of Open Government Datasets
Paper by Pieter Colpaert, Mathias Van Compernolle, Laurens De Vocht, Anastasia Dimou, Miel Vander Sande, Peter Mechant, Ruben Verborgh, and Erik Mannens, to be published in Computer: “Open Governments use the Web as a global dataspace for datasets. It is in the interest of these governments to be interoperable with other governments worldwide, yet there is currently no way to identify relevant datasets to be interoperable with and there is no way to measure the interoperability itself. In this article we discuss the possibility of comparing identifiers used within various datasets as a way to measure semantic interoperability. We introduce three metrics to express the interoperability between two datasets: the identifier interoperability, the relevance and the number of conflicts. The metrics are calculated from a list of statements which indicate for each pair of identifiers in the system whether they identify the same concept or not. While a lot of effort is needed to collect these statements, the return is high: not only relevant datasets are identified, also machine-readable feedback is provided to the data maintainer.”
The Responsive City: Engaging Communities Through Data-Smart Governance
New book by The Responsive City: Engaging Communities Through Data-Smart Governance. The Responsive City is a guide to civic engagement and governance in the digital age that will help leaders link important breakthroughs in about technology and big data analytics with age-old lessons of small-group community input to create more agile, competitive, and economically resilient cities. Featuring vivid case-studies highlighting the work of individuals in New York, Boston, Rio de Janeiro, Stockholm, Indiana, and Chicago, the book provides a compelling model for the future of cities and states. The authors demonstrate how digital innovations will drive a virtuous cycle of responsiveness centered on “empowerment” : 1) empowering public employees with tools to both power their performance and to help them connect more personally to those they service, 2) empowering constituents to see and understand problems and opportunities faced by cities so that they can better engage in the life of their communities, and 3) empowering leaders to drive towards their missions and address the grand challenges confronting cities by harnessing the predictive power of cross-government Big Data, the book will help mayors, chief technology officers, city administrators, agency directors, civic groups and nonprofit leaders break out of current paradigms in order to collectively address civic problems. Co-authored by Stephen Goldsmith, former Mayor of Indianapolis, and current Director of the Innovations in Government Program at the Harvard Kennedy School and Susan Crawford, co-director of Harvard’s Berkman Center for Internet and Society.
- Visualizing service delivery and predicting improvement
- Making the work of government employees more meaningful
- Amplification and coordination of focused citizen engagement
- Big Data in big cities – stories of surprising successes and enormous potential”
This Exercise App Tracks Trends on How We Move In Different Cities
Mark Byrnes at CityLab: “An app designed to encourage exercise can also tell us a lot about the way different cities get from point A to B.
The app, called Human, runs in the background of your iPhone, automatically detecting activities like walking, cycling, running, and motorized transport. The point is to encourage you to exercise for at least 30 minutes a day.
Almost a year since Human launched (last August), its developers have released stunning visualization of all that movement: 7.5 million miles traveled by their app users so far.
On their site, you can look into the mobility data inside 30 different cities. Once you click on one, you’ll be greeted with a pie chart that shows the distribution of activity within that city lined up against a pie chart that shows the international average.
In the case of Amsterdam, its transportation clichés are verified. App users in the bike-loving city use two wheels way more than they use four. And they walk about as much as anywhere else:

Human then shows the paths traveled by their users. When it comes to Amsterdam, the results look almost exactly like the city’s entire street grid, no matter what physical activity is being shown:




Request for Proposals: Exploring the Implications of Government Release of Large Datasets
“The Berkeley Center for Law & Technology and Microsoft are issuing this request for proposals (RFP) to fund scholarly inquiry to examine the civil rights, human rights, security and privacy issues that arise from recent initiatives to release large datasets of government information to the public for analysis and reuse. This research may help ground public policy discussions and drive the development of a framework to avoid potential abuses of this data while encouraging greater engagement and innovation.
This RFP seeks to:
- Gain knowledge of the impact of the online release of large amounts of data generated by citizens’ interactions with government
- Imagine new possibilities for technical, legal, and regulatory interventions that avoid abuse
- Begin building a body of research that addresses these issues
– BACKGROUND –
Governments at all levels are releasing large datasets for analysis by anyone for any purpose—“Open Data.” Using Open Data, entrepreneurs may create new products and services, and citizens may use it to gain insight into the government. A plethora of time saving and other useful applications have emerged from Open Data feeds, including more accurate traffic information, real-time arrival of public transportation, and information about crimes in neighborhoods. Sometimes governments release large datasets in order to encourage the development of unimagined new applications. For instance, New York City has made over 1,100 databases available, some of which contain information that can be linked to individuals, such as a parking violation database containing license plate numbers and car descriptions.
Data held by the government is often implicitly or explicitly about individuals—acting in roles that have recognized constitutional protection, such as lobbyist, signatory to a petition, or donor to a political cause; in roles that require special protection, such as victim of, witness to, or suspect in a crime; in the role as businessperson submitting proprietary information to a regulator or obtaining a business license; and in the role of ordinary citizen. While open government is often presented as an unqualified good, sometimes Open Data can identify individuals or groups, leading to a more transparent citizenry. The citizen who foresees this growing transparency may be less willing to engage in government, as these transactions may be documented and released in a dataset to anyone to use for any imaginable purpose—including to deanonymize the database—forever. Moreover, some groups of citizens may have few options or no choice as to whether to engage in governmental activities. Hence, open data sets may have a disparate impact on certain groups. The potential impact of large-scale data and analysis on civil rights is an area of growing concern. A number of civil rights and media justice groups banded together in February 2014 to endorse the “Civil Rights Principles for the Era of Big Data” and the potential of new data systems to undermine longstanding civil rights protections was flagged as a “central finding” of a recent policy review by White House adviser John Podesta.
The Berkeley Center for Law & Technology (BCLT) and Microsoft are issuing this request for proposals in an effort to better understand the implications and potential impact of the release of data related to U.S. citizens’ interactions with their local, state and federal governments. BCLT and Microsoft will fund up to six grants, with a combined total of $300,000. Grantees will be required to participate in a workshop to present and discuss their research at the Berkeley Technology Law Journal (BTLJ) Spring Symposium. All grantees’ papers will be published in a dedicated monograph. Grantees’ papers that approach the issues from a legal perspective may also be published in the BTLJ. We may also hold a followup workshop in New York City or Washington, DC.
While we are primarily interested in funding proposals that address issues related to the policy impacts of Open Data, many of these issues are intertwined with general societal implications of “big data.” As a result, proposals that explore Open Data from a big data perspective are welcome; however, proposals solely focused on big data are not. We are open to proposals that address the following difficult question. We are also open to methods and disciplines, and are particularly interested in proposals from cross-disciplinary teams.
- To what extent does existing Open Data made available by city and state governments affect individual profiling? Do the effects change depending on the level of aggregation (neighborhood vs. cities)? What releases of information could foreseeably cause discrimination in the future? Will different groups in society be disproportionately impacted by Open Data?
- Should the use of Open Data be governed by a code of conduct or subject to a review process before being released? In order to enhance citizen privacy, should governments develop guidelines to release sampled or perturbed data, instead of entire datasets? When datasets contain potentially identifiable information, should there be a notice-and-comment proceeding that includes proposed technological solutions to anonymize, de-identify or otherwise perturb the data?
- Is there something fundamentally different about government services and the government’s collection of citizen’s data for basic needs in modern society such as power and water that requires governments to exercise greater due care than commercial entities?
- Companies have legal and practical mechanisms to shield data submitted to government from public release. What mechanisms do individuals have or should have to address misuse of Open Data? Could developments in the constitutional right to information policy as articulated in Whalen and Westinghouse Electric Co address Open Data privacy issues?
- Collecting data costs money, and its release could affect civil liberties. Yet it is being given away freely, sometimes to immensely profitable firms. Should governments license data for a fee and/or impose limits on its use, given its value?
- The privacy principle of “collection limitation” is under siege, with many arguing that use restrictions will be more efficacious for protecting privacy and more workable for big data analysis. Does the potential of Open Data justify eroding state and federal privacy act collection limitation principles? What are the ethical dimensions of a government system that deprives the data subject of the ability to obscure or prevent the collection of data about a sensitive issue? A move from collection restrictions to use regulation raises a number of related issues, detailed below.
- Are use restrictions efficacious in creating accountability? Consumer reporting agencies are regulated by use restrictions, yet they are not known for their accountability. How could use regulations be implemented in the context of Open Data efficaciously? Can a self-learning algorithm honor data use restrictions?
- If an Open Dataset were regulated by a use restriction, how could individuals police wrongful uses? How would plaintiffs overcome the likely defenses or proof of facts in a use regulation system, such as a burden to prove that data were analyzed and the product of that analysis was used in a certain way to harm the plaintiff? Will plaintiffs ever be able to beat first amendment defenses?
- The President’s Council of Advisors on Science and Technology big data report emphasizes that analysis is not a “use” of data. Such an interpretation suggests that NSA metadata analysis and large-scale scanning of communications do not raise privacy issues. What are the ethical and legal implications of the “analysis is not use” argument in the context of Open Data?
- Open Data celebrates the idea that information collected by the government can be used by another person for various kinds of analysis. When analysts are not involved in the collection of data, they are less likely to understand its context and limitations. How do we ensure that this knowledge is maintained in a use regulation system?
- Former President William Clinton was admitted under a pseudonym for a procedure at a New York Hospital in 2004. The hospital detected 1,500 attempts by its own employees to access the President’s records. With snooping such a tempting activity, how could incentives be crafted to cause self-policing of government data and the self-disclosure of inappropriate uses of Open Data?
- It is clear that data privacy regulation could hamper some big data efforts. However, many examples of big data successes hail from highly regulated environments, such as health care and financial services—areas with statutory, common law, and IRB protections. What are the contours of privacy law that are compatible with big data and Open Data success and which are inherently inimical to it?
- In recent years, the problem of “too much money in politics” has been addressed with increasing disclosure requirements. Yet, distrust in government remains high, and individuals identified in donor databases have been subjected to harassment. Is the answer to problems of distrust in government even more Open Data?
- What are the ethical and epistemological implications of encouraging government decision-making based upon correlation analysis, without a rigorous understanding of cause and effect? Are there decisions that should not be left to just correlational proof? While enthusiasm for data science has increased, scientific journals are elevating their standards, with special scrutiny focused on hypothesis-free, multiple comparison analysis. What could legal and policy experts learn from experts in statistics about the nature and limits of open data?…
To submit a proposal, visit the Conference Management Toolkit (CMT) here.
Once you have created a profile, the site will allow you to submit your proposal.
If you have questions, please contact Chris Hoofnagle, principal investigator on this project.”
'Big Data' Will Change How You Play, See the Doctor, Even Eat
We’re entering an age of personal big data, and its impact on our lives will surpass that of the Internet. Data will answer questions we could never before answer with certainty—everyday questions like whether that dress actually makes you look fat, or profound questions about precisely how long you will live.
Every 20 years or so, a powerful technology moves from the realm of backroom expertise and into the hands of the masses. In the late-1970s, computing made that transition—from mainframes in glass-enclosed rooms to personal computers on desks. In the late 1990s, the first web browsers made networks, which had been for science labs and the military, accessible to any of us, giving birth to the modern Internet.
Each transition touched off an explosion of innovation and reshaped work and leisure. In 1975, 50,000 PCs were in use worldwide. Twenty years later: 225 million. The number of Internet users in 1995 hit 16 million. Today it’s more than 3 billion. In much of the world, it’s hard to imagine life without constant access to both computing and networks.
The 2010s will be the coming-out party for data. Gathering, accessing and gleaning insights from vast and deep data has been a capability locked inside enterprises long enough. Cloud computing and mobile devices now make it possible to stand in a bathroom line at a baseball game while tapping into massive computing power and databases. On the other end, connected devices such as the Nest thermostat or Fitbit health monitor and apps on smartphones increasingly collect new kinds of information about everyday personal actions and habits, turning it into data about ourselves.
More than 80 percent of data today is unstructured: tangles of YouTube videos, news stories, academic papers, social network comments. Unstructured data has been almost impossible to search for, analyze and mix with other data. A new generation of computers—cognitive computing systems that learn from data—will read tweets or e-books or watch video, and comprehend its content. Somewhat like brains, these systems can link diverse bits of data to come up with real answers, not just search results.
Such systems can work in natural language. The progenitor is the IBM Watson computer that won on Jeopardy in 2011. Next-generation Watsons will work like a super-powered Google. (Google today is a data-searching wimp compared with what’s coming.)
Sports offers a glimpse into the data age. Last season the NBA installed in every arena technology that can “watch” a game and record, in 48 minutes of action, more than 4 million data points about every movement and shot. That alone could yield new insights for NBA coaches, such as which group of five players most efficiently passes the ball around….
Think again about life before personal computing and the Internet. Even if someone told you that you’d eventually carry a computer in your pocket that was always connected to global networks, you would’ve had a hard time imagining what that meant—imagining WhatsApp, Siri, Pandora, Uber, Evernote, Tinder.
As data about everything becomes ubiquitous and democratized, layered on top of computing and networks, it will touch off the most spectacular technology explosion yet. We can see the early stages now. “Big data” doesn’t even begin to describe the enormity of what’s coming next.”
Big Money, Uncertain Return
Mary K. Pratt in a MIT Technology Review Special Report on Data-Driven Health Care: “Hospitals are spending billions collecting and analyzing medical data. The one data point no one is tracking: the payoff…. Ten years ago, Kaiser Permanente began building a $4 billion electronic-health-record system that includes a comprehensive collection of health-care data ranging from patients’ treatment records to research-based clinical advice. Now Kaiser has added advanced analytics tools and data from more sources, including a pilot program that integrates information from patients’ medical devices.
Faced with new government regulations and insurer pressure to control costs, other health-care organizations are following Kaiser’s example and increasing their use of analytics. The belief: that mining their vast quantities of patient data will yield insights into the best treatments at the lowest cost.
But just how big will the financial payoff be? Terhilda Garrido, vice president of health IT transformation and analytics at Kaiser, admits she doesn’t know. Nor do other health-care leaders. The return on investment for health-care analytics programs remains elusive and nearly impossible for most to calculate…
Opportunities to identify the most effective treatments could slip away if CIOs and their teams aren’t able to quantify the return on their analytics investments. Health-care providers are under increasing pressure to cut costs in an era of capped billing, and executives at medical organizations won’t okay spending their increasingly limited dollars on data warehouses, analytics software, and data scientists if they can’t be sure they’ll see real benefit.
A new initiative at Cleveland Clinic shows the opportunities and challenges. By analyzing patients’ records on their overall health and medical conditions, the medical center determines which patients coming in for hip and knee replacements can get postoperative services in their own homes (the most cost-effective option), which ones will need a short stay in a skilled nursing facility, and which ones will have longer stints in a skilled nursing facility (the most costly option). The classifications control costs while still ensuring the best possible medical outcomes, says CIO C. Martin Harris.
That does translate into real—and significant—financial benefits, but Harris wonders how to calculate the payoff from his data investment. Should the costs of every system from which patient data is pulled be part of the equation in addition to the costs of the data warehouse and analytics tools? Calculating how much money is saved by implementing better protocols is not straightforward either. Harris hesitates to attribute better, more cost-effective patient outcomes solely to analytics when many other factors are also likely contributors…”