Explore our articles

Stefaan Verhulst

Paper by Stephen Wilson in the Australian Journal of Telecommunications and the Digital Economy : “We live in an age where billionaires are self-made on the back of the most intangible of assets – the information they have about us. The digital economy is awash with data. It’s a new and endlessly re-useable raw material, increasingly left behind by ordinary people going about their lives online. Many information businesses proceed on the basis that raw data is up for grabs; if an entrepreneur is clever enough to find a new vein of it, they can feel entitled to tap it in any way they like. However, some tacit assumptions underpinning today’s digital business models are naive. Conventional data protection laws, older than the Internet, limit how Personal Information is allowed to flow. These laws turn out to be surprisingly powerful in the face of ‘Big Data’ and the ‘Internet of Things’. On the other hand, orthodox privacy management was not framed for new Personal Information being synthesised tomorrow from raw data collected today. This paper seeks to bridge a conceptual gap between data analytics and privacy, and sets out extended Privacy Principles to better deal with Big Data.”
The collision between big data and privacy law
FastCo-Exist: “Astronomers have long built observatories to capture the night sky and beyond. Now researchers at NYU are borrowing astronomy’s methods and turning their cameras towards Manhattan’s famous skyline.
NYU’s Center for Urban Science and Progress has been running what’s likely the world’s first “urban observatory” of its kind for about a year. From atop a tall building in downtown Brooklyn (NYU won’t say its address, due to security concerns), two cameras—one regular one and one that captures infrared wavelengths—take panoramic images of lower and midtown Manhattan. One photo is snapped every 10 seconds. That’s 8,640 images a day, or more than 3 million since the project began (or about 50 terabytes of data).

“The real power of the urban observatory is that you have this synoptic imaging. By synoptic imaging, I mean these large swaths of the city,” says the project’s chief scientist Gregory Dobler, a former astrophysicist at Harvard University and the University of California, Santa Barbara who now heads the 15-person observatory team at NYU.
Dobler’s team is collaborating with New York City officials on the project, which is now expanding to set up stations that study other parts of Manhattan and Brooklyn. Its major goal is to discover information about the urban landscape that can’t be seen at other scales. Such data could lead to applications like tracking which buildings are leaking energy (with the infrared camera), or measuring occupancy patterns of buildings at night, or perhaps detecting releases of toxic chemicals in an emergency.
The video above is an example. The top panel cycles through a one-minute slice of observatory images. The bottom panel is an analysis of the same images in which everything that remains static in each image is removed, such as buildings, trees, and roads. What’s left is an imprint of everything in flux within the scene—the clouds, the cars on the FDR Drive, the boat moving down the East River, and, importantly, a plume of smoke that puffs out of a building.
“Periodically, a building will burp,” says Dobler. “It’s hard to see the puffs of smoke . . . but we can isolate that plume and essentially identify it.” (As Dobler has done by highlighting it in red in the top panel).
To the natural privacy concerns about this kind of program, Dobler emphasizes that the pictures are only from an 8 megapixel camera (the same found in the iPhone 6) and aren’t clear enough to see inside a window or make out individuals. As a further privacy safeguard, the images are analyzed to only look at “aggregate” measures—such as the patterns of nighttime energy usage—rather than specific buildings. “We’re not really interested in looking at a given building, and saying, hey, these guys are particular offenders,” he says (He also says the team is not looking at uses for the data in security applications.) However, Dobler was not able to answer a question as to whether the project’s partners at city agencies are able to access data analysis for individual buildings….”

Urban Observatory Is Snapping 9,000 Images A Day Of New York City

New paper by Charles D Borromeo, Titus K Schleyer, Michael J Becich, and Harry Hochheiser: “Background: Research networking systems hold great promise for helping biomedical scientists identify collaborators with the expertise needed to build interdisciplinary teams. Although efforts to date have focused primarily on collecting and aggregating information, less attention has been paid to the design of end-user tools for using these collections to identify collaborators. To be effective, collaborator search tools must provide researchers with easy access to information relevant to their collaboration needs.
Objective: The aim was to study user requirements and preferences for research networking system collaborator search tools and to design and evaluate a functional prototype.
Methods: Paper prototypes exploring possible interface designs were presented to 18 participants in semistructured interviews aimed at eliciting collaborator search needs. Interview data were coded and analyzed to identify recurrent themes and related software requirements. Analysis results and elements from paper prototypes were used to design a Web-based prototype using the D3 JavaScript library and VIVO data. Preliminary usability studies asked 20 participants to use the tool and to provide feedback through semistructured interviews and completion of the System Usability Scale (SUS).
Results: Initial interviews identified consensus regarding several novel requirements for collaborator search tools, including chronological display of publication and research funding information, the need for conjunctive keyword searches, and tools for tracking candidate collaborators. Participant responses were positive (SUS score: mean 76.4%, SD 13.9). Opportunities for improving the interface design were identified.
Conclusions: Interactive, timeline-based displays that support comparison of researcher productivity in funding and publication have the potential to effectively support searching for collaborators. Further refinement and longitudinal studies may be needed to better understand the implications of collaborator search tools for researcher workflows.”

Finding Collaborators: Toward Interactive Discovery Tools for Research Network Systems

They say their model has the potential to transform flu forecasting from a black art to a modern science as well-founded as weather forecasting.
Flu takes between 3,000 and 49,000 lives each year in the U.S. so an accurate forecast can have a significant impact on the way society prepares for the epidemic. The current method of monitoring flu outbreaks is somewhat antiquated. It relies on a voluntary system in which public health officials report the percentage of patients they see each week with influenza-like illnesses. This is defined as the percentage of people with a temperature higher than 100 degrees, a cough and no other explanation other than flu.
These numbers give a sense of the incidence of flu at any instant but the accuracy is clearly limited. They do not, for example, account for people with flu who do not seek treatment or people with flu-like symptoms who seek treatment but do not have flu.
There is another significant problem. The network that reports this data is relatively slow. It takes about two weeks for the numbers to filter through the system so the data is always weeks old.
That’s why the CDC is interested in finding new ways to monitor the spread of flu in real time. Google, in particular, has used the number of searches for flu and flu-like symptoms to forecast flu in various parts of the world. That approach has had considerable success but also some puzzling failures. One problem, however, is that Google does not make its data freely available and this lack of transparency is a potential source of trouble for this kind of research.
So Hickmann and co have turned to Wikipedia. Their idea is that the variation in numbers of people accessing articles about flu is an indicator of the spread of the disease. And since Wikipedia makes this data freely available to any interested party, it is an entirely transparent source that is likely to be available for the foreseeable future….
Ref: arxiv.org/abs/1410.7716 : Forecasting the 2013–2014 Influenza Season using Wikipedia”

How Wikipedia Data Is Revolutionizing Flu Forecasting

in the New York Times: “Google is giving its Flu Trends service an overhaul — “a brand new engine,” as it announced in a blog post on Friday.

The new thing is actually traditional data from the Centers for Disease Control and Prevention that is being integrated into the Google flu-tracking model. The goal is greater accuracy after the Google service had been criticized for consistently over-estimating flu outbreaks in recent years.

The main critique came in an analysis done by four quantitative social scientists, published earlier this year in an article in Science magazine, “The Parable of Google Flu: Traps in Big Data Analysis.” The researchers found that the most accurate flu predictor was a data mash-up that combined Google Flu Trends, which monitored flu-related search terms, with the official C.D.C. reports from doctors on influenza-like illness.

The Google Flu Trends team is heeding that advice. In the blog post, written by Christian Stefansen, a Google senior software engineer, wrote, “We’re launching a new Flu Trends model in the United States that — like many of the best performing methods in the literature — takes official CDC flu data into account as the flu season progresses.”

Google’s flu-tracking service has had its ups and downs. Its triumph came in 2009, when it gave an advance signal of the severity of the H1N1 outbreak, two weeks or so ahead of official statistics. In a 2009 article in Nature explaining how Google Flu Trends worked, the company’s researchers did, as the Friday post notes, say that the Google service was not intended to replace official flu surveillance methods and that it was susceptible to “false alerts” — anything that might prompt a surge in flu-related search queries.

Yet those caveats came a couple of pages into the Nature article. And Google Flu Trends became a symbol of the superiority of the new, big data approach — computer algorithms mining data trails for collective intelligence in real time. To enthusiasts, it seemed so superior to the antiquated method of collecting health data that involved doctors talking to patients, inspecting them and filing reports.

But Google’s flu service greatly overestimated the number of cases in the United States in the 2012-13 flu season — a well-known miss — and, according to the research published this year, has persistently overstated flu cases over the years. In the Science article, the social scientists called it “big data hubris.”

The New Thing in Google Flu Trends Is Traditional Data

Craig Thomler at eGovAU: “…Now we can do much better. Rather than focusing on electing and appointing individual experts – the ‘nodes’ in our governance system, governments need to focus on the network that interconnects citizens, government, business, not-for-profits and other entities.

Rather than limiting decision making to a small core of elected officials (supported by appointed and self-nominated ‘experts’), we need to design decision-making systems which empower broad groups of citizens to self-inform and involve themselves at appropriate steps of decision-making processes.
This isn’t quite direct democracy – where the population weighs in on every issue, but it certainly is a few steps removed from the alienating ‘representative democracy’ that many countries use today.
What this model of governance allows for is far more agile and iterative policy debates, rapid testing and improvement of programs and managed distributed community support – where anyone in a community can offer to help others within a framework which values, supports and rewards their involvement, rather than looks at it with suspicion and places many barriers in the way.
Of course we need the mechanisms designed to support this model of government, and the notion that they will simply evolve out of our existing system is quite naive.
Our current governance structures are evolutionary – based on the principle that better approaches will beat out ineffective and inefficient ones. Both history and animal evolution have shown that inefficient organisms can survive for extremely long times, and can require radical environmental change (such as mass extinction events) for new forms to be successful.
On top of this the evolution of government is particularly slow as there’s far fewer connections between the 200-odd national governments in the world than between the 200+ Watson artificial intelligences in the world.
While every Watson learns what other Watsons learn rapidly, governments have stilted and formal mechanisms for connection that mean that it can take decades – or even longer – for them to recognise successes and failures in others.
In other words, while we have a diverse group of governments all attempting to solve many of the same basic problems, the network effect isn’t working as they are all too inward focused and have focused on developing expertise ‘nodes’ (individuals) rather than expert networks (connections).
This isn’t something that can be fixed by one, or even a group of ten or more governments – thereby leaving humanity in the position of having to repeat the same errors time and time again, approving the same drugs, testing the same welfare systems, trialing the same legal regimes, even when we have examples of their failures and successes we could be learning from.
So therefore the best solution – perhaps the only workable solution for the likely duration of human civilisation on this planet – is to do what some of our forefather did and design new forms of government in a planned way.
Rather than letting governments slowly and haphazardly evolve through trial and error, we should take a leaf out of the book of engineers, and place a concerted effort into designing governance systems that meet human needs.
These systems should involve and nurture strong networks, focusing on the connections rather than the nodes – allowing us to both leverage the full capabilities of society in its own betterment and to rapidly adjust settings when environments and needs change….”
The future of intelligence is distributed – and so is the future of government
Blog by Susan Crawford at HBR: “As politics at the federal level becomes increasingly corrosive and polarized, with trust in Congress and the President at historic lows, Americans still celebrate their cities. And cities are where the action is when it comes to using technology to thicken the mesh of civic goods — more and more cities are using data to animate and inform interactions between government and citizens to improve wellbeing.
Every day, I learn about some new civic improvement that will become possible when we can assume the presence of ubiquitous, cheap, and unlimited data connectivity in cities. Some of these are made possible by the proliferation of smartphones; others rely on the increasing number of internet-connected sensors embedded in the built environment. In both cases, the constant is data. (My new book, The Responsive City, written with co-author Stephen Goldsmith, tells stories from Chicago, Boston, New York City and elsewhere about recent developments along these lines.)
For example, with open fiber networks in place, sending video messages will become as accessible and routine as sending email is now. Take a look at rhinobird.tv, a free lightweight, open-source video service that works in browsers (no special download needed) and allows anyone to create a hashtag-driven “channel” for particular events and places. A debate or protest could be viewed from a thousand perspectives. Elected officials and public employees could easily hold streaming, virtual town hall meetings.
Given all that video and all those livestreams, we’ll need curation and aggregation to make sense of the flow. That’s why visualization norms, still in their infancy, will become a greater part of literacy. When the Internet Archive attempted late last year to “map” 400,000 hours of television news, against worldwide locations, it came up with pulsing blobs of attention. Although visionary Kevin Kelly has been talking about data visualization as a new form of literacy for years, city governments still struggle with presenting complex and changing information in standard, easy-to-consume ways.
Plenar.io is one attempt to resolve this. It’s a platform developed by former Chicago Chief Data Officer Brett Goldstein that allows public datasets to be combined and mapped with easy-to-see relationships among weather and crime, for example, on a single city block. (A sample question anyone can ask of Plenar.io: “Tell me the story of 700 Howard Street in San Francisco.”) Right now, Plenar.io’s visual norm is a map, but it’s easy to imagine other forms of presentation that could become standard. All the city has to do is open up its widely varying datasets…”
Governing the Smart, Connected City

New Paper by William Li, Pablo Azar, David Larochelle, Phil Hill & Andrew Lo: “The agglomeration of rules and regulations over time has produced a body of legal code that no single individual can fully comprehend. This complexity produces inefficiencies, makes the processes of understanding and changing the law difficult, and frustrates the fundamental principle that the law should provide fair notice to the governed. In this article, we take a quantitative, unbiased, and software-engineering approach to analyze the evolution of the United States Code from 1926 to today. Software engineers frequently face the challenge of understanding and managing large, structured collections of instructions, directives, and conditional statements, and we adapt and apply their techniques to the U.S. Code over time. Our work produces insights into the structure of the U.S. Code as a whole, its strengths and vulnerabilities, and new ways of thinking about individual laws. For example, we identify the first appearance and spread of important terms in the U.S. Code like “whistleblower” and “privacy.” We also analyze and visualize the network structure of certain substantial reforms, including the Patient Protection and Affordable Care Act (PPACA) and the Dodd-Frank Wall Street Reform and Consumer Protection Act, and show how the interconnections of references can increase complexity and create the potential for unintended consequences. Our work is a timely illustration of computational approaches to law as the legal profession embraces technology for scholarship, to increase efficiency, and to improve access to justice.”

Law is Code: A Software Engineering Approach to Analyzing the United States Code

Talk by Boyan Yurukov at TEDxBG: “Working on various projects Boyan started a sort of a quest for better transparency. It came with the promise of access that would yield answers to what is wrong and what is right with governments today. Over time, he realized that better transparency and more open data bring us almost no relevant answers. Instead, we get more questions and that’s great news. Questions help us see what is relevant, what is hidden, what our assumptions are. That’s the true value of data.
Boyan Yurukov is a software engineer and open data advocate based in Frankfurt. Graduated Computational Engineering with Data Mining from TU Darmstadt. Involved in data liberation, crowd sourcing and visualization projects focused on various issues in Bulgaria as well as open data legislation….

Open Data – Searching for the right questions

The Economist on how “Data are slowly changing the way cities operate…WAITING for a bus on a drizzly winter morning is miserable. But for London commuters Citymapper, an app, makes it a little more bearable. Users enter their destination into a search box and a range of different ways to get there pop up, along with real-time information about when a bus will arrive or when the next Tube will depart. The app is an example of how data are changing the way people view and use cities. Local governments are gradually starting to catch up.
Nearly all big British cities have started to open up access to their data. On October 23rd the second version of the London Datastore, a huge trove of information on everything from crime statistics to delays on the Tube, was launched. In April Leeds City council opened an online “Data Mill” which contains raw data on such things as footfall in the city centre, the number of allotment sites or visits to libraries. Manchester also releases chunks of data on how the city region operates.
Mostly these websites act as tools for developers and academics to play around with. Since the first Datastore was launched in 2010, around 200 apps, such as Citymapper, have sprung up. Other initiatives have followed. “Whereabouts”, which also launched on October 23rd, is an interactive map by the Future Cities Catapult, a non-profit group, and the Greater London Authority (GLA). It uses 235 data sets, some 150 of them from the Datastore, from the age and occupation of London residents to the number of pubs or types of restaurants in an area. In doing so it suggests a different picture of London neighbourhoods based on eight different categories (see map, and its website: whereaboutslondon.org)….”

City slicker

Get the latest news right in you inbox

Subscribe to curated findings and actionable knowledge from The Living Library, delivered to your inbox every Friday