A Data-driven Approach to Assess the Potential of Smart Cities: The Case of Open Data for Brussels Capital Region


Paper by Miguel Angel Gomez Zotano and Hugues Bersini in Energy Procedia: “The success of smart city projects is intrinsically related to the existence of large volumes of data that could be processed to achieve their objectives. For this purpose, the plethora of data stored by public administrations becomes an incredibly rich source of insight and information due to its volume and diversity. However, it was only with the Open Government Movement when governments have been concerned with the need to open their data to citizens and businesses. Thus, with the emergence of open data portals, these myriad of data enables the development of new business models. The achievement of the benefits sought by making this data available triggers new challenges to cope with the diversity of sources involved. The business potential could be jeopardized by the scarcity of relevant data in the different blocks and domains that makes a city and by the lack of a common approach to data publication, in terms of format, content, etc.

This paper introduces a holistic approach that relies on the Smart City Ontology as the cornerstone to standardise and structure data. This approach, which is proposed to be an analytical tool to assess the potential of data in a given smart city, analyses three main aspects: availability of data, the criteria that data should fulfil to be considered eligible and the model used to structure and organise data. The approach has been applied to the case of Brussels Capital Region, which first results are presented and discussed in this paper. The main conclusion that has been obtained is that, besides its commitment with open data and smart cities, Brussels is not mature enough to fully exploit the real intelligence that smart cities could provide. This maturity would be achieved in the following years with the implementation of the new Brussels’ Smart City Strategy…(More)”.

Access to New Data Sources for Statistics: Business Models and Incentives for the Corporate Sector


Screen Shot 2017-03-28 at 11.45.07 AMReport by Thilo Klein and Stefaan Verhulst: “New data sources, commonly referred to as “Big Data”, have attracted growing interest from National Statistical Institutes. They have the potential to complement official and more conventional statistics used, for instance, to determine progress towards the Sustainable Development Goals (SDGs) and other targets. However, it is often assumed that this type of data is readily available, which is not necessarily the case. This paper examines legal requirements and business incentives to obtain agreement on private data access, and more generally ways to facilitate the use of Big Data for statistical purposes. Using practical cases, the paper analyses the suitability of five generic data access models for different data sources and data uses in an emerging new data ecosystem. Concrete recommendations for policy action are presented in the conclusions….(More)”.

Prediction and Inference from Social Networks and Social Media


Book edited by Kawash, Jalal, Agarwal, Nitin, Özyer, Tansel: “This book addresses the challenges of social network and social media analysis in terms of prediction and inference. The chapters collected here tackle these issues by proposing new analysis methods and by examining mining methods for the vast amount of social content produced. Social Networks (SNs) have become an integral part of our lives; they are used for leisure, business, government, medical, educational purposes and have attracted billions of users. The challenges that stem from this wide adoption of SNs are vast. These include generating realistic social network topologies, awareness of user activities, topic and trend generation, estimation of user attributes from their social content, and behavior detection. This text has applications to widely used platforms such as Twitter and Facebook and appeals to students, researchers, and professionals in the field….(More)”

Standard Business Reporting: Open Data to Cut Compliance Costs


Report by the Data Foundation: “Imagine if U.S. companies’ compliance costs could be reduced, by billions of dollars. Imagine if this could happen without sacrificing any transparency to investors and governments. Open data can make that possible.

This first-ever research report, co-published by the Data Foundation and PwC, explains how Standard Business Reporting (SBR), in which multiple regulatory agencies adopt a common open data structure for the information they collect, reduces costs for both companies and agencies.

SBR programs are in place in the Netherlands, Australia, and elsewhere – but the concept is unknown in the United States. Our report is intended to introduce SBR to U.S. policymakers and lay the groundwork for future change….(More)”.

Congress Takes Blockchain 101


Mike Orcutt at MIT Technology Review: “Congressman David Schweikert is determined to enlighten his colleagues in Washington about the blockchain. The opportunities the technology creates for society are vast, he says, and right now education is key to keeping the government from “screwing it up.”

Schweikert, a Republican from Arizona, co-chairs the recently launched Congressional Blockchain Caucus. He and fellow co-chair, Democratic Representative Jared Polis of Colorado, say they created it in response to increasing interest and curiosity on Capitol Hill about blockchain technology. “Members of Congress are starting to get visits from people that are doing things with the blockchain and talking about it,” says Polis. “They are interested in learning more, and we hope to provide the forum to do that.”

Blockchain technology is difficult to explain, and misconceptions among policymakers are almost inevitable. One important concept Schweikert says more people need to understand is that a blockchain is not necessarily Bitcoin, and there are plenty of applications of blockchains beyond transferring digital currency. Digital currencies, and especially Bitcoin, the most popular blockchain by far, make some policymakers and government officials wary. But focusing on currency keeps people from seeing the potential the blockchain has to reinvent how we control and manage valuable information, Schweikert argues.

A blockchain is a decentralized, online record-keeping system, or ledger, maintained by a network of computers that verify and record transactions using established cryptographic techniques. Bitcoin’s system, which is open-source, depends on people all around the world called miners. They use specialized computers to verify and record transactions, and receive Bitcoin currency in reward. Several other digital currencies work in a similar fashion.

Digital currency is not the main reason so many institutions have begun experimenting with blockchains in recent years, though. Blockchains can also be used to securely and permanently store other information besides currency transaction records. For instance, banks and other financial companies see this as a way to manage information vital to the transfer of ownership of financial assets more efficiently than they do now. Some experiments have involved the Bitcoin blockchain, some use the newer blockchain software platform called Ethereum, and others have used private or semi-private blockchains.

The government should adopt blockchain technology too, say the Congressmen. A decentralized ledger is better than a conventional database “whenever we need better consumer control of information and security” like in health records, tax returns, voting records, and identity management, says Polis. Several federal agencies and state governments are already experimenting with blockchain applications. The Department of Homeland Security, for example, is running a test to track data from its border surveillance devices in a distributed ledger….

Services for transferring money fall under the jurisdiction of several federal regulators, and face a patchwork of state licensing laws. New blockchain-based business models are challenging traditional notions of money transmission, she says, and many companies are unsure where they fit in the complicated legal landscape.

Boring has argued that financial technology companies would benefit from a regulatory safe zone, or “sandbox”—like those that are already in place in the U.K. and Singapore—where they could test products without the risk of “inadvertent regulatory violations.” We don’t need any new legislation from Congress yet, though—that could stifle innovation even more, she says. “What Congress should be doing is educating themselves on the issues.”…(More)”

Dark Web


Kristin Finklea for the Congressional Research Service: “The layers of the Internet go far beyond the surface content that many can easily access in their daily searches. The other content is that of the Deep Web, content that has not been indexed by traditional search engines such as Google. The furthest corners of the Deep Web, segments known as the Dark Web, contain content that has been intentionally concealed. The Dark Web may be used for legitimate purposes as well as to conceal criminal or otherwise malicious activities. It is the exploitation of the Dark Web for illegal practices that has garnered the interest of officials and policymakers.

Individuals can access the Dark Web by using special software such as Tor (short for The Onion Router). Tor relies upon a network of volunteer computers to route users’ web traffic through a series of other users’ computers such that the traffic cannot be traced to the original user. Some developers have created tools—such as Tor2web—that may allow individuals access to Torhosted content without downloading and installing the Tor software, though accessing the Dark Web through these means does not anonymize activity. Once on the Dark Web, users often navigate it through directories such as the “Hidden Wiki,” which organizes sites by category, similar to Wikipedia. Individuals can also search the Dark Web with search engines, which may be broad, searching across the Deep Web, or more specific, searching for contraband like illicit drugs, guns, or counterfeit money.

While on the Dark Web, individuals may communicate through means such as secure email, web chats, or personal messaging hosted on Tor. Though tools such as Tor aim to anonymize content and activity, researchers and security experts are constantly developing means by which certain hidden services or individuals could be identified or “deanonymized.” Anonymizing services such as Tor have been used for legal and illegal activities ranging from maintaining privacy to selling illegal goods—mainly purchased with Bitcoin or other digital currencies. They may be used to circumvent censorship, access blocked content, or maintain the privacy of sensitive communications or business plans. However, a range of malicious actors, from criminals to terrorists to state-sponsored spies, can also leverage cyberspace and the Dark Web can serve as a forum for conversation, coordination, and action. It is unclear how much of the Dark Web is dedicated to serving a particular illicit market at any one time, and, because of the anonymity of services such as Tor, it is even further unclear how much traffic is actually flowing to any given site.

Just as criminals can rely upon the anonymity of the Dark Web, so too can the law enforcement, military, and intelligence communities. They may, for example, use it to conduct online surveillance and sting operations and to maintain anonymous tip lines. Anonymity in the Dark Web can be used to shield officials from identification and hacking by adversaries. It can also be used to conduct a clandestine or covert computer network operation such as taking down a website or a denial of service attack, or to intercept communications. Reportedly, officials are continuously working on expanding techniques to deanonymize activity on the Dark Web and identify malicious actors online….(More)”

Bit By Bit: Social Research in the Digital Age


Open Review of Book by Matthew J. Salganik: “In the summer of 2009, mobile phones were ringing all across Rwanda. In addition to the millions of calls between family, friends, and business associates, about 1,000 Rwandans received a call from Joshua Blumenstock and his colleagues. The researchers were studying wealth and poverty by conducting a survey of people who had been randomly sampled from a database of 1.5 million customers from Rwanda’s largest mobile phone provider. Blumenstock and colleagues asked the participants if they wanted to participate in a survey, explained the nature of the research to them, and then asked a series of questions about their demographic, social, and economic characteristics.

Everything I have said up until now makes this sound like a traditional social science survey. But, what comes next is not traditional, at least not yet. They used the survey data to train a machine learning model to predict someone’s wealth from their call data, and then they used this model to estimate the wealth of all 1.5 million customers. Next, they estimated the place of residence of all 1.5 million customers by using the geographic information embedded in the call logs. Putting these two estimates together—the estimated wealth and the estimated place of residence—Blumenstock and colleagues were able to produce high-resolution estimates of the geographic distribution of wealth across Rwanda. In particular, they could produce an estimated wealth for each of Rwanda’s 2,148 cells, the smallest administrative unit in the country.

It was impossible to validate these estimates because no one had ever produced estimates for such small geographic areas in Rwanda. But, when Blumenstock and colleagues aggregated their estimates to Rwanda’s 30 districts, they found that their estimates were similar to estimates from the Demographic and Health Survey, the gold standard of surveys in developing countries. Although these two approaches produced similar estimates in this case, the approach of Blumenstock and colleagues was about 10 times faster and 50 times cheaper than the traditional Demographic and Health Surveys. These dramatically faster and lower cost estimates create new possibilities for researchers, governments, and companies (Blumenstock, Cadamuro, and On 2015).

In addition to developing a new methodology, this study is kind of like a Rorschach inkblot test; what people see depends on their background. Many social scientists see a new measurement tool that can be used to test theories about economic development. Many data scientists see a cool new machine learning problem. Many business people see a powerful approach for unlocking value in the digital trace data that they have already collected. Many privacy advocates see a scary reminder that we live in a time of mass surveillance. Many policy makers see a way that new technology can help create a better world. In fact, this study is all of those things, and that is why it is a window into the future of social research….(More)”

Big and open data are prompting a reform of scientific governance


Sabina Leonelli in Times Higher Education: “Big data are widely seen as a game-changer in scientific research, promising new and efficient ways to produce knowledge. And yet, large and diverse data collections are nothing new – they have long existed in fields such as meteorology, astronomy and natural history.

What, then, is all the fuss about? In my recent book, I argue that the true revolution is in the status accorded to data as research outputs in their own right. Along with this has come an emphasis on open data as crucial to excellent and reliable science.

Previously – ever since scientific journals emerged in the 17th century – data were private tools, owned by the scientists who produced them and scrutinised by a small circle of experts. Their usefulness lay in their function as evidence for a given hypothesis. This perception has shifted dramatically in the past decade. Increasingly, data are research components that can and should be made publicly available and usable.

Rather than the birth of a data-driven epistemology, we are thus witnessing the rise of a data-centric approach in which efforts to mobilise, integrate and visualise data become contributions to discovery, not just a by-product of hypothesis testing.

The rise of data-centrism highlights the challenges involved in gathering, classifying and interpreting data, and the concepts, technologies and social structures that surround these processes. This has implications for how research is conducted, organised, governed and assessed.

Data-centric science requires shifts in the rewards and incentives provided to those who produce, curate and analyse data. This challenges established hierarchies: laboratory technicians, librarians and database managers turn out to have crucial skills, subverting the common view of their jobs as marginal to knowledge production. Ideas of research excellence are also being challenged. Data management is increasingly recognised as crucial to the sustainability and impact of research, and national funders are moving away from citation counts and impact factors in evaluations.

New uses of data are forcing publishers to re-assess their business models and dissemination procedures, and research institutions are struggling to adapt their management and administration.

Data-centric science is emerging in concert with calls for increased openness in research….(More)”

A solution to the single-question crowd wisdom problem


Dražen Prelec,H. Sebastian Seung & John McCoy in Nature: “Once considered provocative, the notion that the wisdom of the crowd is superior to any individual has become itself a piece of crowd wisdom, leading to speculation that online voting may soon put credentialed experts out of business. Recent applications include political and economic forecasting, evaluating nuclear safety, public policy, the quality of chemical probes, and possible responses to a restless volcano. Algorithms for extracting wisdom from the crowd are typically based on a democratic voting procedure. They are simple to apply and preserve the independence of personal judgment. However, democratic methods have serious limitations. They are biased for shallow, lowest common denominator information, at the expense of novel or specialized knowledge that is not widely shared. Adjustments based on measuring confidence do not solve this problem reliably. Here we propose the following alternative to a democratic vote: select the answer that is more popular than people predict. We show that this principle yields the best answer under reasonable assumptions about voter behaviour, while the standard ‘most popular’ or ‘most confident’ principles fail under exactly those same assumptions. Like traditional voting, the principle accepts unique problems, such as panel decisions about scientific or artistic merit, and legal or historical disputes. The potential application domain is thus broader than that covered by machine learning and psychometric methods, which require data across multiple questions…(More).

Using Algorithms To Predict Gentrification


Tanvi Misra in CityLab: “I know it when I see it,” is as true for gentrification as it is for pornography. Usually, it’s when a neighborhood’s property values and demographics are already changing that the worries about displacement set in—rousing housing advocates and community organizers to action. But by that time, it’s often hard to pause, and put in safeguards for the neighborhood’s most vulnerable residents.

But what if there was an early warning system that detects where price appreciation or decline is about to occur? Predictive tools like this have been developed around the country, most notably by researchers in San Francisco. And their value is clear: city leaders and non-profits pinpoint where to preserve existing affordable housing, where to build more, and where to attract business investment ahead of time. But they’re often too academic or too obscure, which is why it’s not yet clear how they’re being used by policymakers and planners.

That’s the problem Ken Steif, at the University of Pennsylvania, is working to solve, in partnership with Alan Mallach, from the Center for Community Progress.

Mallach’s non-profit focused on revitalizing distressed neighborhoods, particularly in “legacy cities.” These are towns like St. Louis, Flint, Dayton, and Baltimore, that have experienced population loss and economic contraction in recent years, and suffer from property vacancies, blight, and unemployment. Mallach is interested in understanding which neighborhoods are likely to continue down that path, and which ones will do a 180-degree turn. Right now, he can intuitively make those predictions, based on his observations on neighborhood characteristics like housing stock, median income, and race. But an objective assessment can help confirm or deny his hypotheses.

That’s where Steif comes in. Having consulted with cities and non-profits on place-based data analytics, Steif has developed a number of algorithms that predict the movement of housing markets using expensive private data from entities like Zillow. Mallach suggested he try his algorithms on Census data, which is free and standardized.

The phenomenon he tested was  ‘endogenous gentrification’—this idea that an increase in home prices moves from wealthy neighborhoods to less expensive ones in its vicinity, like a wave. ..Steif used Census data from 1990 and 2000 to predict housing price change in 2010 in 29 big and small legacy cities. His algorithms took into account the relationship between the median home prices of a census tract to the ones around it, the proximity of census tracts to high-cost areas, and the spatial patterns in home price distribution. It also folded in variables like race, income and housing supply, among others.

After cross-checking the 2010 prediction with actual home prices, he projected the neighborhood change all the way to 2020. His algorithms were able to compute the speed and breadth of the wave of gentrification over time reasonably well, overall…(More)”.