Making Credit Ratings Data Publicly Available


Paper by Marc D. Joffe and Frank Partnoy: “In the aftermath of the 2007-08 global financial crisis, regulators and policy makers recognized the importance of making bond ratings publicly available. Although rating agencies have made some dataavailable, obtaining this information in bulk can be difficult or impossible. At some times, the data is costly; at other times, it is simply unavailable. Some rating agencies have provided data only on a subscription basis for tens or even hundreds of thousands of dollars annually.

The cost and lack of availability of ratings data are particularly striking given the regulatory requirement that rating agencies publish such data. We describe the relevant Securities and Exchange Commission publication rules and requirements. Unfortunately, the ways in which the major credit rating agencies have responded to these rules have not made data available in an easily accessed or comprehensive way and have instead hindered academic and think-tank research into credit ratings. Financial researchers who lack the funds required to purchase bulk ratings must use a variety of ad hoc methods to obtain rating data or limit their studies of credit ratings.

This brief paper describes our recent initiative to make credit ratings data publicly available. We provide links to a software tool written in Python that crawls credit rating agency websites, downloads the XRBL files, and converts them to Comma Separated Value (CSV) format. We also provide a link to the most recently processed ratings data, separated by agency and asset category, as well as the entire universe of ratings actions, including more than eight million assignments, upgrades, downgrades, and withdrawals…(More)”.

Upholding Democracy Amid the Challenges of New Technology


Paper by Eyal Benvenisti at the The European Journal of International Law: “The law on global governance that emerged after the Second World War was grounded in irrefutable trust in international organizations and an assumption that their subjection to legal discipline and judicial review would be unnecessary and, in fact, detrimental to their success. The law that evolved systematically insulated international organizations from internal and external scrutiny and absolved them of any inherent legal obligations – and, to a degree, continues to do so.

Indeed, it was only well after the end of the Cold War that mistrust in global governance began to trickle through into the legal discourse and the realization gradually took hold that the operation of international organizations needed to be subject to the disciplining power of the law. Since the mid-1990s, scholars have sought to identify the conditions under which trust in global bodies can be regained, mainly by borrowing and adapting domestic public law precepts that emphasize accountability through communications with those affected.

Today, although a ‘culture of accountability’ may have taken root, its legal tools are still shaping up and are often contested. More importantly, these communicative tools are ill-equipped to address the new modalities of governance that are based on decision-making by machines using raw data (rather than two-way exchange with stakeholders) as their input.

The new information and communication technologies challenge the foundational premise of the accountability school – that ‘the more communication, the better’ – as voters-turned-users obtain their information from increasingly fragmented and privatized marketplaces of ideas that are manipulated for economic and political gain.

In this article, I describe and analyse how the law has evolved to acknowledge the need for accountability, how it has designed norms for this purpose and continues in this endeavour – yet how the challenges it faces today are leaving its most fundamental assumptions open to question. I argue that, given the growing influence of public and private global governance bodies on our daily lives and the shape of our political communities, the task of the law of global governance is no longer limited to ensuring the accountability of global bodies, but is also to protect human dignity and the very viability of the democratic state….(More)”.

Improving refugee integration through data-driven algorithmic assignment


Kirk Bansak, et al in Science Magazine: “Developed democracies are settling an increased number of refugees, many of whom face challenges integrating into host societies. We developed a flexible data-driven algorithm that assigns refugees across resettlement locations to improve integration outcomes. The algorithm uses a combination of supervised machine learning and optimal matching to discover and leverage synergies between refugee characteristics and resettlement sites.

The algorithm was tested on historical registry data from two countries with different assignment regimes and refugee populations, the United States and Switzerland. Our approach led to gains of roughly 40 to 70%, on average, in refugees’ employment outcomes relative to current assignment practices. This approach can provide governments with a practical and cost-efficient policy tool that can be immediately implemented within existing institutional structures….(More)”.

StatCan now crowdsourcing cannabis data


Kyle Duggan at iPolitics: “The national statistics agency is launching a crowdsourcing project to find out how much weed Canadians are consuming and how much it costs them.

Statistics Canada is searching for the best picture of consumption it can find ahead of legalization, and is turning to average Canadians to improve its rough estimates about a product that’s largely been accessed illegally by the population.

Thursday it released a suite of “experimental” data that make up its current best guesses on Canadian consumption habits, along with a crowdsourcing website and app to get its own estimates – a project officials said is an experiment itself.

Statscan is also rolling out a quarterly cannabis survey this year.

The agency has been combing through historical research on legal and illegal cannabis prices, scraping price data from illegal vendors online and, for some data, is relying largely on the self-reporting website priceofweed.com to assemble as much pot information as possible, even if it’s not perfect data.

The agency has been quietly preparing for the July legalization deadline by compiling health, justice and economic datasets and scouring to fill in the blanks where it can. Come July, legal cannabis will suddenly also need to be rolled into other important data products, like the GDP accounts….(More)”.

Urban Big Data: City Management and Real Estate Markets


Report by Richard Barkham, Sheharyar Bokhari and Albert Saiz: “In this report, we discuss recent trends in the application of urban big data and their impact on real estate markets. We expect such technologies to improve quality of life and the productivity of cities over the long run.

We forecast that smart city technologies will reinforce the primacy of the most successful global metropolises at least for a decade or more. A few select metropolises in emerging countries may also leverage these technologies to leapfrog on the provision of local public services.

In the long run, all cities throughout the urban system will end up adopting successful and cost-effective smart city initiatives. Nevertheless, smaller-scale interventions are likely to crop up everywhere, even in the short run. Such targeted programs are more likely to improve conditions in blighted or relatively deprived neighborhoods, which could generate gentrification and higher valuations there. It is unclear whether urban information systems will have a centralizing or suburbanizing impact. They are likely to make denser urban centers more attractive, but they are also bound to make suburban or exurban locations more accessible…(More)”.

They Are Watching You—and Everything Else on the Planet


Cover article by Robert Draper for Special Issue of the National Geographic: “Technology and our increasing demand for security have put us all under surveillance. Is privacy becoming just a memory?…

In 1949, amid the specter of European authoritarianism, the British novelist George Orwell published his dystopian masterpiece 1984, with its grim admonition: “Big Brother is watching you.” As unsettling as this notion may have been, “watching” was a quaintly circumscribed undertaking back then. That very year, 1949, an American company released the first commercially available CCTV system. Two years later, in 1951, Kodak introduced its Brownie portable movie camera to an awestruck public.

Today more than 2.5 trillion images are shared or stored on the Internet annually—to say nothing of the billions more photographs and videos people keep to themselves. By 2020, one telecommunications company estimates, 6.1 billion people will have phones with picture-taking capabilities. Meanwhile, in a single year an estimated 106 million new surveillance cameras are sold. More than three million ATMs around the planet stare back at their customers. Tens of thousands of cameras known as automatic number plate recognition devices, or ANPRs, hover over roadways—to catch speeding motorists or parking violators but also, in the case of the United Kingdom, to track the comings and goings of suspected criminals. The untallied but growing number of people wearing body cameras now includes not just police but also hospital workers and others who aren’t law enforcement officers. Proliferating as well are personal monitoring devices—dash cams, cyclist helmet cameras to record collisions, doorbells equipped with lenses to catch package thieves—that are fast becoming a part of many a city dweller’s everyday arsenal. Even less quantifiable, but far more vexing, are the billions of images of unsuspecting citizens captured by facial-recognition technology and stored in law enforcement and private-sector databases over which our control is practically nonexistent.

Those are merely the “watching” devices that we’re capable of seeing. Presently the skies are cluttered with drones—2.5 million of which were purchased in 2016 by American hobbyists and businesses. That figure doesn’t include the fleet of unmanned aerial vehicles used by the U.S. government not only to bomb terrorists in Yemen but also to help stop illegal immigrants entering from Mexico, monitor hurricane flooding in Texas, and catch cattle thieves in North Dakota. Nor does it include the many thousands of airborne spying devices employed by other countries—among them Russia, China, Iran, and North Korea.

We’re being watched from the heavens as well. More than 1,700 satellites monitor our planet. From a distance of about 300 miles, some of them can discern a herd of buffalo or the stages of a forest fire. From outer space, a camera clicks and a detailed image of the block where we work can be acquired by a total stranger….

This is—to lift the title from another British futurist, Aldous Huxley—our brave new world. That we can see it coming is cold comfort since, as Carnegie Mellon University professor of information technology Alessandro Acquisti says, “in the cat-and-mouse game of privacy protection, the data subject is always the weaker side of the game.” Simply submitting to the game is a dispiriting proposition. But to actively seek to protect one’s privacy can be even more demoralizing. University of Texas American studies professor Randolph Lewis writes in his new book, Under Surveillance: Being Watched in Modern America, “Surveillance is often exhausting to those who really feel its undertow: it overwhelms with its constant badgering, its omnipresent mysteries, its endless tabulations of movements, purchases, potentialities.”

The desire for privacy, Acquisti says, “is a universal trait among humans, across cultures and across time. You find evidence of it in ancient Rome, ancient Greece, in the Bible, in the Quran. What’s worrisome is that if all of us at an individual level suffer from the loss of privacy, society as a whole may realize its value only after we’ve lost it for good.”…(More)”.

Crowd monitoring through WiFi Data


Article on the European JRC Open Day Experiment by Gioia Ciro; Tarchi Dario; Vespe Michele and Sermi Francesco: “The research pointed out the feasibility of crowd monitoring through WiFi data. A methodology has been developed and tested using real data. The data were collected during the JRC Open Day 2016 by 20 WiFi access points deployed on the Ispra site. The methodology includes a cleaning procedure to identify actual users and a user localization technique based on a modified WeC approach. The estimated number of attending people were compared with the statistics of the security service showing evident consistency; the possibility to validate the results with independent information represents a significant added value to this research. Finally, the proposed approach allowed to reconstruct the distribution of people within the site in different time spots….(More)”.

Extracting crowd intelligence from pervasive and social big data


Introduction by Leye Wang, Vincent Gauthier, Guanling Chen and Luis Moreira-Matias of Special Issue of the Journal of Ambient Intelligence and Humanized Computing: “With the prevalence of ubiquitous computing devices (smartphones, wearable devices, etc.) and social network services (Facebook, Twitter, etc.), humans are generating massive digital traces continuously in their daily life. Considering the invaluable crowd intelligence residing in these pervasive and social big data, a spectrum of opportunities is emerging to enable promising smart applications for easing individual life, increasing company profit, as well as facilitating city development. However, the nature of big data also poses fundamental challenges on the techniques and applications relying on the pervasive and social big data from multiple perspectives such as algorithm effectiveness, computation speed, energy efficiency, user privacy, server security, data heterogeneity and system scalability. This special issue presents the state-of-the-art research achievements in addressing these challenges. After the rigorous review process of reviewers and guest editors, eight papers were accepted as follows.

The first paper “Automated recognition of hypertension through overnight continuous HRV monitoring” by Ni et al. proposes a non-invasive way to differentiate hypertension patients from healthy people with the pervasive sensors such as a waist belt. To this end, the authors train a machine learning model based on the heart rate data sensed from waists worn by a crowd of people, and the experiments show that the detection accuracy is around 93%.

The second paper “The workforce analyzer: group discovery among LinkedIn public profiles” by Dai et al. describes two users’ group discovery methods among LinkedIn public profiles. One is based on K-means and another is based on SVM. The authors contrast results of both methods and provide insights about the trending professional orientations of the workforce from an online perspective.

The third paper “Tweet and followee personalized recommendations based on knowledge graphs” by Pla Karidi et al. present an efficient semantic recommendation method that helps users filter the Twitter stream for interesting content. The foundation of this method is a knowledge graph that can represent all user topics of interest as a variety of concepts, objects, events, persons, entities, locations and the relations between them. An important advantage of the authors’ method is that it reduces the effects of problems such as over-recommendation and over-specialization.

The fourth paper “CrowdTravel: scenic spot profiling by using heterogeneous crowdsourced data” by Guo et al. proposes CrowdTravel, a multi-source social media data fusion approach for multi-aspect tourism information perception, which can provide travelling assistance for tourists by crowd intelligence mining. Experiments over a dataset of several popular scenic spots in Beijing and Xi’an, China, indicate that the authors’ approach attains fine-grained characterization for the scenic spots and delivers excellent performance.

The fifth paper “Internet of Things based activity surveillance of defence personnel” by Bhatia et al. presents a comprehensive IoT-based framework for analyzing national integrity of defence personnel with consideration to his/her daily activities. Specifically, Integrity Index Value is defined for every defence personnel based on different social engagements, and activities for detecting the vulnerability to national security. In addition to this, a probabilistic decision tree based automated decision making is presented to aid defence officials in analyzing various activities of a defence personnel for his/her integrity assessment.

The sixth paper “Recommending property with short days-on-market for estate agency” by Mou et al. proposes an estate with short days-on-market appraisal framework to automatically recommend those estates using transaction data and profile information crawled from websites. Both the spatial and temporal characteristics of an estate are integrated into the framework. The results show that the proposed framework can estimate accurately about 78% estates.

The seventh paper “An anonymous data reporting strategy with ensuring incentives for mobile crowd-sensing” by Li et al. proposes a system and a strategy to ensure anonymous data reporting while ensuring incentives simultaneously. The proposed protocol is arranged in five stages that mainly leverage three concepts: (1) slot reservation based on shuffle, (2) data submission based on bulk transfer and multi-player dc-nets, and (3) incentive mechanism based on blind signature.

The last paper “Semantic place prediction from crowd-sensed mobile phone data” by Celik et al. semantically classifes places visited by smart phone users utilizing the data collected from sensors and wireless interfaces available on the phones as well as phone usage patterns, such as battery level, and time-related information, with machine learning algorithms. For this study, the authors collect data from 15 participants at Galatasaray University for 1 month, and try different classification algorithms such as decision tree, random forest, k-nearest neighbour, naive Bayes, and multi-layer perceptron….(More)”.

Improving journeys by opening data: The case of Transport for London (TfL)


Merlin Stone and Eleni Aravopoulou in The Bottom Line: “This case study describes how one of the world’s largest public transport operations, Transport for London (TfL), transformed the real-time availability of information for its customers and staff through the open data approach, and what the results of this transformation were. Its purpose is therefore to show what is required for an open data approach to work.

This case study is based mainly on interviews at TfL and data supplied by TfL directly to the researchers. It analyses as far as possible the reported facts of the case, in order to identify the processes required to open data and the benefits thereof.

The main finding is that achieving an open data approach in public transport is helped by having a clear commitment to the idea that the data belongs to the public and that third parties should be allowed to use and repurpose the information, by having a strong digital strategy, and by creating strong partnerships with data management organisations that can support the delivery of high volumes of information.

The case study shows how open data can be used to create commercial and non-commercial customer-facing products and services, which passengers and other road users use to gain a better travel experience, and that this approach can be valued in terms of financial/economic contribution to customers and organisations….(More)”.

The World’s Biggest Biometric Database Keeps Leaking People’s Data


Rohith Jyothish at FastCompany: “India’s national scheme holds the personal data of more than 1.13 billion citizens and residents of India within a unique ID system branded as Aadhaar, which means “foundation” in Hindi. But as more and more evidence reveals that the government is not keeping this information private, the actual foundation of the system appears shaky at best.

On January 4, 2018, The Tribune of India, a news outlet based out of Chandigarh, created a firestorm when it reported that people were selling access to Aadhaar data on WhatsApp, for alarmingly low prices….

The Aadhaar unique identification number ties together several pieces of a person’s demographic and biometric information, including their photograph, fingerprints, home address, and other personal information. This information is all stored in a centralized database, which is then made accessible to a long list of government agencies who can access that information in administrating public services.

Although centralizing this information could increase efficiency, it also creates a highly vulnerable situation in which one simple breach could result in millions of India’s residents’ data becoming exposed.

The Annual Report 2015-16 of the Ministry of Electronics and Information Technology speaks of a facility called DBT Seeding Data Viewer (DSDV) that “permits the departments/agencies to view the demographic details of Aadhaar holder.”

According to @databaazi, DSDV logins allowed third parties to access Aadhaar data (without UID holder’s consent) from a white-listed IP address. This meant that anyone with the right IP address could access the system.

This design flaw puts personal details of millions of Aadhaar holders at risk of broad exposure, in clear violation of the Aadhaar Act.…(More)”.