Dubai Data Releases Findings of ‘The Dubai Data Economic Impact Report’


Press Release: “the ‘Dubai Data Economic Impact Report’…provides the Dubai Government with insights into the potential economic impacts of opening and sharing data and includes a methodology for more rigorous measurement of the economic impacts of open and shared data, to allow regular assessment of the actual impacts in the future.

The study estimates that the opening and sharing of government and private sector data will potentially add a total of 10.4 billion AED Gross Value Added (GVA) impact to Dubai’s economy annually by 2021. Opening government data alone will result in a GVA impact of 6.6 billion AED annually as of 2021. This is equivalent to approximately 0.8% to 1.2% of Dubai’s forecasted GDP for 2021. Transport, storage, and communications are set to be the highest contributor to this potential GVA of opening government data, accounting for (27.8% or AED1.85 bn) of the total amount, followed by public administration (23.6% or AED 1.57 bn); wholesale, retail, restaurants, and hotels (13.7% or AED 908 million); real estate (9.6% or AED 639 million); and professional services (8.9% or AED 588 million). Finance and insurance, meanwhile, is calculated to make up 6.5% (AED 433 million) of the GVA, while mining, manufacturing, and utilities (6% or AED 395 million); construction (3.5% or AED 230 million); and entertainment and arts (0.4% or AED27 million) account for the remaining proportion.

This economic impact will be realized through the publication, exchange, use and reuse of Dubai data. The Dubai Data Law of 2015 mandates that data providers publish open data and exchange shared data. It defines open data as any Dubai data which is published and can be downloaded, used and re-used without restrictions by all types of users, while shared data is the data that has been classified as either confidential, sensitive, or secret, and can only be accessed by other government entities or by other authorised persons. The law pertains to local government entities, federal government entities which have any data relating to the emirate, individuals and companies who produce, own, disseminate, or exchange any data relating to the emirate. It aims to realise Dubai’s vision of transforming itself into a smart city, manage Dubai Data in accordance with a clear and specific methodology that is consistent with international best practices, integrate the services provided by federal and local government entities, and optimise the use of the data available to data providers, among other objectives….

The study identifies several stakeholders  involved in the use and reuse of open and shared data. These stakeholders – some of whom are qualified as “data creators” – play an important role in the process of generating the economic impacts. They include: data enrichers, who combine open data with their own sources and/or knowledge; data enablers, who do not profit directly from the data, but do so via the platforms and technologies they are provided on; data developers, who design and build Application Programming Interfaces (APIs); and data aggregators, who collect and pool data, providing it to other stakeholders….(More)”

A Guide to Data Innovation for Development – From idea to proof-of-concept


Press Release: “UNDP and UN Global Pulse today released a comprehensive guide on how to integrate new sources of data into development and humanitarian work.

New and emerging data sources such as mobile phone data, social media, remote sensors and satellites have the potential to improve the work of governments and development organizations across the globe.

Entitled A Guide to Data Innovation for Development – From idea to proof-of-concept,’ this publication was developed by practitioners for practitioners. It provides step-by-step guidance for working with new sources of data to staff of UN agencies and international Non-Governmental Organizations.

The guide is a result of a collaboration of UNDP and UN Global Pulse with support from UN Volunteers. Led by UNDP innovation teams in Europe and Central Asia and Arab States, six UNDP offices in Armenia, Egypt, Kosovo[1], fYR Macedonia, Sudan and Tunisia each completed data innovation projects applicable to development challenges on the ground.

The publication builds on these successful case trials and on the expertise of data innovators from UNDP and UN Global Pulse who managed the design and development of those projects.

It provides practical guidance for jump-starting a data innovation project, from the design phase through the creation of a proof-of-concept.

The guide is structured into three sections – (I) Explore the Problem & System, (II) Assemble the Team and (III) Create the Workplan. Each of the sections comprises of a series of tools for completing the steps needed to initiate and design a data innovation project, to engage the right partners and to make sure that adequate privacy and protection mechanisms are applied.

…Download ‘A Guide to Data Innovation for Development – From idea to proof-of-concept’ here.”

How the Circle Line rogue train was caught with data


Daniel Sim at the Data.gov.sg Blog: “Singapore’s MRT Circle Line was hit by a spate of mysterious disruptions in recent months, causing much confusion and distress to thousands of commuters.

Like most of my colleagues, I take a train on the Circle Line to my office at one-north every morning. So on November 5, when my team was given the chance to investigate the cause, I volunteered without hesitation.

 From prior investigations by train operator SMRT and the Land Transport Authority (LTA), we already knew that the incidents were caused by some form of signal interference, which led to loss of signals in some trains. The signal loss would trigger the emergency brake safety feature in those trains and cause them to stop randomly along the tracks.

But the incidents — which first happened in August — seemed to occur at random, making it difficult for the investigation team to pinpoint the exact cause.

We were given a dataset compiled by SMRT that contained the following information:

  • Date and time of each incident
  • Location of incident
  • ID of train involved
  • Direction of train…

LTA and SMRT eventually published a joint press release on November 11 to share the findings with the public….

When we first started, my colleagues and I were hoping to find patterns that may be of interest to the cross-agency investigation team, which included many officers at LTA, SMRT and DSTA. The tidy incident logs provided by SMRT and LTA were instrumental in getting us off to a good start, as minimal cleaning up was required before we could import and analyse the data. We were also gratified by the effective follow-up investigations by LTA and DSTA that confirmed the hardware problems on PV46.

From the data science perspective, we were lucky that incidents happened so close to one another. That allowed us to identify both the problem and the culprit in such a short time. If the incidents were more isolated, the zigzag pattern would have been less apparent, and it would have taken us more time — and data — to solve the mystery….(More).”

New Data Portal to analyze governance in Africa


New Institute Pushes the Boundaries of Big Data


Press Release: “Each year thousands of genomes are sequenced, millions of neuronal activity traces are recorded, and light from hundreds of millions of galaxies is captured by our newest telescopes, all creating datasets of staggering size. These complex datasets are then stored for analysis.

Ongoing analysis of these information streams has illuminated a problem, however: Scientists’ standard methodologies are inadequate to the task of analyzing massive quantities of data. The development of new methods and software to learn from data and to model — at sufficient resolution — the complex processes they reflect is now a pressing concern in the scientific community.

To address these challenges, the Simons Foundation has launched a substantial new internal research group called the Flatiron Institute (FI). The FI is the first multidisciplinary institute focused entirely on computation. It is also the first center of its kind to be wholly supported by private philanthropy, providing a permanent home for up to 250 scientists and collaborating expert programmers all working together to create, deploy and support new state-of-the-art computational methods. Few existing institutions support the combination of scientists and programmers, instead leaving programming to relatively impermanent graduate students and postdoctoral fellows, and none have done so at the scale of the Flatiron Institute or with such a broad scope, at a single location.

The institute will hold conferences and meetings and serve as a focal point for computational science around the world….(More)”.

Open Data Workspace for Analyzing Hate Crime Trends


Press Release: “The Anti-Defamation League (ADL) and data.world today announced the launch of a public, open data workspace to help understand and combat the rise of hate crimes. The new workspace offers instant access to ADL data alongside relevant data from the FBI and other authoritative sources, and provides citizens, journalists and lawmakers with tools to more effectively analyze, visualize and discuss hate crimes across the United States.

The new workspace was unveiled at ADL’s inaugural “Never Is Now” Summit on Anti-Semitism, a daylong event bringing together nearly 1,000 people in New York City to hear from an array of experts on developing innovative new ways to combat anti-Semitism and bigotry….

Hate Crime Reporting Gaps


The color scale depicts total reported hate crime incidents per 100,000 people in each state. States with darker shading have more reported incidents of hate crimes while states with lighter shading have fewer reported incidents. The green circles proportionally represent cities that either Did Not Report hate crime data or affirmatively reported 0 hate crimes for the year 2015. Note the lightly shaded states in which many cities either Do Not Report or affirmatively report 0 hate crimes….(More)”

World leaders must invest in better data on children


Press Release: “UNICEF is calling on world leaders to invest in better data on children, warning in a new analysis that sufficient data is available only for half of the child-related Sustainable Development Goals indicators. 

The UNICEF analysis shows that child-related data, including measures on poverty and violence that can be compared, are either too limited or of poor quality, leaving governments without the information they need to accurately address challenges facing millions of children, or to track progress towards achieving the Goals….

Examples of missing data:

• Around one in three countries does not have comparable measures on child poverty.

• Around 120 million girls under the age of 20 have been subjected to forced sexual intercourse or other forced sexual acts. Boys are also at risk, but almost no data is available. 

• There is a shortage of accurate and comparable data on the number of children with disabilities in almost all countries. 

• Universal access to safe drinking water is a fundamental need and human right. We have data about where drinking water comes from, but we often don’t know how safe it is.

• Nine out of 10 children are in primary school, yet crucial data about how many are learning is missing. 

• Every day 830 mothers die as a result of complications related to childbirth. Most of these deaths are preventable, yet there are critical data gaps about the quality of maternal care.

• Stunting denies children a fair chance of survival, growth and development. Yet 105 out of 197 countries do not have recent data on stunting.

• One in two countries around the world lack recent data on overweight children.

UNICEF is calling for governments to invest in disaggregated, comparable and quality data for children, to adequately address issues including intergenerational cycles of poverty, preventable deaths, and violence against children….(More)”

“Big Data Europe” addresses societal challenges with data technologies


Press Release: “Across society, from health to agriculture and transport, from energy to climate change and security, practitioners in every discipline recognise the potential of the enormous amounts of data being created every day. The challenge is to capture, manage and process that information to derive meaningful results and make a difference to people’s lives. The Big Data Europe project has just released the first public version of its open source platform designed to do just that. In 7 pilot studies, it is helping to solve societal challenges by putting cutting edge technology in the hands of experts in fields other than IT.

Although many crucial big data technologies are freely available as open source software, they are often difficult for non-experts to integrate and deploy. Big Data Europe solves that problem by providing a package that can readily be installed locally or at any scale in a cloud infrastructure by a systems administrator, and configured via a simple user interface. Tools like Apache Hadoop, Apache Spark, Apache Flink and many others can be instantiated easily….

The tools included in the platform were selected after a process of requirements-gathering across the seven societal challenges identified by the European Commission (Health, Food, Energy, Transport, Climate, Social Sciences and Security). Tasks like message passing are handled using Kafka and Flume, storage by Hive and Cassandra, or publishing through geotriples. The platform uses the Docker system to make it easy to add new tools and, again, for them to operate at a scale limited only by the computing infrastructure….

The platform can be downloaded from GitHub.
See also the installation instructions, Getting Started and video.”

The risks of relying on robots for fairer staff recruitment


Sarah O’Connor at the Financial Times: “Robots are not just taking people’s jobs away, they are beginning to hand them out, too. Go to any recruitment industry event and you will find the air is thick with terms like “machine learning”, “big data” and “predictive analytics”.

The argument for using these tools in recruitment is simple. Robo-recruiters can sift through thousands of job candidates far more efficiently than humans. They can also do it more fairly. Since they do not harbour conscious or unconscious human biases, they will recruit a more diverse and meritocratic workforce.

This is a seductive idea but it is also dangerous. Algorithms are not inherently neutral just because they see the world in zeros and ones.

For a start, any machine learning algorithm is only as good as the training data from which it learns. Take the PhD thesis of academic researcher Colin Lee, released to the press this year. He analysed data on the success or failure of 441,769 job applications and built a model that could predict with 70 to 80 per cent accuracy which candidates would be invited to interview. The press release plugged this algorithm as a potential tool to screen a large number of CVs while avoiding “human error and unconscious bias”.

But a model like this would absorb any human biases at work in the original recruitment decisions. For example, the research found that age was the biggest predictor of being invited to interview, with the youngest and the oldest applicants least likely to be successful. You might think it fair enough that inexperienced youngsters do badly, but the routine rejection of older candidates seems like something to investigate rather than codify and perpetuate. Mr Lee acknowledges these problems and suggests it would be better to strip the CVs of attributes such as gender, age and ethnicity before using them….(More)”

White House, Transportation Dept. want help using open data to prevent traffic crashes


Samantha Ehlinger in FedScoop: “The Transportation Department is looking for public input on how to better interpret and use data on fatal crashes after 2015 data revealed a startling spike of 7.2 percent more deaths in traffic accidents that year.

Looking for new solutions that could prevent more deaths on the roads, the department released three months earlier than usual the 2015 open dataset about each fatal crash. With it, the department and the White House announced a call to action for people to use the data set as a jumping off point for a dialogue on how to prevent crashes, as well as understand what might be causing the spike.

“What we’re ultimately looking for is getting more people engaged in the data … matching this with other publicly available data, or data that the private sector might be willing to make available, to dive in and to tell these stories,” said Bryan Thomas, communications director for the National Highway Traffic Safety Administration, to FedScoop.

One striking statistic was that “pedestrian and pedalcyclist fatalities increased to a level not seen in 20 years,” according to a DOT press release. …

“We want folks to be engaged directly with our own data scientists, so we can help people through the dataset and help answer their questions as they work their way through, bounce ideas off of us, etc.,” Thomas said. “We really want to be accessible in that way.”

He added that as ideas “come to fruition,” there will be opportunities to present what people have learned.

“It’s a very, very rich data set, there’s a lot of information there,” Thomas said. “Our own ability is, frankly, limited to investigate all of the questions that you might have of it. And so we want to get the public really diving in as well.”…

Here are the questions “worth exploring,” according to the call to action:

  • How might improving economic conditions around the country change how Americans are getting around? What models can we develop to identify communities that might be at a higher risk for fatal crashes?
  • How might climate change increase the risk of fatal crashes in a community?
  • How might we use studies of attitudes toward speeding, distracted driving, and seat belt use to better target marketing and behavioral change campaigns?
  • How might we monitor public health indicators and behavior risk indicators to target communities that might have a high prevalence of behaviors linked with fatal crashes (drinking, drug use/addiction, etc.)? What countermeasures should we create to address these issues?”…(More)”