Sqoop


DataDrivenJournalism: “Just because there’s a duty to disclose, doesn’t mean there’s a duty to make it easy. This seems to be a universally true when it comes to public records, regardless of the country or government making them available.

The consequences for journalists can be profound: hours of time spent digging through messy data, missing stories that go untold, and the opportunity costs that come with these, just to name a few.

This is a problem we set out to improve a couple of years ago in the US with the introduction of Sqoop, a free data journalism site intended to make it easier for reporters to find and track public records, starting with the Securities and Exchange Commission (SEC), the Patent Office, and the federal court system, otherwise known as PACER (public access to court automated records).

Think of it as a search box across all of these public records sites (and we’re working to add others) as well as a rapid alerting service. If a journalist has saved searches for “Facebook”, “Jeffrey P. Bezos”, or “Internet of Things”, she will receive email alerts every time these search terms show up in new public filings.

Journalists can refine search results based on data source, form type, and geographic factors, and then save those searches as alerts….(More)”.

Four lessons NHS Trusts can learn from the Royal Free case


Blog by Elizabeth Denham, Information Commissioner in the UK: “Today my office has announced that the Royal Free London NHS Foundation Trust did not comply with the Data Protection Act when it turned over the sensitive medical data of around 1.6 million patients to Google DeepMind, a private sector firm, as part of a clinical safety initiative. As a result of our investigation, the Trust has been asked to sign an undertaking committing it to changes to ensure it is acting in accordance with the law, and we’ll be working with them to make sure that happens.

But what about the rest of the sector? As organisations increasingly look to unlock the huge potential that creative uses of data can have for patient care, what are the lessons to be learned from this case?

It’s not a choice between privacy or innovation

It’s welcome that the trial looks to have been positive. The Trust has reported successful outcomes. Some may reflect that data protection rights are a small price to pay for this.

But what stood out to me on looking through the results of the investigation is that the shortcomings we found were avoidable. The price of innovation didn’t need to be the erosion of legally ensured fundamental privacy rights….

Don’t dive in too quickly

Privacy impact assessments are a key data protection tool of our era, as evolving law and best practice around the world demonstrate. Privacy impact assessments play an increasingly prominent role in data protection, and they’re a crucial part of digital innovation. ….

New cloud processing technologies mean you can, not that you always should

Changes in technology mean that vast data sets can be made more readily available and can be processed faster and using greater data processing technologies. That’s a positive thing, but just because evolving technologies can allow you to do more doesn’t mean these tools should always be fully utilised, particularly during a trial initiative….

Know the law, and follow it

No-one suggests that red tape should get in the way of progress. But when you’re setting out to test the clinical safety of a new service, remember that the rules are there for a reason….(More)”

Data for Development: The Case for Information, Not Just Data


Daniela Ligiero at the Council on Foreign Relations: “When it comes to development, more data is often better—but in the quest for more data, we can often forget about ensuring we have information, which is even more valuable. Information is data that have been recorded, classified, organized, analyzed, interpreted, and translated within a framework so that meaning emerges. At the end of the day, information is what guides action and change.

The need for more data

In 2015, world leaders came together to adopt a new global agenda to guide efforts over the next fifteen years, the Sustainable Development Goals. The High-level Political Forum (HLPF), to be held this year at the United Nations on July 10-19, is an opportunity for review of the 2030 Agenda, and will include an in-depth analysis of seven of the seventeen goals—including those focused on poverty, health, and gender equality. As part of the HLPF, member states are encouraged to undergo voluntary national reviews of progress across goals to facilitate the sharing of experiences, including successes, challenges, and lessons learned; to strengthen policies and institutions; and to mobilize multi-stakeholder support and partnerships for the implementation of the agenda.

A significant challenge that countries continue to face in this process, and one that becomes painfully evident during the HLPF, is the lack of data to establish baselines and track progress. Fortunately, new initiatives aligned with the 2030 Agenda are working to focus on data, such as the Global Partnership for Sustainable Development Data. There are also initiatives focus on collecting more and better data in particular areas, like gender data (e.g., Data2X; UN Women’s Making Every Girl and Woman Count). This work is important and urgently needed.

Data to monitor global progress on the goals is critical to keeping countries accountable to their commitments and allows countries to examine how they are doing across multiple, ambitious goals. However, equally important is the rich, granular national and sub-national level data that can guide the development and implementation of evidence-based, effective programs and policies. These kinds of data are also often lacking or of poor quality, in which case more data and better data is essential. But a frequently-ignored piece of the puzzle at the national level is improved use of the data we already have.

Making the most of the data we have

To illustrate this point, consider the Together for Girls partnership, which was built on obtaining new data where it was lacking and effectively translating it into information to change policies and programs. We are a partnership between national governments, UN agencies and private sector organizations working to break cycles of violence, with special attention to sexual violence against girls. …The first pillar of our work is focused on understanding violence against children within a country, always at the request of the national government. We do this through a national household survey – the Violence Against Children Survey (VACS), led by national governments, CDC, and UNICEF as part of the Together for Girls Partnership….

The truth is there is a plethora of data at the country level, generated by surveys, special studies, administrative systems, private sector, and citizens that can provide meaningful insights across all the development goals.

Connecting the dots

But data—like our programs’—often remain in silos. For example, data focused on violence against children is typically not top of mind for those working on women’s empowerment or adolescent health. Yet, as an example, the VACS can offer valuable information about how sexual violence against girls, as young as 13,is connected to adolescent pregnancy—or how one of the most common perpetrators of sexual violence against girls is a partner, a pattern that starts early and is a predictor for victimization and perpetration later in life.  However, these data are not consistently used across actors working on programs related to adolescent pregnancy and violence against women….(More)”.

Legal and Ethical Issues of Crowdsourcing


Alqahtani, Bashayr et al in the International Journal of Computer Applications: “Crowdsourcing is lately developed expression which meaning that the outsourcing process of activities by crowd in the form of an ‘open call’ or a firm to an online community. An assigned task can be completed by any member of the crowd and be paid due to their efforts, also to attract the best possible ideas and approaches to boost innovation or to complete data processing tasks. Though the labor organization form was pioneered in the calculation sector, businesses companies have begun using ‘crowdsourcing’ for a various domain of tasks that they discover can be preferable completed and good achieved with crowds’ members instead of their own employees. This research will define the principle of crowdsourcing, types of it, challenges of crowdsourcing, also it will explain advantages and disadvantages and the way that firms are utilizing marketing task application crowdsourcing for the completion, discuss some of legal issues and ethical issues with regulations. Finally, this article will be completed as a paper research for crowdsourcing…(More)”.

The role of Open Data in driving sustainable mobility in nine smart cities


Paper by Piyush Yadav et al: “In today’s era of globalization, sustainable mobility is considered as a key factor in the economic growth of any country. With the emergence of open data initiatives, there is tremendous potential to improve mobility. This paper presents findings of a detailed analysis of mobility open data initiatives in nine smart cities – Amsterdam, Barcelona, Chicago, Dublin, Helsinki, London, Manchester, New York and San Francisco. The paper discusses the study of various sustainable indicators in the mobility domain and its convergence with present open datasets. Specifically, it throws light on open data ecosystems in terms of their production and consumption. It gives a comprehensive view of the nature of mobility open data with respect to their formats, interactivity, and availability. The paper details the open datasets in terms of their alignment with different mobility indicators, publishing platforms, applications and API’s available. The paper discusses how these open datasets have shown signs of fostering organic innovation and sustainable growth in smart cities with impact on mobility trends. The results of the work can be used to inform the design of data driven sustainable mobility in smart cities to maximize the utilization of available open data resources….(More)”.

Gender Biases in Cyberspace: A Two-Stage Model, the New Arena of Wikipedia and Other Websites


Paper by Shlomit Yanisky-Ravid and Amy Mittelman: “Increasingly, there has been a focus on creating democratic standards and norms in order to best facilitate open exchange of information and communication online―a goal that fits neatly within the feminist aim to democratize content creation and community. Collaborative websites, such as blogs, social networks, and, as focused on in this Article, Wikipedia, represent both a cyberspace community entirely outside the strictures of the traditional (intellectual) proprietary paradigm and one that professes to truly embody the philosophy of a completely open, free, and democratic resource for all. In theory, collaborative websites are the solution for which social activists, intellectual property opponents, and feminist theorists have been waiting. Unfortunately, we are now realizing that this utopian dream does not exist as anticipated: the Internet is neither neutral nor open to everyone. More importantly, these websites are not egalitarian; rather, they facilitate new ways to exclude and subordinate women. This Article innovatively argues that the virtual world excludes women in two stages: first, by controlling websites and filtering out women; and second, by exposing women who survived the first stage to a hostile environment. Wikipedia, as well as other cyber-space environments, demonstrates the execution of the model, which results in the exclusion of women from the virtual sphere with all the implications thereof….(More)”.

Open Government: Concepts and Challenges for Public Administration’s Management in the Digital Era


Tippawan Lorsuwannarat in the Journal of Public and Private Management: “This paper has four main objectives. First, to disseminate a study on the meaning and development of open government. Second, to describe the components of an open government. Third, to examine the international movement situation involved with open government. And last, to analyze the challenges related to the application of open government in Thailandus current digital era. The paper suggests four periods of open government by linking to the concepts of public administration in accordance with the use of information technology in the public sector. The components of open government are consistent with the meaning of open government, including open data, open access, and open engagement. The current international situation of open government considers the ranking of open government and open government partnership. The challenges of adopting open government in Thailand include clear policy regarding open government, digital gap, public organizational culture, laws supporting privacy and data infrastructure….(More)”.

Research data infrastructures in the UK


The Open Research Data Task Force : “This report is intended to inform the work of the Open Research Data Task Force, which has been established with the aim of building on the principles set out in Open Research Data Concordat (published in July 2016) to co-ordinate creation of a roadmap to develop the infrastructure for open research data across the UK. As an initial contribution to that work, the report provides an outline of the policy and service infrastructure in the UK as it stands in the first half of 2017, including some comparisons with other countries; and it points to some key areas and issues which require attention. It does not seek to identify possible courses of action, nor even to suggest priorities the Task Force might consider in creating its final report to be published in 2018. That will be the focus of work for the Task Force over the next few months.

Why is this important?

The digital revolution continues to bring fundamental changes to all aspects of research: how it is conducted, the findings that are produced, and how they are interrogated and transmitted not only within the research community but more widely. We are as yet still in the early stages of a transformation in which progress is patchy across the research community, but which has already posed significant challenges for research funders and institutions, as well as for researchers themselves. Research data is at the heart of those challenges: not simply the datasets that provide the core of the evidence analysed in scholarly publications, but all the data created and collected throughout the research process. Such data represents a potentially-valuable resource for people and organisations in the commercial, public and voluntary sectors, as well as for researchers. Access to such data, and more general moves towards open science, are also critically-important in ensuring that research is reproducible, and thus in sustaining public confidence in the work of the research community. But effective use of research data depends on an infrastructure – of hardware, software and services, but also of policies, organisations and individuals operating at various levels – that is as yet far from fully-formed. The exponential increases in volumes of data being generated by researchers create in themselves new demands for storage and computing power. But since the data is characterised more by heterogeneity then by uniformity, development of the infrastructure to manage it involves a complex set of requirements in preparing, collecting, selecting, analysing, processing, storing and preserving that data throughout its life cycle.

Over the past decade and more, there have been many initiatives on the part of research institutions, funders, and members of the research community at local, national and international levels to address some of these issues. Diversity is a key feature of the landscape, in terms of institutional types and locations, funding regimes, and nature and scope of partnerships, as well as differences between disciplines and subject areas. Hence decision-makers at various levels have fostered via their policies and strategies many community-organised developments, as well as their own initiatives and services. Significant progress has been achieved as a result, through the enthusiasm and commitment of key organisations and individuals. The less positive features have been a relative lack of harmonisation or consolidation, and there is an increasing awareness of patchiness in provision, with gaps, overlaps and inconsistencies. This is not surprising, since policies, strategies and services relating to research data necessarily affect all aspects of support for the diverse processes of research itself. Developing new policies and infrastructure for research data implies significant re-thinking of structures and regimes for supporting, fostering and promoting research itself. That in turn implies taking full account of widely-varying characteristics and needs of research of different kinds, while also keeping in clear view the benefits to be gained from better management of research data, and from greater openness in making data accessible for others to re-use for a wide range of different purposes….(More)”.

Volunteers teach AI to spot slavery sites from satellite images


This data will then be used to train machine learning algorithms to automatically recognise brick kilns in satellite imagery. If computers can pinpoint the location of such possible slavery sites, then the coordinates could be passed to local charities to investigate, says Kevin Bales, the project leader, at the University of Nottingham, UK.

South Asian brick kilns are notorious as modern-day slavery sites. There are an estimated 5 million people working in brick kilns in South Asia, and of those nearly 70 per cent are thought to be working there under duress – often to pay off financial debts.

 However, no one is quite sure how many of these kilns there are in the so-called “Brick Belt”, a region that stretches across parts of Pakistan, India and Nepal. Some estimates put the figure at 20,000, but it may be as high as 50,000.

Bales is hoping that his machine learning approach will produce a more accurate figure and help organisations on the ground know where to direct their anti-slavery efforts.

It’s great to have a tool for identifying possible forced labour sites, says Sasha Jesperson at St Mary’s University in London. But it is just a start – to really find out how many people are being enslaved in the brick kiln industry, investigators still need to visit every site and work out exactly what’s going on there, she says….

So far, volunteers have identified over 4000 potential slavery sites across 400 satellite images taken via Google Earth. Once these have been checked several times by volunteers, Bales plans to use these images to teach the machine learning algorithm what kilns look like, so that it can learn to recognise them in images automatically….(More)”.

The State of Open Data Portals in Latin America


Michael Steinberg at Center for Data Innovation: “Many Latin American countries publish open data—government data made freely available online in machine-readable formats and without license restrictions. However, there is a tremendous amount of variation in the quantity and type of datasets governments publish on national open data portals—central online repositories for open data that make it easier for users to find data. Despite the wide variation among the countries, the most popular datasets tend to be those that either provide transparency into government operations or offer information that citizens can use directly. As governments continue to update and improve their open data portals, they should take steps to ensure that they are publishing the datasets most valuable to their citizens.

To better understand this variation, we collected information about open data portals in 20 Latin American countries including Argentina, Bolivia, Brazil, Chile, Colombia, Costa Rica, Ecuador, Mexico, Panama, Paraguay, Peru, and Uruguay. Not all Latin American countries have an open data portal, but even if they do not operate a unified portal, some governments may still have open data. Four Latin American countries—Belize, Guatemala, Honduras, and Nicaragua—do not have open data portals. One country— El Salvador—does not have a government-run open data portal, but does have a national open data portal (datoselsalvador.org) run by volunteers….

There are many steps Latin American governments can take to improve open data in their country. Those nations without open data portals should create them, and those who already have them should continue to update them and publish more datasets to better serve their constituents. One way to do this is to monitor the popular datasets on other countries’ open data portals, and where applicable, ensure the government produces similar datasets. Those running open data portals should also routinely monitor search queries to see what users are looking for, and if they are looking for datasets that have not yet been posted, work with the relevant government agencies to make these datasets available.

In summary, there are stark differences in the amount of data published, the format of the data, and the most popular datasets in open data portals in Latin America. However, in every country there is an appetite for data that either provides public accountability for government functions or supplies helpful information to citizens…(More)”.