Stefaan Verhulst
National Institute of Standards and Technology: “Databases across the country include information with potentially important research implications and uses, e.g. contingency planning in disaster scenarios, identifying safety risks in aviation, assist in tracking contagious diseases, identifying patterns of violence in local communities. However, included in these datasets are personally identifiable information (PII) and it is not enough to simply remove PII from these datasets. It is well known that using auxiliary and possibly completely unrelated datasets, in combination with records in the dataset, can correspond to uniquely identifiable individuals (known as a linkage attack). Today’s efforts to remove PII do not provide adequate protection against linkage attacks. With the advent of “big data” and technological advances in linking data, there are far too many other possible data sources related to each of us that can lead to our identity being uncovered.
Get Involved – How to Participate
The Unlinkable Data Challenge is a multi-stage Challenge. This first stage of the Challenge is intended to source detailed concepts for new approaches, inform the final design in the two subsequent stages, and provide recommendations for matching stage 1 competitors into teams for subsequent stages. Teams will predict and justify where their algorithm fails with respect to the utility-privacy frontier curve.
In this stage, competitors are asked to propose how to de-identify a dataset using less than the available privacy budget, while also maintaining the dataset’s utility for analysis. For example, the de-identified data, when put through the same analysis pipeline as the original dataset, produces comparable results (i.e. similar coefficients in a linear regression model, or a classifier that produces similar predictions on sub-samples of the data).
This stage of the Challenge seeks Conceptual Solutions that describe how to use and/or combine methods in differential privacy to mitigate privacy loss when publicly releasing datasets in a variety of industries such as public safety, law enforcement, healthcare/biomedical research, education, and finance. We are limiting the scope to addressing research questions and methodologies that require regression, classification, and clustering analysis on datasets that contain numerical, geo-spatial, and categorical data.
To compete in this stage, we are asking that you propose a new algorithm utilizing existing or new randomized mechanisms with a justification of how this will optimize privacy and utility across different analysis types. We are also asking you to propose a dataset that you believe would make a good use case for your proposed algorithm, and provide a means of comparing your algorithm and other algorithms.
All submissions must be made using the submission form provided on HeroX website….(More)“.
Paper by Payal Arora and Linnea Holter Thompson in the International Journal of Communication: “Global complex supply chains have made it difficult to know the realities in factories. This structure obfuscates the networks, channels, and flows of communication between employers, workers, nongovernmental organizations and other vested intermediaries, creating a lack of transparency. Factories operate far from the brands themselves, often in developing countries where labor is cheap and regulations are weak. However, the emergence of social media and mobile technology has drawn the world closer together. Specifically, crowdsourcing is being used in an innovative way to gather feedback from outsourced laborers with access to digital platforms. This article examines how crowdsourcing platforms are used for both gathering and sharing information to foster accountability. We critically assess how these tools enable dialogue between brands and factory workers, making workers part of the greater conversation. We argue that although there are challenges in designing and implementing these new monitoring systems, these platforms can pave the path for new forms of unionization and corporate social responsibility beyond just rebranding…(More)”
Report by Margaret C. Levenstein, Allison R.B. Tyler, and Johanna Davidson Bleckman: “Research and evidence-building benefit from the increased availability of administrative datasets, linkage across datasets, detailed geospatial data, and other confidential data. Systems and policies for provisioning access to confidential data, however, have not kept pace and indeed restrict and unnecessarily encumber leading-edge science.
One series of roadblocks can be smoothed or removed by establishing a common understanding of what constitutes different levels of data sensitivity and risk as well as minimum researcher criteria for data access within these levels. This report presents the results of a recently completed study of 23 data repositories.
It describes the extant landscape of policies, procedures, practices, and norms for restricted data access and identifies the significant challenges faced by researchers interested in accessing and analyzing restricted use datasets.
It identifies commonalities among these repositories to articulate shared community standards that can be the basis of a community-normed researcher passport: a credential that identifies a trusted researcher to multiple repositories and other data custodians.
Three main developments are recommended.
First, language harmonization: establishing a common set of terms and definitions – that will evolve over time through collaboration within the research community – will allow different repositories to understand and integrate shared standards and technologies into their own processes.
Second: develop a researcher passport, a durable and transferable digital identifier issued by a central, community-recognized data steward. This passport will capture researcher attributes that emerged as common elements of user access requirements across repositories, including training, and verification of those attributes (e.g., academic degrees, institutional affiliation, citizenship status, and country of residence).
Third: data custodians issue visas that grant a passport holder access to particular datasets for a particular project for a specific period of time. Like stamps on a passport, these visas provide a history of a researcher’s access to restricted data. This history is integrated into the researcher’s credential, establishing the researcher’s reputation as a trusted data steward….(More)
European Commission: “Childhood and adolescent obesity is a major global and European public health problem. Currently, public actions are detached from local needs, mostly including indiscriminate blanket policies and single-element strategies, limiting their efficacy and effectiveness. The need for community-targeted actions has long been obvious, but the lack of monitoring and evaluation framework and the methodological inability to objectively quantify the local community characteristics, in a reasonable timeframe, has hindered that.

Big Data based Platform
Technological achievements in mobile and wearable electronics and Big Data infrastructures allow the engagement of European citizens in the data collection process, allowing us to reshape policies at a regional, national and European level. In BigO, that will be facilitated through the development of a platform, allowing the quantification of behavioural community patterns through Big Data provided by wearables and eHealth- devices.
Estimate child obesity through community data
BigO has set detailed scientific, technological, validation and business objectives in order to be able to build a system that collects Big Data on children’s behaviour and helps planning health policies against obesity. In addition, during the project, BigO will reach out to more than 25.000 school and age-matched obese children and adolescents as sources for community data. Comprehensive models of the obesity prevalence dependence matrix will be created, allowing the data-driven effectiveness predictions about specific policies on a community and the real-time monitoring of the population response, supported by powerful real-time data visualisations….(More)
Meg Wilcox at Civil Eats: “New England’s groundfish season is in full swing, as hundreds of dayboat fishermen from Rhode Island to Maine take to the water in search of the region’s iconic cod and haddock. But this year, several dozen of them are hauling in their catch under the watchful eye of video cameras as part of a new effort to use technology to better sustain the area’s fisheries and the communities that depend on them.
Video observation on fishing boats—electronic monitoring—is picking up steam in the Northeast and nationally as a cost-effective means to ensure that fishing vessels aren’t catching more fish than allowed while informing local fisheries management. While several issues remain to be solved before the technology can be widely deployed—such as the costs of reviewing and storing data—electronic monitoring is beginning to deliver on its potential to lower fishermen’s costs, provide scientists with better data, restore trust where it’s broken, and ultimately help consumers gain a greater understanding of where their seafood is coming from….
Muto’s vessel was outfitted with cameras, at a cost of about $8,000, through a collaborative venture between NOAA’s regional office and science center, The Nature Conservancy (TNC), the Gulf of Maine Research Institute, and the Cape Cod Commercial Fishermen’s Alliance. Camera costs are currently subsidized by NOAA Fisheries and its partners.
The cameras run the entire time Muto and his crew are out on the water. They record how the fisherman handle their discards, the fish they’re not allowed to keep because of size or species type, but that count towards their quotas. The cost is lower than what he’d pay for an in-person monitor.The biggest cost of electronic monitoring, however, is the labor required to review the video. …
Another way to cut costs is to use computers to review the footage. McGuire says there’s been a lot of talk about automating the review, but the common refrain is that it’s still five years off.
To spur faster action, TNC last year spearheaded an online competition, offering a $50,000 prize to computer scientists who could crack the code—that is, teach a computer how to count fish, size them, and identify their species.
“We created an arms race,” says McGuire. “That’s why you do a competition. You’ll never get the top minds to do this because they don’t care about your fish. They all want to work for Google, and one way to get recognized by Google is to win a few of these competitions.”The contest exceeded McGuire’s expectations. “Winners got close to 100 percent in count and 75 percent accurate on identifying species,” he says. “We proved that automated review is now. Not in five years. And now all of the video-review companies are investing in machine leaning.” It’s only a matter of time before a commercial product is available, McGuire believes….(More).
Mike Stucka at Data Driven Journalism: “An editor at The Palm Beach Post printed out hundreds of pages of reports and asked a simple question that turned out to be weirdly complex: How many people were being killed by a prescription drug?
That question relied on version of a report that was soon discontinued by the U.S. Food and Drug Administration. Instead, the agency built a new web site that doesn’t allow exports or the ability to see substantial chunks of the data. So, I went to raw data files that were horribly formatted — and, before the project was over, the FDA had reissued some of those data files and taken most of them offline.
But I didn’t give up hope. Behind the data — known as FAERS, or FDA Adverse Event Reporting System — are more than a decade of data for suspected drug complications of nearly every kind. With multiple drugs in many reports, and multiple versions of many reports, the list of drugs alone comes to some 35 million reports. And it’s a potential gold mine.
How much of a gold mine? For one relatively rare drug, meant only for the worst kind of cancer pain, we found records tying the drug to more than 900 deaths. A salesman had hired a former exotic dancer and a former Playboy model to help sell the drug known as Subsys. He then pushed salesmen to up the dosage, John Pacenti and Holly Baltz found in their package, “Pay To Prescribe? The Fentanyl Scandal.”
FAERS has some serious limitations, but some serious benefits. The data can tell you why a drug was prescribed; it can tell you if a person was hospitalized because of a drug reaction, or killed, or permanently disabled. It can tell you what country the report came from. It’s got the patient age. It’s got the date of reporting. It’s got other drugs involved. Dosage. There’s a ton of useful information.
Now the bad stuff: There may be multiple reports for each actual case, as well as multiple versions of a single “case” ID….(More)”
EarthSky: “Landslides cause thousands of deaths and billions of dollars in property damage each year. Surprisingly, very few centralized global landslide databases exist, especially those that are publicly available.
Now NASA scientists are working to fill the gap—and they want your help collecting information. In March 2018, NASA scientist Dalia Kirschbaum and several colleagues launched a citizen science project that will make it possible to report landslides you have witnessed, heard about in the news, or found on an online database. All you need to do is log into the Landslide Reporter portal and report the time, location, and date of the landslide – as well as your source of information. You are also encouraged to submit additional details, such as the size of the landslide and what triggered it. And if you have photos, you can upload them.
Kirschbaum’s team will review each entry and submit credible reports to the Cooperative Open Online Landslide Repository (COOLR) — which they hope will eventually be the largest global online landslide catalog available.
Landslide Reporter is designed to improve the quantity and quality of data in COOLR. Currently, COOLR contains NASA’s Global Landslide Catalog, which includes more than 11,000 reports on landslides, debris flows, and rock avalanches. Since the current catalog is based mainly on information from English-language news reports and journalists tend to cover only large and deadly landslides in densely populated areas, many landslides never make it into the database….(More)”.
Apolitical: “The team working to drive New Zealand’s government into the digital age believes that part of the problem is the ways that laws themselves are written. Earlier this year, over a three-week experiment, they’ve tested the theory by rewriting legislation itself as software code.…
The team in New Zealand, led by the government’s service innovations team LabPlus, has attempted to improve the interpretation of legislation and vastly ease the creation of digital services by rewriting legislation as code.
Legislation-as-code means taking the “rules” or components of legislation — its logic, requirements and exemptions — and laying them out programmatically so that it can be parsed by a machine. If law can be broken down by a machine, then anyone, even those who aren’t legally trained, can work with it. It helps to standardise the rules in a consistent language across an entire system, giving a view of services, compliance and all the different rules of government.
Over the course of three weeks the team in New Zealand rewrote two sets of legislation as software code: the Rates Rebate Act, a tax rebate designed to lower the costs of owning a home for people on low incomes, and the Holidays Act, which was enacted to grant each employee in New Zealand a guaranteed four weeks a year of holiday.
The way that both policies are written makes them difficult to interpret, and, consequently, deliver. They were written for a paper-based world, and require different service responses from distinct bodies within government based on what the legal status of the citizen using them is. For instance, the residents of retirement villages are eligible to rebates through the Rates Rebate Act, but access it via different people and provide different information than normal ratepayers.
The teams worked to rewrite the legislation, first as “pseudocode” — the rules behind the legislation in a logical chain — then as human-readable legislation and finally as software code, designed to make it far easier for public servants and the public to work out who was eligible for what outcome. In the end, the team had working code for how to digitally deliver two policies.
A step towards digital government
The implications of such techniques are significant. Firstly, machine-readable legislation could speed up interactions between government and business, sparing private organisations the costs in time and money they currently spend interpreting the laws they need to comply with.
If legislation changes, the machine can process it automatically and consistently, saving the cost of employing an expert, or a lawyer, to do this job.
More transformatively for policymaking itself, machine-readable legislation allows public servants to test the impact of policy before they implement it.
“What happens currently is that people design the policy up front and wait to see how it works when you eventually deploy it,” said Richard Pope, one of the original pioneers in the UK’s Government Digital Service (GDS) and the co-author of the UK’s digital service standard. “A better approach is to design the legislation in such a way that gives the teams that are making and delivering a service enough wiggle room to be able to test things.”…(More)”.
Michael C. Horowitz at the Bulletin of the Atomic Scientists: “Artificial intelligence (AI) is having a moment in the national security space. While the public may still equate the notion of artificial intelligence in the military context with the humanoid robots of the Terminatorfranchise, there has been a significant growth in discussions about the national security consequences of artificial intelligence. These discussions span academia, business, and governments, from Oxford philosopher Nick Bostrom’s concern about the existential risk to humanity posed by artificial intelligence to Tesla founder Elon Musk’s concern that artificial intelligence could trigger World War III to Vladimir Putin’s statement that leadership in AI will be essential to global power in the 21st century.
What does this really mean, especially when you move beyond the rhetoric of revolutionary change and think about the real world consequences of potential applications of artificial intelligence to militaries? Artificial intelligence is not a weapon. Instead, artificial intelligence, from a military perspective, is an enabler, much like electricity and the combustion engine. Thus, the effect of artificial intelligence on military power and international conflict will depend on particular applications of AI for militaries and policymakers. What follows are key issues for thinking about the military consequences of artificial intelligence, including principles for evaluating what artificial intelligence “is” and how it compares to technological changes in the past, what militaries might use artificial intelligence for, potential limitations to the use of artificial intelligence, and then the impact of AI military applications for international politics.
The potential promise of AI—including its ability to improve the speed and accuracy of everything from logistics to battlefield planning and to help improve human decision-making—is driving militaries around the world to accelerate their research into and development of AI applications. For the US military, AI offers a new avenue to sustain its military superiority while potentially reducing costs and risk to US soldiers. For others, especially Russia and China, AI offers something potentially even more valuable—the ability to disrupt US military superiority. National competition in AI leadership is as much or more an issue of economic competition and leadership than anything else, but the potential military impact is also clear. There is significant uncertainty about the pace and trajectory of artificial intelligence research, which means it is always possible that the promise of AI will turn into more hype than reality. Moreover, safety and reliability concerns could limit the ways that militaries choose to employ AI…(More)”,
In Navigation by Judgment, Dan Honig argues that high-quality implementation of foreign aid programs often requires contextual information that cannot be seen by those in distant headquarters. Tight controls and a focus on reaching pre-set measurable targets often prevent front-line workers from using skill, local knowledge, and creativity to solve problems in ways that maximize the impact of foreign aid. Drawing on a novel database of over 14,000 discrete development projects across nine aid agencies and eight paired case studies of development projects, Honig concludes that aid agencies will often benefit from giving field agents the authority to use their own judgments to guide aid delivery. This “navigation by judgment” is particularly valuable when environments are unpredictable and when accomplishing an aid program’s goals is hard to accurately measure.
Highlighting a crucial obstacle for effective global aid, Navigation by Judgment shows that the management of aid projects matters for aid effectiveness….(More)”.