Evaluating Identity Disclosure Risk in Fully Synthetic Health Data: Model Development and Validation


Paper by Khaled El Emam et al: “There has been growing interest in data synthesis for enabling the sharing of data for secondary analysis; however, there is a need for a comprehensive privacy risk model for fully synthetic data: If the generative models have been overfit, then it is possible to identify individuals from synthetic data and learn something new about them.

Objective: The purpose of this study is to develop and apply a methodology for evaluating the identity disclosure risks of fully synthetic data.

Methods: A full risk model is presented, which evaluates both identity disclosure and the ability of an adversary to learn something new if there is a match between a synthetic record and a real person. We term this “meaningful identity disclosure risk.” The model is applied on samples from the Washington State Hospital discharge database (2007) and the Canadian COVID-19 cases database. Both of these datasets were synthesized using a sequential decision tree process commonly used to synthesize health and social science data.

Results: The meaningful identity disclosure risk for both of these synthesized samples was below the commonly used 0.09 risk threshold (0.0198 and 0.0086, respectively), and 4 times and 5 times lower than the risk values for the original datasets, respectively.

Conclusions: We have presented a comprehensive identity disclosure risk model for fully synthetic data. The results for this synthesis method on 2 datasets demonstrate that synthesis can reduce meaningful identity disclosure risks considerably. The risk model can be applied in the future to evaluate the privacy of fully synthetic data….(More)”.

Federal Regulators Increase Focus on Patient Risks From Electronic Health Records


Ben Moscovitch at Pew: “…The Office of the National Coordinator for Health Information Technology (ONC) will collect clinicians’ feedback through a survey developed by the Urban Institute under a contract with the agency. ONC will release aggregated results as part its EHR reporting program. Congress required the program’s creation in the 21st Century Cures Act, the wide-ranging federal health legislation enacted in 2016. The act directs ONC to determine which data to gather from health information technology vendors. That information can then be used to illuminate the strengths and weaknesses of EHR products, as well as industry trends.

The Pew Charitable Trusts, major medical organizations and hospital groups, and health information technology experts have urged that the reporting program examine usability-related patient risks. Confusing, cumbersome, and poorly customized EHR systems can cause health care providers to order the wrong drug or miss test results and other information critical to safe, effective treatment. Usability challenges also can increase providers’ frustration and, in turn, their likelihood of making mistakes.

The data collected from clinicians will shed light on these problems, encourage developers to improve the safety of their products, and help hospitals and doctor’s offices make better-informed decisions about the purchase, implementation, and use of these tools. Research shows that aggregated data about EHRs can generate product-specific insights about safety deficiencies, even when health care facilities implement the same system in distinct ways….(More)”.

How the U.S. Military Buys Location Data from Ordinary Apps


Joseph Cox at Vice: “The U.S. military is buying the granular movement data of people around the world, harvested from innocuous-seeming apps, Motherboard has learned. The most popular app among a group Motherboard analyzed connected to this sort of data sale is a Muslim prayer and Quran app that has more than 98 million downloads worldwide. Others include a Muslim dating app, a popular Craigslist app, an app for following storms, and a “level” app that can be used to help, for example, install shelves in a bedroom.

Through public records, interviews with developers, and technical analysis, Motherboard uncovered two separate, parallel data streams that the U.S. military uses, or has used, to obtain location data. One relies on a company called Babel Street, which creates a product called Locate X. U.S. Special Operations Command (USSOCOM), a branch of the military tasked with counterterrorism, counterinsurgency, and special reconnaissance, bought access to Locate X to assist on overseas special forces operations. The other stream is through a company called X-Mode, which obtains location data directly from apps, then sells that data to contractors, and by extension, the military.

The news highlights the opaque location data industry and the fact that the U.S. military, which has infamously used other location data to target drone strikes, is purchasing access to sensitive data. Many of the users of apps involved in the data supply chain are Muslim, which is notable considering that the United States has waged a decades-long war on predominantly Muslim terror groups in the Middle East, and has killed hundreds of thousands of civilians during its military operations in Pakistan, Afghanistan, and Iraq. Motherboard does not know of any specific operations in which this type of app-based location data has been used by the U.S. military.

The apps sending data to X-Mode include Muslim Pro, an app that reminds users when to pray and what direction Mecca is in relation to the user’s current location. The app has been downloaded over 50 million times on Android, according to the Google Play Store, and over 98 million in total across other platforms including iOS, according to Muslim Pro’s website….(More)”.

Building Trust for Inter-Organizational Data Sharing: The Case of the MLDE


Paper by Heather McKay, Sara Haviland, and Suzanne Michael: “There is increasing interest in sharing data across agencies and even between states that was once siloed in separate agencies. Driving this is a need to better understand how people experience education and work, and their pathways through each. A data-sharing approach offers many possible advantages, allowing states to leverage pre-existing data systems to conduct increasingly sophisticated and complete analyses. However, information sharing across state organizations presents a series of complex challenges, one of which is the central role trust plays in building successful data-sharing systems. Trust building between organizations is therefore crucial to ensuring project success.

This brief examines the process of building trust within the context of the development and implementation of the Multistate Longitudinal Data Exchange (MLDE). The brief is based on research and evaluation activities conducted by Rutgers’ Education & Employment Research Center (EERC) over the past five years, which included 40 interviews with state leaders and the Western Interstate Commission for Higher Education (WICHE) staff, observations of user group meetings, surveys, and MLDE document analysis. It is one in a series of MLDE briefs developed by EERC….(More)”.

unBail


About: “The criminal legal system is a maze of laws, language, and unwritten rules that lawyers are trained to maneuver to represent defendants.

However, according to the Bureau of Justice Statistics, only 27% of county public defender’s offices meet national caseload recommendations for cases per attorney, meaning that most public defenders are overworked, leaving their clients underrepresented.

Defendants must complete an estimated 200 discrete tasks during their legal proceeding. This leaves them overwhelmed, lost, and profoundly disadvantaged while attempting to navigate the system….

We have… created a product that acts as the trusted advisor for defendants and their families as they navigate the criminal legal system. We aim to deliver valuable and relevant legal information (but not legal advice) to the user in plain language, empowering them to advocate for themselves and proactively plan for the future and access social services if necessary. The user is also encouraged to give feedback on their experience at each step of the process in the hope that this can be used to improve the system….(More)”

The Work of the Future: Shaping Technology and Institutions


Report by David Autor, David Mindell and Elisabeth Reynolds for the MIT Future of Work Task Force: “The world now stands on the cusp of a technological revolution in artificial intelligence and robotics that may prove as transformative for economic growth and human potential as were electrification, mass production, and electronic telecommunications in their eras. New and emerging technologies will raise aggregate economic output and boost the wealth of nations. Will these developments enable people to attain higher living standards, better working conditions, greater economic security, and improved health and longevity? The answers to these questions are not predetermined. They depend upon the institutions, investments, and policies that we deploy to harness the opportunities and confront the challenges posed by this new era.

How can we move beyond unhelpful prognostications about the supposed end of work and toward insights that will enable policymakers, businesses, and people to better navigate the disruptions that are coming and underway? What lessons should we take from previous epochs of rapid technological change? How is it different this time? And how can we strengthen institutions, make investments, and forge policies to ensure that the labor market of the 21st century enables workers to contribute and succeed?

To help answer these questions, and to provide a framework for the Task Force’s efforts over the next year, this report examines several aspects of the interaction between work and technology….(More)”.

Four Principles to Make Data Tools Work Better for Kids and Families


Blog by the Annie E. Casey Foundation: “Advanced data analytics are deeply embedded in the operations of public and private institutions and shape the opportunities available to youth and families. Whether these tools benefit or harm communities depends on their design, use and oversight, according to a report from the Annie E. Casey Foundation.

Four Principles to Make Advanced Data Analytics Work for Children and Families examines the growing field of advanced data analytics and offers guidance to steer the use of big data in social programs and policy….

The Foundation report identifies four principles — complete with examples and recommendations — to help steer the growing field of data science in the right direction.

Four Principles for Data Tools

  1. Expand opportunity for children and families. Most established uses of advanced analytics in education, social services and criminal justice focus on problems facing youth and families. Promising uses of advanced analytics go beyond mitigating harm and help to identify so-called odds beaters and new opportunities for youth.
    • Example: The Children’s Data Network at the University of Southern California is helping the state’s departments of education and social services explore why some students succeed despite negative experiences and what protective factors merit more investment.
    • Recommendation: Government and its philanthropic partners need to test if novel data science applications can create new insights and when it’s best to apply them.
       
  2. Provide transparency and evidence. Advanced analytical tools must earn and maintain a social license to operate. The public has a right to know what decisions these tools are informing or automating, how they have been independently validated, and who is accountable for answering and addressing concerns about how they work.
    • Recommendations: Local and state task forces can be excellent laboratories for testing how to engage youth and communities in discussions about advanced analytics applications and the policy frameworks needed to regulate their use. In addition, public and private funders should avoid supporting private algorithms whose design and performance are shielded by trade secrecy claims. Instead, they should fund and promote efforts to develop, evaluate and adapt transparent and effective models.
       
  3. Empower communities. The field of advanced data analytics often treats children and families as clients, patients and consumers. Put to better use, these same tools can help elucidate and reform the systems acting upon children and families. For this shift to occur, institutions must focus analyses and risk assessments on structural barriers to opportunity rather than individual profiles.
    • Recommendation: In debates about the use of data science, greater investment is needed to amplify the voices of youth and their communities.
       
  4. Promote equitable outcomes. Useful advanced analytics tools should promote more equitable outcomes for historically disadvantaged groups. New investments in advanced analytics are only worthwhile if they aim to correct the well-documented bias embedded in existing models.
    • Recommendations: Advanced analytical tools should only be introduced when they reduce the opportunity deficit for disadvantaged groups — a move that will take organizing and advocacy to establish and new policy development to institutionalize. Philanthropy and government also have roles to play in helping communities test and improve tools and examples that already exist….(More)”.

Putting Games to Work in the Battle Against COVID-19


Sara Frueh at the National Academies: “While video games often give us a way to explore other worlds, they can also help us learn more about our own — including how to navigate a pandemic. That was the premise underlying “Jamming the Curve,” a competition that enlisted over 400 independent video game developers around the world to develop concepts for games that reflect the real-world dynamics of COVID-19.

“Games can help connect our individual actions to larger-scale impact … and help translate data into engaging stories,” said Rick Thomas, associate program officer of LabX, a program of the National Academy of Sciences that supports creative approaches to public engagement.

Working with partners IndieCade and Georgia Tech, LabX brought Jamming the Curve to life over two weeks in September.

The “game jam” generated over 50 game concepts that drop players into a wide array of roles — from a subway rider trying to minimize the spread of infection among passengers, to a grocery store cashier trying to help customers while avoiding COVID-19, to a fox ninja tasked with dispensing masks to other forest creatures.

The five winning game concepts (see below) were announced at an award ceremony in late October, where each winning team was given a $1,000 prize and the chance to compete for a $20,000 grant to develop their game further.

The power of games

“Sometimes public health concepts can be a little dry,” said Carla Alvarado, a public health expert and program officer at the National Academies who served as a judge for the competition, during the awards ceremony. “Games package that information — it’s bite-sized, it’s digestible, and it’s palatable.”

And because games engage the senses and involve movement, they help people remember what they learn, she said. “That type of learning — experiential learning — helps retain a lot of the concepts.”

The idea of doing a game jam around COVID-19 began when Janet Murray of Georgia Tech reached out to Stephanie Barish and her colleagues at IndieCade about games’ potential to help express the complicated data around the disease. “Not everybody really knows how to look at that all of that information, and games are so wonderful at reaching people in ways that people understand,” Barish said.

Rick Thomas and the LabX team heard about the idea for Jamming the Curve and saw how they could contribute. The program had experience organizing other game projects around role-playing and storytelling — along with access to a range of scientists and public health experts through the National Academies’ networks.

“Given the high stakes of the topic around COVID-19 and the amount of misinformation around the pandemic, we really needed to make sure that we were doing this right when it came to creating these games,” said Thomas. LabX helped to recruit public health professionals involved in the COVID-19 response, as well as experts in science communication and risk perception, to serve as mentors to the game developers.

Play the Winning Games!

Trailers and some playable prototypes for the five winning game concepts can be found online:

  • Everyday Hero, in which players work to stop the spread of COVID-19 through measures such as social distancing and mask use
  • PandeManager, which gives players the job of a town’s mayor who must slow the spread of disease among citizens
  • Lab Hero, in which users play a first responder who is working hard to find a vaccine while following proper health protocols
  • Cat Colony Crisis, in which a ship of space-faring cats must deal with a mysterious disease outbreak
  • Outbreak in Space, which challenges players to save friends and family from a spreading epidemic in an alien world

All of the games submitted to Jamming the Curve can be found at itch.io.

The games needed to be fun as well as scientifically accurate — and so IndieCade, Georgia Tech, and Seattle Indies recruited gaming experts who could advise participants on how to make their creations engaging and easy to understand….(More)“.

The Human Experience Will Not Be Quantified


 Phil Klay at the New York Times: “…Stories are a quintessentially human method of responding to the chaos and uncertainty of the world. Science is a quintessentially human method of trying to control that chaos, and data is its raw material. Adrift in the world, uncertain of the future, hostage to fate, but possessed of increasingly powerful tools for carving up pieces of the world and putting them under the microscope, is it any wonder that we increasingly turn to science when looking for deliverance from our human predicaments?

Science, after all, will eventually bring us to the end of the pandemic, just as it has helped limit the damage through better treatments and proof of the benefits of wearing masks. “Science over fiction,” was one slogan of the Joe Biden campaign, a welcome message to those who’d like public policy tethered more to reality than political fantasy.

But because science supposedly gives clear answers about everything from how to open schools in a pandemic to who will be elected president, we tend to rush to embrace it as a panacea. Some, like the popular podcaster and author Sam Harris, even think science can answer moral questions. Rarely does it occur to us how often the invocation of “science” is used to mask value judgments, or political deliberation.

When the Centers for Disease Control and Prevention and the American Academy of Pediatricians released separate guidelines for reopening schools, the difference lay not in the underlying science but in their institutional priorities, one focused on disease spread and the other on the welfare of children. Likewise, the difference in how New York City handled the reopenings of day cares and schools reflected not simply science, but also what could be more easily demanded of workers who lacked the protection of a powerful union.

As much as we’d like to believe in “science over fiction,” decisions in the real world require negotiating between what we think the data means, what human value we’d like to assign to it and what stories about it we can get others to accept. Data alone is not knowledge, and it is certainly not wisdom. It rarely says as much as we think it does.

Yet its allure is undeniable, persistent. As I watched the election returns on Tuesday and Wednesday, I did so with the sinking feeling that I’d been fooled again by the lure of data. Even though it looked like Biden could still win, it was clear that those hard numbers I’d been absorbing for weeks, based on fine -tuned methodologies, correcting for past mistakes, aggregated to minimize chances of error, hadn’t come close to reflecting reality. “You are literally working on an essay about the problems with relying too much on data,” my wife told me the morning after the election, “and yet you were so confident in the polls.”…(More)”

Your phone already tracks your location. Now that data could fight voter suppression


Article by Seth Rosenblatt: “Smartphone location data is a dream for marketers who want to know where you go and how long you spend there—and a privacy nightmare. But this kind of geolocation data could also be used to protect people’s voting rights on Election Day.

The newly founded nonprofit Center for New Data is now tracking voters at the polls using smartphone location data to help researchers understand how easy—or difficult—it is for people to vote in different places. Called the Observing Democracy project, the nonpartisan effort is making data on how far people have to travel to vote and how long they have to wait in line available in a privacy-friendly way so it can be used to craft election policies that ensure voting is accessible for everyone.

Election data has already fueled changes in various municipalities and states. A 66-page lawsuit filed by Fair Fight Action against the state of Georgia in the wake of Stacey Abrams’s narrow loss to Brian Kemp in the 2018 gubernatorial race relies heavily on data to back its assertions of unconstitutionally delayed and deferred voter registration, unfair challenges to absentee and provisional ballots, and unjustified purges of voter rolls—all hallmarks of voter suppression.

The promise of Observing Democracy is to make this type of impactful data available much more rapidly than ever before. Barely a month old, Observing Democracy isn’t wasting any time: Its all-volunteer staffers will be receiving data potentially as soon as Nov. 4 on voter wait times at polling locations, travel times to polling stations, and how frequently ballot drop-off boxes are visited, courtesy of location-data mining companies X-Mode Social and Veraset, which was spun off from SafeGraph….(More)”.