How Data Mining could have prevented Tunisia’s Terror attack in Bardo Museum


Wassim Zoghlami at Medium: “…Data mining is the process of posing queries and extracting useful patterns or trends often previously unknown from large amounts of data using various techniques such as those from pattern recognition and machine learning. Latelely there has been a big interest on leveraging the use of data mining for counter-terrorism applications

Using the data on more than 50.000+ ISIS connected twitter accounts , I was able to establish an understanding of some factors determined how often ISIS attacks occur , what different types of terror strikes are used in which geopolitical situations, and many other criteria through graphs about the frequency of hashtags usages and the frequency of a particular group of the words used in the tweets.

A simple data mining project of some of the repetitive hashtags and sequences of words used typically by ISIS militants in their tweets yielded surprising results. The results show a rise of some keywords on the tweets that started from Marsh 15, three days before Bardo museum attacks.

Some of the common frequent keywords and hashtags that had a unusual peak since marsh 15 , three days before the attack :

#طواغيت تونس : Tyrants of Tunisia = a reference to the military

بشرى تونس : Good news for Tunisia.

قريبا تونس : Soon in Tunisia.

#إفريقية_للإعلام : The head of social media of Afriqiyah

#غزوة_تونس : The foray of Tunis…

Big Data and Data Mining should be used for national security intelligence

The Tunisian national security has to leverage big data to predict such attacks and to achieve objectives as the volume of digital data. Some of the challenges facing the Data mining techniques are that to carry out effective data mining and extract useful information for counterterrorism and national security, we need to gather all kinds of information about individuals. However, this information could be a threat to the individuals’ privacy and civil liberties…(More)”

Domestic Drones and Privacy: A Primer


Richard M. Thompson for the Congressional Research Service: “There are two overarching privacy issues implicated by domestic drone use. The first is defining what “privacy” means in the context of aerial surveillance. Privacy is an ambiguous term that can mean different things in different contexts. This becomes readily apparent when attempting to apply traditional privacy concepts such as personal control and secrecy to drone surveillance. Other, more nuanced privacy theories such as personal autonomy and anonymity must be explored to get a fuller understanding of the privacy risks posed by drone surveillance. Moreover, with ever-increasing advances in data storage and manipulation, the subsequent aggregation, use, and retention of drone-obtained data may warrant an additional privacy impact analysis.

The second predominant issue is which entity should be responsible for regulating drones and privacy. As the final arbiter of the Constitution, the courts are naturally looked upon to provide at least the floor of privacy protection from UAS surveillance, but as will be discussed in this report, under current law, this protection may be minimal….(More)”

Health Big Data in the Commercial Context


CDT Press Release: “This paper is the third in a series of three, each of which explores health big data in a different context. The first — on health big data in the government context — is available here, and the second — on health big data in the clinical context — is available here.

Consumers are increasingly using mobile phone apps and wearable devices to generate and share data on health and wellness. They are using personal health record tools to access and copy health records and move them to third party platforms. They are sharing health information on social networking sites. They leave digital health footprints when they conduct online searches for health information. The health data created, accessed, and shared by consumers using these and many other tools can range from detailed clinical information, such as downloads from an implantable device and details about medication regimens, to data about weight, caloric intake, and exercise logged with a smart phone app.

These developments offer a wealth of opportunities for health care and personal wellness. However, privacy questions arise due to the volume and sensitivity of health data generated by consumer-focused apps, devices, and platforms, including the potential analytics uses that can be made of such data.

Many of the privacy issues that face traditional health care entities in the big data era also apply to app developers, wearable device manufacturers, and other entities not part of the traditional health care ecosystem. These include questions of data minimization, retention, and secondary use. Notice and consent pose challenges, especially given the limits of presenting notices on mobile device screens, and the fact that consumer devices may be bought and used without consultation with a health care professional. Security is a critical issue as well.

However, the privacy and security provisions of the Heath Insurance Portability and Accountability Act (HIPAA) do not apply to most app developers, device manufacturers or others in the consumer health space. This has benefits to innovation, as innovators would otherwise have to struggle with the complicated HIPAA rules. However, the current vacuum also leaves innovators without clear guidance on how to appropriately and effectively protect consumers’ health data. Given the promise of health apps, consumer devices, and consumer-facing services, and given the sensitivity of the data that they collect and share, it is important to provide such guidance….

As the source of privacy guidelines, we look to the framework provided by the Fair Information Practice Principles (FIPPs) and explore how it could be applied in an age of big data to patient-generated data. The FIPPs have influenced to varying degrees most modern data privacy regimes. While some have questioned the continued validity of the FIPPs in the current era of mass data collection and analysis, we consider here how the flexibility and rigor of the FIPPs provide an organizing framework for responsible data governance, promoting innovation, efficiency, and knowledge production while also protecting privacy. Rather than proposing an entirely new framework for big data, which could be years in the making at best, using the FIPPs would seem the best approach in promoting responsible big data practices. Applying the FIPPs could also help synchronize practices between the traditional health sector and emerging consumer products….(More)”

Big Other: Surveillance Capitalism and the Prospects of an Information Civilization


New paper by Shoshana Zuboff in the Journal of Information Technology: “This article describes an emergent logic of accumulation in the networked sphere, ‘surveillance capitalism,’ and considers its implications for ‘information civilization.’ Google is to surveillance capitalism what General Motors was to managerial capitalism. Therefore the institutionalizing practices and operational assumptions of Google Inc. are the primary lens for this analysis as they are rendered in two recent articles authored by Google Chief Economist Hal Varian. Varian asserts four uses that follow from computer-mediated transactions: ‘data extraction and analysis,’ ‘new contractual forms due to better monitoring,’ ‘personalization and customization,’ and ‘continuous experiments.’ An examination of the nature and consequences of these uses sheds light on the implicit logic of surveillance capitalism and the global architecture of computer mediation upon which it depends. This architecture produces a distributed and largely uncontested new expression of power that I christen: ‘Big Other.’ It is constituted by unexpected and often illegible mechanisms of extraction, commodification, and control that effectively exile persons from their own behavior while producing new markets of behavioral prediction and modification. Surveillance capitalism challenges democratic norms and departs in key ways from the centuries long evolution of market capitalism….(More)”

The big medical data miss: challenges in establishing an open medical resource


Eric J. Topol in Nature: ” I call for an international open medical resource to provide a database for every individual’s genomic, metabolomic, microbiomic, epigenomic and clinical information. This resource is needed in order to facilitate genetic diagnoses and transform medical care.

“We are each, in effect, one-person clinical trials”

Laurie Becklund was a noted journalist who died in February 2015 at age 66 from breast cancer. Soon thereafter, the Los Angeles Times published her op-ed entitled “As I lay dying” (Ref. 1). She lamented, “We are each, in effect, one-person clinical trials. Yet the knowledge generated from those trials will die with us because there is no comprehensive database of metastatic breast cancer patients, their characteristics and what treatments did and didn’t help them”. She went on to assert that, in the era of big data, the lack of such a resource is “criminal”, and she is absolutely right….

Around the same time of this important op-ed, the MIT Technology Review published their issue entitled “10 Breakthrough Technologies 2015” and on the list was the “Internet of DNA” (Ref. 2). While we are often reminded that the world we live in is becoming the “Internet of Things”, I have not seen this terminology applied to DNA before. The article on the “Internet of DNA” decried, “the unfolding calamity in genomics is that a great deal of life-saving information, though already collected, is inaccessible”. It called for a global network of millions of genomes and cited theMatchmaker Exchange as a frontrunner. For this international initiative, a growing number of research and clinical teams have come together to pool and exchange phenotypic and genotypic data for individual patients with rare disorders, in order to share this information and assist in the molecular diagnosis of individuals with rare diseases….

an Internet of DNA — or what I have referred to as a massive, open, online medicine resource (MOOM) — would help to quickly identify the genetic cause of the disorder4 and, in the process of doing so, precious guidance for prevention, if necessary, would become available for such families who are currently left in the lurch as to their risk of suddenly dying.

So why aren’t such MOOMs being assembled? ….

There has also been much discussion related to privacy concerns that patients might be unwilling to participate in a massive medical information resource. However, multiple global consumer surveys have shown that more than 80% of individuals are ready to share their medical data provided that they are anonymized and their privacy maximally assured4. Indeed, just 24 hours into Apple’s ResearchKit initiative, a smartphone-based medical research programme, there were tens of thousand of patients with Parkinson disease, asthma or heart disease who had signed on. Some individuals are even willing to be “open source” — that is, to make their genetic and clinical data fully available with free access online, without any assurance of privacy. This willingness is seen by the participants in the recently launched Open Humans initiative. Along with the Personal Genome Project, Go Viral and American Gut have joined in this initiative. Still, studies suggest that most individuals would only agree to be medical research participants if their identities would not be attainable. Unfortunately, to date, little has been done to protect individual medical privacy, for which there are both promising new data protection technological approaches4 and the need for additional governmental legislation.

This leaves us with perhaps the major obstacle that is holding back the development of MOOMs — researchers. Even with big, team science research projects culling together hundreds of investigators and institutions throughout the world, such as the Global Alliance for Genomics and Health (GA4GH), the data obtained clinically are just as Laurie Becklund asserted in her op-ed — “one-person clinical trials” (Ref. 1). While undertaking the construction of a MOOM is a huge endeavour, there is little motivation for researchers to take on this task, as this currently offers no academic credit and has no funding source. But the transformative potential of MOOMs to improve medical care is extraordinary. Rather than having the knowledge die with each of us, the time has come to take down the walls of academic medical centres and health-care systems around the world, and create a global knowledge medical resource that leverages each individual’s information to help one another…(More)”

Open Data Literature Review


Review by Emmie Tran and Ginny Scholtes: “Open data describes large datasets that governments at all levels release online and free of charge for analysis by anyone for any purpose. Entrepreneurs may use open data to create new products and services, and citizens may use it to gain insight into the government. A plethora of time saving and other useful applications have emerged from open data feeds, including more accurate traffic information, real-time arrival of public transportation, and information about crimes in neighborhoods. But data held by the government is implicitly or explicitly about individuals. While open government is often presented as an unqualified good, sometimes open data can identify individuals or groups, leading to invasions of privacy and disparate impact on vulnerable populations.

This review provides background to parties interested in open data, specifically for those attending the 19th Annual BCLT/BTLJ Symposium on open data. Part I defines open data, focusing on the origins of the open data movement and the types of data subject to government retention and public access. Part II discusses how open data can benefit society, and Part III delves into the many challenges and dangers of open data. Part IV addresses these challenges, looking at how the United States and other countries have implemented open data regimes, and considering some of the proposed measures to mitigate the dangers of open data….(More)”

The End of Asymmetric Information


Essay by Alex Tabarrok and Tyler Cowen: Might the age of asymmetric information – for better or worse – be over?  Market institutions are rapidly evolving to a situation where very often the buyer and the seller have roughly equal knowledge. Technological developments are giving everyone who wants it access to the very best information when it comes to product quality, worker performance, matches to friends and partners, and the nature of financial transactions, among many other areas.

These developments will have implications for how markets work, how much consumers benefit, and also economic policy and the law. As we will see, there may be some problematic sides to these new arrangements, specifically when it comes to privacy. Still, a large amount of economic regulation seems directed at a set of problems which, in large part, no longer exist…

Many “public choice” problems are really problems of asymmetric information. In William Niskanen’s (1974) model of bureaucracy, government workers usually benefit from larger bureaus, and they are able to expand their bureaus to inefficient size because they are the primary providers of information to politicians. Some bureaus, such as the NSA and the CIA, may still be able to use secrecy to benefit from information asymmetry. For instance they can claim to politicians that they need more resources to deter or prevent threats, and it is hard for the politicians to have well-informed responses on the other side of the argument. Timely, rich information about most other bureaucracies, however, is easily available to politicians and increasingly to the public as well. As information becomes more symmetric, Niskanen’s (1974) model becomes less applicable, and this may help check the growth of unneeded bureaucracy.

Cheap sensors are greatly extending how much information can be economically gathered and analyzed. It’s not uncommon for office workers to have every key stroke logged. When calling customer service, who has not been told “this call may be monitored for quality control purposes?” Service-call workers have their location tracked through cell phones. Even information that once was thought to be purely subjective can now be collected and analyzed, often with the aid of smart software or artificial intelligence. One firm, for example, uses badges equipped with microphones, accelerometers, and location sensors to measure tone of voice, posture, and body language, as well as who spoke to whom and for how long (Lohr 2014). The purpose is not only to monitor workers but to deduce when, where and why workers are the most productive. We are again seeing trade-offs which bring greater productivity, and limit asymmetric information, albeit at the expense of some privacy.

As information becomes more prevalent and symmetric, earlier solutions to asymmetric problems will become less necessary. When employers do not easily observe workers, for example, employers may pay workers unusually high wages, generating a rent. Workers will then work at high levels despite infrequent employer observation, to maintain their future rents (Shapiro and Stiglitz 1984). But those higher wages involved a cost, namely that fewer workers were hired, and the hires that were made often were directed to people who were already known to the firm. Better monitoring of workers will mean that employers will hire more people and furthermore they may be more willing to take chances on risky outsiders, rather than those applicants who come with impeccable pedigree. If the outsider does not work out and produce at an acceptable level, it is easy enough to figure this out and fire them later on….(More)”

The Healing Power of Your Own Medical Data


in the New York Times: “Steven Keating’s doctors and medical experts view him as a citizen of the future.

A scan of his brain eight years ago revealed a slight abnormality — nothing to worry about, he was told, but worth monitoring. And monitor he did, reading and studying about brain structure, function and wayward cells, and obtaining a follow-up scan in 2010, which showed no trouble.

But he knew from his research that his abnormality was near the brain’s olfactory center. So when he started smelling whiffs of vinegar last summer, he suspected they might be “smell seizures.”

He pushed doctors to conduct an M.R.I., and three weeks later, surgeons in Boston removed a cancerous tumor the size of a tennis ball from his brain.

At every stage, Mr. Keating, a 26-year-old doctoral student at the Massachusetts Institute of Technology’s Media Lab, has pushed and prodded to get his medical information, collecting an estimated 70 gigabytes of his own patient data by now. His case points to what medical experts say could be gained if patients had full and easier access to their medical information. Better-informed patients, they say, are more likely to take better care of themselves, comply with prescription drug regimens and even detect early-warning signals of illness, as Mr. Keating did.

“Today he is a big exception, but he is also a glimpse of what people will want: more and more information,” said Dr. David W. Bates, chief innovation officer at Brigham and Women’s Hospital.

Some of the most advanced medical centers are starting to make medical information more available to patients. Brigham and Women’s, where Mr. Keating had his surgery, is part of the Partners HealthCare Group, which now has 500,000 patients with web access to some of the information in their health records including conditions, medications and test results.

Other medical groups are beginning to allow patients online access to the notes taken by physicians about them, in an initiative called OpenNotes. In a yearlong evaluation project at medical groups in three states, more than two-thirds of the patients reported having a better understanding of their health and medical conditions, adopting healthier habits and taking their medications as prescribed more regularly.

The medical groups with OpenNotes programs include Beth Israel Deaconess Medical Center in Boston, Geisinger Health System in Pennsylvania, Harborview Medical Center in Seattle, the Mayo Clinic, the Cleveland Clinic and the Veterans Affairs department. By now, nearly five million patients in America have been given online access to their notes.

As an articulate young scientist who had studied his condition, Mr. Keating had a big advantage over most patients in obtaining his data. He knew what information to request, spoke the language of medicine and did not need help. The information he collected includes the video of his 10-hour surgery, dozens of medical images, genetic sequencing data and 300 pages of clinical documents. Much of it is on his website, and he has made his medical data available for research….

Opening data to patients raises questions. Will worried patients inundate physicians with time-consuming questions? Will sharing patient data add to legal risks? One detail in the yearlong study of OpenNotes underlines doctors’ concerns; 105 primary physicians completed the study, but 143 declined to participate.

Still, the experience of the doctors in the evaluation seemed reassuring. Only 3 percent said they spent more time answering patient questions outside of visits. Yet knowing that patients could read the notes, one-fifth of the physicians said they changed the way they wrote about certain conditions, like substance abuse and obesity.

Evidence of the benefit to individuals from sharing information rests mainly on a few studies so far. For example, 55 percent of the members of the epilepsy community on PatientsLikeMe, a patient network, reported that sharing information and experiences with others helped them learn about seizures, and 27 percent said it helped them be more adherent to their medications.

Mr. Keating has no doubts. “Data can heal,” he said. “There is a huge healing power to patients understanding and seeing the effects of treatments and medications.”

Health information, by its very nature, is personal. So even when names and other identifiers are stripped off, sharing personal health data more freely with patients, health care providers and researchers raises thorny privacy issues.

Mr. Keating says he is a strong believer in privacy, but he personally believes that the benefits outweigh the risks — and whether to share data or not should be an individual’s choice and an individual responsibility.

Not everyone, surely, would be as comfortable as Mr. Keating is sharing all his medical information. But he says he believes that people will increasingly want access to their medical data and will share it, especially younger people reared on social networks and smartphones.

“This is what the next generation, which lives on data, is going to want,” Mr. Keating said….(More)”

Sensor Law


Paper by Sandra Braman: For over two decades, information policy-making for human society has been increasingly supplemented, supplanted, and/or superceded by machinic decision-making; over three decades since legal decision-making has been explicitly put in place to serve machinic rather than social systems; and over four decades since designers of the Internet took the position that they were serving non-human (machinic, or daemon) users in addition to humans. As the “Internet of Things” becomes more and more of a reality, these developments increasingly shape the nature of governance itself. This paper’s discussion of contemporary trends in these diverse modes of human-computer interaction at the system level — interactions between social systems and technological systems — introduces the changing nature of the law as a sociotechnical problem in itself. In such an environment, technological innovations are often also legal innovations, and legal developments require socio-technical analysis as well as social, legal, political, and cultural approaches.

Examples of areas in which sensors are already receiving legal attention are rife. A non-comprehensive listing includes privacy concerns beginning but not ending with those raised by sensors embedded in phones and geolocation devices, which are the most widely discussed and those of which the public is most aware. Sensor issues arise in environmental law, health law, marine law, intellectual property law, and as they are raised by new technologies in use for national security purposes that include those confidence- and security-building measures intended for peacekeeping. They are raised by liability issues for objects that range from cars to ovens. And sensor issues are at the core of concerns about “telemetric policing,” as that is coming into use not only in North America and Europe, but in societies such as that of Brazil as well.

Sensors are involved in every stage of legal processes, from identification of persons of interest to determination of judgments and consequences of judgments. Their use significantly alters the historically-developed distinction among types of decision-making meant to come into use at different stages of the process, raising new questions about when, and how, human decision-making needs to dominate and when, and how, technological innovation might need to be shaped by the needs of social rather than human systems.

This paper will focus on the legal dimensions of sensors used in ubiquitous embedded computing….(More)”

Why Google’s Waze Is Trading User Data With Local Governments


Parmy Olson at Forbes: “In Rio de Janeiro most eyes are on the final, nail-biting matches of the World Cup. Over in the command center of the city’s department of transport though, they’re on a different set of screens altogether.

Planners there are watching the aggregated data feeds of thousands of smartphones being walked or driven around a city, thanks to two popular travel apps, Waze and Moovit.

The goal is traffic management, and it involves swapping data for data. More cities are lining up to get access, and while the data the apps are sharing is all anonymous for now, identifying details could get more specific if cities like what they see, and people become more comfortable with being monitored through their smartphones in return for incentives.

Rio is the first city in the world to collect real-time data both from drivers who use the Waze navigation app and pedestrians who use the public-transportation app Moovit, giving it an unprecedented view on thousands of moving points across the sprawling city. Rio is also talking to the popular cycling app Strava to start monitoring how cyclists are moving around the city too.

All three apps are popular, consumer services which, in the last few months, have found a new way to make their crowdsourced data useful to someone other than advertisers. While consumers use Waze and Moovit to get around, both companies are flipping the use case and turning those millions of users into a network of sensors that municipalities can tap into for a better view on traffic and hazards. Local governments can also use these apps as a channel to send alerts.

On an average day in June, Rio’s transport planners could get an aggregated view of 110,000 drivers (half a million over the course of the month), and see nearly 60,000 incidents being reported each day – everything from built-up traffic, to hazards on the road, Waze says. Till now they’ve been relying on road cameras and other basic transport-department information.

What may be especially tantalizing for planners is the super-accurate read Waze gets on exactly where drivers are going, by pinging their phones’ GPS once every second. The app can tell how fast a driver is moving and even get a complete record of their driving history, according to Waze spokesperson Julie Mossler. (UPDATE: Since this story was first published Waze has asked to clarify that it separates users’ names and their 30-day driving info. The driving history is categorized under an alias.)

This passively-tracked GPS data “is not something we share,” she adds. Waze, which Google bought last year for $1.3 billion, can turn the data spigots on and off through its application programing interface (API).

Waze has been sharing user data with Rio since summer 2013 and it just signed up the State of Florida. It says more departments of transport are in the pipeline.

But none of these partnerships are making Waze any money. The app’s currency of choice is data. “It’s a two-way street,” says Mossler. “Literally.”

In return for its user updates, Waze gets real-time information from Rio on highways, from road sensors and even from cameras, while Florida will give the app data on construction projects or city events.

Florida’s department of transport could not be reached for comment, but one of its spokesmen recently told a local news station: “We’re going to share our information, our camera images, all of our information that comes from the sensors on the roadway, and Waze is going to share its data with us.”…

To get Moovit’s data, municipalities download a web interface that gives them an aggregated view of where pedestrians using Moovit are going. In return, the city feeds Moovit’s database with a stream of real-time GPS data for buses and trains, and can issue transport alerts to Moovit’s users. Erez notes the cities aren’t allowed to make “any sort of commercial approach to the users.”

Erez may be saving that for advertisers, an avenue he says he’s still exploring. For now getting data from cities is the bigger priority. It gives Moovit “a competitive advantage,” he says.

Cycling app Strava also recently started sharing its real-time user data as part of a paid-for service called Strava Metro.

Municipalities pay 80 cents a year for every Strava member being tracked. Metro only launched in May, but it already counts the state of Oregon; London, UK; Glasgow, Scotland; Queensland, Austalia and Evanston, Illinois as customers.
….
Privacy advocates will naturally want to keep a wary eye on what data is being fed to cities, and that it doesn’t leak or get somehow misused by City Hall. The data-sharing might not be ubiquitous enough for that to be a problem yet, and it should be noted that any kind of deal making with the public sector can get wrapped up in bureaucracy and take years to get off the ground.

For now Waze says it’s acting for the public good….(More)