Memex Human Trafficking


MEMEX is a DARPA program that explores how next generation search and extraction systems can help with real-world use cases. The initial application is the fight against human trafficking. In this application, the input is a portion of the public and dark web in which human traffickers are likely to (surreptitiously) post supply and demand information about illegal labor, sex workers, and more. DeepDive processes such documents to extract evidential data, such as names, addresses, phone numbers, job types, job requirements, information about rates of service, etc. Some of these data items are difficult for trained human annotators to accurately extract and have never been previously available, but DeepDive-based systems have high accuracy (Precision and Recall in the 90s, which may exceed non-experts). Together with provenance information, such structured, evidential data are then passed on to both other collaborators on the MEMEX program as well as law enforcement for analysis and consumption in operational applications. MEMEX has been featured extensively in the media and is supporting actual investigations. For example, every human trafficking investigation pursued by the Human Trafficking Response Unity in New York City involves MEMEX. DeepDive is the main extracted data provider for MEMEX. See also, 60 minutes, Scientific American, Wall St. Journal, BBC, and Wired. It is supporting actual investigations and perhaps new usecases in the war on terror.

Here is a detailed description of DeepDive’s role in MEMEX.”

 

Proofreading of legal documents


 at Techcrunch: “.. jEugene…helps the drafters of legal documents catch mistakes that could be fatal to such documents’ validity or enforceability.

The original idea of Harry Zhou, who, as a first-year lawyer, was tasked with proofing a 250-page contract and wanted more than his supervising lawyer’s assurance that “you did great,” jEugene scans through a legal document and highlights in text potential drafting mistakes in the document.

The product is being used by White & Case LLP and is undergoing trial at Fenwick & West LLP. Tens of smaller law firms are accessing jEugene through Clio, a provider of cloud-based legal management software. Other clients are under NDA.

Errors that jEugene currently detects may seem innocuous at times, but could lead to hefty costs. For example, millions of dollars that certain creditors recently failed to recover in a famous bankruptcy case could have been avoided had jEugene been used; and jEugene’s analysis of legal documents disclosed on SEC EDGAR routinely reveals similar errors missed by some of the most sophisticated law firms (they say).

Here’s how it works: A user uploads a document, waits a few seconds, and downloads the resulting file. This emulates handwritten markups that lawyers are used to seeing, and highlights potential drafting mistakes in the document with different colors. The user then reviews the results to determine whether any revision is necessary….(More)”

Anonymization and Risk


Paper by Ira Rubinstein and Woodrow Hartzog: “Perfect anonymization of data sets has failed. But the process of protecting data subjects in shared information remains integral to privacy practice and policy. While the deidentification debate has been vigorous and productive, there is no clear direction for policy. As a result, the law has been slow to adapt a holistic approach to protecting data subjects when data sets are released to others. Currently, the law is focused on whether an individual can be identified within a given set. We argue that the better locus of data release policy is on the process of minimizing the risk of reidentification and sensitive attribute disclosure. Process-based data release policy, which resembles the law of data security, will help us move past the limitations of focusing on whether data sets have been “anonymized.” It draws upon different tactics to protect the privacy of data subjects, including accurate deidentification rhetoric, contracts prohibiting reidentification and sensitive attribute disclosure, data enclaves, and query-based strategies to match required protections with the level of risk. By focusing on process, data release policy can better balance privacy and utility where nearly all data exchanges carry some risk….(More)”

Journal of Technology Science


Technology Science is an open access forum for any original material dealing primarily with a social, political, personal, or organizational benefit or adverse consequence of technology. Studies that characterize a technology-society clash or present an approach to better harmonize technology and society are especially welcomed. Papers can come from anywhere in the world.

Technology Science is interested in reviews of research, experiments, surveys, tutorials, and analyses. Writings may propose solutions or describe unsolved problems. Technology Science may also publish letters, short communications, and relevant news items. All submissions are peer-reviewed.

The scientific study of technology-society clashes is a cross-disciplinary pursuit, so papers in Technology Science may come from any of many possible disciplinary traditions, including but not limited to social science, computer science, political science, law, economics, policy, or statistics.

The Data Privacy Lab at Harvard University publishes Technology Science and its affiliated subset of papers called the Journal of Technology Science and maintains them online at techscience.org and at jots.pub. Technology Science is available free of charge over the Internet. While it is possible that bound paper copies of Technology Science content may be produced for a fee, all content will continue to be offered online at no charge….(More)”

 

Index: Crime and Criminal Justice Data


The Living Library Index – inspired by the Harper’s Index – provides important statistics and highlights global trends in governance innovation. This installment focuses on crime and criminal justice data and was originally published in 2015.

This index provides information about the type of crime and criminal justice data collected, shared and used in the United States. Because it is well known that data related to the criminal justice system is often times unreliable, or just plain missing, this index also highlights some of the issues that stand in the way of accessing useful and in-demand statistics.

Data Collections: National Crime Statistics

  • Number of incident-based crime datasets created by the Federal Bureau of Investigation (FBI): 2
    • Number of U.S. Statistical Agencies: 13
    • How many of those are focused on criminal justice: 1, the Bureau of Justice Statistics (BJS)
    • Number of data collections focused on criminal justice the BJS produces: 61
    • Number of federal-level APIs available for crime or criminal justice data: 1, the National Crime Victimization Survey (NCVS).
    • Frequency of the NCVS: annually
  • Number of Statistical Analysis Centers (SACs), organizations that are essentially clearinghouses for crime and criminal justice data for each state, the District of Columbia, Puerto Rico and the Northern Mariana Islands: 53

Open data, data use and the impact of those efforts

  • Number of datasets that are returned when “criminal justice” is searched for on Data.gov: 417, including federal-, state- and city-level datasets
  • Number of datasets that are returned when “crime” is searched for on Data.gov: 281
  • The percentage that public complaints dropped after officers started wearing body cameras, according to a study done in Rialto, Calif.: 88
  • The percentage that reported incidents of officer use of force fell after officers started wearing body cameras, according to a study done in Rialto, Calif.: 5
  • The percent that crime decreased during an experiment in predictive policing in Shreveport, LA: 35  
  • Number of crime data sets made available by the Seattle Police Department – generally seen as a leader in police data innovation – on the Seattle.gov website: 4
    • Major crime stats by category in aggregate
    • Crime trend reports
    • Precinct data by beat
    • State sex offender database
  • Number of datasets mapped by the Seattle Police Department: 2:
      • 911 incidents
    • Police reports
  • Number of states where risk assessment tools must be used in pretrial proceedings to help determine whether an offender is released from jail before a trial: at least 11.

Police Data

    • Number of federally mandated databases that collect information about officer use of force or officer involved shootings, nationwide: 0
    • The year a crime bill was passed that called for data on excessive force to be collected for research and statistical purposes, but has never been funded: 1994
    • Number of police departments that committed to being a part of the White House’s Police Data Initiative: 21
    • Percentage of police departments surveyed in 2013 by the Office of Community Oriented Policing within the Department of Justice that are not using body cameras, therefore not collecting body camera data: 75

The criminal justice system

  • Parts of the criminal justice system where data about an individual can be created or collected: at least 6
    • Entry into the system (arrest)
    • Prosecution and pretrial
    • Sentencing
    • Corrections
    • Probation/parole
    • Recidivism

Sources

  • Crime Mapper. Philadelphia Police Department. Accessed August 24, 2014.

e-Consultation Platforms: Generating or Just Recycling Ideas?


Chapter by Efthimios TambourisAnastasia Migotzidou, and Konstantinos Tarabanis in Electronic Participation: “A number of governments worldwide employ web-based e-consultation platforms to enable stakeholders commenting on draft legislation. Stakeholders’ input includes arguing in favour or against the proposed legislation as well as proposing alternative ideas. In this paper, we empirically investigate the relationship between the volume of contributions in these platforms and the amount of new ideas that are generated. This enables us to determine whether participants in such platforms keep generating new ideas or just recycle a finite number of ideas. We capitalised on argumentation models to code and analyse a large number of draft law consultations published inopengov.gr, the official e-consultation platform for draft legislation in Greece. Our results suggest that as the number of posts grows, the number of new ideas continues to increase. The results of this study improve our understanding of the dynamics of these consultations and enable us to design better platforms….(More)”

 

Policy makers’ perceptions on the transformational effect of Web 2.0 technologies on public services delivery


Paper by Manuel Pedro Rodríguez Bolívar at Electronic Commerce Research: “The growing participation in social networking sites is altering the nature of social relations and changing the nature of political and public dialogue. This paper contributes to the current debate on Web 2.0 technologies and their implications for local governance, identifying the perceptions of policy makers on the use of Web 2.0 in providing public services and on the changing roles that could arise from the resulting interaction between local governments and their stakeholders. The results obtained suggest that policy makers are willing to implement Web 2.0 technologies in providing public services, but preferably under the Bureaucratic model framework, thus retaining a leading role in this implementation. The learning curve of local governments in the use of Web 2.0 technologies is a factor that could influence policy makers’ perceptions. In this respect, many research gaps are identified and further study of the question is recommended….(More)”

Big data algorithms can discriminate, and it’s not clear what to do about it


 at the Conversation“This program had absolutely nothing to do with race…but multi-variable equations.”

That’s what Brett Goldstein, a former policeman for the Chicago Police Department (CPD) and current Urban Science Fellow at the University of Chicago’s School for Public Policy, said about a predictive policing algorithm he deployed at the CPD in 2010. His algorithm tells police where to look for criminals based on where people have been arrested previously. It’s a “heat map” of Chicago, and the CPD claims it helps them allocate resources more effectively.

Chicago police also recently collaborated with Miles Wernick, a professor of electrical engineering at Illinois Institute of Technology, to algorithmically generate a “heat list” of 400 individuals it claims have thehighest chance of committing a violent crime. In response to criticism, Wernick said the algorithm does not use “any racial, neighborhood, or other such information” and that the approach is “unbiased” and “quantitative.” By deferring decisions to poorly understood algorithms, industry professionals effectively shed accountability for any negative effects of their code.

But do these algorithms discriminate, treating low-income and black neighborhoods and their inhabitants unfairly? It’s the kind of question many researchers are starting to ask as more and more industries use algorithms to make decisions. It’s true that an algorithm itself is quantitative – it boils down to a sequence of arithmetic steps for solving a problem. The danger is that these algorithms, which are trained on data produced by people, may reflect the biases in that data, perpetuating structural racism and negative biases about minority groups.

There are a lot of challenges to figuring out whether an algorithm embodies bias. First and foremost, many practitioners and “computer experts” still don’t publicly admit that algorithms can easily discriminate.More and more evidence supports that not only is this possible, but it’s happening already. The law is unclear on the legality of biased algorithms, and even algorithms researchers don’t precisely understand what it means for an algorithm to discriminate….

While researchers clearly understand the theoretical dangers of algorithmic discrimination, it’s difficult to cleanly measure the scope of the issue in practice. No company or public institution is willing to publicize its data and algorithms for fear of being labeled racist or sexist, or maybe worse, having a great algorithm stolen by a competitor.

Even when the Chicago Police Department was hit with a Freedom of Information Act request, they did not release their algorithms or heat list, claiming a credible threat to police officers and the people on the list. This makes it difficult for researchers to identify problems and potentially provide solutions.

Legal hurdles

Existing discrimination law in the United States isn’t helping. At best, it’s unclear on how it applies to algorithms; at worst, it’s a mess. Solon Barocas, a postdoc at Princeton, and Andrew Selbst, a law clerk for the Third Circuit US Court of Appeals, argued together that US hiring law fails to address claims about discriminatory algorithms in hiring.

The crux of the argument is called the “business necessity” defense, in which the employer argues that a practice that has a discriminatory effect is justified by being directly related to job performance….(More)”

Push, Pull, and Spill: A Transdisciplinary Case Study in Municipal Open Government


New paper by Jan Whittington et al: “Cities hold considerable information, including details about the daily lives of residents and employees, maps of critical infrastructure, and records of the officials’ internal deliberations. Cities are beginning to realize that this data has economic and other value: If done wisely, the responsible release of city information can also release greater efficiency and innovation in the public and private sector. New services are cropping up that leverage open city data to great effect.

Meanwhile, activist groups and individual residents are placing increasing pressure on state and local government to be more transparent and accountable, even as others sound an alarm over the privacy issues that inevitably attend greater data promiscuity. This takes the form of political pressure to release more information, as well as increased requests for information under the many public records acts across the country.

The result of these forces is that cities are beginning to open their data as never before. It turns out there is surprisingly little research to date into the important and growing area of municipal open data. This article is among the first sustained, cross-disciplinary assessments of an open municipal government system. We are a team of researchers in law, computer science, information science, and urban studies. We have worked hand-in-hand with the City of Seattle, Washington for the better part of a year to understand its current procedures from each disciplinary perspective. Based on this empirical work, we generate a set of recommendations to help the city manage risk latent in opening its data….(More)”

The Trouble With Disclosure: It Doesn’t Work


Jesse Eisinger at ProPublica: “Louis Brandeis was wrong. The lawyer and Supreme Court justice famously declared that sunlight is the best disinfectant, and we have unquestioningly embraced that advice ever since.

 Over the last century, disclosure and transparency have become our regulatory crutch, the answer to every vexing problem. We require corporations and government to release reams of information on food, medicine, household products, consumer financial tools, campaign finance and crime statistics. We have a booming “report card” industry for a range of services, including hospitals, public schools and restaurants.

All this sunlight is blinding. As new scholarship is demonstrating, the value of all this information is unproved. Paradoxically, disclosure can be useless — and sometimes actually harmful or counterproductive.

“We are doing disclosure as a regulatory move all over the board,” says Adam J. Levitin, a law professor at Georgetown, “The funny thing is, we are doing this despite very little evidence of its efficacy.”

Let’s start with something everyone knows about — the “terms of service” agreements for the likes of iTunes. Like everybody else, I click the “I agree” box, feeling a flash of resentment. I’m certain that in Paragraph 184 is a clause signing away my firstborn to a life of indentured servitude to Timothy D. Cook as his chief caviar spoon keeper.

Our legal theoreticians have determined these opaque monstrosities work because someone, somewhere reads the fine print in these contracts and keeps corporations honest. It turns out what we laymen intuit is true: No one reads them, according to research by a New York University law professor, Florencia Marotta-Wurgler.

In real life, there is no critical mass of readers policing the agreements. And if there were an eagle-eyed crew of legal experts combing through these agreements, what recourse would they have? Most people don’t even know that the Supreme Court has gutted their rights to sue in court, and they instead have to go into arbitration, which usually favors corporations.

The disclosure bonanza is easy to explain. Nobody is against it. It’s politically expedient. Companies prefer such rules, especially in lieu of actual regulations that would curtail bad products or behavior. The opacity lobby — the remora fish class of lawyers, lobbyists and consultants in New York and Washington — knows that disclosure requirements are no bar to dodgy practices. You just have to explain what you’re doing in sufficiently incomprehensible language, a task that earns those lawyers a hefty fee.

Of course, some disclosure works. Professor Levitin cites two examples. The first is an olfactory disclosure. Methane doesn’t have any scent, but a foul smell is added to alert people to a gas leak. The second is ATM. fees. A study in Australia showed that once fees were disclosed, people avoided the high-fee machines and took out more when they had to go to them.

But to Omri Ben-Shahar, co-author of a recent book, ” More Than You Wanted To Know: The Failure of Mandated Disclosure,” these are cherry-picked examples in a world awash in useless disclosures. Of course, information is valuable. But disclosure as a regulatory mechanism doesn’t work nearly well enough, he argues….(More)