The false promise of the digital humanities


Adam Kirsch in the New Republic: “The humanities are in crisis again, or still. But there is one big exception: digital humanities, which is a growth industry. In 2009, the nascent field was the talk of the Modern Language Association (MLA) convention: “among all the contending subfields,” a reporter wrote about that year’s gathering, “the digital humanities seem like the first ‘next big thing’ in a long time.” Even earlier, the National Endowment for the Humanities created its Office of Digital Humanities to help fund projects. And digital humanities continues to go from strength to strength, thanks in part to the Mellon Foundation, which has seeded programs at a number of universities with large grantsmost recently, $1 million to the University of Rochester to create a graduate fellowship.

Despite all this enthusiasm, the question of what the digital humanities is has yet to be given a satisfactory answer. Indeed, no one asks it more often than the digital humanists themselves. The recent proliferation of books on the subjectfrom sourcebooks and anthologies to critical manifestosis a sign of a field suffering an identity crisis, trying to determine what, if anything, unites the disparate activities carried on under its banner. “Nowadays,” writes Stephen Ramsay in Defining Digital Humanities, “the term can mean anything from media studies to electronic art, from data mining to edutech, from scholarly editing to anarchic blogging, while inviting code junkies, digital artists, standards wonks, transhumanists, game theorists, free culture advocates, archivists, librarians, and edupunks under its capacious canvas.”

Within this range of approaches, we can distinguish a minimalist and a maximalist understanding of digital humanities. On the one hand, it can be simply the application of computer technology to traditional scholarly functions, such as the editing of texts. An exemplary project of this kind is the Rossetti Archive created by Jerome McGann, an online repository of texts and images related to the career of Dante Gabriel Rossetti: this is essentially an open-ended, universally accessible scholarly edition. To others, however, digital humanities represents a paradigm shift in the way we think about culture itself, spurring a change not just in the medium of humanistic work but also in its very substance. At their most starry-eyed, some digital humanistssuch as the authors of the jargon-laden manifesto and handbook Digital_Humanitieswant to suggest that the addition of the high-powered adjective to the long-suffering noun signals nothing less than an epoch in human history: “We live in one of those rare moments of opportunity for the humanities, not unlike other great eras of cultural-historical transformation such as the shift from the scroll to the codex, the invention of movable type, the encounter with the New World, and the Industrial Revolution.”

The language here is the language of scholarship, but the spirit is the spirit of salesmanshipthe very same kind of hyperbolic, hard-sell approach we are so accustomed to hearing about the Internet, or  about Apple’s latest utterly revolutionary product. Fundamental to this kind of persuasion is the undertone of menace, the threat of historical illegitimacy and obsolescence. Here is the future, we are made to understand: we can either get on board or stand athwart it and get run over. The same kind of revolutionary rhetoric appears again and again in the new books on the digital humanities, from writers with very different degrees of scholarly commitment and intellectual sophistication.

In Uncharted, Erez Aiden and Jean-Baptiste Michel, the creators of the Google Ngram Vieweran online tool that allows you to map the frequency of words in all the printed matter digitized by Googletalk up the “big data revolution”: “Its consequences will transform how we look at ourselves…. Big data is going to change the humanities, transform the social sciences, and renegotiate the relationship between the world of commerce and the ivory tower.” These breathless prophecies are just hype. But at the other end of the spectrum, even McGann, one of the pioneers of what used to be called “humanities computing,” uses the high language of inevitability: “Here is surely a truth now universally acknowledged: that the whole of our cultural inheritance has to be recurated and reedited in digital forms and institutional structures.”

If ever there were a chance to see the ideological construction of reality at work, digital humanities is it. Right before our eyes, options are foreclosed and demands enforced; a future is constructed as though it were being discovered. By now we are used to this process, since over the last twenty years the proliferation of new technologies has totally discredited the idea of opting out of “the future.”…

Open Government Data Gains Global Momentum


Wyatt Kash in Information Week: “Governments across the globe are deepening their strategic commitments and working more closely to make government data openly available for public use, according to public and private sector leaders who met this week at the inaugural Open Government Data Forum in Abu Dhabi, hosted by the United Nations and the United Arab Emirates, April 28-29.

Data experts from Europe, the Middle East, the US, Canada, Korea, and the World Bank highlighted how one country after another has set into motion initiatives to expand the release of government data and broaden its use. Those efforts are gaining traction due to multinational organizations, such as the Open Government Partnership, the Open Data Institute, The World Bank, and the UN’s e-government division, that are trying to share practices and standardize open data tools.
In the latest example, the French government announced April 24 that it is joining the Open Government Partnership, a group of 64 countries working jointly to make their governments more open, accountable, and responsive to citizens. The announcement caps a string of policy shifts, which began with the formal release of France’s Open Data Strategy in May 2011 and which parallel similar moves by the US.
The strategy committed France to providing “free access and reuse of public data… using machine-readable formats and open standards,” said Romain Lacombe, head of innovation for the French prime minister’s open government task force, Etalab. The French government is taking steps to end the practice of selling datasets, such as civil and case-law data, and is making them freely reusable. France launched a public data portal, Data.gouv.fr, in December 2011 and joined a G8 initiative to engage with open data innovators worldwide.
For South Korea, open data is not just about achieving greater transparency and efficiency, but is seen as digital fuel for a nation that by 2020 expects to achieve “ambient intelligence… when all humans and things are connected together,” said Dr. YoungSun Lee, who heads South Korea’s National Information Society Agency.
He foresees open data leading to a shift in the ways government will function: from an era of e-government, where information is delivered to citizens, to one where predictive analysis will foster a “creative government,” in which “government provides customized services for each individual.”
The open data movement is also propelling innovative programs in the United Arab Emirates. “The role of open data in directing economic and social decisions pertaining to investments… is of paramount importance” to the UAE, said Dr. Ali M. Al Khouri, director general of the Emirates Identity Authority. It also plays a key role in building public trust and fighting corruption, he said….”

Findings of the Big Data and Privacy Working Group Review


John Podesta at the White House Blog: “Over the past several days, severe storms have battered Arkansas, Oklahoma, Mississippi and other states. Dozens of people have been killed and entire neighborhoods turned to rubble and debris as tornadoes have touched down across the region. Natural disasters like these present a host of challenges for first responders. How many people are affected, injured, or dead? Where can they find food, shelter, and medical attention? What critical infrastructure might have been damaged?
Drawing on open government data sources, including Census demographics and NOAA weather data, along with their own demographic databases, Esri, a geospatial technology company, has created a real-time map showing where the twisters have been spotted and how the storm systems are moving. They have also used these data to show how many people live in the affected area, and summarize potential impacts from the storms. It’s a powerful tool for emergency services and communities. And it’s driven by big data technology.
In January, President Obama asked me to lead a wide-ranging review of “big data” and privacy—to explore how these technologies are changing our economy, our government, and our society, and to consider their implications for our personal privacy. Together with Secretary of Commerce Penny Pritzker, Secretary of Energy Ernest Moniz, the President’s Science Advisor John Holdren, the President’s Economic Advisor Jeff Zients, and other senior officials, our review sought to understand what is genuinely new and different about big data and to consider how best to encourage the potential of these technologies while minimizing risks to privacy and core American values.
Over the course of 90 days, we met with academic researchers and privacy advocates, with regulators and the technology industry, with advertisers and civil rights groups. The President’s Council of Advisors for Science and Technology conducted a parallel study of the technological trends underpinning big data. The White House Office of Science and Technology Policy jointly organized three university conferences at MIT, NYU, and U.C. Berkeley. We issued a formal Request for Information seeking public comment, and hosted a survey to generate even more public input.
Today, we presented our findings to the President. We knew better than to try to answer every question about big data in three months. But we are able to draw important conclusions and make concrete recommendations for Administration attention and policy development in a few key areas.
There are a few technological trends that bear drawing out. The declining cost of collection, storage, and processing of data, combined with new sources of data like sensors, cameras, and geospatial technologies, mean that we live in a world of near-ubiquitous data collection. All this data is being crunched at a speed that is increasingly approaching real-time, meaning that big data algorithms could soon have immediate effects on decisions being made about our lives.
The big data revolution presents incredible opportunities in virtually every sector of the economy and every corner of society.
Big data is saving lives. Infections are dangerous—even deadly—for many babies born prematurely. By collecting and analyzing millions of data points from a NICU, one study was able to identify factors, like slight increases in body temperature and heart rate, that serve as early warning signs an infection may be taking root—subtle changes that even the most experienced doctors wouldn’t have noticed on their own.
Big data is making the economy work better. Jet engines and delivery trucks now come outfitted with sensors that continuously monitor hundreds of data points and send automatic alerts when maintenance is needed. Utility companies are starting to use big data to predict periods of peak electric demand, adjusting the grid to be more efficient and potentially averting brown-outs.
Big data is making government work better and saving taxpayer dollars. The Centers for Medicare and Medicaid Services have begun using predictive analytics—a big data technique—to flag likely instances of reimbursement fraud before claims are paid. The Fraud Prevention System helps identify the highest-risk health care providers for waste, fraud, and abuse in real time and has already stopped, prevented, or identified $115 million in fraudulent payments.
But big data raises serious questions, too, about how we protect our privacy and other values in a world where data collection is increasingly ubiquitous and where analysis is conducted at speeds approaching real time. In particular, our review raised the question of whether the “notice and consent” framework, in which a user grants permission for a service to collect and use information about them, still allows us to meaningfully control our privacy as data about us is increasingly used and reused in ways that could not have been anticipated when it was collected.
Big data raises other concerns, as well. One significant finding of our review was the potential for big data analytics to lead to discriminatory outcomes and to circumvent longstanding civil rights protections in housing, employment, credit, and the consumer marketplace.
No matter how quickly technology advances, it remains within our power to ensure that we both encourage innovation and protect our values through law, policy, and the practices we encourage in the public and private sector. To that end, we make six actionable policy recommendations in our report to the President:
Advance the Consumer Privacy Bill of Rights. Consumers deserve clear, understandable, reasonable standards for how their personal information is used in the big data era. We recommend the Department of Commerce take appropriate consultative steps to seek stakeholder and public comment on what changes, if any, are needed to the Consumer Privacy Bill of Rights, first proposed by the President in 2012, and to prepare draft legislative text for consideration by stakeholders and submission by the President to Congress.
Pass National Data Breach Legislation. Big data technologies make it possible to store significantly more data, and further derive intimate insights into a person’s character, habits, preferences, and activities. That makes the potential impacts of data breaches at businesses or other organizations even more serious. A patchwork of state laws currently governs requirements for reporting data breaches. Congress should pass legislation that provides for a single national data breach standard, along the lines of the Administration’s 2011 Cybersecurity legislative proposal.
Extend Privacy Protections to non-U.S. Persons. Privacy is a worldwide value that should be reflected in how the federal government handles personally identifiable information about non-U.S. citizens. The Office of Management and Budget should work with departments and agencies to apply the Privacy Act of 1974 to non-U.S. persons where practicable, or to establish alternative privacy policies that apply appropriate and meaningful protections to personal information regardless of a person’s nationality.
Ensure Data Collected on Students in School is used for Educational Purposes. Big data and other technological innovations, including new online course platforms that provide students real time feedback, promise to transform education by personalizing learning. At the same time, the federal government must ensure educational data linked to individual students gathered in school is used for educational purposes, and protect students against their data being shared or used inappropriately.
Expand Technical Expertise to Stop Discrimination. The detailed personal profiles held about many consumers, combined with automated, algorithm-driven decision-making, could lead—intentionally or inadvertently—to discriminatory outcomes, or what some are already calling “digital redlining.” The federal government’s lead civil rights and consumer protection agencies should expand their technical expertise to be able to identify practices and outcomes facilitated by big data analytics that have a discriminatory impact on protected classes, and develop a plan for investigating and resolving violations of law.
Amend the Electronic Communications Privacy Act. The laws that govern protections afforded to our communications were written before email, the internet, and cloud computing came into wide use. Congress should amend ECPA to ensure the standard of protection for online, digital content is consistent with that afforded in the physical world—including by removing archaic distinctions between email left unread or over a certain age.
We also identify several broader areas ripe for further study, debate, and public engagement that, collectively, we hope will spark a national conversation about how to harness big data for the public good. We conclude that we must find a way to preserve our privacy values in both the domestic and international marketplace. We urgently need to build capacity in the federal government to identify and prevent new modes of discrimination that could be enabled by big data. We must ensure that law enforcement agencies using big data technologies do so responsibly, and that our fundamental privacy rights remain protected. Finally, we recognize that data is a valuable public resource, and call for continuing the Administration’s efforts to open more government data sources and make investments in research and technology.
While big data presents new challenges, it also presents immense opportunities to improve lives, the United States is perhaps better suited to lead this conversation than any other nation on earth. Our innovative spirit, technological know-how, and deep commitment to values of privacy, fairness, non-discrimination, and self-determination will help us harness the benefits of the big data revolution and encourage the free flow of information while working with our international partners to protect personal privacy. This review is but one piece of that effort, and we hope it spurs a conversation about big data across the country and around the world.
Read the Big Data Report.
See the fact sheet from today’s announcement.

Saving Big Data from Big Mouths


Cesar A. Hidalgo in Scientific American: “It has become fashionable to bad-mouth big data. In recent weeks the New York Times, Financial Times, Wired and other outlets have all run pieces bashing this new technological movement. To be fair, many of the critiques have a point: There has been a lot of hype about big data and it is important not to inflate our expectations about what it can do.
But little of this hype has come from the actual people working with large data sets. Instead, it has come from people who see “big data” as a buzzword and a marketing opportunity—consultants, event organizers and opportunistic academics looking for their 15 minutes of fame.
Most of the recent criticism, however, has been weak and misguided. Naysayers have been attacking straw men, focusing on worst practices, post hoc failures and secondary sources. The common theme has been to a great extent obvious: “Correlation does not imply causation,” and “data has biases.”
Critics of big data have been making three important mistakes:
First, they have misunderstood big data, framing it narrowly as a failed revolution in social science hypothesis testing. In doing so they ignore areas where big data has made substantial progress, such as data-rich Web sites, information visualization and machine learning. If there is one group of big-data practitioners that the critics should worship, they are the big-data engineers building the social media sites where their platitudes spread. Engineering a site rich in data, like Facebook, YouTube, Vimeo or Twitter, is extremely challenging. These sites are possible because of advances made quietly over the past five years, including improvements in database technologies and Web development frameworks.
Big data has also contributed to machine learning and computer vision. Thanks to big data, Facebook algorithms can now match faces almost as accurately as humans do.
And detractors have overlooked big data’s role in the proliferation of computational design, data journalism and new forms of artistic expression. Computational artists, journalists and designers—the kinds of people who congregate at meetings like Eyeo—are using huge sets of data to give us online experiences that are unlike anything we experienced in paper. If we step away from hypothesis testing, we find that big data has made big contributions.
The second mistake critics often make is to confuse the limitations of prototypes with fatal flaws. This is something I have experienced often. For example, in Place Pulse—a project I created with my team the M.I.T. Media Lab—we used Google Street View images and crowdsourced visual surveys to map people’s perception of a city’s safety and wealth. The original method was rife with limitations that we dutifully acknowledged in our paper. Google Street View images are taken at arbitrary times of the day and showed cities from the perspective of a car. City boundaries were also arbitrary. To overcome these limitations, however, we needed a first data set. Producing that first limited version of Place Pulse was a necessary part of the process of making a working prototype.
A year has passed since we published Place Pulse’s first data set. Now, thanks to our focus on “making,” we have computer vision and machine-learning algorithms that we can use to correct for some of these easy-to-spot distortions. Making is allowing us to correct for time of the day and dynamically define urban boundaries. Also, we are collecting new data to extend the method to new geographical boundaries.
Those who fail to understand that the process of making is iterative are in danger of  being too quick to condemn promising technologies.  In 1920 the New York Times published a prediction that a rocket would never be able to leave  atmosphere. Similarly erroneous predictions were made about the car or, more recently, about iPhone’s market share. In 1969 the Times had to publish a retraction of their 1920 claim. What similar retractions will need to be published in the year 2069?
Finally, the doubters have relied too heavily on secondary sources. For instance, they made a piñata out of the 2008 Wired piece by Chris Anderson framing big data as “the end of theory.” Others have criticized projects for claims that their creators never made. A couple of weeks ago, for example, Gary Marcus and Ernest Davis published a piece on big data in the Times. There they wrote about another of one of my group’s projects, Pantheon, which is an effort to collect, visualize and analyze data on historical cultural production. Marcus and Davis wrote that Pantheon “suggests a misleading degree of scientific precision.” As an author of the project, I have been unable to find where I made such a claim. Pantheon’s method section clearly states that: “Pantheon will always be—by construction—an incomplete resource.” That same section contains a long list of limitations and caveats as well as the statement that “we interpret this data set narrowly, as the view of global cultural production that emerges from the multilingual expression of historical figures in Wikipedia as of May 2013.”
Bickering is easy, but it is not of much help. So I invite the critics of big data to lead by example. Stop writing op–eds and start developing tools that improve on the state of the art. They are much appreciated. What we need are projects that are worth imitating and that we can build on, not obvious advice such as “correlation does not imply causation.” After all, true progress is not something that is written, but made.”

Mapping the Intersection Between Social Media and Open Spaces in California


Stamen Design: “Last month, Stamen launched parks.stamen.com, a project we created in partnership with the Electric Roadrunner Lab, with the goal of revealing the diversity of social media activity that happens inside parks and other open spaces in California. If you haven’t already looked at the site, please go visit it now! Find your favorite park, or the parks that are nearest to you, or just stroll between random parks using the wander button. For more background about the goals of the project, read Eric’s blog post: A Conversation About California Parks.
In this post I’d like to describe some of the algorithms we use to collect the social media data that feeds the park pages. Currently we collect data from four social media platforms: Twitter, Foursquare, Flickr, and Instagram. We chose these because they all have public APIs (Application Programming Interfaces) that are easy to work with, and we expect they will provide a view into the different facets of each park, and the diverse communities who enjoy these parks. Each social media service creates its own unique geographies, and its own way of representing these parks. For example, the kinds of photos you upload to Instagram might be different from the photos you upload to Flickr. The way you describe experiences using Twitter might be different from the moments you document by checking into Foursquare. In the future we may add more feeds, but for now there’s a lot we can learn from these four.
Through the course of collecting data from these social network services, I also found that each service’s public API imposes certain constraints on our queries, producing their own intricate patterns. Thus, the quirks of how each API was written results in distinct and fascinating geometries. Also, since we are only interested in parks for this project, the process of culling non-park-related content further produces unusual and interesting patterns. Rural areas have large parks that cover huge areas, while cities have lots of (relatively) tiny parks, which creates its own challenges for how we query the APIs.
Broadly, we followed a similar approach for all the social media services. First, we grab the geocoded data from the APIs. This ignores any media that don’t have a latitude and longitude associated with them. In Foursquare, almost all checkins have a latitude and longitude, and for Flickr and Instagram most photos have a location associated with them. However, for Twitter, only around 1% of all tweets have geographic coordinates. But as we will see, even 1% still results in a whole lot of tweets!
After grabbing the social media data, we intersect it with the outlines of parks and open spaces in California, using polygons from the California Protected Areas Database maintained by GreenInfo Network. Everything that doesn’t intersect one of these parks, we throw away. The following maps represent the data as it looks before the filtering process.
But enough talking, let’s look at some maps!”

This is what happens when you give social networking to doctors


in PandoDaily: “Dr. Gregory Kurio will never forget the time he was called to the ER because a epileptic girl was brought in suffering a cardiac arrest of sorts (HIPAA mandates he doesn’t give out the specific details of the situation). In the briefing, he learned the name of her cardiac physician who he happened to know through the industry. He subsequently called the other doctor and asked him to send over any available information on the patient — latest meds, EKGs, recent checkups, etc.

The scene in the ER was, to be expected, one of chaos, with trainees and respiratory nurses running around grabbing machinery and meds. Crucial seconds were ticking past, and Dr. Kurio quickly realized the fax machine was not the best approach for receiving the records he needed. ER fax machines are often on the opposite of the emergency room, take awhile to print lengthy of records, frequently run out of paper, and aren’t always reliable – not exactly the sort of technology you want when a patient’s life or death hangs in the midst.

Email wasn’t an option either, because HIPAA mandates that sensitive patient files are only sent through secure channels. With precious little time to waste, Dr. Kurio decided to take a chance on a new technology service he had just signed up for — Doximity.

Doximity is a LinkedIn for Doctors of sorts. It has, as one feature, a secure e-fax system that turns faxes into digital messages and sends them to a user’s mobile device. Dr. Kurio gave the other physician his e-fax number, and a little bit of techno-magic happened.

….

With a third of the nation’s doctors on the platform, today Doximity announced a $54 million Series C from DFJ,  T. Rowe Price Associates, Morgan Stanley, and existing investors. The funding news isn’t particularly important, in and of itself, aside from the fact that the company is attracting the attention of private market investors very early in its growth trajectory. But it’s a good opportunity to take a look at Doximity’s business model, how it mirrors the upwards growth of other vertical professional social networks (say that five times fast), and the way it’s transforming our healthcare providers’ jobs.

Doximity works, in many ways, just like LinkedIn. Doctors have profiles with pictures and their resume, and recruiters pay the company to message medical professionals. “If you think it’s hard to find a Ruby developer in San Francisco, try to find an emergency room physician in Indiana,” Doximity CEO Jeff Tangney says. One recruiter’s pain is a smart entrepreneur’s pleasure — a simple, straightforward monetization strategy.

But unlike LinkedIn, Doximity can dive much deeper on meeting doctors’ needs through specialized features like the e-fax system. It’s part of the reason Konstantin Guericke, one of LinkedIn’s “forgotten” co-founders, was attracted to the company and decided to join the board as an advisor. “In some ways, it’s a lot like LinkedIn,” Guericke says, when asked why he decided to help out. “But for me it’s the pleasure of focusing on a more narrow audience and making more of an impact on their life.”

In another such high-impact, specialized feature, doctors can access Doximity’s Google Alerts-like system for academic articles. They can sign up to receive notifications when stories are published about their obscure specialties. That means time-strapped physicians gain a more efficient way to stay up to date on all the latest research and information in their field. You can imagine that might impact the quality of the care they provide.

Lastly, Doximity offers a secure messaging system, allowing doctors to email one another regarding a fellow patient. Such communication is a thorny issue for doctors given HIPPA-related privacy requirements. There are limited ways to legally update say, a primary care physician when a specialist learns one of their patients has colon cancer. It turns into a big game of phone tag to relay what should be relatively straightforward information. Furthermore, leaving voicemails and sending faxes can result in details getting lost in what its an searchable system.

The platform is free for doctors, and it has attracted them quickly join in droves. Doximity co-founder and CEO Jeff Tangney estimates that last year the platform had added 15 to 16 percent of US doctors. But this year, the company claims it’s “on track to have half of US physicians as members by this summer.” Fairly impressive growth rate and market penetration.

With great market penetration comes great power. And dollars. Although the company is only monetizing through recruitment at the moment, the real money to be made with this service is through targeted advertising. Think about how much big pharma and medtech companies would be willing to cough up to to communicate at scale with the doctors who make purchasing decisions. Plus, this is an easy way for them to target industry thought leaders or professionals with certain specialties.

Doximity’s founders’ and investors’ eyes might be seeing dollar signs, but they haven’t rolled anything out yet on the advertising front. They’re wary and want to do so in a way that ads value to all parties while avoiding pissing off medical professionals. When they finally pul lthe trigger, however, it’s has the potential to be a Gold Rush.

Doximity isn’t the only company to have discovered there’s big money to be made in vertical professional social networks. As Pando has written, there’s a big trend in this regard. Spiceworks, the social network for IT professionals which claims to have a third of the world’s IT professionals on the site, just raised $57 million in a round led by none other than Goldman Sachs. Why does the firm have such faith in a free social network for IT pros — seemingly the most mundane and unprofitable of endeavors? Well, just like with doctor and pharma corps, IT companies are willing to shell out big to market their wares directly to such IT pros.

Although the monetization strategies differ from business to business, ResearchGate is building a similar community with a social network of scientists around the world, Edmodo is doing it with educators, GitHub with developers, GrabCAD for mechanical engineers. I’ve argued that such vertical professional social networks are a threat to LinkedIn, stealing business out from under it in large industry swaths. LinkedIn cofounder Konstantin Guericke disagrees.

“I don’t think it’s stealing revenue from them. Would it make sense for LinkedIn to add a profile subset about what insurance someone takes? That would just be clutter,” Guericke says. “It’s more going after an opportunity LinkedIn isn’t well positioned to capitalize on. They could do everything Doximity does, but they’d have to give up something else.”

All businesses come with their own challenges, and Doximity will certainly face its share of them as it scales. It has overcome the initial hurdle of achieving the network effects that come with penetrating the a large segment of the market. Next will come monetizing sensitively and continuing to protecting users — and patients’ — privacy.

There are plenty of data minefields to be had in a sector as closely regulated as healthcare, as fellow medical startup Practice Fusion recently found out. Doximity has to make sure its system for onboarding and verifying new doctors is airtight. The company has already encountered some instances of individuals trying to pose as medical professionals to get access to another’s records — specifically a former lover trying to chase down their ex-spouse’s STI tests. One blowup where the company approves someone they shouldn’t or hackers break into the system, and doctors could lose trust in the safety of the technology….”

Looking for the Needle in a Stack of Needles: Tracking Shadow Economic Activities in the Age of Big Data


Manju Bansal in MIT Technology Review: “The undocumented guys hanging out in the home-improvement-store parking lot looking for day labor, the neighborhood kids running a lemonade stand, and Al Qaeda terrorists plotting to do harm all have one thing in common: They operate in the underground economy, a shadowy zone where businesses, both legitimate and less so, transact in the currency of opportunity, away from traditional institutions and their watchful eyes.
One might think that this alternative economy is limited to markets that are low on the Transparency International rankings (such as sub-Saharan Africa and South Asia, for instance). However, a recent University of Wisconsin report estimates the value of the underground economy in the United States at about $2 trillion, about 15% of the total U.S. GDP. And a 2013 study coauthored by Friedrich Schneider, a noted authority on global shadow economies, estimated the European Union’s underground economy at more than 18% of GDP, or a whopping 2.1 trillion euros. More than two-thirds of the underground activity came from the most developed countries, including Germany, France, Italy, Spain, and the United Kingdom.
Underground economic activity is a multifaceted phenomenon, with implications across the board for national security, tax collections, public-sector services, and more. It includes the activity of any business that relies primarily on old-fashioned cash for most transactions — ranging from legitimate businesses (including lemonade stands) to drug cartels and organized crime.
Though it’s often soiled, heavy to lug around, and easy to lose to theft, cash is still king simply because it is so easy to hide from the authorities. With the help of the right bank or financial institution, “dirty” money can easily be laundered and come out looking fresh and clean, or at least legitimate. Case in point is the global bank HSBC, which agreed to pay U.S. regulators $1.9 billion in fines to settle charges of money laundering on behalf of Mexican drug cartels. According to a U.S. Senate subcommittee report, that process involved transferring $7 billion in cash from the bank’s branches in Mexico to those in the United States. Just for reference, each $100 bill weighs one gram, so to transfer $7 billion, HSBC had to physically transport 70 metric tons of cash across the U.S.-Mexican border.
The Financial Action Task Force, an intergovernmental body established in 1989, has estimated the total amount of money laundered worldwide to be around 2% to 5% of global GDP. Many of these transactions seem, at first glance, to be perfectly legitimate. Therein lies the conundrum for a banker or a government official: How do you identify, track, control, and, one hopes, prosecute money launderers, when they are hiding in plain sight and their business is couched in networked layers of perfectly defensible legitimacy?
Enter big-data tools, such as those provided by SynerScope, a Holland-based startup that is a member of the SAP Startup Focus program. This company’s solutions help unravel the complex networks hidden behind the layers of transactions and interactions.
Networks, good or bad, are near omnipresent in almost any form of organized human activity and particularly in banking and insurance. SynerScope takes data from both structured and unstructured data fields and transforms these into interactive computer visuals that display graphic patterns that humans can use to quickly make sense of information. Spotting of deviations in complex networked processes can easily be put to use in fraud detection for insurance, banking, e-commerce, and forensic accounting.
SynerScope’s approach to big-data business intelligence is centered on data-intense compute and visualization that extend the human “sense-making” capacity in much the same way that a telescope or microscope extends human vision.
To understand how SynerScope helps authorities track and halt money laundering, it’s important to understand how the networked laundering process works. It typically involves three stages.
1. In the initial, or placement, stage, launderers introduce their illegal profits into the financial system. This might be done by breaking up large amounts of cash into less-conspicuous smaller sums that are then deposited directly into a bank account, or by purchasing a series of monetary instruments (checks, money orders) that are then collected and deposited into accounts at other locations.
2. After the funds have entered the financial system, the launderer commences the second stage, called layering, which uses a series of conversions or transfers to distance the funds from their sources. The funds might be channeled through the purchase and sales of investment instruments, or the launderer might simply wire the funds through a series of accounts at various banks worldwide. 
Such use of widely scattered accounts for laundering is especially prevalent in those jurisdictions that do not cooperate in anti-money-laundering investigations. Sometimes the launderer disguises the transfers as payments for goods or services.
3. Having successfully processed the criminal profits through the first two phases, the launderer then proceeds to the third stage, integration, in which the funds re-enter the legitimate economy. The launderer might invest the funds in real estate, luxury assets, or business ventures.
Current detection tools compare individual transactions against preset profiles and rules. Sophisticated criminals quickly learn how to make their illicit transactions look normal for such systems. As a result, rules and profiles need constant and costly updating.
But SynerScope’s flexible visual analysis uses a network angle to detect money laundering. It shows the structure of the entire network with data coming in from millions of transactions, a structure that launderers cannot control. With just a few mouse clicks, SynerScope’s relation and sequence views reveal structural interrelationships and interdependencies. When those patterns are mapped on a time scale, it becomes virtually impossible to hide abnormal flows.

SynerScope’s relation and sequence views reveal structural and temporal transaction patterns which make it virtually impossible to hide abnormal money flows.”

Using data to treat the sickest and most expensive patients


Dan Gorenstein for Marketplace (radio):  “Driving to a big data conference a few weeks back, Dr. Jeffrey Brenner brought his compact SUV to a full stop – in the middle of a short highway entrance ramp in downtown Philadelphia…

Here’s what you need to know about Dr. Jeffrey BrennerHe really likes to figure out how things work. And he’s willing to go to extremes to do it – so far that he’s risking his health policy celebrity status.
Perhaps it’s not the smartest move from a guy who just last fall was named a MacArthur Genius, but this month, Brenner began to test his theory for treating some of the sickest and most expensive patients.
“We can actually take the sickest and most complicated patients, go to their bedside, go to their home, go with them to their appointments and help them for about 90 days and dramatically improve outcomes and reduce cost,” he says.
That’s the theory anyway. Like many ideas when it comes to treating the sickest patients, there’s little data to back up that it works.
Brenner’s willing to risk his reputation precisely because he’s not positive his approach for treating folks who cycle in and out of the healthcare system — “super-utilizers” — actually works.
“It’s really easy for me at this point having gotten a MacArthur award to simply declare what we do works and to drive this work forward without rigorously testing it,” Brenner said. “We are not going to do that,” he said. “We don’t think that’s the right thing to do. So we are going to do a randomized controlled trial on our work and prove whether it works and how well it works.”
Helping lower costs and improve care for the super-utilizers is one of the most pressing policy questions in healthcare today. And given its importance, there is a striking lack of data in the field.
People like to call randomized controlled trials (RCTs) the gold standard of scientific testing because two groups are randomly assigned – one gets the treatment, while the other doesn’t – and researchers closely monitor differences.
But a 2012 British Medical Journal article found over the last 25 years, a total of six RCTs have focused on care delivery for super-utilizers.

Randomized Clinical Trials (RCTs)

…Every major health insurance company – Medicare and Medicaid, too – has spent billions on programs for super-utilizers. The absence of rigorous evidence raises the question: Is all this effort built on health policy quicksand?
Not being 100 percent sure can be dangerous, says Duke behavioral scientist Peter Ubel, particularly in healthcare.
Ubel said back in the 1980s and 90s doctors prescribed certain drugs for irregular heartbeats. The medication, he said, made those weird rhythms go away, leaving beautiful-looking EKGs.
“But no one had tested whether people receiving these drugs actually lived longer, and many people thought, ‘Why would you do that? We can look at their cardiogram and see that they’re getting better,’” Ubel said. “Finally when somebody put that evidence to the test of a randomized trial, it turned out that these drugs killed people.”
WellPoint’s Nussbaum said he hoped Brenner’s project would inspire others to follow his lead and insert data into the discussion.
“I believe more people should be bold in challenging the status quo of our delivery system,” Nussbaum said. “The Jeff Brenners of the world should be embraced. We should be advocating for them to take on these studies.”
So why aren’t more healthcare luminaries putting their brilliance to the test? There are a couple of reasons.
Harvard economist Kate Baicker said until now there have been few personal incentives pushing people.
“If you’re focused on branding and spreading your brand, you have no incentive to say, ‘How good is my brand after all?’” she said.
And Venrock healthcare venture capitalist Bob Kocher said no one would fault Brenner if he put his brand before science, an age-old practice in this business.
“Healthcare has benefitted from the fact that you don’t understand it. It’s a bit of an art, and it hasn’t been a science,” he said. “You made money in healthcare by putting a banner outside your building saying you are a top something without having to justify whether you really are top at whatever you do.”
Duke’s Ubel said it’s too easy – and frankly, wrong – to say the main reason doctors avoid these rigorous studies is because they’re afraid to lose money and status. He said doctors aren’t immune from the very human trap of being sure their own ideas are right.
He says psychologists call it confirmation bias.
“Everything you see is filtered through your hopes, your expectations and your pre-existing beliefs,” Ubel said. “And that’s why I might look at a grilled cheese sandwich and see a grilled cheese sandwich and you might see an image of Jesus,” he says.
Even with all these hurdles, MIT economist Amy Finkelstein – who is running the RCT with Brenner – sees change coming.
“Providers have a lot more incentive now than they use to,” she said. “They have much more skin in the game.”
Finkelstein said hospital readmission penalties and new ways to pay doctors are bringing market incentives that have long been missing.
Brenner said he accepts that the truth of what he’s doing in Camden may be messier than the myth.

Collective intelligence in crises


Buscher, Monika and Liegl, Michael in: Social collective intelligence. Computational Social Sciences Series: “New practices of social media use in emergency response seem to enable broader ‘situation awareness’ and new forms of crisis management. The scale and speed of innovation in this field engenders disruptive innovation or a reordering of social, political, economic practices of emergency response. By examining these dynamics with the concept of social collective intelligence, important opportunities and challenges can be examined. In this chapter we focus on socio-technical aspects of social collective intelligence in crises to discuss positive and negative frictions and avenues for innovation. Of particular interest are ways of bridging between collective intelligence in crises and official emergency response efforts.”

Twitter Can Now Predict Crime, and This Raises Serious Questions


Motherboard: “Police departments in New York City may soon be using geo-tagged tweets to predict crime. It sounds like a far-fetched sci-fi scenario a la Minority Report, but when I contacted Dr. Matthew Greber, the University of Virginia researcher behind the technology, he explained that the system is far more mathematical than metaphysical.
The system Greber has devised is an amalgam of both old and new techniques. Currently, many police departments target hot spots for criminal activity based on actual occurrences of crime. This approach, called kernel density estimation (KDE), involves pairing a historical crime record with a geographic location and using a probability function to calculate the possibility of future crimes occurring in that area. While KDE is a serviceable approach to anticipating crime, it pales in comparison to the dynamism of Twitter’s real-time data stream, according to Dr. Gerber’s research paper “Predicting Crime Using Twitter and Kernel Density Estimation”.
Dr. Greber’s approach is similar to KDE, but deals in the ethereal realm of data and language, not paperwork. The system involves mapping the Twitter environment; much like how police currently map the physical environment with KDE. The big difference is that Greber is looking at what people are talking about in real time, as well as what they do after the fact, and seeing how well they match up. The algorithms look for certain language that is likely to indicate the imminent occurrence of a crime in the area, Greber says. “We might observe people talking about going out, getting drunk, going to bars, sporting events, and so on—we know that these sort of events correlate with crime, and that’s what the models are picking up on.”
Once this data is collected, the GPS tags in tweets allows Greber and his team to pin them to a virtual map and outline hot spots for potential crime. However, everyone who tweets about hitting the club later isn’t necessarily going to commit a crime. Greber tests the accuracy of his approach by comparing Twitter-based KDE predictions with traditional KDE predictions based on police data alone. The big question is, does it work? For Greber, the answer is a firm “sometimes.” “It helps for some, and it hurts for others,” he says.
According to the study’s results, Twitter-based KDE analysis yielded improvements in predictive accuracy over traditional KDE for stalking, criminal damage, and gambling. Arson, kidnapping, and intimidation, on the other hand, showed a decrease in accuracy from traditional KDE analysis. It’s not clear why these crimes are harder to predict using Twitter, but the study notes that the issue may lie with the kind of language used on Twitter, which is characterized by shorthand and informal language that can be difficult for algorithms to parse.
This kind of approach to high-tech crime prevention brings up the familiar debate over privacy and the use of users’ date for purposes they didn’t explicitly agree to. The case becomes especially sensitive when data will be used by police to track down criminals. On this point, though he acknowledges post-Snowden societal skepticism regarding data harvesting for state purposes, Greber is indifferent. “People sign up to have their tweets GPS tagged. It’s an opt-in thing, and if you don’t do it, your tweets won’t be collected in this way,” he says. “Twitter is a public service, and I think people are pretty aware of that.”…