Paper by David Pérez Castrillo and David Wettstein: “We study innovation contests with asymmetric information and identical contestants, where contestants’ efforts and innate abilities generate inventions of varying qualities. The designer offers a reward to the contestant achieving the highest quality and receives the revenue generated by the innovation. We characterize the equilibrium behavior, outcomes and payoffs for both nondiscriminatory and discriminatory (where the reward is contestant-dependent) contests. We derive conditions under which the designer obtains a larger payoff when using a discriminatory contest and describe settings where these conditions are satisfied.”
Saving Big Data from Big Mouths
Cesar A. Hidalgo in Scientific American: “It has become fashionable to bad-mouth big data. In recent weeks the New York Times, Financial Times, Wired and other outlets have all run pieces bashing this new technological movement. To be fair, many of the critiques have a point: There has been a lot of hype about big data and it is important not to inflate our expectations about what it can do.
But little of this hype has come from the actual people working with large data sets. Instead, it has come from people who see “big data” as a buzzword and a marketing opportunity—consultants, event organizers and opportunistic academics looking for their 15 minutes of fame.
Most of the recent criticism, however, has been weak and misguided. Naysayers have been attacking straw men, focusing on worst practices, post hoc failures and secondary sources. The common theme has been to a great extent obvious: “Correlation does not imply causation,” and “data has biases.”
Critics of big data have been making three important mistakes:
First, they have misunderstood big data, framing it narrowly as a failed revolution in social science hypothesis testing. In doing so they ignore areas where big data has made substantial progress, such as data-rich Web sites, information visualization and machine learning. If there is one group of big-data practitioners that the critics should worship, they are the big-data engineers building the social media sites where their platitudes spread. Engineering a site rich in data, like Facebook, YouTube, Vimeo or Twitter, is extremely challenging. These sites are possible because of advances made quietly over the past five years, including improvements in database technologies and Web development frameworks.
Big data has also contributed to machine learning and computer vision. Thanks to big data, Facebook algorithms can now match faces almost as accurately as humans do.
And detractors have overlooked big data’s role in the proliferation of computational design, data journalism and new forms of artistic expression. Computational artists, journalists and designers—the kinds of people who congregate at meetings like Eyeo—are using huge sets of data to give us online experiences that are unlike anything we experienced in paper. If we step away from hypothesis testing, we find that big data has made big contributions.
The second mistake critics often make is to confuse the limitations of prototypes with fatal flaws. This is something I have experienced often. For example, in Place Pulse—a project I created with my team the M.I.T. Media Lab—we used Google Street View images and crowdsourced visual surveys to map people’s perception of a city’s safety and wealth. The original method was rife with limitations that we dutifully acknowledged in our paper. Google Street View images are taken at arbitrary times of the day and showed cities from the perspective of a car. City boundaries were also arbitrary. To overcome these limitations, however, we needed a first data set. Producing that first limited version of Place Pulse was a necessary part of the process of making a working prototype.
A year has passed since we published Place Pulse’s first data set. Now, thanks to our focus on “making,” we have computer vision and machine-learning algorithms that we can use to correct for some of these easy-to-spot distortions. Making is allowing us to correct for time of the day and dynamically define urban boundaries. Also, we are collecting new data to extend the method to new geographical boundaries.
Those who fail to understand that the process of making is iterative are in danger of being too quick to condemn promising technologies. In 1920 the New York Times published a prediction that a rocket would never be able to leave atmosphere. Similarly erroneous predictions were made about the car or, more recently, about iPhone’s market share. In 1969 the Times had to publish a retraction of their 1920 claim. What similar retractions will need to be published in the year 2069?
Finally, the doubters have relied too heavily on secondary sources. For instance, they made a piñata out of the 2008 Wired piece by Chris Anderson framing big data as “the end of theory.” Others have criticized projects for claims that their creators never made. A couple of weeks ago, for example, Gary Marcus and Ernest Davis published a piece on big data in the Times. There they wrote about another of one of my group’s projects, Pantheon, which is an effort to collect, visualize and analyze data on historical cultural production. Marcus and Davis wrote that Pantheon “suggests a misleading degree of scientific precision.” As an author of the project, I have been unable to find where I made such a claim. Pantheon’s method section clearly states that: “Pantheon will always be—by construction—an incomplete resource.” That same section contains a long list of limitations and caveats as well as the statement that “we interpret this data set narrowly, as the view of global cultural production that emerges from the multilingual expression of historical figures in Wikipedia as of May 2013.”
Bickering is easy, but it is not of much help. So I invite the critics of big data to lead by example. Stop writing op–eds and start developing tools that improve on the state of the art. They are much appreciated. What we need are projects that are worth imitating and that we can build on, not obvious advice such as “correlation does not imply causation.” After all, true progress is not something that is written, but made.”
Crowdsourcing the future: predictions made with a social network
New Paper by Clifton Forlines et al: “Researchers have long known that aggregate estimations built from the collected opinions of a large group of people often outperform the estimations of individual experts. This phenomenon is generally described as the “Wisdom of Crowds”. This approach has shown promise with respect to the task of accurately forecasting future events. Previous research has demonstrated the value of utilizing meta-forecasts (forecasts about what others in the group will predict) when aggregating group predictions. In this paper, we describe an extension to meta-forecasting and demonstrate the value of modeling the familiarity among a population’s members (its social network) and applying this model to forecast aggregation. A pair of studies demonstrates the value of taking this model into account, and the described technique produces aggregate forecasts for future events that are significantly better than the standard Wisdom of Crowds approach as well as previous meta-forecasting techniques.”
VIDEO:
This is what happens when you give social networking to doctors
CArmel DeAmicis in PandoDaily: “Dr. Gregory Kurio will never forget the time he was called to the ER because a epileptic girl was brought in suffering a cardiac arrest of sorts (HIPAA mandates he doesn’t give out the specific details of the situation). In the briefing, he learned the name of her cardiac physician who he happened to know through the industry. He subsequently called the other doctor and asked him to send over any available information on the patient — latest meds, EKGs, recent checkups, etc.
The scene in the ER was, to be expected, one of chaos, with trainees and respiratory nurses running around grabbing machinery and meds. Crucial seconds were ticking past, and Dr. Kurio quickly realized the fax machine was not the best approach for receiving the records he needed. ER fax machines are often on the opposite of the emergency room, take awhile to print lengthy of records, frequently run out of paper, and aren’t always reliable – not exactly the sort of technology you want when a patient’s life or death hangs in the midst.
Email wasn’t an option either, because HIPAA mandates that sensitive patient files are only sent through secure channels. With precious little time to waste, Dr. Kurio decided to take a chance on a new technology service he had just signed up for — Doximity.
Doximity is a LinkedIn for Doctors of sorts. It has, as one feature, a secure e-fax system that turns faxes into digital messages and sends them to a user’s mobile device. Dr. Kurio gave the other physician his e-fax number, and a little bit of techno-magic happened.
….
With a third of the nation’s doctors on the platform, today Doximity announced a $54 million Series C from DFJ, T. Rowe Price Associates, Morgan Stanley, and existing investors. The funding news isn’t particularly important, in and of itself, aside from the fact that the company is attracting the attention of private market investors very early in its growth trajectory. But it’s a good opportunity to take a look at Doximity’s business model, how it mirrors the upwards growth of other vertical professional social networks (say that five times fast), and the way it’s transforming our healthcare providers’ jobs.
Doximity works, in many ways, just like LinkedIn. Doctors have profiles with pictures and their resume, and recruiters pay the company to message medical professionals. “If you think it’s hard to find a Ruby developer in San Francisco, try to find an emergency room physician in Indiana,” Doximity CEO Jeff Tangney says. One recruiter’s pain is a smart entrepreneur’s pleasure — a simple, straightforward monetization strategy.
But unlike LinkedIn, Doximity can dive much deeper on meeting doctors’ needs through specialized features like the e-fax system. It’s part of the reason Konstantin Guericke, one of LinkedIn’s “forgotten” co-founders, was attracted to the company and decided to join the board as an advisor. “In some ways, it’s a lot like LinkedIn,” Guericke says, when asked why he decided to help out. “But for me it’s the pleasure of focusing on a more narrow audience and making more of an impact on their life.”
In another such high-impact, specialized feature, doctors can access Doximity’s Google Alerts-like system for academic articles. They can sign up to receive notifications when stories are published about their obscure specialties. That means time-strapped physicians gain a more efficient way to stay up to date on all the latest research and information in their field. You can imagine that might impact the quality of the care they provide.
Lastly, Doximity offers a secure messaging system, allowing doctors to email one another regarding a fellow patient. Such communication is a thorny issue for doctors given HIPPA-related privacy requirements. There are limited ways to legally update say, a primary care physician when a specialist learns one of their patients has colon cancer. It turns into a big game of phone tag to relay what should be relatively straightforward information. Furthermore, leaving voicemails and sending faxes can result in details getting lost in what its an searchable system.
The platform is free for doctors, and it has attracted them quickly join in droves. Doximity co-founder and CEO Jeff Tangney estimates that last year the platform had added 15 to 16 percent of US doctors. But this year, the company claims it’s “on track to have half of US physicians as members by this summer.” Fairly impressive growth rate and market penetration.
With great market penetration comes great power. And dollars. Although the company is only monetizing through recruitment at the moment, the real money to be made with this service is through targeted advertising. Think about how much big pharma and medtech companies would be willing to cough up to to communicate at scale with the doctors who make purchasing decisions. Plus, this is an easy way for them to target industry thought leaders or professionals with certain specialties.
Doximity’s founders’ and investors’ eyes might be seeing dollar signs, but they haven’t rolled anything out yet on the advertising front. They’re wary and want to do so in a way that ads value to all parties while avoiding pissing off medical professionals. When they finally pul lthe trigger, however, it’s has the potential to be a Gold Rush.
Doximity isn’t the only company to have discovered there’s big money to be made in vertical professional social networks. As Pando has written, there’s a big trend in this regard. Spiceworks, the social network for IT professionals which claims to have a third of the world’s IT professionals on the site, just raised $57 million in a round led by none other than Goldman Sachs. Why does the firm have such faith in a free social network for IT pros — seemingly the most mundane and unprofitable of endeavors? Well, just like with doctor and pharma corps, IT companies are willing to shell out big to market their wares directly to such IT pros.
Although the monetization strategies differ from business to business, ResearchGate is building a similar community with a social network of scientists around the world, Edmodo is doing it with educators, GitHub with developers, GrabCAD for mechanical engineers. I’ve argued that such vertical professional social networks are a threat to LinkedIn, stealing business out from under it in large industry swaths. LinkedIn cofounder Konstantin Guericke disagrees.
“I don’t think it’s stealing revenue from them. Would it make sense for LinkedIn to add a profile subset about what insurance someone takes? That would just be clutter,” Guericke says. “It’s more going after an opportunity LinkedIn isn’t well positioned to capitalize on. They could do everything Doximity does, but they’d have to give up something else.”
All businesses come with their own challenges, and Doximity will certainly face its share of them as it scales. It has overcome the initial hurdle of achieving the network effects that come with penetrating the a large segment of the market. Next will come monetizing sensitively and continuing to protecting users — and patients’ — privacy.
There are plenty of data minefields to be had in a sector as closely regulated as healthcare, as fellow medical startup Practice Fusion recently found out. Doximity has to make sure its system for onboarding and verifying new doctors is airtight. The company has already encountered some instances of individuals trying to pose as medical professionals to get access to another’s records — specifically a former lover trying to chase down their ex-spouse’s STI tests. One blowup where the company approves someone they shouldn’t or hackers break into the system, and doctors could lose trust in the safety of the technology….”
Twitter Can Now Predict Crime, and This Raises Serious Questions
The system Greber has devised is an amalgam of both old and new techniques. Currently, many police departments target hot spots for criminal activity based on actual occurrences of crime. This approach, called kernel density estimation (KDE), involves pairing a historical crime record with a geographic location and using a probability function to calculate the possibility of future crimes occurring in that area. While KDE is a serviceable approach to anticipating crime, it pales in comparison to the dynamism of Twitter’s real-time data stream, according to Dr. Gerber’s research paper “Predicting Crime Using Twitter and Kernel Density Estimation”.
Dr. Greber’s approach is similar to KDE, but deals in the ethereal realm of data and language, not paperwork. The system involves mapping the Twitter environment; much like how police currently map the physical environment with KDE. The big difference is that Greber is looking at what people are talking about in real time, as well as what they do after the fact, and seeing how well they match up. The algorithms look for certain language that is likely to indicate the imminent occurrence of a crime in the area, Greber says. “We might observe people talking about going out, getting drunk, going to bars, sporting events, and so on—we know that these sort of events correlate with crime, and that’s what the models are picking up on.”
Once this data is collected, the GPS tags in tweets allows Greber and his team to pin them to a virtual map and outline hot spots for potential crime. However, everyone who tweets about hitting the club later isn’t necessarily going to commit a crime. Greber tests the accuracy of his approach by comparing Twitter-based KDE predictions with traditional KDE predictions based on police data alone. The big question is, does it work? For Greber, the answer is a firm “sometimes.” “It helps for some, and it hurts for others,” he says.
According to the study’s results, Twitter-based KDE analysis yielded improvements in predictive accuracy over traditional KDE for stalking, criminal damage, and gambling. Arson, kidnapping, and intimidation, on the other hand, showed a decrease in accuracy from traditional KDE analysis. It’s not clear why these crimes are harder to predict using Twitter, but the study notes that the issue may lie with the kind of language used on Twitter, which is characterized by shorthand and informal language that can be difficult for algorithms to parse.
This kind of approach to high-tech crime prevention brings up the familiar debate over privacy and the use of users’ date for purposes they didn’t explicitly agree to. The case becomes especially sensitive when data will be used by police to track down criminals. On this point, though he acknowledges post-Snowden societal skepticism regarding data harvesting for state purposes, Greber is indifferent. “People sign up to have their tweets GPS tagged. It’s an opt-in thing, and if you don’t do it, your tweets won’t be collected in this way,” he says. “Twitter is a public service, and I think people are pretty aware of that.”…
Testing Theories of American Politics: Elites, Interest Groups, and Average Citizens
Crowdsourcing medical expertise in near real time
Paper by Max H. Sims et al in Journal of Hospital Medicine: “Given the pace of discovery in medicine, accessing the literature to make informed decisions at the point of care has become increasingly difficult. Although the Internet creates unprecedented access to information, gaps in the medical literature and inefficient searches often leave healthcare providers’ questions unanswered. Advances in social computation and human computer interactions offer a potential solution to this problem. We developed and piloted the mobile application DocCHIRP, which uses a system of point-to-multipoint push notifications designed to help providers problem solve by crowdsourcing from their peers. Over the 244-day pilot period, 85 registered users logged 1544 page views and sent 45 consult questions. The median initial first response from the crowd occurred within 19 minutes. Review of the transcripts revealed several dominant themes, including complex medical decision making and inquiries related to prescription medication use. Feedback from the post-trial survey identified potential hurdles related to medical crowdsourcing, including a reluctance to expose personal knowledge gaps and the potential risk for “distracted doctoring.” Users also suggested program modifications that could support future adoption, including changes to the mobile interface and mechanisms that could expand the crowd of participating healthcare providers.”
Good Governance – Performance Values and Procedural Values in Conflict
Paper by Gjalt de Graaf and Hester Paanakker in The American Review of Public Administration: “Good governance codes usually end with a list of public values no one could oppose. A recurrent issue is that not all of these values—however desirable they are—can be achieved at the same time. With its focus on performance and procedural values of governance, this article zooms in on the conflict between two different types of values, signifying and exemplifying how output and outcome on one hand and the process of governance on the other may coincide or collide. The main research question is, “What is the nature of value conflict in public governance and what specific conflicts between performance and procedural values do public actors perceive?” A literature review and two case studies involving aldermen and the most senior public administrators in public governance set out to answer these questions. The most frequently perceived conflict is between lawfulness and transparency in procedure, on one hand, and the attainment of effectiveness and efficiency as performance values on the other.”
Passage Of The DATA Act Is A Major Advance In Government Transparency
OpEd by Hudson Hollister in Forbes: “Even as the debate over official secrecy grows on Capitol Hill, basic information about our government’s spending remains hidden in plain sight.
Information that is technically public — federal finance, awards, and expenditures — is effectively locked within a disconnected disclosure system that relies on outdated paper-based technology. Budgets, grants, contracts, and disbursements are reported manually and separately, using forms and spreadsheets. Researchers seeking insights into federal spending must invest time and resources crafting data sets out of these documents. Without common data standards across all government spending, analyses of cross-agency spending trends require endless conversions of apples to oranges.
For a nation whose tech industry leads the world, there is no reason to allow this antiquated system to persist.
That’s why we’re excited to welcome Thursday’s unanimous Senate approval of the Digital Accountability and Transparency Act — known as the DATA Act.
The DATA Act will mandate government-wide standards for federal spending data. It will also require agencies to publish this information online, fully searchable and open to everyone.
Watchdogs and transparency advocates from across the political spectrum have endorsed the DATA Act because all Americans will benefit from clear, accessible information about how their tax dollars are being spent.
It is darkly appropriate that the only organized opposition to this bill took place behind closed doors. In January, Senate sponsors Mark Warner (D-VA) and Rob Portman (R-OH) rejected amendments offered privately by the White House Office of Management and Budget. These nonpublic proposals would have gutted the DATA Act’s key data standards requirement. But Warner and Portman went public with their opposition, and Republicans and Democrats agreed to keep a strong standards mandate.
We now await swift action by the House of Representatives to pass this bill and put it on the President’s desk.
The tech industry is already delivering the technology and expertise that will use federal spending data, once it is open and standardized, to solve problems.
If the DATA Act is fully enforced, citizens will be able to track government spending on a particular contractor or from a particular program, payment by payment. Agencies will be able to deploy sophisticated Big Data analytics to illuminate, and eliminate, waste and fraud. And states and universities will be able to automate their complex federal grant reporting tasks, freeing up more tax dollars for their intended use. Our industry can perform these tasks — as soon as we get the data.
Chairman Earl Devaney’s Recovery Accountability and Transparency Board proved this is possible. Starting in 2009, the Recovery Board applied data standards to track stimulus spending. Our members’ software used that data to help inspectors general prevent and recover over $100 million in spending on suspicious grantees and contractors. The DATA Act applies that approach across the whole of government spending.
Congress is now poised to pass this landmark legislative mandate to transform spending from disconnected documents into open data. Next , the executive branch must implement that mandate.
So our Coalition’s work continues. We will press the Treasury Department and the White House to adopt robust, durable, and nonproprietary data standards for federal spending.
And we won’t stop with spending transparency. The American people deserve access to open data across all areas of government activity — financial regulatory reporting, legislative actions, judicial filings, and much more….”
In defense of “slacktivism”: The Human Rights Campaign Facebook logo as digital activism
Stephanie Vie in First Monday: “This paper examines the Human Rights Campaign (HRC) Marriage Equality logo as an example of a meme to further understandings of memetic transmission in social media technologies. The HRC meme is an important example of how even seemingly insignificant moves such as adopting a logo and displaying it online can serve to combat microaggressions, or the damaging results of everyday bias and discrimination against marginalized groups. This article suggests that even small moves of support, such as changing one’s Facebook status to a memetic image, assist by demonstrating a supportive environment for those who identify with marginalized groups and by drawing awareness to important causes. Often dismissed as “slacktivism,” I argue instead that the digital activism made possible through social media memes can build awareness of crucial issues, which can then lead to action.”