Microsoft Unveils Machine Learning for the Masses


The service, called Microsoft Azure Machine Learning, was announced Monday but won’t be available until July. It combines Microsoft’s own software with publicly available open source software, packaged in a way that is easier to use than most of the arcane strategies currently in use.
“This is drag-and-drop software,” said Joseph Sirosh, vice president for machine learning at Microsoft. “My high schooler is using this.”
That would be a big step forward in popularizing what is currently a difficult process in increasingly high demand. It would also further the ambitions of Satya Nadella, Microsoft’s chief executive, of making Azure the center of Microsoft’s future.
Users of Azure Machine Learning will have to keep their data in Azure, and Microsoft will provide ways to move data from competing services, like Amazon Web Services. Pricing has not yet been finalized, Mr. Sirosh said, but will be based on a premium to Azure’s standard computing and transmission charges.
Machine learning computers examine historical data through different algorithms and programming languages to make predictions. The process is commonly used in Internet search, fraud detection, product recommendations and digital personal assistants, among other things.
As more data is automatically stored online, there are opportunities to use machine learning for performing maintenance, scheduling hospital services, and anticipating disease outbreaks and crime, among other things. The methods have to become easier and cheaper to be popular, however.
That is the goal of Azure Machine Learning. “This is, as far as I know, the first comprehensive machine learning service in the cloud,” Mr. Sirosh said. “I’m leveraging every asset in Microsoft for this.” He is also using ways of accessing an open source version of R, a standard statistical language, while in Azure.
Microsoft is likely to face competition from rival cloud companies, including Google and Amazon. Both Google and Amazon have things like data frameworks used in building machine learning algorithms, as well as their own analysis services. IBM is eager to make use of its predictive software in its cloud business. Visualization companies like Tableau specialize in presenting the results so they can be acted on easily…”

Crowdsourcing moving beyond the fringe


Bob Brown in Networked World: ” Depending up on how you look at it, crowdsourcing is all the rage these days — think Wikipedia, X Prize and Kickstarter — or at the other extreme, greatly underused.
To the team behind the new “insight network” Yegii, crowdsourcing has not nearly reached its potential despite having its roots as far back as the early 1700s and a famous case of the British Government seeking a solution to “The Longitude Problem” in order to make sailing less life threatening. (I get the impression that mention of this example is obligatory at any crowdsourcing event.)
This angel-funded startup, headed by an MIT Sloan School of Management senior lecturer and operating from a Boston suburb, is looking to exploit crowdsourcing’s potential through a service that connects financial, healthcare, technology and other organizations seeking knowledge with experts who can provide it – and fairly fast. To CEO Trond Undheim, crowdsourcing is “no longer for fringe freelance work,” and the goal is to get more organizations and smart individuals involved.
“Yegii is essentially a network of networks, connecting people, organizations, and knowledge in new ways,” says Undheim, who explains that the name Yegii is Korean for “talk” or “discussion”. “Our focus is laser sharp: we only rank and rate knowledge that says something essential about what I see as the four forces of industry disruption: technology, policy, user dynamics and business models.  We tackle challenging business issues across domains, from life sciences to energy to finance.  The point is that today’s industry classification is falling apart. We need more specific insight than in-house strategizing or generalist consulting advice.”
Undheim attempted to drum up interest in the new business last week at an event at Babson College during which a handful of crowdsourcing experts spoke. Harvard Business School adjunct professor Alan MacCormack discussed the X Prize, Netflix Prize and other examples of spurring competition through crowdsourcing. MIT’s Peter Gloor extolled the virtue of collaborative and smart swarms of people vs. stupid crowds (such as football hooligans). A couple of advertising/marketing execs shared stories of how clients and other brands are increasingly tapping into their customer base and the general public for new ideas from slogans to products, figuring that potential new customers are more likely to trust their peers than corporate ads. Another speaker dove into more details about how to run a crowdsourcing challenge, which includes identifying motivation that goes beyond money.
All of this was to frame Yegii’s crowdsourcing plan, which is at the beta stage with about a dozen clients (including Akamai and Santander bank) and is slated for mass production later this year. Yegii’s team consists of five part-timers, plus a few interns, who are building a web-based platform that consists of “knowledge assets,” that is market research, news reports and datasets from free and paid sources. That content – on topics that range from Bitcoin’s impact on banks to telecom bandwidth costs — is reviewed and ranked through a combination of machine learning and human peers. Information seekers would pay Yegii up to hundreds of dollars per month or up to tens of thousands of dollars per project, and then multidisciplinary teams would accept the challenge of answering their questions via customized reports within staged deadlines.
“We are focused on building partnerships with other expert networks and associations that have access to smart people with spare capacity, wherever they are,” Undheim says.
One reason organizations can benefit from crowdsourcing, Undheim says, is because of the “ephemeral nature of expertise in today’s society.” In other words, people within your organization might think of themselves as experts in this or that, but when they really think about it, they might realize their level of expertise has faded. Yegii will strive to narrow down the best sources of information for those looking to come up to speed on a subject over a weekend, whereas hunting for that information across a vast search engine would not be nearly as efficient….”

A New Way to Look at Law, With Data Viz and Machine Learning


  in Wired:

Ravel displays search results as an interactive visualization. Image: Ravel
“On TV, being a lawyer is all about dazzling jurors with verbal pyrotechnics. But for many lawyers–especially young ones–the job is about research. Long, dry, tedious research.
It’s that less glamorous side of the profession that Daniel Lewis and Nik Reed are trying to upend with Ravel. Using data visualization, language analysis, and machine learning, the Stanford Law grads are aiming to reinvent legal research–and perhaps give young lawyers a deeper understanding of their field in the process.
Lawyers have long relied on subscription services like LexisNexis and WestLaw to do their jobs. These services offer indispensable access to vast databases of case documents. Lewis remembers seeing the software on the computers at his Dad’s law firm when he used to hang out there as a kid. You’d put in a keyword, say, securities fraud, and get back a long, rank-ordered list of results relevant to that topic.
Years later, when Lewis was embarking on his own legal career as a first year at Stanford Law, he was struck by how little had changed. “The tools and technologies were the same,” he says. “It was surprising and disconcerting.” Reed, his classmate there, was also perplexed, especially having spent some time in the finance industry working with its high-powered tools. “There was all this cool stuff that everyone else was using in every other field, and it just wasn’t coming to lawyers,” he says.

Early users have reported that Ravel cut their overall research time by up to two thirds….

Ravel’s most ambitious features, however, are intended to help with the analysis of cases. These tools, saved for premium subscribers, are designed to automatically surface the key passages in whatever case you happen to be looking at, sussing out instances when they’ve been cited or reinterpreted in cases that followed.
To do this, Ravel effectively has to map the law, an undertaking that involves both human insight and technical firepower. The process, roughly: Lewis and Reed will look at a particular case, pinpoint the case it’s referencing, and then figure out what ties them together. It could be a direct reference, or a glancing one. It might show up as three paragraphs in that later ruling, or just a sentence.
Once those connections have been made, they’re handed off to Ravel’s engineers. The engineers, which make up more than half of the company’s ten-person team, are tasked with building models that can identify those same sorts of linkages in other cases, using natural language processing. In effect, Ravel’s trying to uncover the subtle linguistic patterns undergirding decades of legal rulings.
That all goes well beyond visual search, and the idea of future generations of lawyers learning from an algorithmic analysis of the law seems quietly dangerous in its own way (though a sterling conceit for a near-future short story!)
Still, compared to the comparatively primitive tools that still dominate the field today, Lewis and Reed see Ravel as a promising resource for young lawyers and law students. “It’s about helping them research more confidently,” Lewis says. “It’s about making sure they understand the story in the right way.” And, of course, about making all that research a little less tedious, too.”

Heteromation and its (dis)contents: The invisible division of labor between humans and machines


Paper by Hamid Ekbia and Bonnie Nardi in First Monday: “The division of labor between humans and computer systems has changed along both technical and human dimensions. Technically, there has been a shift from technologies of automation, the aim of which was to disallow human intervention at nearly all points in the system, to technologies of “heteromation” that push critical tasks to end users as indispensable mediators. As this has happened, the large population of human beings who have been driven out by the first type of technology are drawn back into the computational fold by the second type. Turning artificial intelligence on its head, one technology fills the gap created by the other, but with a vengeance that unsettles established mechanisms of reward, fulfillment, and compensation. In this fashion, replacement of human beings and their irrelevance to technological systems has given way to new “modes of engagement” with remarkable social, economic, and ethical implications. In this paper we provide a historical backdrop for heteromation and explore and explicate some of these displacements through analysis of a number of cases, including Mechanical Turk, the video games FoldIt and League of Legends, and social media.

Full Text: HTML

How Long Is Too Long? The 4th Amendment and the Mosaic Theory


Law and Liberty Blog: “Volume 8.2 of the NYU Journal of Law and Liberty has been sent to the printer and physical copies will be available soon, but the articles in the issue are already available online here. One article that has gotten a lot of attention so far is by Steven Bellovin, Renee Hutchins, Tony Jebara, and Sebastian Zimmeck titled “When Enough is Enough: Location Tracking, Mosaic Theory, and Machine Learning.” A direct link to the article is here.
The mosaic theory is a modern corollary accepted by some academics – and the D.C. Circuit Court of Appeals in Maynard v. U.S. – as a twenty-first century extension of the Fourth Amendment’s prohibition on unreasonable searches of seizures. Proponents of the mosaic theory argue that at some point enough individual data collections, compiled and analyzed together, become a Fourth Amendment search. Thirty years ago the Supreme Court upheld the use of a tracking device for three days without a warrant, however the proliferation of GPS tracking in cars and smartphones has made it significantly easier for the police to access a treasure trove of information about our location at any given time.
It is easy to see why this theory has attracted some support. Humans are creatures of habit – if our public locations are tracked for a few days, weeks, or a month, it is pretty easy for machines to learn our ways and assemble a fairly detailed report for the government about our lives. Machines could basically predict when you will leave your house for work, what route you will take, when and where you go grocery shopping, all before you even do it, once it knows your habits. A policeman could observe you moving about in public without a warrant of course, but limited manpower will always reduce the probability of continuous mass surveillance. With current technology, a handful of trained experts could easily monitor hundreds of people at a time from behind a computer screen, and gather even more information than most searches requiring a warrant. The Supreme Court indicated a willingness to consider the mosaic theory in U.S. v. Jones, but has yet to embrace it…”

The article in Law & Liberty details the need to determine at which point machine learning creates an intrusion into our reasonable expectations of privacy, and even discusses an experiment that could be run to determine how long data collection can proceed before it is an intrusion. If there is a line at which individual data collection becomes a search, we need to discover where that line is. One of the articles’ authors, Steven Bollovin, has argued that the line is probably at one week – at that point your weekday and weekend habits would be known. The nation’s leading legal expert on criminal law, Professor Orin Kerr, fired back on the Volokh Conspiracy that Bollovin’s one week argument is not in line with previous iterations of the mosaic theory.

Data Mining Reddit Posts Reveals How to Ask For a Favor–And Get it


Emerging Technology From the arXiv: “There’s a secret to asking strangers for something and getting it. Now data scientists say they’ve discovered it by studying successful requests on the web

One of the more extraordinary phenomena on the internet is the rise of altruism and of websites designed to enable it. The Random Acts of Pizza section of the Reddit website is a good example.

People leave messages asking for pizza which others fulfil if they find the story compelling. As the site says: “because… who doesn’t like helping out a stranger? The purpose is to have fun, eat pizza and help each other out. Together, we aim to restore faith in humanity, one slice at a time.”

A request might go something like this: “It’s been a long time since my mother and I have had proper food. I’ve been struggling to find any kind of work so I can supplement my mom’s social security… A real pizza would certainly lift our spirits”. Anybody can then fulfil the order which is then marked on the site with a badge saying “got pizza’d”, often with notes of thanks.

That raises an interesting question. What kinds of requests are most successful in getting a response? Today, we get an answer thanks to the work of Tim Althoff at Stanford University and a couple of pals who lift the veil on the previously murky question of how to ask for a favour—and receive it.

They analysed how various features might be responsible for the success of a post, such as the politeness of the post; its sentiment, whether positive or negative for example; its length. The team also looked at the similarity of the requester to the benefactor; and also the status of the requester.

Finally, they examined whether the post contained evidence of need in the form of a narrative that described why the requester needed free pizza.

Althoff and co used a standard machine learning algorithm to comb through all the possible correlations in 70 per cent of the data, which they used for training. Having found various correlations, they tested to see whether this had predictive power in the remaining 30 per cent of the data. In other words, can their algorithm predict whether a previously unseen request will be successful or not?

It turns out that their algorithm makes a successful prediction about 70 per cent of the time. That’s far from perfect but much better than random guessing which is right only half the time.

So what kinds of factors are important? Narrative is a key part of many of the posts, so Althoff and co spent some time categorising the types of stories people use.

They divided the narratives into five types, those that mention: money; a job; being a student; family; and a final group that includes mentions of friends, being drunk, celebrating and so on, which Althoff and co call ‘craving’.

Of these, narratives about jobs, family and money increase the probability of success. Student narratives have no effect while craving narratives significantly reduce the chances of success. In other words, narratives that communicate a need are more successful than those that do not.

 “We find that clearly communicating need through the narrative is essential,” say Althoff and co. And evidence of reciprocation helps too.

(Given these narrative requirements, it is not surprising that longer requests tend to be more successful than short ones.)

So for example, the following request was successful because it clearly demonstrates both need and evidence of reciprocation.

“My gf and I have hit some hard times with her losing her job and then unemployment as well for being physically unable to perform her job due to various hand injuries as a server in a restaurant. She is currently petitioning to have unemployment reinstated due to medical reasons for being unable to perform her job, but until then things are really tight and ANYTHING would help us out right now.

I’ve been both a giver and receiver in RAOP before and would certainly return the favor again when I am able to reciprocate. It took everything we have to pay rent today and some food would go a long ways towards making our next couple of days go by much better with some food.”

By contrast, the ‘craving’ narrative below demonstrates neither and was not successful.

“My friend is coming in town for the weekend and my friends and i are so excited because we haven’t seen him since junior high. we are going to a high school football game then to the dollar theater after and it would be so nice if someone fed us before we embarked :)”

Althoff and co also say that the status of the requester is an important factor too. “We find that Reddit users with higher status overall (higher karma) or higher status within the subcommunity (previous posts) are significantly more likely to receive help,” they say.

But surprisingly, being polite does not help (except by offering thanks).

That’s interesting work. Until now, psychologists have never understood the factors that make requests successful, largely because it has always been difficult to separate the influence of the request from what is being requested.

The key here is that everybody making requests in this study wants the same thing—pizza. In one swoop, this makes the data significantly easier to tease apart.

An important line of future work will be in using his work to understand altruistic behaviour in other communities too…

Ref:  http://arxiv.org/abs/1405.3282 : How to Ask for a Favor: A Case Study on the Success of Altruistic Requests”

The Secret Science of Retweets


Emerging Technology From the arXiv: “If you send a tweet to a stranger asking them to retweet it, you probably wouldn’t be surprised if they ignored you entirely. But if you sent out lots of tweets like this, perhaps a few might end up being passed on.

How come? What makes somebody retweet information from a stranger? That’s the question addressed today by Kyumin Lee from Utah State University in Logan and a few pals from IBM’s Almaden research center in San Jose….by studying the characteristics of Twitter users, it is possible to identify strangers who are more likely to pass on your message than others. And in doing this, the researchers say they’ve been able to improve the retweet rate of messages sent strangers by up to 680 percent.
So how did they do it? The new technique is based on the idea that some people are more likely to tweet than others, particularly on certain topics and at certain times of the day. So the trick is to find these individuals and target them when they are likely to be most effective.
So the approach was straightforward. The idea is to study the individuals on Twitter, looking at their profiles and their past tweeting behavior, looking for clues that they might be more likely to retweet certain types of information. Having found these individuals, send your tweets to them.
That’s the theory. In practice, it’s a little more involved. Lee and co wanted to test people’s response to two types of information: local news (in San Francisco) and tweets about bird flu, a significant issue at the time of their research. They then created several Twitter accounts with a few followers, specifically to broadcast information of this kind.
Next, they selected people to receive their tweets. For the local news broadcasts, they searched for Twitter users geolocated in the Bay area, finding over 34,000 of them and choosing 1,900 at random.
They then a sent a single message to each user of the format:
“@ SFtargetuser “A man was killed and three others were wounded in a shooting … http://bit.ly/KOl2sC” Plz RT this safety news”
So the tweet included the user’s name, a short headline, a link to the story and a request to retweet.
Of these 1,900 people, 52 retweeted the message they received. That’s 2.8 percent.
For the bird flu information, Lee and co hunted for people who had already tweeted about bird flu, finding 13,000 of them and choosing 1,900 at random. Of these, 155 retweeted the message they received, a retweet rate of 8.4 percent.
But Lee and co found a way to significantly improve these retweet rates. They went back to the original lists of Twitter users and collected publicly available information about each of them, such as their personal profile, the number of followers, the people they followed, their 200 most recent tweets and whether they retweeted the message they had received
Next, the team used a machine learning algorithm to search for correlations in this data that might predict whether somebody was more likely to retweet. For example, they looked at whether people with older accounts were more likely to retweet or how the ratio of friends to followers influenced the retweet likelihood, or even how the types of negative or positive words they used in previous tweets showed any link. They also looked at the time of day that people were most active in tweeting.
The result was a machine learning algorithm capable of picking users who were most likely to retweet on a particular topic.
And the results show that it is surprisingly effective. When the team sent local information tweets to individuals identified by the algorithm, 13.3 percent retweeted it, compared to just 2.6 percent of people chosen at random.
And they got even better results when they timed the request to match the periods when people had been most active in the past. In that case, the retweet rate rose to 19.3 percent. That’s an improvement of over 600 percent.
Similarly, the rate for bird flu information rose from 8.3 percent for users chosen at random to 19.7 percent for users chosen by the algorithm.
That’s a significant result that marketers, politicians, news organizations will be eyeing with envy.
An interesting question is how they can make this technique more generally applicable. It raises the prospect of an app that allows anybody to enter a topic of interest and which then creates a list of people most likely to retweet on that topic in the next few hours.
Lee and co do not mention any plans of this kind. But if they don’t exploit it, then there will surely be others who will.
Ref: arxiv.org/abs/1405.3750 : Who Will Retweet This? Automatically Identifying and Engaging Strangers on Twitter to Spread Information”

The Collective Intelligence Handbook: an open experiment


Michael Bernstein: “Is there really a wisdom of the crowd? How do we get at it and understand it, utilize it, empower it?
You probably have some ideas about this. I certainly do. But I represent just one perspective. What would an economist say? A biologist? A cognitive or social psychologist? An artificial intelligence or human-computer interaction researcher? A communications scholar?
For the last two years, Tom Malone (MIT Sloan) and I (Stanford CS) have worked to bring together all these perspectives into one book. We are nearing completion, and the Collective Intelligence Handbook will be published by the MIT Press later this year. I’m still relatively dumbfounded by the rockstar lineup we have managed to convince to join up.

It’s live.

Today we went live with the authors’ current drafts of the chapters. All the current preprints are here: http://cci.mit.edu/CIchapterlinks.html

And now is when you come in.

But we’re not done. We’d love for you — the crowd — to help us make this book better. We envisioned this as an open process, and we’re excited that all the chapters are now at a point where we’re ready for critique, feedback, and your contributions.
There are two ways you can help:

  • Read the current drafts and leave comments inline in the Google Docs to help us make them better.
  • Drop suggestions in the separate recommended reading list for each chapter. We (the editors) will be using that material to help us write an introduction to each chapter.

We have one month. The authors’ final chapters are due to us in mid-June. So off we go!”

Here’s what’s in the book:

Chapter 1. Introduction
Thomas W. Malone (MIT) and Michael S. Bernstein (Stanford University)
What is collective intelligence, anyway?
Chapter 2. Human-Computer Interaction and Collective Intelligence
Jeffrey P. Bigham (Carnegie Mellon University), Michael S. Bernstein (Stanford University), and Eytan Adar (University of Michigan)
How computation can help gather groups of people to tackle tough problems together.
Chapter 3. Artificial Intelligence and Collective Intelligence
Daniel S. Weld (University of Washington), Mausam (IIT Delhi), Christopher H. Lin (University of Washington), and Jonathan Bragg (University of Washington)
Mixing machine intelligence with human intelligence could enable a synthesized intelligent actor that brings together the best of both worlds.
Chapter 4. Collective Behavior in Animals: An Ecological Perspective
Deborah M. Gordon (Stanford University)
How do groups of animals work together in distributed ways to solve difficult problems?
Chapter 5. The Wisdom of Crowds vs. the Madness of Mobs
Andrew W. Lo (MIT)
Economics has studied a collectively intelligent forum — the market — for a long time. But are we as smart as we think we are?
Chapter 6. Collective Intelligence in Teams and Organizations
Anita Williams Woolley (Carnegie Mellon University), Ishani Aggarwal (Georgia Tech), Thomas W. Malone (MIT)
How do the interactions between groups of people impact how intelligently that group acts?
Chapter 7. Cognition and Collective Intelligence
Mark Steyvers (University of California, Irvine), Brent Miller (University of California, Irvine)
Understanding the conditions under which people are smart individually can help us predict when they might be smart collectively.

Chapter 8. Peer Production: A Modality of Collective Intelligence
Yochai Benkler (Harvard University), Aaron Shaw (Northwestern University), Benjamin Mako Hill (University of Washington)
What have collective efforts such as Wikipedia taught us about how large groups come together to create knowledge and creative artifacts?

Saving Big Data from Big Mouths


Cesar A. Hidalgo in Scientific American: “It has become fashionable to bad-mouth big data. In recent weeks the New York Times, Financial Times, Wired and other outlets have all run pieces bashing this new technological movement. To be fair, many of the critiques have a point: There has been a lot of hype about big data and it is important not to inflate our expectations about what it can do.
But little of this hype has come from the actual people working with large data sets. Instead, it has come from people who see “big data” as a buzzword and a marketing opportunity—consultants, event organizers and opportunistic academics looking for their 15 minutes of fame.
Most of the recent criticism, however, has been weak and misguided. Naysayers have been attacking straw men, focusing on worst practices, post hoc failures and secondary sources. The common theme has been to a great extent obvious: “Correlation does not imply causation,” and “data has biases.”
Critics of big data have been making three important mistakes:
First, they have misunderstood big data, framing it narrowly as a failed revolution in social science hypothesis testing. In doing so they ignore areas where big data has made substantial progress, such as data-rich Web sites, information visualization and machine learning. If there is one group of big-data practitioners that the critics should worship, they are the big-data engineers building the social media sites where their platitudes spread. Engineering a site rich in data, like Facebook, YouTube, Vimeo or Twitter, is extremely challenging. These sites are possible because of advances made quietly over the past five years, including improvements in database technologies and Web development frameworks.
Big data has also contributed to machine learning and computer vision. Thanks to big data, Facebook algorithms can now match faces almost as accurately as humans do.
And detractors have overlooked big data’s role in the proliferation of computational design, data journalism and new forms of artistic expression. Computational artists, journalists and designers—the kinds of people who congregate at meetings like Eyeo—are using huge sets of data to give us online experiences that are unlike anything we experienced in paper. If we step away from hypothesis testing, we find that big data has made big contributions.
The second mistake critics often make is to confuse the limitations of prototypes with fatal flaws. This is something I have experienced often. For example, in Place Pulse—a project I created with my team the M.I.T. Media Lab—we used Google Street View images and crowdsourced visual surveys to map people’s perception of a city’s safety and wealth. The original method was rife with limitations that we dutifully acknowledged in our paper. Google Street View images are taken at arbitrary times of the day and showed cities from the perspective of a car. City boundaries were also arbitrary. To overcome these limitations, however, we needed a first data set. Producing that first limited version of Place Pulse was a necessary part of the process of making a working prototype.
A year has passed since we published Place Pulse’s first data set. Now, thanks to our focus on “making,” we have computer vision and machine-learning algorithms that we can use to correct for some of these easy-to-spot distortions. Making is allowing us to correct for time of the day and dynamically define urban boundaries. Also, we are collecting new data to extend the method to new geographical boundaries.
Those who fail to understand that the process of making is iterative are in danger of  being too quick to condemn promising technologies.  In 1920 the New York Times published a prediction that a rocket would never be able to leave  atmosphere. Similarly erroneous predictions were made about the car or, more recently, about iPhone’s market share. In 1969 the Times had to publish a retraction of their 1920 claim. What similar retractions will need to be published in the year 2069?
Finally, the doubters have relied too heavily on secondary sources. For instance, they made a piñata out of the 2008 Wired piece by Chris Anderson framing big data as “the end of theory.” Others have criticized projects for claims that their creators never made. A couple of weeks ago, for example, Gary Marcus and Ernest Davis published a piece on big data in the Times. There they wrote about another of one of my group’s projects, Pantheon, which is an effort to collect, visualize and analyze data on historical cultural production. Marcus and Davis wrote that Pantheon “suggests a misleading degree of scientific precision.” As an author of the project, I have been unable to find where I made such a claim. Pantheon’s method section clearly states that: “Pantheon will always be—by construction—an incomplete resource.” That same section contains a long list of limitations and caveats as well as the statement that “we interpret this data set narrowly, as the view of global cultural production that emerges from the multilingual expression of historical figures in Wikipedia as of May 2013.”
Bickering is easy, but it is not of much help. So I invite the critics of big data to lead by example. Stop writing op–eds and start developing tools that improve on the state of the art. They are much appreciated. What we need are projects that are worth imitating and that we can build on, not obvious advice such as “correlation does not imply causation.” After all, true progress is not something that is written, but made.”

Digital Humanitarians


New book by Patrick Meier on how big data is changing humanitarian response: “The overflow of information generated during disasters can be as paralyzing to humanitarian response as the lack of information. This flash flood of information when amplified by social media and satellite imagery is increasingly referred to as Big Data—or Big Crisis Data. Making sense of Big Crisis Data during disasters is proving an impossible challenge for traditional humanitarian organizations, which explains why they’re increasingly turning to Digital Humanitarians.
Who exactly are these Digital Humanitarians? They’re you, me, all of us. Digital Humanitarians are volunteers and professionals from the world over and from all walks of life. What do they share in common? The desire to make a difference, and they do that by rapidly mobilizing online in collaboration with international humanitarian organizations. They make sense of vast volumes of social media and satellite imagery in virtually real-time to support relief efforts worldwide. How? They craft and leverage ingenious crowdsourcing solutions with trail-blazing insights from artificial intelligence.
In sum, this book charts the sudden and spectacular rise of Digital Humanitarians by sharing their remarkable, real-life stories, highlighting how their humanity coupled with innovative solutions to Big Data is changing humanitarian response forever. Digital Humanitarians will make you think differently about what it means to be humanitarian and will invite you to join the journey online.
Clicker here to be notified when the book becomes available. For speaking requests, please email Speaking@iRevolution.net.”