Data Mining Reveals the Secret to Getting Good Answers


arXiv: “If you want a good answer, ask a decent question. That’s the startling conclusion to a study of online Q&As.

If you spend any time programming, you’ll probably have come across the question and answer site Stack Overflow. The site allows anybody to post a question related to programing and receive answers from the community. And it has been hugely successful. According to Alexa, the site is the 3rd most popular Q&A site in the world and 79th most popular website overall.
But this success has naturally led to a problem–the sheer number of questions and answers the site has to deal with. To help filter this information, users can rank both the questions and the answers, gaining a reputation for themselves as they contribute.
Nevertheless, Stack Overflow still struggles to weed out off topic and irrelevant questions and answers. This requires considerable input from experienced moderators. So an interesting question is whether it is possible to automate the process of weeding out the less useful question and answers as they are posted.
Today we get an answer of sorts thanks to the work of Yuan Yao at the State Key Laboratory for Novel Software Technology in China and a team of buddies who say they’ve developed an algorithm that does the job.
And they say their work reveals an interesting insight: if you want good answers, ask a decent question. That may sound like a truism, but these guys point out that there has been no evidence to support this insight, until now.
…But Yuan and co digged a little deeper. They looked at the correlation between well received questions and answers. And they discovered that these are strongly correlated.
A number of factors turn out to be important. These include the reputation of the person asking the question or answering it, the number of previous questions or answers they have posted, the popularity of their input in the recent past along with measurements like the length of the question and its title.
Put all this into a number cruncher and the system is able to predict the quality score of the question and its expected answers. That allows it to find the best questions and answers and indirectly the worst ones.
There are limitations to this approach, of course…In the meantime, users of Q&A sites can learn a significant lesson from this work. If you want good answers, first formulate a good question. That’s something that can take time and experience.
Perhaps the most interesting immediate application of this new work might be as a teaching tool to help with this learning process and to boost the quality of questions and answers in general.
See also: http://arxiv.org/abs/1311.6876: Want a Good Answer? Ask a Good Question First!”