Addendum on search and support vector machines

In the last post I described how to use support vector machines (SVMs) to combine multiple notions of search relevance. After posting I realized I could greatly simplify my discussion of an important subject in the final section of that post. What I can simplify is this: once you’ve found the SVM parameters, how should you use them to rank a long list of documents for a particular query, in practice?

Here’s the setup. We suppose we’ve already taken a bunch of training data, as in the last post, and used that training data to find a SVM with soft margin. That SVM can be used to rank any query and document. In particular, the SVM is specified by a vector $\vec w$ , and we rank a document $d$ above a document $d'$ for query $q$ if:

$[*] \,\,\,\, \vec w \cdot \vec \phi(q,d,d') \geq 0,$

where $\vec \phi(q,d,d')$ is the feature difference vector. (In the last post a parameter $b$ also appeared in this condition, but as I noted in one of the exercises in that post, for the particular problem of search $b=0$ , which is why it doesn’t appear here.)

In the final section of the last post I did lots of perambulations with the condition [*] to figure out how to construct a linear order ranking documents, given a particular query $q$ . We can simplify life quite a bit by recalling that $\vec \phi(q,d,d') = \vec \psi(q,d)-\vec \psi(q,d'),$ where $\vec \psi(q,d)$ is the feature vector for query $q$ and document $d$ . The condition above can then be rewritten as:

$[**] \,\,\,\, \vec w \cdot \vec \psi(q,d) \geq \vec w \cdot \vec \psi(q,d')$

This condition suggests a much easier way of ranking documents: simply treat $\vec w \cdot \vec \psi(q,d)$ as a score of how good document $d$ is when the query is $q$ , so that we rank $d$ above $d'$ whenever the score for $d$ is better than $d'$ . We then just construct a linear ordering of the documents, based on the score. Or, if we prefer, we can find just the top 10 (or 20, or whatever) documents by iterating linearly over the documents, and keeping a running tally of the best 10 (or 20) found to date.

This is quite a bit simpler than the discussion at the end of my last post. It only works, though, because our classifer (the SVM) is a linear function of the feature difference vectors. It’s not difficult to construct other types of classifer for which it really wouldn’t be obvious how to construct this kind of score. For those classifiers you could still fall back on the analysis I did in the last post, so it wasn’t all wasted effort.

Incidentally, one thing I like about the equation [**] is that it makes it a lot clearer why $\vec w$ is called the weight vector. It means that the score for page $d$ on query $q$ is just a linear combination of the different feature values, with weights given by the weight vector. That makes the notation $\vec w$ and nomenclature “weight vector” much clearer, as well as making it a lot clearer what the vector $\vec w$ means!

2 comments