I have been looking at a binary classification problem (basically a problem where you are supposed to predict “yes” or “no” based on a set of features) where the cost of misclassifying a “yes” as a “no” is much more expensive than misclassifying a “no” as a “yes”.

Searching the web for hints about how to approach this kind of scenario, I discovered that there are some methods explicitly designed for this, such as MetaCost [pdf link] by Domingos and cost-sensitive decision trees, but I also learned that there are a couple of very general relationships that apply for a scenario like mine.

In this paper (pdf link) by Ling and Sheng, it is shown that if your (binary) classifier can produce a posterior probability estimate for predicting (e g) “yes” in a test set, then one can make that classifier cost-sensitive simply by choosing the classification threshold (which is often taken as 0.5 in non-cost-sensitive classifiers) according to **p_threshold = FP / (FP + FN)**, where FP is the false positive rate and FN is the false negative rate. Equivalently, one can “rebalance” the original samples by sampling “yes” and “no” examples proportionally so that **p_threshold** becomes 0.5. That is, the prior probabilities of the “yes” and “no” classes and the costs are interchangeable.

So in principle, one could either manipulate the classification threshold or the training set proportions to get a cost-sensitive classifier. Of course, further adjustment may be needed if the classifier you are using does not produce explicit probability estimates.

The paper is worth reading in full as it shows clearly why these relationships hold and how they are used in various cost-sensitive classification methods.

P.S. This is of course very much related to the imbalanced class problem that I wrote about in an earlier post, but at that time I was not thinking that much about the classification-cost aspect yet.