A quotable Domingos paper

I’ve been (re-)reading Pedro Domingos’ paper, A Few Useful Things to Know About Machine Learning, and wanted to share some quotes that I like.

  • (…) much of the “folk knowledge” that is needed to successfully develop machine learning applications is not readily available in [textbooks].
  • Most textbooks are organized by representation [rather than the type of evaluation or optimization] and it’s easy to overlook the fact that the other components are equally important.
  • (…) if you hire someone to build a classifier, be sure to keep some of the data to yourself and test the classifier they give you on it.
  • Farmers combine seeds with nutrients to grow crops. Learners combine knowledge with data to grow programs.
  • What if the knowledge and data we have are not sufficient to completely determine the correct classifier? Then we run the risk of just hallucinating a classifier (…)
  • (…) strong false assumptions can be better than weak true ones, because a learner with the latter needs more data to avoid overfitting.
  • Even with a moderate dimension of 100 and a huge training set of a trillion examples, the latter cover only a fraction of about 10^-18 of the input space. This is what makes machine learning both necessary and hard.
  • (…) the most useful learners are those that facilitate incorporating knowledge.

Another interesting recent paper by Domingos is What’s Missing in AI: The Interface Layer.

Previously, Domingos has done a lot of interesting work on, for instance, why Naïve Bayes often works well even though its assumptions are not fulfilled, and why bagging works well. Those are just the ones I remember, I’m sure there is a lot more.

One thought on "A quotable Domingos paper

  1. Thanks! Another quote I liked from the paper:

    “Generalization being the goal has another major consequence: data alone is not enough, no matter how much of it you have.”

