LDA Explained with a Dog Pedigree Model

Machine learning algorithms are now so accessible that even my non-technical wife constantly asks: “Isn’t that what ChatGPT is capable of?”

The time has come for data scientists to remain vigilant on the whys and hows behind machine learning algorithms.

This 2-part blog post is an actual journey where I have attempted to explain to my wife how Latent Dirichlet Allocation (LDA, a staple in all data scientists’ arsenal for topic modelling, recommendation and more) works with the help of a dog pedigree model. By the end of the series, you should be able to answer the following:

Part 1:

  • How does LDA work?
  • How to explain LDA to a non-technical person?

Part 2:

  • How does LDA converge?
  • When to use LDA & when not to?
  • What are the alternatives & variants to LDAs (excluding LLMs)?

Let’s get started.

Imagine you have the best job in the world:

Estimate the mix of pedigree of a bunch of adorable dog photos

Easy enough!

Short legs = Corgi or Dachshund;

Long body = Dachshund;

Chocolate chip muffin face = Chihuahua.

Source: Wikipedia

But each dog has a unique blend of traits. A dog might have a Corgi’s short legs but the face of a Chihuahua. We are not just identifying breeds but modelling a mosaic of traits into groups of breeds.

Number of Topics & Corpus

Even though we are not classifying dog photos for their breed, it is helpful to consider the physical traits we can observe from all images and roughly how…