Machine learning algorithms are now so accessible that even my non-technical wife constantly asks: “Isn’t that what ChatGPT is capable of?”
The time has come for data scientists to remain vigilant on the whys and hows behind machine learning algorithms.
This 2-part blog post is an actual journey where I have attempted to explain to my wife how Latent Dirichlet Allocation (LDA, a staple in all data scientists’ arsenal for topic modelling, recommendation and more) works with the help of a dog pedigree model. By the end of the series, you should be able to answer the following:
Part 1:
- How does LDA work?
- How to explain LDA to a non-technical person?
Part 2:
- How does LDA converge?
- When to use LDA & when not to?
- What are the alternatives & variants to LDAs (excluding LLMs)?
Let’s get started.
Imagine you have the best job in the world:
Estimate the mix of pedigree of a bunch of adorable dog photos
Easy enough!
Short legs = Corgi or Dachshund;
Long body = Dachshund;
Chocolate chip muffin face = Chihuahua.
But each dog has a unique blend of traits. A dog might have a Corgi’s short legs but the face of a Chihuahua. We are not just identifying breeds but modelling a mosaic of traits into groups of breeds.
Number of Topics & Corpus
Even though we are not classifying dog photos for their breed, it is helpful to consider the physical traits we can observe from all images and roughly how…