Understanding Latent Dirichlet Allocation (LDA) — A Data Scientist’s Guide (Part 1) | by Louis Chan

LDA Explained with a Dog Pedigree Model

Machine learning algorithms are now so accessible that even my non-technical wife constantly asks: “Isn’t that what ChatGPT is capable of?”

The time has come for data scientists to remain vigilant on the whys and hows behind machine learning algorithms.

This 2-part blog post is an actual journey where I have attempted to explain to my wife how Latent Dirichlet Allocation (LDA, a staple in all data scientists’ arsenal for topic modelling, recommendation and more) works with the help of a dog pedigree model. By the end of the series, you should be able to answer the following:

Part 1:

How does LDA work?
How to explain LDA to a non-technical person?

Part 2:

How does LDA converge?
When to use LDA & when not to?
What are the alternatives & variants to LDAs (excluding LLMs)?

Let’s get started.

Imagine you have the best job in the world:

Estimate the mix of pedigree of a bunch of adorable dog photos

Easy enough!

Short legs = Corgi or Dachshund;

Long body = Dachshund;

Chocolate chip muffin face = Chihuahua.

But each dog has a unique blend of traits. A dog might have a Corgi’s short legs but the face of a Chihuahua. We are not just identifying breeds but modelling a mosaic of traits into groups of breeds.

Number of Topics & Corpus

Even though we are not classifying dog photos for their breed, it is helpful to consider the physical traits we can observe from all images and roughly how…

Understanding Latent Dirichlet Allocation (LDA) — A Data Scientist’s Guide (Part 1) | by Louis Chan | Feb, 2024

LDA Explained with a Dog Pedigree Model

Number of Topics & Corpus

Related Post

You Missed

Spotify set to increase prices and add more plans soon

Oura preempts Samsung’s Galaxy Ring with new features for its rings

14 Years of iPad, and “Trucks” Continue To Dominate

Apple Vision Pro Spatial Personas Hands-On: A Step-Change For Telepresence