Do you find it difficult to keep up with the latest ML research? Are you overwhelmed with the massive amount of papers about LLMs, vector databases, or RAGs?

In this post, I will show how to build an AI assistant that mines this large amount of information easily. You’ll ask it your questions in natural language and it’ll answer according to relevant papers it finds on Papers With Code.

On the backend side, this assistant will be powered with a Retrieval Augmented Generation (RAG) framework that relies on a scalable serverless vector database, an embedding model from VertexAI, and an LLM from OpenAI.

On the front-end side, this assistant will be integrated into an interactive and easily deployable web application built with Streamlit.

Every step of this process will be detailed below with an accompanying source code that you can reuse and adapt👇.

Ready? Let’s dive in 🔍.

If you’re interested in ML content, detailed tutorials, and practical tips from the industry, follow my newsletter. It’s called The Tech Buffet.

Papers With Code (a.k.a PWC) is a free website for researchers and practitioners to find and follow the latest state-of-the-art ML papers, source code, and datasets.

Image modified by the author

Luckily, it’s also possible to interact with PWC through an API to programmatically retrieve research papers. If you look at this Swagger UI, you can find all the available endpoints and try them out.

Let’s, for example, search papers on a specific keyword.