Learnings from fine-tuning a large language model on a single consumer GPU

10 min read

12 hours ago

Image by Author (Midjourney).

When we think about Large Language Models or any other generative models, the first hardware that comes to mind is GPU. Without GPUs, many advancements in Generative AI, machine learning, deep learning, and data science would’ve been impossible. If 15 years ago, gamers were enthusiastic about the latest GPU technologies, today data scientists and machine learning engineers join them and pursue the news in this field too. Although usually gamers and ML users are looking at two different kinds of GPUs and graphic cards.

Gaming users usually use consumer graphic cards (such as NVIDIA GeForce RTX Series GPUs), while ML and AI developers usually follow news about Data Center and Cloud Computing GPUs (such as V100, A100, or H100). Gaming graphic cards usually have much less GPU memory (at most 24GB as of January 2024) compared to Data Center GPUs (in the range of 40GB to 80GB usually). Also, their price is another significant difference. While most consumer graphic cards could be up to $3000, most Data Center graphic cards start from that price and can go tens of thousands of dollars easily.

Since many people, including myself, might have a consumer graphic card for their gaming or daily use, they might be interested to see if they can use the same graphic cards for training, fine-tuning, or inference of LLM models. In 2020, I wrote a comprehensive article about whether we can use consumer graphic cards for data science projects (link to the article). At that time, the models were mostly small ML or Deep Learning models and even a graphic card with 6GB of memory could handle many training projects. But, in this article, I am going to use such a graphic card for large language models with billions of parameters.

For this article, I used my GeForce 3090 RTX card which has 24GB of GPU memory. For your reference, data center graphic cards such as A100 and H100 have 40GB and 80GB of memory respectively. Also, a typical AWS EC2 p4d.24xlarge instance has 8 GPUs (V100) with a total of 320GB of GPU memory. As you can see the difference between a simple consumer GPU…