Introducing MetaGPT’s Data Interpreter: SOTA Open Source LLM-based Data Solutions

MetaGPT's Data Interpreter: Open Source Statistical Modeling
Image created by Author with Midjourney

 

MetaGPT is a multi-agent framework for assigning roles to various agents which leads to the formation of collaborative entities which are able to work in tandem to execute complex instructions. MetaGPT bills itself as a “software company as multi-agent system,” giving you an idea of the intended usage of these collaborative entities. MetaGPT can be used as a standalone app from the command line, and as a library within your own Python scripts, allowing for the flexibility and control one would desire in such a framework.

The project began in April 2023, leveraging ChatGPT, and at the time of writing has nearly 40K stars on GitHub. Its GitHub repo further describes itself as follows:

 

MetaGPT takes a one line requirement as input and outputs user stories / competitive analysis / requirements / data structures / APIs / documents, etc.

Internally, MetaGPT includes product managers / architects / project managers / engineers. It provides the entire process of a software company along with carefully orchestrated SOPs.

 

MetaGPT architecture
MetaGPT’s Software Company Multi-Agent Schematic (Gradually Implementing) (from MetaGPT’s GitHub)

 

MetaGPT can be used for code generation, prototyping, project planning, and more. It has been recognized as a standout open source achievement, and is continually a trending GitHub repo.

That’s MetaGPT. Now let’s discuss Data Interpreter, Deep Wisdom‘s latest MetaGPT improvement, and achievement in its own right.

 

Data Interpreter is another member agent of the MetaGPT framework, an agent dedicated to assessing and solving data-related tasks. From the paper:

 

In this study, we introduce the Data Interpreter, a solution designed to solve with code that emphasizes three pivotal techniques to augment problem-solving in data science: 1) dynamic planning with hierarchical graph structures for real-time data adaptability; 2) tool integration dynamically to enhance code proficiency during execution, enriching the requisite expertise; 3) logical inconsistency identification in feedback, and efficiency enhancement through experience recording. […] Compared to open-source baselines, it demonstrated superior performance, exhibiting significant improvements in machine learning tasks, increasing from 0.86 to 0.95. Additionally, it showed a 26% increase in the MATH dataset and a remarkable 112% improvement in open-ended tasks.

 

These findings are certainly impressive. And there is no need to take them at face value, since they have published these results. Deep Wisdom has also made available a plethora of examples to show how their Data Interpreter agent can be used in conjunction with the existing MetaGPT framework.

This example here shows how it can be used for NVIDIA stock trend analysis. To see what a MetaGPT Data Interpreter prompt looks like, I will duplicate it below:

 

Obtain NVIDIA Corporation (NVDA) stock price data from Yahoo Finance, focusing on historical closing prices from the past 5 years. Summary statistics (mean, median, standard deviation, etc.) to understand the central tendency and dispersion of closing prices. Analyze the data for any noticeable trends, patterns, or anomalies over time, potentially using rolling averages or percentage changes. Create a plot to visualize all the data analysis. Reserve 20% of the dataset for validation. Train a predictive model on the training set. Report the model’s validation accuracy, and visualize the result of prediction result. close

 

You can check out the example notebook (linked above) to follow MetaGPT’s process and see the results. Spoiler alert: Deep Wisdom isn’t sharing them because they are not impressive 🙂

Read the full paper for all the info you could ask for. You can find out more about installation and usage on the project’s GitHub repo. I can attest from experience that MetaGPT is a worthwhile project to check out, and with the addition of the Data Interpreter agent, this is even more true than it was before.
 
 

Matthew Mayo (@mattmayo13) holds a Master’s degree in computer science and a graduate diploma in data mining. As Editor-in-Chief of KDnuggets, Matthew aims to make complex data science concepts accessible. His professional interests include natural language processing, machine learning algorithms, and exploring emerging AI. He is driven by a mission to democratize knowledge in the data science community. Matthew has been coding since he was 6 years old.