The 7 Best AI Tools for Data Science Workflow
Image from DALLE-3
It is now evident that those who adopt AI quickly will lead the way, while those who resist change will be replaced by those who are already using AI. Artificial intelligence is no longer just a passing fad; it is becoming an essential tool in various industries, including data science. Developers and researchers are increasingly using AI-powered tools to simplify their workflows, and one such tool that has gained immense popularity recently is ChatGPT.
In this blog, I will discuss the 7 best AI tools that have made my life as a data scientist easier. These tools are indispensable in my daily tasks, such as writing tutorials, researching, coding, analyzing data, and performing machine learning tasks. By sharing these tools, I hope to help fellow data scientists and researchers streamline their workflows and stay ahead of the curve in the ever-evolving field of AI.
Every data professional is familiar with pandas, a Python package used for data manipulation and analysis. But what if I told you that instead of writing code, you can analyze and generate data visualizations by simply typing a prompt or a question? That’s what PandasAI does – it’s like an AI Agent for your Python workflow that automates data analysis using various AI models. You can even use locally run models.
In the code below, we have created an agent using the pandas dataframe and OpenAI model. This agent can perform various tasks on your dataframe using natural language. We asked it a simple question and then requested an explanation of how it arrived at the results.
import os
import pandas as pd
from pandasai.llm import OpenAI
from pandasai import Agent
sales_by_country = pd.DataFrame(
{
"country": [
"United States",
"United Kingdom",
"France",
"Germany",
"Italy",
"Spain",
"Canada",
"Australia",
"Japan",
"China",
],
"sales": [5000, 3200, 2900, 4100, 2300, 2100, 2500, 2600, 4500, 7000],
}
)
llm = OpenAI(api_token=os.environ["OPENAI_API_KEY"])
pandas_ai_df = Agent(sales_by_country, config={"llm": llm})
response = pandas_ai_df.chat("Which are the top 5 countries by sales?")
explanation = pandas_ai_df.explain()
print("Answer:", response)
print("Explanation:", explanation)
The results are amazing. Experimenting with my real-life data would have taken at least half an hour.
Answer: The top 5 countries by sales are: China, United States, Japan, Germany, United Kingdom
Explanation: I looked at the data we have and found a way to sort it based on sales. Then, I picked the top 5 countries with the highest sales numbers. Finally, I put those countries into a list and created a sentence to show them as the top 5 countries by sales.
GitHub Copilot is now necessary if you are a full time developer or dealing with the code everyday. Why? It enhances your ability to write clean and effective code faster. You can even chat with your file and debug faster or generate context aware code.
GitHub Copilot includes AI chatbot, inline chatbox, code generation, autocomplete, CLI autocomplete, and other GitHub-based features that can help with code search and understanding.
GitHub Copilot is a paid tool, so if you don’t want to pay $10/ month then you should check out Top 5 AI Coding Assistants You Must Try.
ChatGPT has been dominating the AI space for 2 years now. People use it for writing emails, generating content, code generation, and all kinds of nominal work-related tasks.
If you pay for a subscription, you get access to the state-of-the-art model GPT-4, which is excellent at solving complex problems.
I use it daily for code generation, for code explanation, for asking general questions, and for content generation. The work generated by AI is not always perfect. You may need to make some edits to present it to a wider audience.
ChatGPT is an essential tool for data scientists. Using it is not cheating. Instead, it saves you time in researching and finding solutions compared to everyone else.
If you value privacy, consider running open source AI models on your laptop. Check out 5 Ways To Use LLMs On Your Laptop.
If you have trained a deep neural network for a complex machine learning task, then you must have first trained it on Google Colab due to the availability of freely accessible GPUs and TPUs. With the surge in Generative AI, Google Colab has recently introduced some features that will help you generate code, debug faster, and autocomplete.
Colab AI is like an integrated AI coding assistant in your workspace. You can generate code by simply prompting and asking follow-up questions. It also comes with inline code prompting, although it has limited use with the free version.
I would highly recommend getting the paid version as it provides better GPUs and an overall better coding experience.
Discover the Top 11 AI Coding Assistants for 2024 and try out all alternatives to Colab AI to find the best fit for you.
I have been using Perplexity AI as my new search engine and research assistant. It helps me learn about new technologies and concepts by providing concise and up-to-date summaries with links to relevant blogs and videos. I can even ask follow-up questions and get a modified answer.
Perplexity AI offers various features to assist its users. It can answer a wide range of questions, from basic facts to complex queries, using the latest sources. Its Copilot feature allows users to explore their topics in-depth, enabling them to expand their knowledge and discover new areas of interest. Furthermore, users can organize their search results into “Collections” based on projects or topics, making it easier to find what they need in the future.
Check out 8 AI-powered search engines that can enhance your internet searching and research capabilities as an alternative to Google.
I want to let you know that Grammarly is an exceptional tool for individuals with Dyslexia. It helps me write content quickly and accurately. I have been using Grammarly for almost 9 years now, and I love the features that correct my spelling, grammar, and overall structure of my writing. Recently, they introduced Grammarly AI, which allows me to improve my writing with the help of generative AI models. This tool has made my life easier as I can now write better emails, direct messages, content, tutorials, and reports. It is a vital tool for me, much like Canva.
Hugging Face is not just a tool, but an entire ecosystem that has become an essential part of my daily work life. I use it to access datasets, models, machine learning demos, and APIs for AI models. Additionally, I rely on various Hugging Face Python packages for training, fine-tuning, evaluating, and deploying machine learning models.
Hugging Face is an open-source platform that’s free for the community and allows people to host datasets, models, and AI demos. It even lets you deploy your models inferences and run them on GPUs. In the next few years, it’s likely to become the primary platform for data discussions, research and development, and operations.
Discover the top 10 data science tools to use in 2024 and become a super data scientist, solving data problems better than anyone.
I have been using Travis, an AI-powered tutor, to conduct research on advanced topics such as MLOps, LLMOps, and data engineering. It provides simple explanations about these topics and you can ask follow-up questions just like with any chatbot. It’s perfect for those who only want search results from top publications on Medium.
In this blog, we have explored 7 powerful AI tools that can significantly enhance the productivity and efficiency of data scientists and researchers – from conversational data analysis with PandasAI to code generation and debugging assistance with GitHub Copilot and Colab AI, offering game-changing capabilities to simplify complex code related tasks and save valuable time. ChatGPT’s versatility allows for content generation, code explanation, and problem-solving, while Perplexity AI provides a smart search engine and research assistant. Grammarly AI offers invaluable writing assistance, and Hugging Face serves as a comprehensive ecosystem for accessing datasets, models, and APIs to develop and deploy machine learning solutions.
Abid Ali Awan (@1abidaliawan) is a certified data scientist professional who loves building machine learning models. Currently, he is focusing on content creation and writing technical blogs on machine learning and data science technologies. Abid holds a Master’s degree in technology management and a bachelor’s degree in telecommunication engineering. His vision is to build an AI product using a graph neural network for students struggling with mental illness.