What I Learned Building ML Platform at Mailchimp

Feedback integration is crucial for ML models to meet user needs.

A robust ML infrastructure gives teams a competitive advantage.

Technical projects must be aligned with business objectives.

Human involvement in MLOps and AI is as crucial as the technology itself. 

I started my ML journey as an analyst back in 2016. Since then, I’ve worked as a data scientist for a multinational company and an MLOps engineer for an early-stage startup before moving to Mailchimp in May 2021. I joined just before its $12 billion acquisition by Intuit.

It was an exciting time to be there as the handoff happened, and today, I still draw from this experience. In this article, I will outline the learnings and challenges I faced while on the ML platform team at Mailchimp, building infrastructure and setting up the environment for development and testing.

Mailchimp’s ML Platform: genesis, challenges, and objectives

Mailchimp is a 20-year-old bootstrapped email marketing company. They still have their infrastructure in physical data centers and server racks. They had only started transitioning to the cloud relatively recently when I joined.

Mailchimp had decided, “We’ll move the burgeoning data science and machine learning initiatives in batches, including any data engineers needed to support those. We’ll keep everyone else in the legacy stack for now.” I still think this was a great decision, and I’d recommend a similar strategy to anyone in the same position.

Team setup and responsibilities

We had around 20 data engineers and ML(Ops) engineers working on the ML platform at Mailchimp.

The data engineers’ main job was bringing data from the legacy stack onto Google Cloud Platform (GCP), where our data science and machine learning pipelines and projects lived. This process created a latency of approximately one day for the data. It could be even longer if data needed to be backfilled. This delay was a challenge on its own.

Responsibility for MLOps and the ML platform was split across three teams:

  • 1
    One team focused on making tools, setting up the environment for development and training for data scientists, and helping with the productionization work. (This was my team.)
  • 2
    One team focused on serving the live models. This included maintaining the underlying infrastructure and working on model deployment automation.
  • 3
    One team that started doing data integrations and, over time, evolved and shifted their focus to model monitoring.

Passive productionization and getting leadership buy-in

The problem we were trying to solve in my team was: How do we provide passive productionization for data scientists at Mailchimp, given all the different kinds of projects they were working on? By passive productionization, we meant transitioning from model development to deployment and operation as seamlessly and effortlessly as possible for the data scientists involved.

The key was not relying on a “build it and they will come” approach. Instead, we identified inefficiencies and shortcomings of the existing processes and created improved alternatives. Then, we made an effort to engage data scientists through workshops and tailored support to transition smoothly to these better solutions. We also had ML engineers embedded in the data science teams that helped bridge gaps left by the tooling and infrastructure. In that sense, it’s about “doing things that don’t scale” until you have traction.

Important note: There’s a lesson in this I’ve learned over and over again that many technically-oriented teams seem to miss: To get buy-in from leadership, you have to align what you’re doing with specific business objectives. Of course, one part of it is offering genuinely superior alternatives. But, when presenting to management, you have to emphasize tangible benefits, such as significantly reduced project delivery times, increased employee satisfaction, and higher productivity. It’s paramount that you can showcase measured improvements and propose a sustainable maintenance plan.

Getting data product feedback at Mailchimp 

At Mailchimp, we faced many challenges, ranging from the delay at which data arrived on our cloud-based ML platform to evaluating and scaling new libraries and patterns of ML development (like LLMs) for Mailchimp’s GenAI features.

One important challenge was getting feedback on our ML and data products from users and then making the necessary iterations based on the feedback with as light a lift as possible. 

Questions that needed to be answered included:

  • 1
    How do you get feedback on the models in the first place?
  • 2
    How would you then integrate that feedback back into the model for enrichment?

Let’s look at each independently. To get feedback on user-facing models, you can learn from user input directly, assuming expertise in experimentation design. For example, “Is this ad relevant to you?” is a way of getting feedback directly from the UI. Going beyond that, you can utilize tools like A/B tests and write the results back to a database for later analysis.

Regarding the enrichment of models through integrating feedback, you begin by analyzing and preprocessing the user’s feedback. You can then use that data to retrain the model. Feedback also helps you identify and focus on areas that are most in need of improvement.

After retraining, it’s crucial that you test the updated model to ensure improved performance and to validate that you addressed the issues identified in the feedback. Finally, you deploy the revised model with continuous monitoring to track its effectiveness.

Only by going through all these steps can you be sure that feedback integration leads to tangible improvements and that your ML-powered features remain in line with user expectations.

In times of generative AI, a good ML infrastructure matters a lot

A lesson that I’ve learned time and time again over the past years is the enduring importance of ‘boring’ data and ML infrastructure. Despite the hype around GenAI and new tools and platforms, the backbone of MLOps isn’t disappearing anytime soon.

It’s crucial to develop systems that can scale effectively and accommodate diverse ML models, as needed by data scientists or ML engineers. This applies whether you’re working with live-service models that require online training or batch-processing models trained offline. Your infrastructure must be versatile enough to manage these needs based on your projections.

What ultimately matters is who owns the data

We see a lot of discussions around the limited availability of public datasets for training GenAI models and concerns about the implications of depleting web-based datasets. The solution always circles back to first-party data a business owns and controls.

That’s reminiscent of the industry’s reaction when Google announced its plan to discontinue third-party data tracking. There was widespread alarm, but the message was clear: businesses that integrate data collection with their machine-learning initiatives have less to worry about. On the other hand, companies that merely serve as a facade for services like OpenAI’s APIs are at risk, as they don’t offer unique value.

And mark my words: 2024 is when we’ll start seeing companies move beyond the POC stage of GenAI, only to realize their efforts and initiatives will be plagued by the ghosts of data quality past

Learnings from Mikiko Bazeley

As I reflect on my journey at Mailchimp and my roles since then – leading MLOps and Developer Relations at feature store provider Featureform and for the data-centric AI platform Labelbox – a few key lessons stand out:

  1. Integrating feedback into ML models is crucial to align with user needs. Effective feedback collection and integration, such as direct UI prompts and A/B testing, is essential for continuous model improvement.
  2. It’s hard to overstate the importance of a robust ML infrastructure. In today’s GenAI world, owning and understanding your data becomes a significant competitive advantage. Transitioning from reliance on public datasets to leveraging first-party data is necessary and a smart strategic choice. This is what I’m now working on at Labelbox, where we create solutions for transforming and processing unstructured data (whether image, text, audio, video or geospatial) into machine learning inputs.
  3. It’s essential to align technical projects with business objectives. When communicating with leadership, focusing on tangible benefits such as improved efficiency and higher productivity is crucial. Demonstrating measurable improvements and offering a sustainable maintenance plan can significantly enhance buy-in from both leadership and cross-functional teams. (For more information on measuring and communicating ROI on MLOps initiatives, please check out my guide: “Measuring Your ML Platform’s North Star Metrics.”)
  4. Lastly, let’s not forget the human element in MLOps and AI. Engaging teams through workshops, providing tailored support, and fostering a culture of collaboration are just as important as the technical aspects. Remember, successful implementation is as much about people as it is about technology. The future of AI isn’t just about building bigger, human-free models and systems. The opportunity to democratize advances in Machine Learning is aligning the development of smaller, task-specific models with human needs and expertise.

Was the article useful?

Thank you for your feedback!

Explore more content topics: