We thank Manasi Vartak from Verta for taking part in this interview and sharing her story and several insights, including her perspectives on the democratization of NLP, opportunities in the AI space, and how her company helps teams build and deploy AI-enabled products.
Often, not everyone’s career journey is a straight path. However, you have been interested in mathematics and computer science since college. It wasn’t until your work at MIT where you became more focused in Data and ML Operations. Can you share how your journey has led you to where you are now?
As mentioned, I majored in Computer Science and Mathematics in college. What really excited me about computer science was the ability to work across a lot of disciplines and have tangible impact using software. At MIT, I got the opportunity to further focus on a particular kind of computer science, specifically data analytics and machine learning. These two tools are incredibly powerful, and just like the broader computer science theme, they can be used across the board whether it’s for healthcare, ad targeting, ranking your news, detecting threats, helping people answer questions about the census, you name it. That’s what got me really excited about the area. With my PhD thesis and now with Verta, I get to build tools that help companies build AI into products and applications in domains from workplace collaboration to insurance.
You are the founder and CEO of Verta. What is the value based outcome your company provides?
We help teams build and deploy AI-enabled products and applications faster and more safely than ever before.
Check out what books helped 20+ successful data scientists grow in their career.
Verta has been around for almost four (4) years now. What problem(s) were you wanting to solve early on and have those problems changed over time?
That’s a great question. When we started, Verta was focused on ModelDB which was my PhD thesis and we built products around experiment management – so focusing more on the research side of machine learning. However as we started engaging with customers we found out that the big challenge was operationalizing the research models – that means taking the models, packaging them, integrating them into user-facing features or business processes and then running them at scale. That’s what Verta does today, and more and more you’ll see our products supporting AI governance and safety.
The follow on question: what new problems / challenges have emerged which you can also have an impact on?
As I alluded to, AI governance is becoming a significant bottleneck in enterprises deploying AI applications. This can include things like ensuring that there are safety checks in place for any AI model that’s deployed, making sure that the AI is fair not biased, that we can explain the decisions that have been made, and that we are able to integrate AI into regular products and have the same degree of monitoring and oversight over them. So those are some of the new areas that have emerged in the last few years and Verta is continuing to expand our offering to cover these areas.
How does your platform support companies who continue having challenges in shifting out of the pilot phase to the production phase for their AI models?
We think about the model life cycle as having two parts: the Build phase where you are building the models (these are usually research models) and then the second phase is Run phase, where you would take a model you would use in a product, make sure it passes essential checks, package it, deploy it into a production-ready system (e.g., your production or staging cluster) and then monitor it to make to ensure that it still producing high-quality results. And that’s what the Verta platform provides – we provide a model catalog which helps research teams share models with the production team, then we provide a packaging and deployment system that takes these models from the catalog and runs them on various platforms, and we provide a model monitoring suite that helps you monitor model performance changes including draft, anomalies, etc.
The post production governance framework hasn’t been fully explored yet in more companies. Can the Verta platform provide the same value & benefits in post production when new data collected, as it does prior to production?
Post-production ML governance is very new and it is an ever-evolving field. Since Verta focuses on the Run phase of the model lifecycle, post-production governance is our emphasis. This includes monitoring the models that have already been deployed to track things such as whether there is data drift happening as new data is being collected, whether anomalies are being observed, or how your overall model performance is changing as a result. In addition, post-production governance includes steps undertaken by the risk and compliance teams including bias and fairness assessments to ensure models are safe for use. All post-production aspects, including governance, are an essential part of what we solve.
I recently read a blog posting company’s website about how ML code is not the same as Software. I think this is the company’s 1st blog post. What recent observations or conversations have prompted write about this topic?
So that’s not the first blog post – I think we have about two dozen or more blog posts before then! However, the thing that prompted us to write this blog post was that this question comes up very frequently. Teams often struggle with how to integrate ML models into their existing SDLC (software development life cycle) processes and so the question becomes, how do you construct build systems for ML, how do you test these models, how do you deploy etc. Since we’ve helped teams deploy hundreds of models at this point, we’ve become very familiar with the unique aspects of how model systems need to be different from SDLC systems and where we can reuse existing infrastructure. And that was really what led to us writing that blog post. I won’t give away the blog post, but here’s the link – check it out!
Often we see enterprise companies create two (2) separate teams, data scientist & AI model engineers or developers. How important is it ensure real collaboration between these teams and can you elaborate on a what steps should be taken to reinforce this collaboration?
We see this a lot as well. As I mentioned, we divide the model life cycle into the Build phase and the Run phase. The build phase is where lots of data scientists (and some data engineers) will be working to get the right data, to train the models, to test them on a small scale; and the Run phase is where the ML Engineers or developers take the trained models and integrate and run them in a production system. As you might imagine, the collaboration between these two personas is essential because data scientists can build models all day, but if they aren’t able to integrate those models into products or those models are too slow or too inaccurate, all of the money spent in building the models and all the time spent is from nothing. And so this collaboration is absolutely crucial. Often, the lack of this collaboration is also the cause of many delays in shipping models.
What steps can one take? They fall into two areas: Institute a Process – When should a model be handed off from Data science to engineering? What checks need to be passed? What kind of documentation does the data scientist need to provide to make sure that their model is ready to be used?
The second area is Tools. You and your team probably uses Slack to communicate and that’s what really helps with a collaboration. This case is no different; you need a good tool that’s going to help them with collaboration. We have found that this tool is the model catalog. A model catalog is a place where the data scientists building models can share them with engineering and that’s where the engineers can pick up models, understand how to use them, and determine the best paths to bring them into production. Finally, there’s old fashioned communication. Helping the two sides understand what the opposing needs and they can respectively do to make the job easier. This is very similar to the DevOps challenge that the software community ran into over a decade ago.
Soft skills are becoming more important in data science and model engineers (or developers). Specifically for the need to communicate effectively to address the assumptions of the data, how the model works, etc. As you look to evolve your team in 2022, are there changes you plan to implement as part of your team development to support these skills?
Since Verta builds infrastructure for data science, we hire team members with strong data science or infrastructure experience. One skillset we are specifically looking for is the ability to design products that make it easier to consume models and use them in different business processes. These models are often used by non-experts so we think about how do you build a system that’s going to help no-experts understand what a model does, how it should be used, what are restrictions on its use etc. That’s a really important area that we’re hiring for.
The follow-on question: What skills are receiving a higher priority for 2022?
As mentioned, we build infrastructure and so our biggest needs are for software engineers and data scientists and product designers.
From your perspective what are some breakthroughs you have witnessed in the past two (2) years?
In the last 2 years the biggest change I have seen has been the massive democratization of natural language processing. A major driver here has been pre-trained models and HuggingFace. Before pre-trained NLP models were made available by the big companies (Google, OpenAI, Microsoft), it was actually pretty hard to build solid applications with NLP. You could build something, but it wasn’t going to be terribly accurate. With the pre-trained models and libraries like HuggingFace, it’s become extremely easy to build NLP based applications.
As technology is changing more rapidly, how do you envision the future of this space in the near future (e.g. 2 years).
I think we’re still in the early innings of what AI can enable for real-life products.
What I mean by that is there have been a lot of prototypes, a lot of research but we still haven’t seen the enormous power (and the challenges) of building products that are fully AI enabled. Think about Siri, think about Alexa; the amount of complexity that sits behind those systems and devices is enormous, but what they can do is amazing. Now imagine that every device in your home and every software that you interact with is that smart (your couch, your toaster, your clothing). I think that there is going to be a massive explosion of AI-enabled products in the next few years and we’re going to see the power of these products as well as the challenges that come with it, the failure mode of these products, and how to react to those failures. This will bring the responsible use of AI to the forefront and will see a next generation of tools to help us build AI responsibly.
What do you hope to see in the long term future for the industry and/or for Verta?
AI and ML are going to have such a long and bright future! As I mentioned, our entire surroundings are going to be full of AI enabled devices and software – from our homes to offices to parks and movie theaters. The future is limitless but with that, we need to make sure that we reach that feature in a safe and sustainable manner that uses AI in ethical and responsible AI ways. As the applications of AI grow, opportunities for Verta to do business and opportunities for us to have an impact also grow, and so we’re excited to become one of the major players that will shape this industry.
You were quoted in an article “there are a lot of reasons why I started Verta, but one of them was to show other women this is a possible career path for you, and you can do it well”. What further advice can you share to girls / women who may have not thought about data science as a career opportunity?
I don’t think there is any reason why women or girls should not consider data, data analytics, machine learning, infrastructure, or entrepreneurship as a career path. These paths are just as hard or as easy as any other path that you are considering. It’s just that historically, there have been fewer women pursuing them. That has changed a lot. Not only are there many more women in these careers, there are a lot more resources today and support systems that will help you succeed in this type of career path. So set your sights on something that is personally meaningful and then go after it. Utilize the resources available at your college, your workplace, or online. If you don’t have resources, reach out to people on Twitter, on LinkedIn, etc. Most people are happy to take time out and pay it forward. And remember that, as with everything, there are going to be failures no matter what you do; what matters is how you get up and how you keep going.
Melissa has 28 years of business & digital transformation experience and was recently recognized as Top 25 Global Consultants and Top Data Consultant, by Chief Data Officer Magazine. A founding member of Women in Data and AI (WLDA), Melissa is a visionary who recognizes the importance of how data strategy, technology, automation, and the stakeholder experience are critical to meeting the needs of tomorrow and beyond.