
In the rapidly growing field of data engineering, restructuring data pipelines has become fundamental to driving business growth and operational efficiency. Manohar Sai Jasti, Software Development Engineer at Workday, shares his journey of implementing innovative solutions and ensuring scalability in data pipelines. In this interview, we explore his experiences and insights into reshaping data pipelines to empower businesses with data-driven decision-making.
What are some key projects involving data pipeline restructuring, and what outcomes did you achieve?
When I was engaged at Stord, a leading cloud supply chain, and fulfillment platform, I was the sole data engineer there. My responsibility was to lead several critical projects that reshaped our data infrastructure. One of the most important initiatives was the Log-Based Replication (LBR) Migration project, which I spearheaded in collaboration with our Site Reliability Engineering (SRE) team.
Before this project, we faced substantial data discrepancies between our source system and BigQuery. They were leading to inefficiencies and slower data updates, so the migration yielded remarkable results.
To be precise, we achieved annual cost savings of $72,000 per year, equating to $6,000 per month. The data discrepancies were practically eliminated and reduced by almost 100%. Data refresh rates were also improved by at least 30%.
This project has been a huge undertaking and has impacted all of the major datasets for both Stord One Commerce and Stord One Warehouse, which are cloud-based order management and warehouse management products. Thanks to the remarkable results, I was recognized and awarded for “Efficiency Driver”.
Another key project was the Critical Orders Dataflow Enhancement. I owned this crucial data flow where the goal was to consolidate information across Stord’s legacy and new systems. This project significantly improved our data aggregation and reporting capabilities. Its main advantage was providing logistics customers with detailed and accurate insights into their supply chain operations.
Additionally, I completed all data-end migrations from Veracore to Stord One Commerce, which was a huge customer obsession win. This migration improved operational efficiency, grew revenue, and enhanced our products and services.
Currently, as an Analytics Engineer at Workday since May 2024, I’m involved in developing and maintaining robust data transformation pipelines. I’m part of the Performance, Resilience, and Scalability (PRS) Engineering Tools Group. My role involves creating a complete data pipeline, from data warehouse to data science applications, empowering Workmates with data-driven decisions at their fingertips.
Here, I’ve been extensively leveraging DBT, the data build tool, to enhance our FinOps practices and create models that ingest and transform billing data from various cloud providers. This work has improved our ability to analyze costs across our multi-cloud infrastructure, providing valuable insights for resource allocation and spend optimization.
Data product governance is crucial for preventing siloed development and ensuring consistent, high-quality data assets across an organization. In my current role at Workday, I’ve been addressing this challenge by implementing comprehensive data governance practices for our data products used by the analysts, data scientists etc, through cross-functional collaboration, standardization, access management, data pipeline life cycle management, etc.
Scalability and flexibility are cornerstones of any robust data infrastructure. How do you ensure your systems can scale seamlessly while supporting business growth?
Scalability and flexibility are indeed very important at our job, especially at Stord. The matter is that we have rapidly expanded our cloud supply chain services, and to support this growth further and ensure that all new features are flexible, I focused on several key areas.
The first was query performance enhancements. I corrected our data infrastructure by strategically separating fact tables. In fact, I can boast that this restructuring dramatically enhanced query performance and optimized data retrieval processes for Stord’s complex logistics operations.
Another key area was the transition to DBT (Data Build Tool). I moved critical data processing logic that powers most of our dashboards from traditional stored procedures to DBT. This has brought comparatively fruitful results—the overall operational efficiency and alerting systems were improved. Thanks to that, it has become easier to adapt to new requirements without repairing the entire system.
Comprehensive alerting and monitoring were also an area of priority. I implemented 100% alerting and monitoring across all pipelines and critical processes. This resulted in minimized data downtime and improved ability to respond quickly to issues.
In my current role at Workday, I continue to focus on scalability and flexibility. I utilize a range of tools, including DBT, Trino/Presto, Jupyter Notebooks, Python, Apache AirFlow, AWS RDS, MySQL/Postgresql, and Git for data processing and analysis.
What steps have you taken to modernize data processing workflows, and how have these improvements impacted efficiency and accuracy?
At Stord, one of the most impactful changes I made in terms of modernizing data workflows was the Log-Based Replication Migration. It solved data accuracy issues, improved refresh rates, and cut costs, which helped us provide real-time insights into logistics operations.
I also introduced DBT to manage critical data processes. This allowed us to handle data more efficiently and made it easier for team members to work together on updates.
Another project involved improving how we handle master order data. These updates gave us a clearer picture of warehouse activities and made our reports more valuable for customers.
At Workday, I’ve focused on multi-cloud infrastructure, creating pipelines that ensure accurate and up-to-date data for cost analysis. These improvements have helped teams make decisions faster and with more confidence.
Let’s talk innovation—how have automated monitoring and machine learning shaped your approach to managing data?
At Stord, innovation was all about staying ahead in how we managed data. One major improvement was introducing automated monitoring and alerting for all pipelines. With 100% coverage, we could catch and fix issues before customers were affected. This was especially useful in ensuring accurate logistics tracking and reporting.
I also worked on enhancing our alerting system to focus on things like stale or duplicate data. These improvements helped us maintain high data quality and improved customer trust in our analytics.
At Workday, I’ve continued to prioritize innovation by developing tools and processes that make our data products better. For example, I’m working on improving alerting systems to identify issues faster and create smoother workflows for our teams.
Speaking about current trends, machine learning is now transforming practically every data-driven business. Can you share how you’ve integrated machine learning into data processing and its impact on analytics quality and timeliness?
During my time at Stord, I was involved in exploring machine learning technologies’ integration into our data processing. One of my key projects was building an AI-powered chatbot in collaboration with cross-functional teams. This chatbot used generative AI to handle analytical queries, allowing users to ask questions in plain language and get SQL-based answers quickly.
We also added error-handling mechanisms that helped the chatbot learn and improve over time. This not only reduced response times for ad-hoc queries but also gave our teams faster access to the data they needed.
At Workday, I’m applying this experience to build a knowledge bot that uses generative AI. The bot is designed to help users ask questions about how to use analytics tools, cutting down the need for documentation and providing real-time support. It’s an exciting project that’s making analytics easier and faster for everyone involved.
As we wrap up, what hurdles did you face during projects like log-based replication, and how did you overcome them?
The Log-Based Replication Migration at Stord had its share of challenges. The main technical hurdle was the complexity of supply chain data. It was also important to integrate the new system without disrupting ongoing logistics operations.
We sometimes ran into unexpected problems—what we called “black swan” issues—after making updates to master orders logic. These required deep troubleshooting and teamwork to resolve.
To handle these challenges, I made sure to test thoroughly at every step. I worked closely with the SRE team to solve technical problems and collaborated with stakeholders to keep everyone aligned on goals.
In my current role at Workday, I’ve faced different challenges related to multi-cloud infrastructure. For example, ensuring data accuracy across different cloud platforms is critical. To solve this, I built tests to validate data and created a system to flag stale data before it affected customers. This proactive approach has helped ensure our analytics are always reliable and up-to-date.