Interview with Mr. Sumeet Patil, Lead Data Scientist at Chistats Labs

Sumeet Patil –is a Lead Data scientist at Chistats Labs Pvt. Ltd.
He has experience as a software engineer at CDAC Research & Development in 2017 for one year where his focus was on analyzing high-performance computing systems and improving the software performance on PARAM supercomputer. He is a highly skilled and result-oriented professional with 3 years of experience.  Also, he is an expert in emerging technologies such as Artificial Intelligence (Machine learning, Deep learning, Natural language Processing, Computer Vision), Data Science, Data mining and Data Analytics.

In this article, Mr. Sumeet Patil shares his experience on:
1. The roadmap of becoming a data scientist.
2. How it is like to be a data scientist
3. What are the skills needed to become a data scientist.

Q: What is the role of data scientist?

Data science is possibly the hottest career of the 21st century having a high potential. In today’s high-tech world, everyone has pressing questions that must be answered by largely collected data. From businesses to non-profit organizations to government institutions, there is a seemingly infinite amount of information that can be categorized, interpreted, and applied for a wide range of purposes. But finding the right answers, however, can be a serious challenge.
● How can a business sort through purchasing data to create a marketing plan?
● How can the government use behavior patterns to create community activities?
● How can non-profit use their marketing budget to enhance potential operations?
It all comes down to data scientists. Because there is simply too much information for a person to process and use, data scientists are trained to collect, structure, and analyze data, help people from each corner of the industry and every segment of the population.

Q: Which data scientist do you admire most and why?

There are many to name and LinkedIn data science community is very rich in experience and I admire a lot, but the peoples that had a real impact on me are:

  • Yogesh Karpate (Ph.D.) CEO and Founder of Chistats Labs, Its always said that your first boss is the major factor in your career and I’m lucky to have him as my colleague, mentor, and boss. His wide experience in data science and machine learning has always helped me learn something new every day. We are looking ahead to achieve a lot more together.
  • LinkedIn Data Science community: Andriy Burkov, Randy Lao, Kyle McKiou, Dat Tran and many more. their constant contribution to the field of Data Science and Artificial Intelligence helped me a lot. These guys kept me motivated and helped me with learning all those skill needed in this field.

Q: How it is like to be a data scientist?

  • Data science is a complex and often confusing field, and it involves dozens of different skills that make defining the profession a constant struggle.      A data scientist is someone who gathers and analyzes with the goal of reaching a conclusion. They do this through many different techniques.
  •  They may present the data in a visual context, which is often called “visualizing the data,” allowing a user to look for clear patterns that wouldn’t be noticeable if the information was presented in hard numbers on a spreadsheet.
  • To be a data scientist is to be equipped with a diverse and wide-ranging skill set, balancing knowledge in different computer programming languages with advanced experience in data mining and visualization. Technical skills are not all that count, However Data scientists often exist in business settings and are charged with making complex data-driven organizational decisions.
  • As a result, it is highly important for them to be effective communicators, leaders, and team members as well as high-level analytical thinkers.
  • At its core, data science is the practice of looking for meaning in mass amounts of data.

Q: What are the responsibilities of data scientist?  

A  data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician  ” – Josh Wills.

On any given day, a data scientist’s responsibilities may include:

  • Conduct undirected research and frame open-ended industry questions.
  • Extract huge volumes of data from multiple internal and external sources.
  • Employ sophisticated analytics programs, machine learning and statistical.
  • Methods to prepare data for use in predictive and prescriptive modeling.
  • Thoroughly clean and prune data to discard irrelevant information.
  • Explore and examine data from a variety of angles to determine hidden weaknesses, trends and/or opportunities.
  • Devise data-driven solutions to the most pressing challenges.
  • Invent new algorithms to solve problems and build new tools to automate work.
  • Communicate predictions and findings to management and IT departments through effective data visualizations and reports.
  • Recommend cost-effective changes to existing procedures and strategic conduct undirected research and frame open-ended industry questions.
  • Every company will have a different take on job tasks. Some treat their data scientists as data analysts or combine their duties with data engineers; others need top-level analytics experts skilled in intense machine learning and data visualizations.

Q: What is the machine learning worflow?

  • Essentially, a data scientist extracts meaning from the varying types of data (e.g., structured, unstructured, semi-structured) that flow into the enterprise. On any given day, a data scientist may be extracting data from a database, preparing the data for various analyses, building and testing a statistical model or creating reports that include easily understandable data visualizations. There is a data science cycle which isn’t a set of rules as much as it is a heuristic:
  • Data collection
  • Data preparation
  • Exploratory data analysis (EDA)
  • Evaluating and interpreting EDA results
  • Model building
  • Model testing
  • Model deployment
  • Model optimization

Q: What are the real-time applications of data science?

Internet Search: When we speak of search, we think ‘Google’. Right? But there are many other search engines like Yahoo, Bing, Ask, AOL, Duckduckgo etc. All these search engines (including Google) make use of data science algorithms to deliver the best result for our searched query in a fraction of seconds.

  • Recommender Systems: Who can forget the suggestions about similar products on Amazon? They not only help you find relevant products from billions of products available with them but also adds a lot to the user experience.
  • Image Recognition: You upload your image with friends on Facebook and you start getting suggestions to tag your friends. This automatic tag suggestion feature uses face recognition algorithm.
  • Speech Recognition: Some of the best examples of speech recognition products are Google Voice, Siri, Cortana etc. Using speech recognition the feature, even if you aren’t in a position to type a message, your life wouldn’t stop and there are many more.

Q: What is the roadmap to become a data scientist?

“Practice makes everyone perfect”, data science is no magic and it requires a certain amount of practice to learn, acquire and work. Try to solve real-world social/business problems which can be obtained from sites like Kaggle, Analytics Vidhya, etc where they provide all the required details and so on. these are the steps you can follow to become a data scientist.

1.Pursue an undergraduate, graduate, or certificate in data science

Academic qualifications may be more important than you imagine. Broadly speaking, you have 3 education options if you’re considering a career as a data scientist

  • Degrees and graduate certificates provide you with knowledge, internships, networking and recognized academic qualifications for your resume.
  • Self-guided learning courses are free/cheap, short and targeted. They allow you to complete projects on your own time, but they require you to structure it.
  • Boot camps are intense and faster to complete than traditional degrees. They may be taught by practicing data scientists, but they won’t give you a degree.

2.Learn the required skills to become a data scientist

There are several skills for doing data science and the libraries which exist such as like,

  • Programming Languages: Python, R, SAS, Matlab.
  • Expertise in any one programming language, I suggest ‘R’ or ‘Python’.
  • Machine Learning Tools (Sci-kit Learn, Numpy, Pandas).
  • You should understand what is Machine learning and how it works. Understand different types of Machine Learning techniques like
  • Supervised Learning
  • Unsupervised
  • Good knowledge of various Supervised and Unsupervised learning algorithms is  required such as a Linear Regression, Logistic Regression, Decision Tree, Random Forest, K Nearest  Clustering (for example K-means)
  • Data Visualization. It is a very important part of Data life-cycle. Below are a few visualization tools:
  • Tableau
  • Kibana
  • Tableau
  • Matplotlib
  • Shiny
  • math(Exploratory Analysis & Modelling) and fundamental. The Fundamentals may include,
  • Matrices and Linear Algebra Functions
  • Hash Functions and Binary Tree
  • Relational Algebra, Database Basics
  • ETL ( Extract Transform Load ).
  • In  Statistics it includes:
  • Descriptive Statistics (Mean, Median, Standard Deviation, Variance)
  • Exploratory Data Analysis
  • Percentiles and Outliers
  • Probability Theory
  • Bayes Theorem
  • Random Variables
  • Cumulative Distribution function (CDF)
  • Skewness
  • Other Statistics fundamentals
  • Data Mining, Cleaning, and Munging
  • Reporting VS BI (Business Intelligence) VS Analytics
  • Big Data & Cloud Platforms
  • Data-Driven Problem Solving
  • Effective Communication
  • Software Engineering Skills

3.Building your data science profile

Build your profile as a data scientist by practicing your skills with any of the following and add it to your Resume/CV:

  • Kaggle         :   it is a platform for predictive modeling and analytics competitions.
  • 100 Days Of Code  :  This is a challenge where beginner coders attempt to code for at least an hour every day for 100 days.
  • Codewars     : Improve your skills by training with others on real code challenges.
  • DrivenData   : DrivenData brings crowdsourcing to some of the world’s biggest social challenges and the organizations taking them on.
  • HackerRank   : Practice coding while you are complete then Find jobs.

Q: How important will data science be in the future?

The exponential growth in data we have witnessed since the beginning of our digital era is not expected to slow down anytime soon. In fact, we have probably just seen the tip of the iceberg. The coming years will bring about an ever-increasing torrent of data. The new data will function as rocket fuel for our data science models, giving rise to better models as well as new and innovative use cases.




Opinions expressed by AI Time Journal contributors are their own.

About Shanmugapriya Balamurugan

Editorial Staff Intern Pandian Saraswathi Yadav Engineering College, Interested in Machine Learning and Data science.

View all posts by Shanmugapriya Balamurugan →