Jalem Raj Rohit– is a data scientist at GEP Worldwide. He is working in data science and NLP, and he is also a Technical author. He authored two books and a video lesson on ML and serverless engineering. He is a Diamond moderator of the DevOps and the DataScience sites of Stack Overflow.
It does open-source in my free-time. Tinkers around in Julia, Go, Scala, R, and Python.
He is a technical speaker and he speaks on the following topics mentioned below.
- The 1960s elegance behind Go’s regexp (FOSS Asia, Singapore 2017)
- Understanding Serverless Architecture (Pycon Taiwan 2017)
- Linear Regression – The good, the bad and the untold (PyData Delhi 2017)
- Getting into Data Science (Guest talk at Pydata Raipur, IIIT Naya Raipur)
- Machine Learning Workshop (2 hrs; IIIT Naya Raipur)
- Lessons learned from building serverless distributed systems (DevOps Days India 2017)
- Machine Learning Workshop (5.5hrs, Miles India Camp 2017)
- Lessons learned from building serverless distributed systems (Velocity Conf. London 2017) [October ’17]
In this article, Mr.Jalem Raj Rohit shares his experience on:
- How to become a data scientist
- What are the tools/algorithms needed to become a data scientist
- Leveraging language models on top of a search for context-aware search
- His project on data-driven predictor for 2015 world cup
Q: What is data science?
Data science is the science of finding answers from data. Data science is all about major worlds around the data and technology. We have learned physics, chemistry, maths and we can bring those techniques applied the data. It is a combination of programmings, mathematics, and logical thing.
Q: What are the technologies used in data science?
There are several technologies are used in data science. Most popular once are Python, R, and Julia ( they are programming languages ).
Python:preferably you can start with python. If you are working for a company or else you want some productivity you go with python.
R:If you are going to be Researcher means then you go with R because R is good for search purposes it has so many in build package. mostly researchers will go with R but it is complex to understand initially. If you are from a non-programming background then you go with python.
Julia:It is current emerging language in programming side.
Q: What are the algorithms used in data science?
There are several numbers of algorithms used in data science such as Random Forest, XGBoost, LSTM, CNN etc..,
Q: What are the toolkit using in data science?
Toolkits for doing data science are mostly about the three languages ( Python, R, and Julia ) and the libraries which exist for them. In addition, there are distributed computing toolkits like Apache Spark and it’s a competitor.
Q: Why data science is important in the upcoming year?
Data Science in the health industry is quite disruptive recently. Advancements in cancer and research of other critical diseases using ML would be something to look forward to.
Q: What we have to learn to become a good data scientist?
These are the steps you can follow in the become a data scientist,
- Math basic (linear algebra, differentials, and geometry)
- Googling skills and getting answers/help off the internet
- Programming abilities
- Hypothesis testing skills
Q: Could you share any working experiences in your research area?
My research areas are NLP and quant,
1. I’m currently working on leveraging language models on top of the search for context-aware search. Unsupervised NLP learning is what piques my interest in the world of NLP
2. Usage of NLP for augmenting quant algorithms is what I spend most of my time on, after work.
Q: Could you share your “Data-driven predictor” project? and How to apply?
We scraped the data of cricket matches for the past 10 years and predicted the probability of a team winning a match.
1. It depends on several combinations, including the batting, bowling lineups, ground, type of ground, home/away game.
2. Tested and validated it for the 2015 world cup.
Editorial Staff Intern
Pandian Saraswathi Yadav Engineering College. I am interested in Python, Machine Learning, AI.