- The key skills required for a Data Scientist and the most prominent challenges faced
- Resources that can help a data scientist hone their skills
- Personal experiences in the Data Science and AI field
- Challenges organizations face getting into Data Science and having a data-driven mindset
Over the years, the terms that describe what we do have changed, from Data Mining to Predictive Analytics to Big Data and Data Science, the methods have evolved, and the data has grown exponentially.
– Daniele Micci-Barreca
How did you first get into data science?
I was working on my Ph.D. in Cognitive and Neural Systems, and of course, I was interested and fascinated by machine intelligence. However, when the application areas were very narrow, things like image recognition, robotics, and pattern recognition. But it was around that time that the term “Data Mining” emerged, and the idea of applying these machine learning and statistical methods to database mining began to emerge.
This convergence of ML and data propelled my interest in the field. I also had the opportunity to get an internship at Epsilon, one of the first “database marketing” organizations to adopt “Data Mining” for customer response modeling and other applications. Over the years, the terms that describe what we do have changed, from Data Mining to Predictive Analytics to Big Data and Data Science, the methods have evolved, and the data has grown exponentially. Still, the basic concept, I believe, is very much the same.
What are the key skills that you use every day as a data scientist, and how did you develop them?
I think the main trait of a good Data Scientist is curiosity and the ability to use data to discover relationships and patterns. Technical skills, in my opinion, are less important than the ability to reason around data, creatively transform and manipulate it, present it and visualize it, use the appropriate techniques, such as statistical and ML algorithms, and interpret the results.
I do not believe that being an ML algorithm guru is as helpful as being a good “storyteller” with data, and the latter may be and more difficult skill to master than the former. Developing these skills comes with practice and practice. It is hard to gather this knowledge from a book; it comes from having strong mentors, making mistakes, and trying different things.
What are the top challenges you currently face as a professional data scientist, and how do you go about tackling them?
I think the tools of the trade are evolving quickly and multiplying. It can be challenging to decide which tools are worth learning and which ones may fade. Also, I am seeing a solid trend toward throwing raw data to more and more robust algorithms rather than focusing on the core practice of understanding, transforming, and interpreting data. I think this has some drawbacks and many benefits, which are the reason for the acceleration in this direction.
Data Science is heading toward more becoming a practice with more “black box” models, which may work well, but provide little insight into the data.
The main trait of a good Data Scientist is curiosity and the ability to use data to discover relationships and patterns.
– Daniele Micci-Barreca
How important is the domain knowledge of the business/industry you’re in as a data scientist, and how did you acquire it?
People may have different opinions on this, but domain knowledge is fundamental to being a successful Data Scientist. For many years I worked as a consultant, and my job entailed jumping into a new business setting and producing meaningful insights and models. Every time I was getting into a new field, the biggest challenge to rapidly acquire expert-level domain knowledge in that field was to understand the meaning of the data and its processes.
When I started a project in an area where I had worked already, it made a world of difference. For this reason, successful Data Scientists tend to stick with a few related areas and become an expert in that field. For example, after literally ‘stumbling,’ and that means feeling like a fish out of water on the first few projects, into the area of tax analytics, I ended up starting a consulting business focused just on that. We specialized in helping tax agencies nationwide and internationally leverage Data Science to improve their tax compliance programs. Taxation is not a simple domain to master, but we gained tremendous credibility and success for the following decade once we did.
When you know the field, you can speak with your stakeholders in their language, understand the meaning of the data, and know-how to interpret your models.
3 words that best summarize how you learned ML and data science:
I learned Data Science through Humility, Curiosity, and Creativity.
From the beginning, I realized that there is a lot to learn in this field; I did not have all the skills I needed from my education, despite having studied for a long time. Furthermore, at the time, the educational opportunities available tailored to Data Science needs we more limited. You need to master data extraction and manipulation; you need to understand your Statistics, and Machine Learning, and most importantly, the domain.
It would be best if you focused on what you don’t know yet, rather than what you know well, which often are the tools and the algorithms, but you will always face a new domain to apply your know-how. Curiosity in Data Science is a crucial ingredient, as I said. Being quick at picking up on clues in the data or inconsistencies in your findings always pursues the truth and questions your results.
Creativity is expressed in how well you can use the data available to transform it to make it more informative for your models – in achieving this, domain knowledge can help. Simultaneously, sometimes being new to an area can also help bring some new perspectives on the use of data, or you may be able to leverage ideas and methods used in other domains.
Books: which books have helped you the most in your journey and why?
I think that everyone interested in Data Science at some point should read a book like “Elements of Statistical Learning” by Hastie, Tibshirani, and Friedman, or something similar because it provides the mathematical fundamentals for Data Science. For people who want to get a more general overview of how businesses can take advantage of Data Science, I like the book by Provost and Fawcett, “Data Science for Business”, very well done.
Courses: what courses/programs have you taken that have significantly contributed to advancing your career in data science?
By the time specialized online courses and programs in Data Science began to emerge, I had already developed quite a bit of on-the-job experience. Therefore I have not leveraged these resources as much. Of course, I am grateful for the Machine Learning training I received as part of my doctoral studies and my Computer Science foundation from previous studies.
I do recall, however, attending an excellent onsite seminar on Statistical Learning presented by Stanford professors Hastie and Tibshirani that I had the opportunity to participate in and their book, which I mentioned earlier. I also enjoyed viewing Andrew Ng’s class on Machine Learning.
What advice would you give to someone who wants to get into data science today?
If you are already working in a related field, find an opportunity to work on an actual project that involves using Data Science methods, especially if you can work in a support role with an experienced Data Scientist. If you are still in your academic journey, do not skip on educational content to provide you with a solid theoretical foundation. Give value to the domain knowledge and business interpretation of your findings. Do not fall in love too much with your most complex algorithms; often, the simplest solution works the best. Practice the Occam’s Razor.
What inspires you about working in Data Science?
The opportunity to drive decisions based on data insights and its unlimited applicability to improve business processes. Also, I think that the convergence of massive data assets (especially in images and text) and compelling learning methods such as Deep Learning have recently upped the game in terms of what is possible. The progress made in some areas, such are automated machine translation and image recognition, is incredible.
In many other areas, the Art and Science of crafting models following well-established methods are still the way to go, in my opinion. Nevertheless, regardless of “how,” I see widespread adoption across many industries genuinely, and I think this time is here to stay.
Tag one or two people in your industry who you would like to see answer these questions.
- Long Sun – Machine Learning Leader @ Uber
- Greg Makowski – Head of Data Science Systems @ FogHorn Systems, Inc
Share your experience of applying data science to solve problems for your customers, business or end users?
I have worked in this field for well over 20 years. Thus it would take a while to summarize my journey and the opportunities I had to improve business operations and innovate for my clients. I started in retail, and I had the fortune to have Walmart as a client and use Data Science to develop an innovative way to assort stores across the country using sales data. Then I shifted to fraud detection and prevention applications, initially helping online merchants fight credit card fraud – it was very challenging work.
From there, I started my own consulting business with another partner, and we developed deep expertise in tax compliance, developing models for tax agencies for applications such as audit selection, debt collection, and fraud detection.
After running that business for over 12 years, I moved back into the corporate world by joining Uber’s Payment and Risk organization. More recently, I came to Google, where I am part of the Google Commerce organization.
A data-driven culture means looking at the many decision points that drive the organization’s operation and thrive toward ensuring that every decision, big and small, is driven by data insights.
– Daniele Micci-Barreca
What challenges do enterprises face in getting models to impact the business?
For many organizations, that challenge has been to institute a data-driven culture across the company. That is what makes the difference. Implementing a model or two to improve specific areas of operation is not what drives a radical change. This is, by the way, typically done specialized consultants who design, build and then leave.
A data-driven culture means looking at the many decision points that drive the organization’s operation and thrive toward ensuring that every decision, big and small, is driven by data insights. Some organizations work that way, and Amazon has been one of those setting the standard for decades and many tech-centered companies. But many others, especially the more traditional ones, are still struggling in finding the right people, tools a core function within the company.