- His breakdown of the top skills required by data scientists and how to tackle the daily challenges they face.
- His actionable insights for anyone in data science, from beginners to seniors.
- His attitude towards failure and lifelong learning.
How did you first get into data science?
I have always worked with data, since high school, a long time ago in a galaxy far far away. Specifically, my background is astrophysics, with a Ph.D. in the subject.
I performed astronomical data analysis, modeling, and simulation for 25+ years, while also working on data repositories for space science satellite missions at NASA. I became very interested in the scientific discovery opportunities of very large datasets in the late 1990’s, at which time I began my quest into machine learning, data mining, and data science.
The motivation for me has always been discovery, from my early days until now. Observational data, modeled scientifically, can deliver insights, discoveries, and understandings of all things, not just the Universe, but the universe of everything.
How is data science used to create value in your current project(s)?
Data is a source of discovery: insights, understanding how things work, important patterns, meaningful relationships, innovation, new value, predictive models of things to come, and new opportunities.
Data science can reveal new properties of known things, identify previously unknown things with known properties, and discover new things with previously unknown properties. The things that we explore are all-inclusive: people, processes, products, events, and behaviors — in all domains and industries.
What are the key skills that you use every day as a data scientist, and how did you develop them?
The “everyday” skills for me are more about the talents of a data scientist: domain knowledge (data understanding), data collection (sensors, measurement, access, databases, data systems), communication (data storytelling, data visualization), curiosity (question generation), exploratory data analysis (data literacy, coding), inference (machine learning, statistics), inquiry (model-building, simulation), and scientific methodology (experimentation, assessment, validation, refinement).
What are the top challenges you currently face as a professional data scientist, and how do you go about tackling them?
The top challenges include the first-mile challenge (finding, accessing, cleaning, and integrating diverse distributed complex data sources) and the last-mile challenge (deriving actionable insights from all of those data).
The best approach to tackling these is to collaborate with others and work as a team, with diverse skills and talents being applied to the challenge.
Another big challenge is the enormous speed at which new technologies, tools, and techniques are being developed. It is very hard to keep up with all of the new developments and startups.
My approach to that is to stay very active on social media (Twitter, LinkedIn) and to subscribe to numerous data science newsletters, and keep learning every day.
How important is the domain knowledge of the business/industry you’re in as a data scientist, and how did you acquire it?
Domain knowledge is absolutely essential to me, as a more senior data scientist, since otherwise my contributions would be too “plain” and “generic” for my clients.
However, the junior data scientists can contribute substantially in the technical, mathematical, and coding aspects of projects without as much domain knowledge, but they will pick up that knowledge during the course of the project, and eventually they will need it more deeply as they progress further in their careers.
So, in general, domain knowledge is essential for data scientists to bring long-term success and value to their organizations and clients.
I acquired domain knowledge in many industries and applications through nearly 20 years of reading, consulting, public speaking, and engagement with many different people and organizations in a large variety of applications.
It’s all about curiosity and lifelong learning — never give up!
Do you create data science content?
I create a lot of data science content. I write articles and publish them in many different places. I am a co-author on a new book about AI that will be published in the coming year.
I was co-creator of the world’s first undergraduate data science degree program for a university — in that capacity, I generated curriculum and content for several different courses (introductory data science, scientific programming, databases, data mining, data ethics, modeling and simulation), plus I taught a graduate course for 12 years that was an extensive and comprehensive survey of all of those topics in one densely packed course!
I am creating a new course now on business analytics for MBA students that I will teach in the coming year. I produce and share content every day on social media. I give webinars, podcasts, conference talks, university lectures, and keynote presentations at events worldwide.
I have posted some videos on YouTube, but not much, though I have some big plans for that in the future. If I am not creating data science content, then I would stop learning — and that is not an option for me.
3 words that best summarize how you learned ML and data science:
Reading, Lecturing, Social Media Engagement
People: who are some inspiring data scientists and people in AI that you follow?
Andrew Ng, Bill Schmarzo, Lillian Pierson, Cassie Kozyrkov, Jared Lander, Jason Brownlee, Adrian Rosebrock, Mico Yuk, Hadley Wickham
Books: which books have helped you the most in your journey and why?
That list of books would be extremely long for me, because I have been helped by many books, but one in particular stands out from my early days: “Data Mining Techniques for Marketing, Sales and Customer Relationship Management”, by Linoff and Berry. I found that this book gives an educational and informative explanation of many of the most common machine learning algorithms, within the context of practical and common business applications.
Courses: what courses/programs have you taken that have significantly contributed to advancing your career in data science?
My education was long ago, but nearly all of my physics, mathematics, and scientific programming courses have been essential to my career success, including my current data science activities since those skills, techniques and scientific aptitudes are relevant every day.
My astronomy courses, especially in graduate school, were essential to me falling in love with scientific discovery, understanding data, developing modeling skills, and pursuing science as a career for the past 4 decades. And I did take an online Python course a few years ago that was very helpful.
Conferences: which data-science-related conferences that you attended have you particularly enjoyed and why?
I attend quite a few conferences each year. I don’t want to say that one is better than the other because they all have different ways of focusing on things and bringing value.
However, the ones that I attend almost every year (usually twice per year) are the ODSC (Open Data Science Conferences) events. Another one that I have attended more than once are the Data Natives conferences.
I do love specialty conferences that focus on specific industries, because I then can learn a lot about that industry, including: marketing, smart energy grids, health analytics, geospatial intelligence, finance, cybersecurity, high-performance computing, and more.
I was a speaker at the conferences that I named (ODSC and Data Natives).
What are the top 3 resources that you use to keep up with the advancements in the field?
Twitter and LinkedIn are my top resources. But I subscribe to many newsletters. There are too many to name. But, I find that a daily search on Twitter, or LinkedIn, or a standard search engine will yield enormous amounts of data science, AI, machine learning, and data analytics content and fun stuff to learn every day.
What is the biggest improvement that you introduced in the last 12 months that has considerably improved your workflow?
I reminded myself that I don’t need to learn everything that is new (especially now, when there is a deluge of new stuff all the time). I can rely on others to fill the gaps in my knowledge when I need it.
What advice would you give to someone who wants to get into data science today?
(1) remember that failure is one of life’s most important learning experiences;
(2) during your most boring and mundane work experiences, keep moving forward and doing your best, because every one of those experiences is building your foundation to a successful career;
(3) never stop learning; and
(4) the best way to learn anything is to teach it to others.
Your favorite thing about working in data science:
Learning new mathematical algorithms is one of my two most favorite things, the second thing being the application of that algorithm for discovery of new things from data.
If you weren’t working in data science, you would be:
continuing my lifelong career in astronomy and astrophysics, studying the dynamics and evolution of galaxies in the Universe.
What inspires you about working in Data Science?
I am inspired by the immense opportunity to discover new things, new insights, and new understandings in multiple application domains.
Tag one or two people in your industry who you would like to see answering these questions.