We thank Miguel González-Fierro, Senior Data Scientist Lead from Microsoft, for taking part in this discussion as part of the Data Science Interview Series and sharing several valuable insights, including:
- Key skills used as a Data Scientist
- Exciting breakthroughs in Data Science
- Challenges and Inspirations as a Data Scientist
- The path ahead with low code and AutoML
I’m a firm believer that every one of us is born with some particular tendencies and preferences. This is the inner voice that leads us to what we are called to become, our vocation.
The beginning: Getting into Data Science
How did you first get into data science?
It was at the end of Engineering school. I studied Industrial Engineering in Madrid, with a specialization in Electrical Engineering. Very soon I realized that I didn’t want to spend my career in electrical engineering. In the school I studied, they taught us a wide range of subjects like physics, mathematics, mechanics, electricity, electronics, finance, programming and others, however, the subject that I liked the most was automation. At the end of the 5-year degree program, we all had to do a project, which on average would take people 6 months. The area I choose was robotics and machine learning, even though I had very little knowledge of the subject.
My project was about a pair of small humanoid robots transporting a ladder, it was particularly difficult because the robots had to synchronize their movements and maintain equilibrium. I liked the project so much that I spent maybe 10 to 12h a day working on it. In a month, I had finished the project and graduated. Soon later, I was hired in that same lab to work as a researcher in robotics and machine learning, where I did my PhD.
Reflecting on my early years, it wasn’t until I turned 23 I found my path in life, I knew that I liked math and physics more than literature or history, but there was nothing that I was particularly excited about. The moment I found that area that really excited me, it made the whole difference. I’m a firm believer that every one of us is born with some particular tendencies and preferences. This is the inner voice that leads us to what we are called to become, our vocation.
If we follow that voice, we become our authentic self and we can develop our capabilities at their maximum. If we refuse to hear that inner voice, distract our attention, or get bribed by jobs we don’t like in exchange for easy money, we just defraud ourselves and become a shadow of what we are called to be. That’s why it is so important to find one’s call. For me, it was robotics, AI and machine learning.
Key Skills and Challenges as a Data Scientist
One of the top challenges is effective communication. When you are in a company that works with multiple teams and customers, it is key to be able to provide clarity and simplicity.\
What are the key skills that you use every day as a data scientist, and how did you develop them?
I would divide the skills into two groups, technical skills and interpersonal skills.
Technical skills include programming, machine learning, operationalization, database management, distributed computing, web development and applied research, among others. The way I develop them is by using a T approach. I try to have a wide understanding of all these skills and go very deep in those areas that interest me the most.
There are multiple ways I get knowledge in the areas that interest me. For example, I read a research paper at a minimum every two weeks. While I’m working on my day-to-day, I make sure I dedicate some extra time to understand very deeply the system I’m using, I don’t stop when something just works. Another way for me to learn is by blogging about an area I like to understand deeper.
Teaching others is one of the best ways to teach yourself, so I use my blog to develop articles and projects. This is a trick I play with on my own mind, creating these posts force me to study the subject much deeper. In addition, a very quick way to learn new knowledge is by surrounding myself with people that know more than me. Of all the ways I use to develop my expertise, learning from experts is the best and fastest by far, and it has paid off during my life.
Interpersonal skills include influence, selling skills, marketing, public speaking, mentorship, effective communication and simplification. In recent times, I’ve been focusing more on these areas. The way I develop these skills is by acquiring knowledge from books, podcast, YouTube videos and courses, and by a process I call iterative validation.
Iterative validation consists of performing an action, for example proposing an idea to my team or explaining something to a customer from a particular angle, then getting the feedback of the action, trying to get both quantitative and qualitative feedback, and then do a retrospective on the action, so I can validate what messages, actions or ideas are more effective.
A meta-action I follow is what I call life optimization. I optimize absolutely every action I do. I think time is the most valuable asset we have, so I really hate wasting it. I’m constantly thinking about how I can speed up tasks, I’m a fan of multitasking, for example, and I choose very carefully what tasks I can do in parallel and what time of the day I can perform them. Another trick I use for my work is my codebase, which contains small pieces of code I can reuse and that speed up my daily work.
Another strategy that helps me to create presentations faster is having a PowerPoint of what I call two things, three things, four things, etc. For example, the deck three things has maybe 30 slides, each of them with graphs of triplets, so if in a presentation I want to show an idea that has 3 components, I can quickly find a design that I can use.
What are the top challenges you currently face as a professional data scientist, and how do you go about tackling them?
One of the top challenges is effective communication. When you are in a company that works with multiple teams and customers, it is key to be able to provide clarity and simplicity. The way I face this challenge is by iterating, when there is an idea that is being discussed I try to think about ways of simplifying it or crafting a message that multiple parties can understand, no matter their level of expertise.
Another important challenge in my opinion is technology adoption. Even though data science and AI has been around for a while, there are still many companies that don’t understand how transformative this technology is. I believe that jobs are not going to be replaced by artificial intelligences, I think that people using AI will replace those people not using it.
What is the biggest improvement that you introduced in the last 12 months that has considerably improved your workflow?
Maybe a little bit earlier than 12 months, but the most impactful methodology was something we called evidence-based software design. It is a way of making decisions based on evidence, as opposed to personal opinions or preferences, that reduces team discussions, increases execution performance and improves product quality.
It has five parts: ask an answerable question, find the best evidence, critically appraise the evidence, apply the evidence, and evaluate performance.
How do you keep current with the new developments? Top 3 resources?
I use Mendeley to collect and read papers and documents, Linkedin and maybe some other sources for getting news about AI and Trello to organize my day-to-day.
The future of Data Science
What have been the most relevant breakthroughs in data science in the last 1-2 years, and what trends do you see emerging going forward?
To me is the enablement of self-supervised learning methodologies and, as the most prominent architecture, the denoising autoencoder based on Transformers, BERT (and all the algorithm family that came from it). I see self-supervised learning as the main path to improve the AI field in the next years. This methodology can open the path for real-time closed-loop systems to start developing.
Most of the current AI systems are open-loop and non-real-time, in the sense that a machine learning model is trained in one phase and then productized to score on a different phase. Ideally, we want systems that can operate in real-time for both learning and scoring. Self-supervised learning is a technique that simply takes better advantage of the data available, so it can enable these closed-loop systems.
Do you think that going forward, Data Science implementation would become more low code or no code?
I think so because the trend is towards simplicity. When I started working in the field, a lot of data science was done in C++. One of the most popular machine learning libraries was OpenCV for computer vision, whose initial API was written in C, and later in C++. I lived the transition from C++ to Python. This facilitated dramatically its adoption and the development of other libraries.
I see a step forward towards simplicity thanks to low-code or no-code tools. They will provide another big jump in adoption.
With Automated ML and PaaS services penetrating the industry, do you think this will replace bespoke solutions?
I actually think that the future will bring customized solutions to the highest atomic possible level. However, the way to reach those goals would be through intelligent automation instead of consultancy services.
What inspires you about working in Data Science?
Data science, machine learning, AI and robotics are some of the top technologies of our lifetime. To me, it is exciting to explore and apply them.
3 words that best summarize how you learned ML and data science:
I don’t understand