Interview with Kaushik Raha, Vice President, Data Science Health Content Operations, Elsevier

Kaushik Raha has been a leader of Data Science teams at various health care organizations. He has a clear vision when it comes to building high performing data science teams and leveraging AI for accelarating development of healthcare over the world. 

We thank Kaushik Raha from Elsevier for taking part in this interview and answering the questions in great detail. He shared several insights from his years of experience in Data Science & Healthcare including:

  • Skills he seeks while hiring data scientists
  • Essential aspects of data science solutions for healthcare
  • His roles & responisbilites at Elsevier
  • Learnings from his data science leadership career

I often say if you want to build a boat, stay close to the water. The ability to continuously develop code that can be delivered and tested in a certain environment, iteratively and reliably is, therefore, key to success.

Kaushik Raha

AI also has a role to play in accelerating digitization as increasingly machine learning, deep learning, and computer vision algorithms are being deployed for digital transformation in healthcare

Kaushik Raha

Building a high performing team is similar to optimal human resource utilization exercise and needs a leader with a high level of emotional intelligence.

Kaushik Raha

Let’s dive right into the conversation. Grab your coffee and enjoy this conversation!

What convinced him that Data Science is ideal for his career

CK: At what point did you realise that you wanted to pursue a career in data science (data & AI), and how did you get into it?

KR: I would say it was a specific moment in the year 2013 when I realized that I wanted to pursue a career in data science. At that time, I was employed at the pharmaceutical company GlaxoSmithKline (GSK) as a Computational Scientist and I was working with R&D teams engaged in drug discovery. I was already working with big datasets and using machine learning and advanced algorithms to solve problems. Around this time a data science opportunity in Johnson & Johnson (J&J) presented itself which piqued my interest in the field. But I turned down the job offer as I was not sure it was the right career move for me at the time. However, what followed was intense interest from my side in all things data science and I realized that my training and experience up to that point had uniquely prepared me to pursue a career in data science. 

AI Time Journal Resources
Are you learning data science?

Check out what books helped 20+ successful data scientists grow in their career.

You see, I had studied Biophysics in my undergrad and was working with data and algorithms ever since. I followed that up with a PhD in Computational Chemistry and Chemical Biology from Penn State University. After my PhD, I did a postdoc at UCSF in San Francisco, and then joined GSK where I was working with R&D teams and applying computational algorithms for drug discovery. By this time, I had years of experience in highly quantitative as well as inter-disciplinary fields. Ultimately, data science is also a highly quantitative and inter-disciplinary field. So, finally, I was convinced that it was the right career choice for me. Fortunately, another data science opportunity came up at J&J and this time I accepted the offer. In 2014 I started in a new role as Principal Data Scientist at J&J.


Key skills and aspects he looks for when hiring

CK: What skills and attitude do you look for when hiring data scientists?

KR: I will answer the attitude part first as I think it is very important. When hiring data scientists, I look for people who have the ability and passion for collaboration. They need to have the right attitude to be able to work in a highly matrixed environment, work with people from different backgrounds such as product design, content, engineering, analytics, UI/UX, etc., and be comfortable working in such large and diverse teams. I firmly believe that big problems are solved by a team effort and being a team player is a key attribute for a data scientist. Of course, what goes along with it is effective communication. So, superior communication skills and an amiable personality are also important.

Lastly, I think empathy goes a long way towards a successful team dynamic, the ability to listen and empathize are key attitudes that I look for. These attributes are not a substitute for technical expertise but actually are complementary and necessary for success in my opinion, in addition to technical skills and expertise.

With respect to skills, of course, I look for key data science skills like the ability to code in different languages e.g., python, C/C++, scala, java, spark, R, etc. However, python is by far the language of choice for us so we administer a python coding test for hiring in my team. We assess the candidate’s ability for writing code, not for some canned computer science problem, but to solve real-world problems that we face ourselves. Level of comfort with handling big data is also something that I look for and I prefer real-world experience working with large datasets when hiring data scientists. That is one of the reasons I look for people with a quantitative background and educational training, with an emphasis on advanced degrees such as a PhD. So far the majority of data scientists in my team hold a PhD. They hold PhDs in Bioinformatics, Neuroscience, Applied Math, Computer Science, Linguistics, & Physics. I find PhD training to be valuable as you spend time in a PhD solving a really hard problem working alongside a team (most of the times) which prepares you well for a career in data science.


His key responsibilities and leadership roles

CK: You are currently leading Smart Content at Elsevier. What are your key responsibilities in this role?

KR: Well, currently I am the Vice President of Data Science, Health Content Operations in Elsevier. However, I started at Elsevier as Director of Smart Content. In my current role, I own the development and deployment of data science and AI-first solutions, to lead Elsevier’s content transformation efforts and enhance products and platform capabilities that cater to healthcare, health education, life sciences, and pharma. I head a global team of around 60 data scientists and domain experts in Health & Life Sciences, who are creating knowledge from content, at scale.

My current focus is on Elsevier’s products in Health markets that span clinical reference, search, and discovery, education, and advanced clinical decision support. My team routinely applies AI – deep learning/machine learning, natural language processing (NLP), computer vision – and other data science capabilities to build solutions and support Elsevier’s products in precision medicine, computer-aided diagnosis, and point-of-care. My team also supports Elsevier’s clinical and nursing education business and we apply cutting-edge data science and AI to advance our nursing education product portfolio.

Source: Unsplash

Additionally, I also lead cognitive automation activities from an operational efficiency standpoint and I am a member of the Data Science Leadership Group which sets the data science strategy to support our core businesses and develops career frameworks for data science and analytics across Elsevier.


Areas of the impact of Data Science in Healthcare

CK: How do you see the role of data science in Healthcare? Which areas or use cases are the most promising?

KR: I see data science playing a very prominent role in Healthcare. It has already established itself and made a significant impact on Healthcare, but I believe, the best is yet to come. Healthcare is inherently a data-rich domain. (Big) data is everywhere in healthcare, whether it is patient electronic health records, medical imaging, insurance claims, data from wearables, or biomarkers and genomic and other ‘omics’ data that will be routinely collected at the level of individual patients. As such, healthcare data is projected to grow faster than any other sector and is likely to have a compounded annual growth rate (CAGR) of greater than 30% per cent through 2025.

Alongside this, we are also starting to see a convergence of data science and Artificial Intelligence (AI) technologies which have shown a lot of success in solving hard, previously intractable problems in the healthcare and life sciences, over the past couple of years. It is no surprise that big technology companies that have been pioneers in data science and AI are focused on and have made big bets in healthcare. This trend will accelerate in the coming years and we will witness data science and AI becoming a key pillar of healthcare, spanning the entire value chain.

As for the use cases, broadly speaking, it will be in areas of improving patient outcomes and reducing the cost of care. AI will also accelerate drug discovery and clinical development cycles which will have a big impact on healthcare. There will be a multitude of use cases within these areas which will be driven by data science and AI. Most promising from my standpoint are use cases where AI is the driver of evidenced-based and personalized medicine, helps reduce diagnostic error and ensures uniformity of care, and addresses problems like physician burnout and optimal use of healthcare resources.


Importance of Operationalization and Scalability in Data Science

CK: What lessons have you learned about developing scalable data science and analytics solutions?

KR: One of the key lessons that I have learned about developing scalable data science and analytics solutions, is that operationalization of data science needs to address from the very start of the project, for the solution to be scalable. Sometimes data scientists fall into the trap of developing solutions without giving any consideration to operationalization. This can take the form of solving the problem at a much smaller scale, using unrepresentative training data sets or choosing the wrong stack to build machine learning models that have high latency and don’t scale well.

Data scientists need to understand the continuous integration/continuous deployment (CI/CD) process very well and do the algorithmic development work within that framework. I often say if you want to build a boat, stay close to the water. The ability to continuously develop code that can be delivered and tested in a certain environment, iteratively and reliably is, therefore, key to success. This is easier said than done and requires a strong collaboration between the data science and engineering teams from the beginning. They need to be on the same page on how a specific solution will be operationalized and agree on the CI/CD pipeline upfront. I have learned that a lot of planning needs to go into this and communication between teams is essential or the project can quickly go off the rails.

Often clarifying roles and responsibilities in the project team are also essential for success. What I mean by this is data scientists should be concerned with building the best model from the available data, that fits the product requirements and engineers should be concerned with the deployment of these models and ensure scalability. Due to the commoditization of machine learning sometimes this is not appreciated and the experience and expertise needed to build a good model is underestimated.


AI for healthcare development and digitization

CK: How can AI be used to improve healthcare in growing and backward countries?

KR: AI is increasingly playing a pivotal role to improve healthcare in growing and backward countries. Just like wireless technology brought some level of parity to the communications sector between developed and developing/underdeveloped countries, AI seems poised to play a similar role in healthcare. The impact of AI will be felt both in improving patient outcomes and decreasing the cost of care in these countries. One of the big challenges that developing and underdeveloped countries face today is a shortage of trained medical professionals such as doctors and nurses, and their concentration in and around big cities and economic centres.

Source: Unsplash

AI algorithms are getting more and more sophisticated in assisting diagnoses and reducing the diagnostic error rate thereby impacting the cost of care directly, but also in the context of backward countries, this technology could address the physician shortage and burnout problem. Coupled with digital reference content and telemedicine, AI can help transform healthcare in such countries. This is certainly not without challenges.

One of the precursors of AI is digitization and backward countries are still lagging in digitization. Indeed in such countries, medical records are still on paper, physicians’ notes are often handwritten, and imaging is still on film. However, I believe, AI also has a role to play in accelerating digitization as increasingly machine learning, deep learning, and computer vision algorithms are being deployed for digital transformation in healthcare. Developing countries will specifically benefit from this phenomenon. In my team, we have done a lot of work in this domain. At this point, I want to give a shoutout to my employer, Elsevier, for having a large footprint in this area. Specifically, Elsevier has deep connections with the Indian healthcare sector and we’re, in partnership with Indian organizations and government, working on myriad problems that the country faces in healthcare, and we’re leveraging AI to solve these problems.


His learnings from leading data science teams

CK: Over the past years, you have led teams of data scientists and domain experts. What are the most important lessons that you have learned over about building and leading high-performing data science teams?

KR: One of the most important lessons that I have learned is that high performing teams are mission-driven and have an inherent need to connect the problem that they are working on to a larger mission, or a higher purpose. Hence to be an effective leader of high-performing teams, the mission and vision need to be articulated very clearly and reinforced often. This also goes a long way in building high performing teams too, as these high performing individuals are looking for more than just a job or a paycheck. I have been fortunate to work in the healthcare and pharmaceutical sector, for companies like Elsevier, which helps researchers and healthcare professionals advance science and improve health outcomes for the benefit of society. So it is relatively easy for me to articulate a mission and vision that resonates with talented data scientists and domain experts who join my team.

Source: Unsplash

The other important lesson is that high performing individuals and teams are looking to solve hard problems and need to be challenged accordingly. I have seen some amount of impatience in high performing individuals and they need to be constantly challenged with hard problems, or they move on. It is the job of the leader to bring such problems to the data science teams while ensuring alignment with the business and long-term enterprise priorities. Sometimes it is not easy to achieve.

Finally, to build a high performing team, a data science leader needs to have a good understanding of what the individual team members are most passionate about. What makes them tick and what are the tasks and problems that they would go above and beyond for. Building a high performing team is similar to optimal human resource utilization exercise and needs a leader with a high level of emotional intelligence.

Why Data Scientists should think like an artist

What unusual or absurd thing do you practice or advocate for in your role as a data science leader?
KR: The unusual or absurd thing that I advocate data scientists is to think like an artist. While scientists by nature are highly analytical and very good with numbers and mathematical or statistical concepts, oftentimes they are unable to communicate these insights to lay people who don’t understand technical jargon. That’s why I advise data scientists to channelize their inner artist to communicate better. I ask them to tell a story about the analysis they did, or make a beautiful interactive visualization, or build a compelling app to show the practical value of their models.  And to do this they need to get out of their scientific comfort zone and use their artistic sensibilities. I find that this also leads to better product design and consequently buy-in from stakeholders.

Future trends for AI

What have been the most relevant breakthroughs in data science impacting our world in the last 1-2 years, and what trends do you see emerging going forward?
KR: From my vantage point the most relevant breakthroughs have come in deep learning, computer vision, and language modelling (NLP). These breakthroughs are having a profound impact on health sciences. Biology and medicine will emerge as fields where the application of data science and AI will have far-reaching consequences. The most obvious trend that will continue in language modelling is the “transformer arms race” which has led to the development of state-of-art language models such as BERT and most recently GPT-3. These breakthroughs are pushing the boundaries of language understanding and comprehension by machines, which I consider to be the final frontier which will lead us to artificial general intelligence (in the distant future).

Similarly, deep learning’s impact on solving unsolved biological and chemical problems has been stunning. Case in point is DeepMind’s AlphaFold which has shown tremendous success at solving the protein folding problem. Having worked on this problem during my PhD, I believe this is a true breakthrough which will eventually accelerate the discovery of pathbreaking first-in-class therapeutics. Other deep learning techniques, for example, generative adversarial networks or GANS are also making inroads in solving myriad problems in biomedicine. Data science and AI are also impacting clinical medicine and will underpin future developments in decision support and delivery of care. AI will play a major role upstream in the digital transformation of healthcare, as well as the downstream synthesis of evidence from clinical trials, real-world evidence, biomedical literature, and molecular and ‘omics’ data, to improve outcomes.

Associate Editor

Chayan is a creative Data Scientist with an eye for details. An everyday learner and blogger, he has extreme eagerness to share knowledge and support the Data Science community. Connect with him on LinkedIn to get in touch and don’t forget to check out his Medium blogs.

Data Science | Machine Learning | Tech Blogger – upGrad

About Chayan Kathuria

Chayan is a creative Data Scientist with an eye for details. An everyday learner and blogger, he has extreme eagerness to share knowledge and support the Data Science community. Connect with him on LinkedIn to get in touch and don't forget to check out his Medium blogs. Data Science | Machine Learning | Tech Blogger - upGrad

View all posts by Chayan Kathuria →