Restoring Faith in Data Science with Unsupervised Learning

Despite several advances in data science designed to expedite this discipline and make it more self-service oriented, data scientists are not the most popular employees at many organizations.

“We were talking to a senior director level person at a Dow 30 sized company and I think he summarized it best when he said data scientists say the most and contribute the least,” Kyndi CEO Ryan Welsh recollected. “I think that’s a good way to summarize the sentiment in enterprise AI today.”

Specifically, that sentiment expresses the reality that many large scale Artificial Intelligence projects—those for core business use cases—traditionally take so long to implement that there’s a mounting sense of disillusionment about the data scientists responsible for them, which naturally extends to AI itself.

“People have been promised AI solutions to increase productivity and these things get stuck in the lab by a bunch of people pontificating about how AI’s going to transform the world,” Welsh explained. “It’s kind of like great, but can you help me do this simple business thing first?”

By accessing AI approaches underpinned by unsupervised learning, business users can not only swiftly employ AI to do their jobs better via natural language cognitive search, but also restore faith in the data scientists that either built or are calibrating these solutions for their particular use case.

AI Time Journal Resources
Are you learning data science?

Check out what books helped 20+ successful data scientists grow in their career.

Moreover, this development is part of a larger AI movement in which “building prepackaged business solutions atop a platform specifically for [the business] is a major trend that will start, as well as accelerate, in 2022,” Welsh predicted.

Unsupervised Learning

The variety of unsupervised learning approaches (including different facets of clustering, topological data analysis, Principal Component Analysis, and dimensionality reduction) are primed for hastening the time to value of AI solutions—especially those involving natural language technologies and conversational search. Unsupervised learning doesn’t require annotated training data like supervised learning approaches do, which decreases the time and cost of implementing tools relying on the former. Some even posit that unsupervised learning’s more authentic than supervised learning is because the one exclusively learns based on patterns in the data, whereas the other is predicated on manmade labels. For the foregoing search applications, “because we have a unique way of doing unsupervised learning, we can put these systems into production very quickly,” Welsh disclosed.

Furthermore, unsupervised learning can be a bit of a guessing game when refining models to ensure they’re as accurate as possible, which is responsible for many of the delays with enterprise AI deployments. Conversely, by supplementing this approach with unsupervised learning and symbolic reasoning, the reality is the opposite. “When u have this generic UI that people can customize for their business problem and the AI can be tuned and optimized very quickly, it can start working right away for a simple use case,” Welsh mentioned. “If it needs to be tuned a little bit for the domain specific language, that could take a single day to build that model, versus nine months.”

Supervised Learning

As previously denoted, unsupervised learning techniques support cognitive search use cases very well when paired with supervised learning methods. In competitive natural language technology search applications, however, organizations don’t have to actually train the supervised learning models to accelerate time to value because the vendors already have. “Those supervised models are generic,” Welsh remarked. “You can train a supervised model like BERT to extract people, places, and other types of entities from data in a highly active way. Given the syntax of the English language, we’re able to extract names regardless of what domain it is because names are in the same spot in sentences, whether it’s in finance or whether it’s in pharmaceuticals.”

Unsupervised learning then maximizes the value of the entity extraction within a specific corpus, for example, by arranging entities on a knowledge graph to build a knowledge representation for symbolic reasoning—which is perhaps the original and most accurate means of performing Natural Language Processing. With this approach, organizations don’t have to rebuild the annotated training data delays of supervised learning when constructing the concrete knowledge base for dynamic question and answering for search. “A lot of people will say I built a human authored representation for which I then extract things and place them in that human authored representation,” Welsh mentioned. “The trick is to not have to require that human authored representation.”

Keeping The Faith

By using unsupervised learning to construct the knowledge base for their cognitive search deployments, organizations don’t need humans to create that knowledge base. That knowledge, of course, is requisite for a hierarchical, or taxonomical, understanding of language for nuanced question and answering in real time.

Therefore, firms adopting natural language search solutions informed by unsupervised learning can expedite their deployments of this aspect of Artificial Intelligence in hours or days, instead of waiting months or years for data scientists to craft these capabilities by hand. The result is a greater appreciation for data scientists, data science, and the AI both engender.

Featured Image: NeedPix

Contributor

Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance, and analytics.

Opinions expressed by contributors are their own.

About Jelani Harper

Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance, and analytics.

View all posts by Jelani Harper →