The inner operations of advanced machine learning models are nebulous to the average business user, regulator, or customer impacted by the outputs of this form of statistical Artificial Intelligence.
At best, such laymen are vaguely aware that neural networks, for example, function in a manner that’s somewhat similar to how the human brain does. The most sophisticated may have heard something about the notion of parameters; most are blissfully unaware of the presence of hyper-parameters or their import to applications of deep learning.
“Basically, in [these] machine learning models, there are two sets of parameters,” explained Suman Bera, Senior Software Engineer at Katana Graph. “One set of parameters you are trying to learn through your machine learning algorithm. And, there is another set of parameters which are predefined. You are not trying to learn them. These are called hyper-parameters.”
Hyper-parameters are invaluable to devising accurate predictions from advanced machine learning models, which are oftentimes implemented with the compute and scale of deep learning.
Those predictions, of course, can solve almost a limitless variety of business problems, from fortifying cyber security defenses to determining which customers are most advantageous to issue loans.
The crux of the utility of the role hyper-parameters play in advanced machine learning is successfully tuning them. Since they’re predefined, it’s up to data scientists to select the proper numerical value for hyper-parameters to help furnish accurate model predictions. “Typically we have to do some sort of hyper-parameter tuning to figure out the values of these hyper-parameters that have not been learned,” Bera noted.
The specific hyper-parameters users need to tune are determined in part by the business problem they’re building models to solve, which might involve spurring the creation of pharmaceuticals to reduce time to market for delivering consumers new drugs. “There is more than one hyper-parameter,” Bera revealed. “Threshold is just one of them, but in general machine learning models there are many other hyper-parameters that are activated.” Ensuring hyper-parameters are optimally tuned to accomplish a business objective, such as detecting the presence of intruders via computer vision, is critical to building accurate advanced machine learning models.
A Threshold-Based Approach
According to Bera, one of the ways to properly calibrate hyper-parameters is via a similarity graph framework designed to detect the similitude between nodes at immense scale. Focusing on the threshold hyper-parameter, for instance, would deliver pivotal information for pairing nodes when seeking to construct an advanced machine learning model to predict molecular properties, which expedites the development of pharmaceuticals. In this use case, the threshold of the respective molecules is the basis for pairing the nodes. Therefore, when using a training dataset involving 20,000 molecules, “I will keep maybe 2,000 [molecules] separate from that training process,” Bera said. Next, users would try a variety of different thresholds (each with separate values) at which to tune this hyper-parameter.
“Based on these… different thresholds I will see which model has the higher accuracy on these 2,000 [molecules] that I did not use during my training,” Bera commented. “And, based on whichever graph and whichever model gives me the best accuracy, I will select that particular threshold. So, this is the process of hyper-parameter tuning.” For the pharmaceutical use case, establishing the proper value of the threshold is necessary to link nodes that either meet or surpass that value. In theory, however, users can tune any hyper-parameter via this same approach to craft advanced machine learning models with the accuracy required to address their respective business issues.
Deep Learning Implications
Hyper-parameter tuning is fundamental to manually developing neural networks that are deployed with the computational power of deep learning. However, deep learning involves more than just neural networks. There are a variety of different models and algorithms—including decision trees and combinations of models via ensemble modeling techniques—to which deep learning is applicable. These non-neural network techniques also involve hyper-parameters. Tuning them can play a vital difference in the accuracy levels of their ensuing models.
Bera referenced a data science competition (the molecular property prediction competition of Therapeutic Data Commons) in which an assortment of techniques was used to create models with the best predictive prowess. “There is a model called XGBoost; it’s also a popular model,” Bera admitted. “So, there is a recent entry where the approach, instead of using graph, you just look at the properties of the drug and you try to predict [new properties]. It seems like this approach, with the right set of hyper-parameters, would work as well.”
Featured Image: NeedPix