CRF is intended to do the task-specific predictions i.e. we have the input X (vector) and predict the label y which are predefined.

CRF is a probabilistic discriminative model that has a wide range of applications in Natural Language Processing, Computer Vision and Bioinformatics.

The conditional random field is used for predicting the sequences that use the contextual information to add information which will be used by the model to make correct predictions.

**Table of contents : **

- Introduction
- Generative versus Discriminative Models
- CRF for sequence models
- The mathematical background of CRFs
- Applications of CRFs

**1. Introduction:**

Let’s say we want to build an application where we predict an output vector y = {y0, y1, ………, yn} of a random variable given a feature vector X. A famous example in NLP is part of speech (POS) tagging in which each variable yi is the POS tag of a word i and input variable X i divided into features {X1, X2, ……….., Xn}.

In such a type of problem, the goal is to not only predict the output vector correctly but also the sequence of predictions matter a lot, if not more. Conditional random fields come to rescue as it uses word sequences.

*AI Time Journal Resources*

**Are you learning data science?**

Check out what books helped 20+ successful data scientists grow in their career.

**2. Generative versus Discriminative Models:**

Generative models are models that describe how a label vector y can probabilistically generate the feature vector X. As a simple example, Naive Bayes which is a very popular probabilistic classifier is a generative algorithm.

On the other hand, discriminative models describe how to take feature vector X and assign them an output vector y. In simple terms, the discriminative model models the decision boundary between different classes. The common example of a discriminative model is logistic regression which maximizes the likelihood estimates.

**3. CRF for sequence models:**

The power of CRF models comes to rescue when the model predicts many variables that are interdependent. To understand this we discuss the name entity recognition (NER) problem from NLP. NER is a problem of identifying the entities from the text and classifying the entities into a person, location, organization and so on.

The main challenge behind the NER problem is that the entities that are too rare to appear in training set due to which model must identify based only on context. The naive approach to this problem is to classify each word independently. The main problem with this approach is it assumes that named entity labels are independent which is not the case.

For example, Maharashtra is a location, while the Maharastra Times is an organization.

To tackle this problem we use CRFs where input data is sequence and output is also a sequence and we have to take the previous context into account when predicting on a data point. For this purpose, we will use a feature functions that will have multiple input values.

The feature function is defined as follows:

### Join our weekly newsletter to receive:

- Latest articles & interviews
- AI events: updates, free passes and discount codes
- Opportunities to join AI Time Journal initiatives

**4. The mathematical background of CRFs:**

In Conditional Random Fields, we calculate the conditional probability i.e.

p (y | X)

The probability the of output vector y given an input sequence X.

To predict the proper sequence we need to maximize the probability and we take the sequence with maximum probability.

As discussed in the previous section we will use the feature function f. The output sequence is modeled as the normalized product of the feature function.

Where Z (X) is the normalization.

λ (lambda) is the feature function weights, which is learned by the algorithm.

For the estimation of the parameter λ, we will use maximum likelihood estimation. Thus the model is log-linear on the feature function.

Next, we apply partial derivative w.r.t. λ in order to find argmin on the negative log function.

For the parameter optimization, we use iterative approach i.e. gradient descent based method. The gradient update step for the CRF model is:

**5. Applications of CRFs:**

CRFs have the ability to model the sequential data that can be used in Natural Language Processing, Computer Vision and in many areas. One of the famous application of CRFs in NLP is Named Entity Recognition where we predict the sequence in which they are dependent on each other. There are other various types of CRFs such as Hidden CRF used for gesture recognition, Dynamic CRF for labeling sequence data, Skip Gram CRF for activity recognition and so on. Another application is gene prediction.

Contributor

*Pillai College of Engineering | Machine Learning enthusiast*