Multi-label Classification

4 min readAug 4, 2021

For most of the classification problems, we try to assign each unseen instance within the data with a single label. This means there is always a single class that each instance is belonged to. But in real-world problems, there can be scenarios where a single item or an instance belongs to more than one class, that is, there are multiple labels. This article will explain how multilabel classification works and what it looks like in machine learning. There are lots of real-world applications for multilabel classification. Some of the areas can be,

Text categorization
Audio categorization
Image categorization
Bioinformatics

Let’s take a Medium article as an example. An article published on this platform is usually related to more than a single subject area. So, if we want to identify all the subject areas that this article is related to, we have to predict all the classes that it belongs to. This is the basic idea of multilabel classification. In the literature, you can find two main methods for addressing multilabel classification problems.

Problem transformation method.
Algorithm Adaptation method.

Types of classification algorithms in Machine Learning

Let’s see what these two methods mean and what are the types algorithms belong to each of them. In the problem transformation method, the algorithm maps the multi-label learning task into one or several single-label classification tasks, where you can address each transformed problem as usual binary classification tasks. There are many problem transformation algorithms you can use. Let’s talk about a few algorithms that are widely used.

Binary Relevance
Label Powerset
Classifier Chains

Binary Relevance

In binary relevance (BR), you decompose the multilabel classification problem into a set of independent binary classification problems, where each corresponds to a label in the original problem. Once you have obtained the predictions for a particular unseen instance from each binary classifier, you can get the final multi-label prediction for that instance aggregating those binary predictions.

Label Powerset

Label Powerset algorithm transforms the multilabel classification problem into several multi-class single-label classification tasks. This algorithm considers all the unique subsets of multiple labels that exist in the training data (distinct label subset) as a class attribute for the new classification problem. Once the transformation is done, you can use any type of single-label learning algorithm to generate the predictions.

Classifier Chains

In the Classifier Chains algorithm, you first train a classifier by only taking the x variables as input data and then for the next classifier you will be using the x variables as well as the previous classifiers in the chain for training.

As the word explains itself, in Algorithm Adaptation techniques, we try to adapt the algorithms so that we can directly perform multilabel classification, instead of transforming the original problem into a set of single-label classifiers.

The Multi-label k-Nearest Neighbour (MLkNN) algorithm is one of the widely used adapted algorithms. The basic idea of MLkNN is much similar to the traditional KNN algorithm. First, you obtain the closest k instances for each new sample and their corresponding sets of labels. Once you obtain the k nearest neighbours and their label sets, you can define the label set for the new instance using the maximum posterior probability criterion.

In python, you can find many algorithms that can be used for multilabel classification tasks apart from the ones we discussed here. You can see scikit-multilearn if you want to learn more about these types of algorithms.

Multi-label Classification

Binary Relevance

Label Powerset

Classifier Chains

Written by Wimukthi Madhusanka