Can a machine learn how to classify dog breeds?

With the superpower from Convolutional Neural Networks (CNN), the answer is yes! But the most interesting part is to understand how it works. This post will help yourself familiar with CNN…

Smartphone

独家优惠奖金 100% 高达 1 BTC + 180 免费旋转




A short introduction to distance measures in Machine Learning

Finding similarity among data-points is an important task when we try to find clusters from a given data. Unsupervised learning algorithms like K-means believe on the theory — ‘closer the points more similar they are’ as there is no explicit measurement for similarity.

So, the clustering, the classification of observations into groups, needs some methods for computing the distance or the (dis)similarity between each pair of observations. The result of this computation gives an idea about the dissimilarity among the observations (why dissimilarity instead of similarity? think yourself… ) which is called distance matrix.

In our Euclidean world of mathematics we generally measure the distance between two points (remember that you can represent a point with a vector or a matrix) with a straight line joining them. The length of the straight line represents the shortest distance between these two points. This is also called as Euclidean distance. Apart from the minimum distance, there are multiple ways of measuring distance between two data-points. In most high dimensional applications the choice of the distance metric is not obvious. In this article our goal is to understand different ways of measuring distance (similarity, actually) and the use of those.

Norm is a generic representation of distance. Mathematically speaking —

(Does the 3ʳᵈ point remind you the triangle law of vector?)

Actually, the norm of a mathematical object is a quantity that in some (possibly abstract) sense describes the length, size, or extent of the object. The norms for vectors (or, data-points) are called vector norms. Vector norms generally belong to the family of Lₚ-norm which is defined by —

It can be shown that for any p > 0, ∣∣x∣∣ₚ defines a vector norm.

L₁ norm is commonly known as Manhattan Distance or Taxicab norm.

fig 1. Manhattan distance between two points (Image source: Trebuňa, Peter & Halčinová, Jana. (2012). Experimental Modelling of the Cluster Analysis Processes. Procedia Engineering. 48. 673–678. 10.1016/j.proeng.2012.09.569.)

Let us assume there are two data points. These data are having only two features (say, length and weight). So, we can plot these data in a 2-dimensional plane, length and weight being two axes. Let us consider figure 1 is representing these two points on length-weight mathematical plane. Two red dots are representing two points — (1,2) and (5,5). Now, if you have to walk from point 1 to point 2 on the condition that roads are only parallel to the axes, you have to first walk from 1 to 5 along length axis and then you have to walk from 2 to 5 along weight axis. So, a total 4 + 3 = 7 units of movement is done to reach the second point from the first point.

This is Manhattan distance. If you want to see mathematically, put p = 1 in formula — 1. The formula of L₁ norm will become — ∣∣ x ∣∣₁ = ∑xᵢ. So, L₁ norm of the point (1,2) will be 1+2 = 3 and L₁ norm of point (5,5) will be 5+5 = 10. And the Manhattan distance between these two points will be 10–3 = 7.

L₂ norm is commonly known as Euclidean distance, the shortest distance between two points. If you put p = 2 in formula — 1, you can obtain L₂ norm of a point or the Euclidean distance of the point from origin (0,0) (see formula. 2 below).

So, if you need to measure the distance between two data points p and q, the formula 2 will take the following form —

There are few other p-norms. But for our discussion L₁ and L₂ norms are sufficient to know.

Mahalanobis distance is the distance between a point and a distribution. And not between two distinct points. Mahalanobis Distance is a generic distance measurement technique that equals to Euclidean distance for uncorrelated variables.

Formula 4

The formula given above represents Mahalanobis distance. Here,

Let us try to understand the formula. The term (x-m) represents the distance between the observation and the mean. We multiplied it with the inverse of Covariance matrix. This is kind of a multivariate equivalent of the regular standardisation or z-score ( z = (x-μ)/σ). Effectively, it addresses the problem of scaling as well as correlation. If the variables are highly correlated, multiplying with C⁻¹ will scale down the distance between x and m. On the other hand if there is no correlation among X’s, C will become an identity matrix — which actually leaves no effect on (x-m).

Add a comment

Related posts:

got it

Semenjak kejadian Jeremy memberi respon sesuai dengan permintaan Kenzy yang dimana diangguki langsung oleh Khail (dan itu tentu saja membuat Jeremy maupun Kenzy sedikit terkejut karena mereka…

Thought Patches

I am on Spring Break with my family in Hardy, Arkansas. My daughter and I took a canoe ride this afternoon down the South Fork of Spring River. Once we disembarked at the boat ramp in Hardy Park, we…

Cyberbullying Has Tragic Consequences

Carson James Bride was a quiet, caring 16 year old who had recently completed his sophomore year at Riverdale High School near Portland, Oregon. He was active in soccer, drama, site council, robotics…