Basic terminology for better understanding of Dimension Reduction tech. PCA and t-sne

4 min readDec 27, 2020

In this we will learn about Dimension reduction and why should we learn about and where to use this ?

What is Dimension reduction ?

From wiki, Dimension reduction or Dimensionality reduction is the transformation of data from a high dimensional space into low dimension. since we can visualize data in 2-d,3-d using scatter plot and also 4-d,5-d,6-d using pair plot . But as d increases pair plot doesn’t work well . Dimension reduction is common in the fields that deals with large number of observation and/or large number of variable, such as signals processing, speech etc.

There are various methods used for dimensionality reduction include, here we will discuss the two most used techniques:

Principal Component Analysis(PCA)
T-distributed Stochastic Neighbor Embedding(t-sne)

Before learning Dimension reduction techniques lets understand some basic termenology:

Row vectors and Column Vectors: In linear algebra , a column vector is n x 1 matrix, that is , matrix consisting of a single column of n element.

similarly a row vector is 1xn matrix ,that is ,a matrix consisting of a single row of m elements.

f it is not explicitly tell what vector is ,then bydefault it is Column Vector.
The transpose of a column vector is a row vector.

2. Represent Dataset as a data-Matrix:

One of the common way of representing a Dataset is Matrix

Each datapoint represent :row and Each feature is Column

3. Data Preprocessing:

Pre-processing refers to mathematics operation and transformation that we have to do on data itself before we build models or something else.

(A) Column-Normalization: Normalization is technique of reducing measurement to standard scale.

Q. How to column normalization ?

Showing for 1 feature and for rest it follow the same procedure.

step1: take all the values corresponding to feature ‘fj’.

a1,a2,a3,a4,a5,a6,…………aᵢ,……………an — → n values of feature fj.

Compute :

let : max[aᵢ’s]=aₘₐₓ and min[aᵢ’s]=aₘᵢₙ

now a1,a2,a3,a4,a5,a6,…………aᵢ,……………an : after column normalization will written as a1',a2',a3',a4',a5',a6',…………ai’,……………an’

such that : (ai_normalised) aᵢ’=aᵢ-aₘᵢₙ/aₘₐₓ-aₘᵢₙ and also aᵢ ∈[0,1]

Q. Why to do column-normalization?

Suppose we have two feature f1=’height’ collected its datapoints in cm and f2=’weight’ collected its datapoints in kg. Now what if the height is collected in inches and weight in pounds , then there will be different relation between height and weight so, after column normalization we don’t care weather our data is collected in feet , inches or pound .
By Column normalization we are getting both feature into standard format where all the values lies between [0,1].

By column -normalization : We are getting rid of scale

Geometrically: Data points anywhere in n-d space after doing column-normalization is squash into unit hypercube in n-d space

(B) Column-Standardization : It is similiar technique as Column-Normalization. But with a change.

Column Standardization: It will transform feature coming from any distribution so that , it will have zero mean and unit variance or standard deviation

Q. How to column Standardization?

Showing for 1 feature and for rest it follow the same procedure.

step1: take all the values corresponding to feature ‘fj’.

a1,a2,a3,a4,a5,a6,…………aᵢ,……………an — → n values of feature fj.

Compute : ā =mean[aᵢ] (also called mean vector) and s = std_dev[aᵢ]

now a1,a2,a3,a4,a5,a6,…………aᵢ,……………an : after column-standardization will written as a1',a2',a3',a4',a5',a6',…………ai’,……………an’

such that : aᵢ’=(aᵢ — ā)/s where ā= mean[aᵢ’]=0 (also called mean vector)and s=std_dev[aᵢ’]=1=

Geometric intuition of column standardization:

Moving the mean vector to origin .
Squashing/expanding data points with variance =1

So column-standardization is mean centering+scaling(std_dev=1) for all features.

4. Co-Variance Matrix:

Acc. to Wiki, a covariance matrix is square matrix giving the covariance between each pair of elements of given random vector and its also symmetric matrix and its main diagonal contains variances

Let X and Y are 2 random vetors or features then covariance is denoted as cov(X,Y)or Cov(F1,F2)

If F1and F2are column standardised implies mean(X)=0

and std-dev(F1)=1 ,simlarly Y
then Cov(F1,F2)= (F1T *F2)/n iff F1 and F2 are column standardised.

Since we learn some basic terminology for dimension reduction .Let perform and learn about PCA and t-sne technique used for dimension reduction in next part