Variance Thresholding in Pyhton for feature selection

variance thresholding

It supposes that features with low variance contain less information. By default, it removes all zero variance features.

Practically implementation in Python with scikit learn library.

#Import Libraires
from sklearn import datasets
from sklearn.feature_selection import VarianceThreshold

#Load iris data
df = datasets.load_iris()
#Create features and target
X = df.data
y = df.target

#Create VarianceThreshold object with threshold of 0.6
threshold = VarianceThreshold(threshold=.6)
Conduct variance thresholding
x_high_var = threshold.fit_transform(X)

#view first 10 records with variance above threshold
x_high_var[0:10]

#Ouput
array([[5.1, 1.4],
[4.9, 1.4],
[4.7, 1.3],
[4.6, 1.5],
[5. , 1.4],
[5.4, 1.7],
[4.6, 1.4],
[5. , 1.5],
[4.4, 1.4],
[4.9, 1.5]])

For more about it. You can read officially scikit learn page.

About Mitra N Mishra 35 Articles
Mitra N Mishra is working as a full-stack data scientist.

Be the first to comment

Leave a Reply