KNN algorithm why and how with sklearn?

Deepak Poojari
3 min readJan 6, 2019

--

Hello everyone, today we are going to look into an interesting machine learning algorithm KNN, its pros and cons and its implementation using sklearn.

KNN is the simplest supervised machine learning algorithm that can be used for both classification and regression.Its works on the principle of finding the k nearest neighbor using distance between two points/parameters.

How does it work?

KNN is known as a lazy learning and non-parametric learning algorithm.
lazy learning because it does not learn anything from test data or we can say it does not have a training phase.
non-parametric because it does not make any assumption on the underlying dataset.

KNN is simple, it gets all train data and stores it in memory now when a new unseen data comes in for prediction. under the hood it compares the distance between unseen data and all the data in the training dataset and returns the k results whose distance with unseen data is minimum.

How to choose k?

The value of k depends completely on your dataset, initialize k and observe the model performance on different values of k. The k in which we get the least error should be selected. suppose you have a dataset of 1000 data then k should be selected between 1–90.

Pros: simple to implement, Versatile ( can be used for classification and regression )
Cons:
computationally expensive, high memory requirement as it stores all train data in memory.

Practical application’s

Due it its simplicity and ease of use it is used widely in different machine learning application.
It is used both in supervised and unsupervised learning algorithm’s.
it is used in a recommendation system, fraud detection, segregating online documents etc.

Implementation with sklearn

We will use Pima Indians Diabetes Database dataset form kaggle. we will predict whether a patient has diabetes based on his BP, ages etc. you can find details on parameters Kaggle.
code available on my Kaggle and Github

Step 1: load the necessary packages

Step 2: let's see how our dataset looks like

Step 3: Import KNeighborsClassifier a prebuild model for KNN algorithm using sklearn

Step 4: create test and train data

Step 5: Check accuracy for different values of k.

Step 6: Build a model for k which gives maximum accuracy.

as we can see we get an accuracy of 79% on train and 0.72% on test which is not bad.

I keep on posting articles on NodeJS, Machine/Deep Learning and many more technologies to come in my stack.
connect with me at Linkedin, Gmail, Twitter

If you like this article please give it a clap.

Follow for more like on LinkedIn: https://www.linkedin.com/in/deepak6446r/

--

--

Deepak Poojari

Backend | NodeJS | Golang developer | Playing around with tech | Likes to workout and stay fit.