Liver Disease prediction with AdaBoost and KNN Classifier
- saman aboutorab
- Jan 3, 2024
- 1 min read
Updated: Jan 8, 2024
In this project, our focus revolves around the Indian Liver Patient Dataset obtained from the UCI Machine Learning repository. The primary objective is to employ the k-Nearest Neighbors (KNN) algorithm to predict the likelihood of a patient suffering from a liver disease based on the comprehensive set of features available in the dataset. The dataset encompasses various attributes related to patients' health, and by leveraging the KNN algorithm, we aim to establish a predictive model that can classify patients into categories indicating the presence or absence of liver diseases. By instantiating three classifiers, we enhance the robustness of our predictions, considering the diverse nature of medical data. This project not only delves into the intricacies of machine learning but also addresses a critical health-related concern, demonstrating the practical applications of KNN in healthcare analytics.

Librariesfrom sklearn.tree import DecisionTreeClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier as KNN
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import accuracy_score
from sklearn.metrics import roc_auc_score
from sklearn.ensemble import VotingClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.ensemble import AdaBoostClassifier
Datasetdf_patients = pd.read_csv('indian_liver_patient_preprocessed.csv')
print(df_patients.head()) df_patients = df_patients.drop(['Unnamed: 0'], axis=1) Train/Test splitX = df_patients.drop(['Liver_disease'], axis=1)
y = df_patients[['Liver_disease']]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=4) KNN Classifier modelsSEED = 1
# Instantiate lr
lr = LogisticRegression(random_state = SEED)
# Instantiate KNN
knn = KNN(n_neighbors=27)
# Instantiate dr
dt = DecisionTreeClassifier(min_samples_leaf=0.13, random_state=SEED)
# List of classifiers
classifiers = [('Logistic Regression', lr), ('K Nearest Neighbors', knn), ('Classification Tree', dt)] Evaluate Classifiers# Iterate over classifier
for clf_name, clf in classifiers:
# Fit to the training data
clf.fit(X_train, y_train)
# Predict test data
y_pred = clf.predict(X_test)
# Calculate accuracy
accuracy = accuracy_score(y_test, y_pred)
# Evaluate clf's accuracy on the test set
print('{:s} : {:.3f}'.format(clf_name, accuracy)) Logistic Regression : 0.690 K Nearest Neighbors : 0.698 Classification Tree : 0.672 Voting classifiers# Instantiate a VottingClassifier
vc = VotingClassifier(estimators = classifiers)
# Fit to training data
vc.fit(X_train, y_train)
# predict
y_pred = vc.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print('Voting Classifer: {:.3f}'.format(accuracy)) Voting Classifer: 0.681 Bagging classifier# Instantiate dt
dt = DecisionTreeClassifier(random_state=1)
# Instantiate bc
bc = BaggingClassifier(base_estimator=dt, n_estimators=50, oob_score=True, random_state=1) Fit the model# Fit bc on training data
bc.fit(X_train, y_train)
# Predict bc on test data
bc.predict(X_test)
# Evaluate
acc_test = accuracy_score(y_test, y_pred)
# Evaluate OOB accuracy
acc_oob = bc.oob_score_
# Print acc_test and acc_oob
print('Test set accuracy: {:.3f}, OOB accuracy: {:.3f}'.format(acc_test, acc_oob)) Test set accuracy: 0.681, OOB accuracy: 0.737 Adaboost classifier# Instantiate ada
ada = AdaBoostClassifier(base_estimator=dt, n_estimators=180, random_state=1)
# Fit to the training data
ada.fit(X_train, y_train)
# Predict on test data
y_pred_proba = ada.predict_proba(X_test)[:,1] Evaluate Ada# Evaluate ada classifier
ada_roc_auc_score = roc_auc_score(y_test, y_pred_proba)
print('ROC AUC score: {:.3f}'.format(ada_roc_auc_score)) ROC AUC score: 0.657 GridSearchCV: Decision Tree hyperparameters# Define params
params_dt = {'max_depth':[2, 3, 4], 'min_samples_leaf':[0.12, 0.14, 0.16, 0.18]}
# Instantiate grid
grid_dt = GridSearchCV(estimator=dt, param_grid=params_dt, scoring='roc_auc', cv=5, n_jobs=-1)
grid_dt.fit(X_train, y_train) GridSearch result# Extract the best estimator
best_model = grid_dt.best_estimator_
# Predict the test set probabilities of the positive class
y_pred_proba = best_model.predict_proba(X_test)[:,1]
# Compute test_roc_auc
test_roc_auc = roc_auc_score(y_test, y_pred_proba)
# Print test_roc_auc
print('Test set ROC AUC score: {:.3f}'.format(test_roc_auc)) Test set ROC AUC score: 0.696 |
Comments