ANMOL PATEL
Dark
🌙
☀️
✈️
🏖️
🌴
🧳
🌍
⭐
🛫
🌄
Portfolio
GitHub Profile
LinkedIn
ML Pipeline
Holiday Package Prediction
Predicting who's ready for their next dream vacation
Random Forest Classification · End-to-end workflow
View GitHub Repo
📦
01
Data collection
Kaggle dataset — 4,888 customer records, 20 features, binary target ProdTaken
Travel.csv
4,888 rows
20 features
🧹
02
Data cleaning
Median/mode imputation · Fix Fe Male → Female · Drop CustomerID
Median imputation
Mode imputation
Label fix
🔍
03
Exploratory data analysis
18% purchase rate · Passport strongest predictor (0.26) · Age ↔ Income (0.46)
df.hist()
Heatmap
Correlation
⚙
04
Feature engineering
TotalVisiting = NumberOfPersonVisiting + NumberOfChildrenVisiting
Feature creation
Column drop
🔄
05
Preprocessing — ColumnTransformer
80/20 train-test split · OneHotEncoder (drop=first) · StandardScaler
OneHotEncoder
StandardScaler
80/20 split
06
MODEL COMPARISON
Baseline models
Logistic Regression · Decision Tree · Gradient Boosting
not selected
Random Forest
Best accuracy 93% · Highest F1 score across all models
selected ✓
🎯
07
Hyperparameter tuning
RandomizedSearchCV · n_iter=100 · cv=3 · Best: n_estimators=1000, max_features=7
RandomizedSearchCV
n_iter=100
cv=3
📊
08
Evaluation
Confusion matrix · Classification report · Feature importance · ROC-AUC
Confusion matrix
Feature importance
ROC-AUC
💾
09
Model export
Saved with joblib · holiday_package_classification_model.pkl + preprocessor.pkl
joblib
.pkl
deploy ready
FINAL RESULTS
93%
ACCURACY
0.97
PRECISION
0.68
RECALL
0.80
F1 SCORE
.pkl
EXPORTED