✈️🏖️🌴🧳 🌍🛫🌄

ML Pipeline

Holiday Package Prediction
Predicting who's ready for their next dream vacation

Random Forest Classification · End-to-end workflow


📦
01
Data collection
Kaggle dataset — 4,888 customer records, 20 features, binary target ProdTaken
Travel.csv 4,888 rows 20 features
🧹
02
Data cleaning
Median/mode imputation · Fix Fe Male → Female · Drop CustomerID
Median imputation Mode imputation Label fix
🔍
03
Exploratory data analysis
18% purchase rate · Passport strongest predictor (0.26) · Age ↔ Income (0.46)
df.hist() Heatmap Correlation
04
Feature engineering
TotalVisiting = NumberOfPersonVisiting + NumberOfChildrenVisiting
Feature creation Column drop
🔄
05
Preprocessing — ColumnTransformer
80/20 train-test split · OneHotEncoder (drop=first) · StandardScaler
OneHotEncoder StandardScaler 80/20 split
MODEL COMPARISON
Baseline models
Logistic Regression · Decision Tree · Gradient Boosting
not selected
Random Forest
Best accuracy 93% · Highest F1 score across all models
selected ✓
🎯
07
Hyperparameter tuning
RandomizedSearchCV · n_iter=100 · cv=3 · Best: n_estimators=1000, max_features=7
RandomizedSearchCV n_iter=100 cv=3
📊
08
Evaluation
Confusion matrix · Classification report · Feature importance · ROC-AUC
Confusion matrix Feature importance ROC-AUC
💾
09
Model export
Saved with joblib · holiday_package_classification_model.pkl + preprocessor.pkl
joblib .pkl deploy ready

FINAL RESULTS
93%
ACCURACY
0.97
PRECISION
0.68
RECALL
0.80
F1 SCORE
.pkl
EXPORTED