🚗🏎️🚙 🔧💰📊🛣️

ML Pipeline

Resale Car Price Prediction
Data-driven price estimation for pre-owned vehicles

Random Forest Regression · End-to-end workflow


📦
01
Data collection
CarDekho dataset — 15,411 used car records scraped from cardekho.com with 13 features, target variable: selling_price (INR)
cardekho_imputated.csv 15,411 rows 13 features
🧹
02
Data cleaning
Pre-imputed dataset — zero null values found · Drop car_name and brand (redundant with model column)
No nulls Drop car_name Drop brand
🔍
03
Exploratory data analysis
Price heavily right-skewed (₹2–15L range) · max_power strongest predictor (0.75) · engine second (0.59) · vehicle_age & km_driven negatively correlated
Price distribution Correlation heatmap sns / plotly
04
Feature identification
7 numerical features · 4 categorical features · Label Encoding applied to model column (120 unique car models)
7 numerical 4 categorical LabelEncoder on model
🔄
05
Preprocessing — ColumnTransformer
80/20 train-test split (random_state=42) · OneHotEncoder (drop=first) on seller_type, fuel_type, transmission_type · StandardScaler on numerical features
OneHotEncoder StandardScaler 80/20 split
MODEL COMPARISON
Lower performers
Linear Regression · Lasso · Ridge — Test R² 0.66 · KNN — Test R² 0.92 · Decision Tree — overfit (Train 0.99 → Test 0.88)
not selected
Random Forest Regressor
Best test R² 0.93 · Minimal overfitting · Strongest generalisation across all metrics
selected ✓
🎯
07
Hyperparameter tuning
RandomizedSearchCV · n_iter=100 · cv=3 · Best: n_estimators=1000, max_features=7, max_depth=None, min_samples_split=2
RandomizedSearchCV n_iter=100 cv=3
📊
08
Evaluation
RMSE ₹2,12,015 · MAE ₹98,050 · R² 0.9403 on test set · Top predictors: max_power, engine, vehicle_age, km_driven
R² Score RMSE / MAE Feature importance
💾
09
Model export
Saved with joblib · car_price_predictor.pkl + preprocessor.pkl · Sample prediction: Ford Ecosport (6yr) → ₹6,08,500
joblib .pkl deploy ready

FINAL RESULTS — TUNED RANDOM FOREST
0.94
TEST R²
2.12L
RMSE (₹)
0.98L
MAE (₹)
6
MODELS TESTED
.pkl
EXPORTED