Framingham Heart Study: EDA and Classification Model
- Tech Stack: Python, Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
- Project Focus: Exploratory data analysis and predicting heart disease risk using a decision tree classifier
- GitHub Repository: Project Link
This project involves an extensive exploratory data analysis (EDA) of the Framingham Heart Study dataset and the development of a decision tree classifier to predict the risk of heart disease. Key highlights of the project include:
- Data Cleaning and Preprocessing: Addressed missing values, outliers, and feature scaling to prepare the dataset for analysis and modeling.
- EDA Insights:
- Examined the impact of key factors like age, cholesterol levels, blood pressure, and smoking habits on heart disease risk.
- Visualized correlations using heatmaps, histograms, and pair plots to uncover hidden patterns in the data.
- Identified trends in gender-specific risk factors for cardiovascular diseases.
- Model Development: Built a decision tree classifier achieving 90% accuracy on the test set by optimizing hyperparameters and feature selection.
- Evaluation: Assessed model performance using metrics such as precision, recall, F1-score, and ROC-AUC to