Framingham Heart Study: EDA and Classification Model

  • Tech Stack: Python, Pandas, NumPy, Matplotlib, Seaborn, Scikit-learn
  • Project Focus: Exploratory data analysis and predicting heart disease risk using a decision tree classifier
  • GitHub Repository: Project Link

This project involves an extensive exploratory data analysis (EDA) of the Framingham Heart Study dataset and the development of a decision tree classifier to predict the risk of heart disease. Key highlights of the project include:

  • Data Cleaning and Preprocessing: Addressed missing values, outliers, and feature scaling to prepare the dataset for analysis and modeling.
  • EDA Insights:
    • Examined the impact of key factors like age, cholesterol levels, blood pressure, and smoking habits on heart disease risk.
    • Visualized correlations using heatmaps, histograms, and pair plots to uncover hidden patterns in the data.
    • Identified trends in gender-specific risk factors for cardiovascular diseases.
  • Model Development: Built a decision tree classifier achieving 90% accuracy on the test set by optimizing hyperparameters and feature selection.
  • Evaluation: Assessed model performance using metrics such as precision, recall, F1-score, and ROC-AUC to