Diabetes Risk Predictor
Supporting SDG 3: Good Health & Well-being
An offline machine learning application designed to increase diabetes risk awareness and promote early health screening in underserved communities.
Samuel Kamawira • Diabetes-Risk-Predictor Repository
The Problem – Rising Diabetes Risk
Diabetes has become one of the world's most prevalent non-communicable diseases, affecting millions globally. Yet many individuals remain unaware of their risk until serious complications emerge.
Critical Challenges
  • Limited awareness of personal diabetes risk factors
  • Serious complications: heart disease, kidney failure, vision loss
  • Delayed diagnosis leading to preventable health impacts
  • Insufficient access to digital health tools, especially offline solutions
SDG 3 – Good Health & Well-being
SDG 3 Mission
Ensure healthy lives and promote well-being for all at all ages across the globe.
Target 3.4
Reduce premature mortality from non-communicable diseases like diabetes by one-third through prevention and treatment.
Prevention Focus
Early risk awareness and screening are essential tools for preventing diabetes and its complications.

This project directly supports SDG 3 by promoting awareness of diabetes risk and encouraging proactive health monitoring and early medical check-ups.
Our Solution – Diabetes Risk Predictor
1
Machine Learning Model
Estimates type 2 diabetes risk using proven classification algorithms trained on established health datasets.
2
Simple Health Inputs
Uses 8 basic parameters: age, BMI, glucose levels, blood pressure, and other accessible health metrics.
3
Offline Capability
Built as a Streamlit web app that runs completely offline—no internet connection required.
4
Educational Tool
Designed for awareness and health education, not as a clinical diagnostic system.
Users input basic health data and receive an estimated risk level along with general health tips and recommendations for medical consultation.
How It Works – Under the Hood
Data & Preprocessing
  • Dataset: Pima Indians Diabetes dataset with 768 records
  • Data cleaning: Replace unrealistic zero values with median values
  • Standardization: Features scaled using StandardScaler for optimal model performance
Model & Evaluation
  • Algorithm: Logistic Regression for binary classification
  • Training: 80/20 train-test split for robust validation
  • Performance: Achieves reasonable accuracy and recall suitable for educational purposes
The model prioritizes interpretability and ease of deployment while maintaining effectiveness as an awareness tool.
App Interface – Prediction Tab
User Input Parameters
Pregnancies, Glucose, Blood Pressure
Skin Thickness, Insulin levels
BMI, Diabetes Pedigree Function
Age
Output & Guidance
  • Predicted diabetes risk (Yes/No)
  • Probability score (0–1 scale)
  • Risk level: Low / Medium / High
  • General health tips and recommendations
  • Clear medical disclaimer
The main prediction screen provides users with immediate risk assessment and actionable health guidance.
App Interface – About & SDG 3 Tab
Project Context
Explains the project goal, methodology, and educational purpose of the diabetes risk assessment tool.
SDG 3 Connection
Details how the application supports Good Health & Well-being through risk awareness and prevention.
Clear Disclaimer
Emphasizes that this is not a medical diagnosis tool and users should consult healthcare professionals.
This transparency tab ensures users understand the tool's educational purpose and its explicit connection to sustainable development goals.
Usage Stats & Data Logging
Local Data Collection
Every prediction is logged locally in usage_log.csv for analysis while maintaining complete privacy.
Logged Information
  • Timestamp of prediction
  • Input parameters
  • Predicted class and probability
  • Calculated risk level
Analytics Dashboard
The Usage Stats tab provides insights into application usage patterns:
Total
Predictions
Count of all risk assessments
Avg
Probability
Mean diabetes risk score
Interactive bar chart visualizes distribution across Low, Medium, and High risk levels.
Limitations & Future Improvements
Current Limitations
  • Dataset from specific population (Pima Indian women)—limited generalizability
  • Uses only a small set of numeric features
  • Model lacks clinical validation—strictly for educational use
  • Does not account for genetic diversity or regional health variations
Future Enhancements
  • Evaluate alternative models (Random Forest, XGBoost) for improved accuracy
  • Add visual feature importance explanations using SHAP or LIME
  • Incorporate more diverse datasets for better generalization
  • Develop multilingual support for broader accessibility
  • Create mobile-responsive version for smartphone deployment
These acknowledged limitations and planned improvements demonstrate a realistic, iterative approach to health technology development.
Deployment & Conclusion
01
Repository Access
Complete project available on GitHub: Diabetes-Risk-Predictor
02
Model Training
Run python train_model.py to train or retrain the model
03
Application Launch
Execute streamlit run app.py to start the web interface
04
Offline Operation
Fully functional without internet—no external APIs required
05
Package Submission
Bundled as Diabetes_Risk_Predictor_Samuel_Kamawira.zip

This project demonstrates how machine learning and accessible web applications can support SDG 3 by raising awareness about diabetes risk and encouraging early health screenings—particularly in communities with limited access to digital health infrastructure.
Made with