You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Data-Science-For-Beginners/1-Introduction/04-stats-and-probability/assignment.md

25 lines
1.5 KiB

# Small Diabetes Study
In this assignment, we will work with a small dataset of diabetes patients taken from [here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html).
| | AGE | SEX | BMI | BP | S1 | S2 | S3 | S4 | S5 | S6 | Y |
|---|-----|-----|-----|----|----|----|----|----|----|----|----|
| 0 | 59 | 2 | 32.1 | 101. | 157 | 93.2 | 38.0 | 4. | 4.8598 | 87 | 151 |
| 1 | 48 | 1 | 21.6 | 87.0 | 183 | 103.2 | 70. | 3. | 3.8918 | 69 | 75 |
| 2 | 72 | 2 | 30.5 | 93.0 | 156 | 93.6 | 41.0 | 4.0 | 4. | 85 | 141 |
| ... | ... | ... | ... | ...| ...| ...| ...| ...| ...| ...| ... |
## Instructions
* Open the [assignment notebook](assignment.ipynb) in a jupyter notebook environment
* Complete all tasks listed in the notebook, namely:
[ ] Compute mean values and variance for all values
[ ] Plot boxplots for BMI, BP and Y depending on gender
[ ] What is the the distribution of Age, Sex, BMI and Y variables?
[ ] Test the correlation between different variables and disease progression (Y)
[ ] Test the hypothesis that the degree of diabetes progression is different between men and women
## Rubric
Exemplary | Adequate | Needs Improvement
--- | --- | -- |
All required tasks are complete, graphically illustrated and explained | Most of the tasks are complete, explanations or takeaways from graphs and/or obtained values are missing | Only basic tasks such as computation of mean/variance and basic plots are complete, no conclusions are made from the data