You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
260 lines
7.3 KiB
260 lines
7.3 KiB
{
|
|
"cells": [
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"## Introdução à Probabilidade e Estatística\n",
|
|
"## Tarefa\n",
|
|
"\n",
|
|
"Nesta tarefa, usaremos o conjunto de dados de pacientes com diabetes retirado [deste link](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html).\n"
|
|
],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": 13,
|
|
"source": [
|
|
"import pandas as pd\n",
|
|
"import numpy as np\n",
|
|
"\n",
|
|
"df = pd.read_csv(\"../../data/diabetes.tsv\",sep='\\t')\n",
|
|
"df.head()"
|
|
],
|
|
"outputs": [
|
|
{
|
|
"output_type": "execute_result",
|
|
"data": {
|
|
"text/plain": [
|
|
" AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Y\n",
|
|
"0 59 2 32.1 101.0 157 93.2 38.0 4.0 4.8598 87 151\n",
|
|
"1 48 1 21.6 87.0 183 103.2 70.0 3.0 3.8918 69 75\n",
|
|
"2 72 2 30.5 93.0 156 93.6 41.0 4.0 4.6728 85 141\n",
|
|
"3 24 1 25.3 84.0 198 131.4 40.0 5.0 4.8903 89 206\n",
|
|
"4 50 1 23.0 101.0 192 125.4 52.0 4.0 4.2905 80 135"
|
|
],
|
|
"text/html": [
|
|
"<div>\n",
|
|
"<style scoped>\n",
|
|
" .dataframe tbody tr th:only-of-type {\n",
|
|
" vertical-align: middle;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe tbody tr th {\n",
|
|
" vertical-align: top;\n",
|
|
" }\n",
|
|
"\n",
|
|
" .dataframe thead th {\n",
|
|
" text-align: right;\n",
|
|
" }\n",
|
|
"</style>\n",
|
|
"<table border=\"1\" class=\"dataframe\">\n",
|
|
" <thead>\n",
|
|
" <tr style=\"text-align: right;\">\n",
|
|
" <th></th>\n",
|
|
" <th>AGE</th>\n",
|
|
" <th>SEX</th>\n",
|
|
" <th>BMI</th>\n",
|
|
" <th>BP</th>\n",
|
|
" <th>S1</th>\n",
|
|
" <th>S2</th>\n",
|
|
" <th>S3</th>\n",
|
|
" <th>S4</th>\n",
|
|
" <th>S5</th>\n",
|
|
" <th>S6</th>\n",
|
|
" <th>Y</th>\n",
|
|
" </tr>\n",
|
|
" </thead>\n",
|
|
" <tbody>\n",
|
|
" <tr>\n",
|
|
" <th>0</th>\n",
|
|
" <td>59</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>32.1</td>\n",
|
|
" <td>101.0</td>\n",
|
|
" <td>157</td>\n",
|
|
" <td>93.2</td>\n",
|
|
" <td>38.0</td>\n",
|
|
" <td>4.0</td>\n",
|
|
" <td>4.8598</td>\n",
|
|
" <td>87</td>\n",
|
|
" <td>151</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>1</th>\n",
|
|
" <td>48</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>21.6</td>\n",
|
|
" <td>87.0</td>\n",
|
|
" <td>183</td>\n",
|
|
" <td>103.2</td>\n",
|
|
" <td>70.0</td>\n",
|
|
" <td>3.0</td>\n",
|
|
" <td>3.8918</td>\n",
|
|
" <td>69</td>\n",
|
|
" <td>75</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>2</th>\n",
|
|
" <td>72</td>\n",
|
|
" <td>2</td>\n",
|
|
" <td>30.5</td>\n",
|
|
" <td>93.0</td>\n",
|
|
" <td>156</td>\n",
|
|
" <td>93.6</td>\n",
|
|
" <td>41.0</td>\n",
|
|
" <td>4.0</td>\n",
|
|
" <td>4.6728</td>\n",
|
|
" <td>85</td>\n",
|
|
" <td>141</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>3</th>\n",
|
|
" <td>24</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>25.3</td>\n",
|
|
" <td>84.0</td>\n",
|
|
" <td>198</td>\n",
|
|
" <td>131.4</td>\n",
|
|
" <td>40.0</td>\n",
|
|
" <td>5.0</td>\n",
|
|
" <td>4.8903</td>\n",
|
|
" <td>89</td>\n",
|
|
" <td>206</td>\n",
|
|
" </tr>\n",
|
|
" <tr>\n",
|
|
" <th>4</th>\n",
|
|
" <td>50</td>\n",
|
|
" <td>1</td>\n",
|
|
" <td>23.0</td>\n",
|
|
" <td>101.0</td>\n",
|
|
" <td>192</td>\n",
|
|
" <td>125.4</td>\n",
|
|
" <td>52.0</td>\n",
|
|
" <td>4.0</td>\n",
|
|
" <td>4.2905</td>\n",
|
|
" <td>80</td>\n",
|
|
" <td>135</td>\n",
|
|
" </tr>\n",
|
|
" </tbody>\n",
|
|
"</table>\n",
|
|
"</div>"
|
|
]
|
|
},
|
|
"metadata": {},
|
|
"execution_count": 13
|
|
}
|
|
],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"Neste conjunto de dados, as colunas são as seguintes:\n",
|
|
"* Idade e sexo são autoexplicativos\n",
|
|
"* IMC é o índice de massa corporal\n",
|
|
"* PA é a pressão arterial média\n",
|
|
"* S1 até S6 são diferentes medições de sangue\n",
|
|
"* Y é a medida qualitativa da progressão da doença ao longo de um ano\n",
|
|
"\n",
|
|
"Vamos estudar este conjunto de dados utilizando métodos de probabilidade e estatística.\n",
|
|
"\n",
|
|
"### Tarefa 1: Calcular valores médios e variância para todos os valores\n"
|
|
],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"source": [],
|
|
"outputs": [],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"### Tarefa 2: Plotar boxplots para IMC, PA e Y dependendo do gênero\n"
|
|
],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"source": [],
|
|
"outputs": [],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "code",
|
|
"execution_count": null,
|
|
"source": [],
|
|
"outputs": [],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [
|
|
"### Tarefa 4: Teste a correlação entre diferentes variáveis e a progressão da doença (Y)\n",
|
|
"\n",
|
|
"> **Dica** A matriz de correlação fornecerá as informações mais úteis sobre quais valores são dependentes.\n"
|
|
],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"source": [],
|
|
"metadata": {}
|
|
},
|
|
{
|
|
"cell_type": "markdown",
|
|
"metadata": {},
|
|
"source": [
|
|
"\n---\n\n**Aviso Legal**: \nEste documento foi traduzido utilizando o serviço de tradução por IA [Co-op Translator](https://github.com/Azure/co-op-translator). Embora nos esforcemos para garantir a precisão, esteja ciente de que traduções automáticas podem conter erros ou imprecisões. O documento original em seu idioma nativo deve ser considerado a fonte oficial. Para informações críticas, recomenda-se a tradução profissional realizada por humanos. Não nos responsabilizamos por quaisquer mal-entendidos ou interpretações incorretas decorrentes do uso desta tradução.\n"
|
|
]
|
|
}
|
|
],
|
|
"metadata": {
|
|
"orig_nbformat": 4,
|
|
"language_info": {
|
|
"name": "python",
|
|
"version": "3.8.8",
|
|
"mimetype": "text/x-python",
|
|
"codemirror_mode": {
|
|
"name": "ipython",
|
|
"version": 3
|
|
},
|
|
"pygments_lexer": "ipython3",
|
|
"nbconvert_exporter": "python",
|
|
"file_extension": ".py"
|
|
},
|
|
"kernelspec": {
|
|
"name": "python3",
|
|
"display_name": "Python 3.8.8 64-bit (conda)"
|
|
},
|
|
"interpreter": {
|
|
"hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5"
|
|
},
|
|
"coopTranslator": {
|
|
"original_hash": "6d945fd15163f60cb473dbfe04b2d100",
|
|
"translation_date": "2025-09-06T17:26:33+00:00",
|
|
"source_file": "1-Introduction/04-stats-and-probability/assignment.ipynb",
|
|
"language_code": "br"
|
|
}
|
|
},
|
|
"nbformat": 4,
|
|
"nbformat_minor": 2
|
|
} |