replaced Height and Weight

pull/524/head
MitrovicRelja 2 years ago
parent f904ef1480
commit 76d5ac666e

8
.idea/.gitignore vendored

@ -0,0 +1,8 @@
# Default ignored files
/shelf/
/workspace.xml
# Editor-based HTTP Client requests
/httpRequests/
# Datasource local storage ignored files
/dataSources/
/dataSources.local.xml

@ -0,0 +1,11 @@
<?xml version="1.0" encoding="UTF-8"?>
<module type="JAVA_MODULE" version="4">
<component name="NewModuleRootManager" inherit-compiler-output="true">
<exclude-output />
<content url="file://$MODULE_DIR$">
<excludeFolder url="file://$MODULE_DIR$/venv" />
</content>
<orderEntry type="jdk" jdkName="Python 3.9 (Data-Science-For-Beginners)" jdkType="Python SDK" />
<orderEntry type="sourceFolder" forTests="false" />
</component>
</module>

@ -0,0 +1,6 @@
<component name="InspectionProjectProfileManager">
<profile version="1.0">
<option name="myName" value="Project Default" />
<inspection_tool class="Eslint" enabled="true" level="WARNING" enabled_by_default="true" />
</profile>
</component>

@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="JpaBuddyIdeaProjectConfig">
<option name="renamerInitialized" value="true" />
</component>
</project>

@ -0,0 +1,7 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectRootManager" version="2" project-jdk-name="Python 3.9 (Data-Science-For-Beginners)" project-jdk-type="Python SDK" />
<component name="ProjectType">
<option name="id" value="jpab" />
</component>
</project>

@ -0,0 +1,8 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="ProjectModuleManager">
<modules>
<module fileurl="file://$PROJECT_DIR$/.idea/Data-Science-For-Beginners.iml" filepath="$PROJECT_DIR$/.idea/Data-Science-For-Beginners.iml" />
</modules>
</component>
</project>

@ -0,0 +1,6 @@
<?xml version="1.0" encoding="UTF-8"?>
<project version="4">
<component name="VcsDirectoryMappings">
<mapping directory="" vcs="Git" />
</component>
</project>

@ -3,9 +3,9 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"## Introduction to Probability and Statistics\r\n", "## Introduction to Probability and Statistics\n",
"## Assignment\r\n", "## Assignment\n",
"\r\n", "\n",
"In this assignment, we will use the dataset of diabetes patients taken [from here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)." "In this assignment, we will use the dataset of diabetes patients taken [from here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)."
], ],
"metadata": {} "metadata": {}
@ -14,10 +14,10 @@
"cell_type": "code", "cell_type": "code",
"execution_count": 13, "execution_count": 13,
"source": [ "source": [
"import pandas as pd\r\n", "import pandas as pd\n",
"import numpy as np\r\n", "import numpy as np\n",
"\r\n", "\n",
"df = pd.read_csv(\"../../data/diabetes.tsv\",sep='\\t')\r\n", "df = pd.read_csv(\"../../data/diabetes.tsv\",sep='\\t')\n",
"df.head()" "df.head()"
], ],
"outputs": [ "outputs": [
@ -149,16 +149,16 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"\r\n", "\n",
"In this dataset, columns as the following:\r\n", "In this dataset, columns as the following:\n",
"* Age and sex are self-explanatory\r\n", "* Age and sex are self-explanatory\n",
"* BMI is body mass index\r\n", "* BMI is body mass index\n",
"* BP is average blood pressure\r\n", "* BP is average blood pressure\n",
"* S1 through S6 are different blood measurements\r\n", "* S1 through S6 are different blood measurements\n",
"* Y is the qualitative measure of disease progression over one year\r\n", "* Y is the qualitative measure of disease progression over one year\n",
"\r\n", "\n",
"Let's study this dataset using methods of probability and statistics.\r\n", "Let's study this dataset using methods of probability and statistics.\n",
"\r\n", "\n",
"### Task 1: Compute mean values and variance for all values" "### Task 1: Compute mean values and variance for all values"
], ],
"metadata": {} "metadata": {}
@ -201,8 +201,8 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"### Task 4: Test the correlation between different variables and disease progression (Y)\r\n", "### Task 4: Test the correlation between different variables and disease progression (Y)\n",
"\r\n", "\n",
"> **Hint** Correlation matrix would give you the most useful information on which values are dependent." "> **Hint** Correlation matrix would give you the most useful information on which values are dependent."
], ],
"metadata": {} "metadata": {}
@ -249,4 +249,4 @@
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2
} }

File diff suppressed because one or more lines are too long

@ -3,9 +3,9 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"## Introduction to Probability and Statistics\r\n", "## Introduction to Probability and Statistics\n",
"## Assignment\r\n", "## Assignment\n",
"\r\n", "\n",
"In this assignment, we will use the dataset of diabetes patients taken [from here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)." "In this assignment, we will use the dataset of diabetes patients taken [from here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)."
], ],
"metadata": {} "metadata": {}
@ -14,11 +14,11 @@
"cell_type": "code", "cell_type": "code",
"execution_count": 13, "execution_count": 13,
"source": [ "source": [
"import pandas as pd\r\n", "import pandas as pd\n",
"import numpy as np\r\n", "import numpy as np\n",
"import matplotlib.pyplot as plt\r\n", "import matplotlib.pyplot as plt\n",
"\r\n", "\n",
"df = pd.read_csv(\"../../../data/diabetes.tsv\",sep='\\t')\r\n", "df = pd.read_csv(\"../../../data/diabetes.tsv\",sep='\\t')\n",
"df.head()" "df.head()"
], ],
"outputs": [ "outputs": [
@ -150,16 +150,16 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"\r\n", "\n",
"In this dataset, columns as the following:\r\n", "In this dataset, columns as the following:\n",
"* Age and sex are self-explanatory\r\n", "* Age and sex are self-explanatory\n",
"* BMI is body mass index\r\n", "* BMI is body mass index\n",
"* BP is average blood pressure\r\n", "* BP is average blood pressure\n",
"* S1 through S6 are different blood measurements\r\n", "* S1 through S6 are different blood measurements\n",
"* Y is the qualitative measure of disease progression over one year\r\n", "* Y is the qualitative measure of disease progression over one year\n",
"\r\n", "\n",
"Let's study this dataset using methods of probability and statistics.\r\n", "Let's study this dataset using methods of probability and statistics.\n",
"\r\n", "\n",
"### Task 1: Compute mean values and variance for all values" "### Task 1: Compute mean values and variance for all values"
], ],
"metadata": {} "metadata": {}
@ -355,7 +355,7 @@
"cell_type": "code", "cell_type": "code",
"execution_count": 8, "execution_count": 8,
"source": [ "source": [
"# Another way\r\n", "# Another way\n",
"pd.DataFrame([df.mean(),df.var()],index=['Mean','Variance']).head()" "pd.DataFrame([df.mean(),df.var()],index=['Mean','Variance']).head()"
], ],
"outputs": [ "outputs": [
@ -447,7 +447,7 @@
"cell_type": "code", "cell_type": "code",
"execution_count": 9, "execution_count": 9,
"source": [ "source": [
"# Or, more simply, for the mean (variance can be done similarly)\r\n", "# Or, more simply, for the mean (variance can be done similarly)\n",
"df.mean()" "df.mean()"
], ],
"outputs": [ "outputs": [
@ -486,8 +486,8 @@
"cell_type": "code", "cell_type": "code",
"execution_count": 17, "execution_count": 17,
"source": [ "source": [
"for col in ['BMI','BP','Y']:\r\n", "for col in ['BMI','BP','Y']:\n",
" df.boxplot(column=col,by='SEX')\r\n", " df.boxplot(column=col,by='SEX')\n",
"plt.show()" "plt.show()"
], ],
"outputs": [ "outputs": [
@ -538,8 +538,8 @@
"cell_type": "code", "cell_type": "code",
"execution_count": 19, "execution_count": 19,
"source": [ "source": [
"for col in ['AGE','SEX','BMI','Y']:\r\n", "for col in ['AGE','SEX','BMI','Y']:\n",
" df[col].hist()\r\n", " df[col].hist()\n",
" plt.show()" " plt.show()"
], ],
"outputs": [ "outputs": [
@ -593,9 +593,9 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"Conclusions:\r\n", "Conclusions:\n",
"* Age - normal\r\n", "* Age - normal\n",
"* Sex - uniform\r\n", "* Sex - uniform\n",
"* BMI, Y - hard to tell" "* BMI, Y - hard to tell"
], ],
"metadata": {} "metadata": {}
@ -603,8 +603,8 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"### Task 4: Test the correlation between different variables and disease progression (Y)\r\n", "### Task 4: Test the correlation between different variables and disease progression (Y)\n",
"\r\n", "\n",
"> **Hint** Correlation matrix would give you the most useful information on which values are dependent." "> **Hint** Correlation matrix would give you the most useful information on which values are dependent."
], ],
"metadata": {} "metadata": {}
@ -847,7 +847,7 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"Conclusion:\r\n", "Conclusion:\n",
"* The strongest correlation of Y is BMI and S5 (blood sugar). This sounds reasonable." "* The strongest correlation of Y is BMI and S5 (blood sugar). This sounds reasonable."
], ],
"metadata": {} "metadata": {}
@ -856,10 +856,10 @@
"cell_type": "code", "cell_type": "code",
"execution_count": 26, "execution_count": 26,
"source": [ "source": [
"fig, ax = plt.subplots(1,3,figsize=(10,5))\r\n", "fig, ax = plt.subplots(1,3,figsize=(10,5))\n",
"for i,n in enumerate(['BMI','S5','BP']):\r\n", "for i,n in enumerate(['BMI','S5','BP']):\n",
" ax[i].scatter(df['Y'],df[n])\r\n", " ax[i].scatter(df['Y'],df[n])\n",
" ax[i].set_title(n)\r\n", " ax[i].set_title(n)\n",
"plt.show()" "plt.show()"
], ],
"outputs": [ "outputs": [
@ -888,9 +888,9 @@
"cell_type": "code", "cell_type": "code",
"execution_count": 27, "execution_count": 27,
"source": [ "source": [
"from scipy.stats import ttest_ind\r\n", "from scipy.stats import ttest_ind\n",
"\r\n", "\n",
"tval, pval = ttest_ind(df.loc[df['SEX']==1,['Y']], df.loc[df['SEX']==2,['Y']],equal_var=False)\r\n", "tval, pval = ttest_ind(df.loc[df['SEX']==1,['Y']], df.loc[df['SEX']==2,['Y']],equal_var=False)\n",
"print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")" "print(f\"T-value = {tval[0]:.2f}\\nP-value: {pval[0]}\")"
], ],
"outputs": [ "outputs": [
@ -942,4 +942,4 @@
}, },
"nbformat": 4, "nbformat": 4,
"nbformat_minor": 2 "nbformat_minor": 2
} }

Loading…
Cancel
Save