parent
2926383f96
commit
29fa7e1c12
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@ -0,0 +1,264 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Introduction to Probability and Statistics\n",
|
||||||
|
"## Assignment\n",
|
||||||
|
"\n",
|
||||||
|
"In this assignment, we will use the dataset of diabetes patients obtained [from here](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html).\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 13,
|
||||||
|
"source": [
|
||||||
|
"import pandas as pd\r\n",
|
||||||
|
"import numpy as np\r\n",
|
||||||
|
"\r\n",
|
||||||
|
"df = pd.read_csv(\"../../data/diabetes.tsv\",sep='\\t')\r\n",
|
||||||
|
"df.head()"
|
||||||
|
],
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"output_type": "execute_result",
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
" AGE SEX BMI BP S1 S2 S3 S4 S5 S6 Y\n",
|
||||||
|
"0 59 2 32.1 101.0 157 93.2 38.0 4.0 4.8598 87 151\n",
|
||||||
|
"1 48 1 21.6 87.0 183 103.2 70.0 3.0 3.8918 69 75\n",
|
||||||
|
"2 72 2 30.5 93.0 156 93.6 41.0 4.0 4.6728 85 141\n",
|
||||||
|
"3 24 1 25.3 84.0 198 131.4 40.0 5.0 4.8903 89 206\n",
|
||||||
|
"4 50 1 23.0 101.0 192 125.4 52.0 4.0 4.2905 80 135"
|
||||||
|
],
|
||||||
|
"text/html": [
|
||||||
|
"<div>\n",
|
||||||
|
"<style scoped>\n",
|
||||||
|
" .dataframe tbody tr th:only-of-type {\n",
|
||||||
|
" vertical-align: middle;\n",
|
||||||
|
" }\n",
|
||||||
|
"\n",
|
||||||
|
" .dataframe tbody tr th {\n",
|
||||||
|
" vertical-align: top;\n",
|
||||||
|
" }\n",
|
||||||
|
"\n",
|
||||||
|
" .dataframe thead th {\n",
|
||||||
|
" text-align: right;\n",
|
||||||
|
" }\n",
|
||||||
|
"</style>\n",
|
||||||
|
"<table border=\"1\" class=\"dataframe\">\n",
|
||||||
|
" <thead>\n",
|
||||||
|
" <tr style=\"text-align: right;\">\n",
|
||||||
|
" <th></th>\n",
|
||||||
|
" <th>AGE</th>\n",
|
||||||
|
" <th>SEX</th>\n",
|
||||||
|
" <th>BMI</th>\n",
|
||||||
|
" <th>BP</th>\n",
|
||||||
|
" <th>S1</th>\n",
|
||||||
|
" <th>S2</th>\n",
|
||||||
|
" <th>S3</th>\n",
|
||||||
|
" <th>S4</th>\n",
|
||||||
|
" <th>S5</th>\n",
|
||||||
|
" <th>S6</th>\n",
|
||||||
|
" <th>Y</th>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" </thead>\n",
|
||||||
|
" <tbody>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>0</th>\n",
|
||||||
|
" <td>59</td>\n",
|
||||||
|
" <td>2</td>\n",
|
||||||
|
" <td>32.1</td>\n",
|
||||||
|
" <td>101.0</td>\n",
|
||||||
|
" <td>157</td>\n",
|
||||||
|
" <td>93.2</td>\n",
|
||||||
|
" <td>38.0</td>\n",
|
||||||
|
" <td>4.0</td>\n",
|
||||||
|
" <td>4.8598</td>\n",
|
||||||
|
" <td>87</td>\n",
|
||||||
|
" <td>151</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>1</th>\n",
|
||||||
|
" <td>48</td>\n",
|
||||||
|
" <td>1</td>\n",
|
||||||
|
" <td>21.6</td>\n",
|
||||||
|
" <td>87.0</td>\n",
|
||||||
|
" <td>183</td>\n",
|
||||||
|
" <td>103.2</td>\n",
|
||||||
|
" <td>70.0</td>\n",
|
||||||
|
" <td>3.0</td>\n",
|
||||||
|
" <td>3.8918</td>\n",
|
||||||
|
" <td>69</td>\n",
|
||||||
|
" <td>75</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>2</th>\n",
|
||||||
|
" <td>72</td>\n",
|
||||||
|
" <td>2</td>\n",
|
||||||
|
" <td>30.5</td>\n",
|
||||||
|
" <td>93.0</td>\n",
|
||||||
|
" <td>156</td>\n",
|
||||||
|
" <td>93.6</td>\n",
|
||||||
|
" <td>41.0</td>\n",
|
||||||
|
" <td>4.0</td>\n",
|
||||||
|
" <td>4.6728</td>\n",
|
||||||
|
" <td>85</td>\n",
|
||||||
|
" <td>141</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>3</th>\n",
|
||||||
|
" <td>24</td>\n",
|
||||||
|
" <td>1</td>\n",
|
||||||
|
" <td>25.3</td>\n",
|
||||||
|
" <td>84.0</td>\n",
|
||||||
|
" <td>198</td>\n",
|
||||||
|
" <td>131.4</td>\n",
|
||||||
|
" <td>40.0</td>\n",
|
||||||
|
" <td>5.0</td>\n",
|
||||||
|
" <td>4.8903</td>\n",
|
||||||
|
" <td>89</td>\n",
|
||||||
|
" <td>206</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" <tr>\n",
|
||||||
|
" <th>4</th>\n",
|
||||||
|
" <td>50</td>\n",
|
||||||
|
" <td>1</td>\n",
|
||||||
|
" <td>23.0</td>\n",
|
||||||
|
" <td>101.0</td>\n",
|
||||||
|
" <td>192</td>\n",
|
||||||
|
" <td>125.4</td>\n",
|
||||||
|
" <td>52.0</td>\n",
|
||||||
|
" <td>4.0</td>\n",
|
||||||
|
" <td>4.2905</td>\n",
|
||||||
|
" <td>80</td>\n",
|
||||||
|
" <td>135</td>\n",
|
||||||
|
" </tr>\n",
|
||||||
|
" </tbody>\n",
|
||||||
|
"</table>\n",
|
||||||
|
"</div>"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"metadata": {},
|
||||||
|
"execution_count": 13
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"In this dataset, the columns are as follows:\n",
|
||||||
|
"* Age and sex are straightforward\n",
|
||||||
|
"* BMI refers to body mass index\n",
|
||||||
|
"* BP represents average blood pressure\n",
|
||||||
|
"* S1 to S6 are various blood measurements\n",
|
||||||
|
"* Y is a qualitative indicator of disease progression over the course of one year\n",
|
||||||
|
"\n",
|
||||||
|
"Let's analyze this dataset using probability and statistical methods.\n",
|
||||||
|
"\n",
|
||||||
|
"### Task 1: Calculate the mean and variance for all values\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"### Task 2: Plot boxplots for BMI, BP, and Y depending on gender\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"### Task 3: What is the distribution of Age, Sex, BMI, and Y variables?\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"### Task 4: Test the correlation between different variables and disease progression (Y)\n",
|
||||||
|
"\n",
|
||||||
|
"> **Hint** The correlation matrix will provide the most valuable insights into which values are interdependent.\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"### Task 5: Test the hypothesis that the degree of diabetes progression is different between men and women\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n---\n\n**Disclaimer**: \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the definitive source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python",
|
||||||
|
"version": "3.8.8",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"file_extension": ".py"
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3.8.8 64-bit (conda)"
|
||||||
|
},
|
||||||
|
"interpreter": {
|
||||||
|
"hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5"
|
||||||
|
},
|
||||||
|
"coopTranslator": {
|
||||||
|
"original_hash": "defe9f96b3d327a6f37d795c43ad0219",
|
||||||
|
"translation_date": "2025-09-03T20:43:46+00:00",
|
||||||
|
"source_file": "1-Introduction/04-stats-and-probability/assignment.ipynb",
|
||||||
|
"language_code": "en"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,82 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# Let's learn about birds\n",
|
||||||
|
"\n",
|
||||||
|
"Birds are fascinating creatures that can be found all over the world. They come in a wide variety of shapes, sizes, and colors, and they play an important role in ecosystems.\n",
|
||||||
|
"\n",
|
||||||
|
"## Characteristics of birds\n",
|
||||||
|
"\n",
|
||||||
|
"- **Feathers**: All birds have feathers, which help them fly, stay warm, and attract mates.\n",
|
||||||
|
"- **Beaks**: Birds have beaks instead of teeth, and the shape of their beak often reflects their diet.\n",
|
||||||
|
"- **Eggs**: Birds lay eggs, and their nests can be simple or elaborate depending on the species.\n",
|
||||||
|
"- **Flight**: Most birds can fly, although some, like penguins and ostriches, have adapted to other ways of moving.\n",
|
||||||
|
"\n",
|
||||||
|
"## Why are birds important?\n",
|
||||||
|
"\n",
|
||||||
|
"Birds contribute to the environment in many ways:\n",
|
||||||
|
"- **Pollination**: Some birds help pollinate plants by transferring pollen as they feed on nectar.\n",
|
||||||
|
"- **Seed dispersal**: Birds spread seeds, helping plants grow in new areas.\n",
|
||||||
|
"- **Pest control**: Many birds eat insects, keeping pest populations in check.\n",
|
||||||
|
"- **Indicator species**: Birds can signal changes in the environment, such as pollution or habitat loss.\n",
|
||||||
|
"\n",
|
||||||
|
"## Fun facts about birds\n",
|
||||||
|
"\n",
|
||||||
|
"- The smallest bird in the world is the bee hummingbird, which is about the size of a thumb.\n",
|
||||||
|
"- The ostrich is the largest bird and can run at speeds of up to 70 km/h (43 mph).\n",
|
||||||
|
"- Some birds, like parrots, can mimic human speech and other sounds.\n",
|
||||||
|
"- Birds have excellent vision, and some species can see ultraviolet light.\n",
|
||||||
|
"\n",
|
||||||
|
"## How can we help birds?\n",
|
||||||
|
"\n",
|
||||||
|
"Here are some ways to support bird populations:\n",
|
||||||
|
"- **Protect habitats**: Preserve forests, wetlands, and other areas where birds live.\n",
|
||||||
|
"- **Provide food and water**: Set up bird feeders and water sources in your yard.\n",
|
||||||
|
"- **Avoid harmful chemicals**: Reduce the use of pesticides and other substances that can harm birds.\n",
|
||||||
|
"- **Participate in citizen science**: Join bird-watching groups or contribute to bird population studies.\n",
|
||||||
|
"\n",
|
||||||
|
"Birds are incredible creatures that enrich our lives and the planet. By learning more about them and taking steps to protect them, we can ensure they continue to thrive for generations to come.\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n---\n\n**Disclaimer**: \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python",
|
||||||
|
"version": "3.7.0",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"file_extension": ".py"
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3.7.0 64-bit"
|
||||||
|
},
|
||||||
|
"interpreter": {
|
||||||
|
"hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
|
||||||
|
},
|
||||||
|
"coopTranslator": {
|
||||||
|
"original_hash": "33e5c5d3f0630388e20f2e161bd4cdf3",
|
||||||
|
"translation_date": "2025-09-03T20:41:56+00:00",
|
||||||
|
"source_file": "3-Data-Visualization/09-visualization-quantities/notebook.ipynb",
|
||||||
|
"language_code": "en"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
File diff suppressed because one or more lines are too long
@ -0,0 +1,32 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# Bird distributions\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n---\n\n**Disclaimer**: \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
},
|
||||||
|
"coopTranslator": {
|
||||||
|
"original_hash": "e5272cbcbffd1ddcc09e44d3d8e7e8cd",
|
||||||
|
"translation_date": "2025-09-03T20:42:29+00:00",
|
||||||
|
"source_file": "3-Data-Visualization/10-visualization-distributions/notebook.ipynb",
|
||||||
|
"language_code": "en"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
File diff suppressed because one or more lines are too long
@ -0,0 +1,32 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# 🍄 Mushroom Proportions\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n---\n\n**Disclaimer**: \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
},
|
||||||
|
"coopTranslator": {
|
||||||
|
"original_hash": "397e9bbc0743761dbf72e5f16b7043e6",
|
||||||
|
"translation_date": "2025-09-03T20:41:39+00:00",
|
||||||
|
"source_file": "3-Data-Visualization/11-visualization-proportions/notebook.ipynb",
|
||||||
|
"language_code": "en"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
File diff suppressed because one or more lines are too long
@ -0,0 +1,32 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# Visualizing Honey Production 🍯 🐝\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n---\n\n**Disclaimer**: \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
},
|
||||||
|
"coopTranslator": {
|
||||||
|
"original_hash": "0f988634b7192626d91cc33b4b6388c5",
|
||||||
|
"translation_date": "2025-09-03T20:42:14+00:00",
|
||||||
|
"source_file": "3-Data-Visualization/12-visualization-relationships/notebook.ipynb",
|
||||||
|
"language_code": "en"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
File diff suppressed because one or more lines are too long
@ -0,0 +1,140 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# NYC Taxi data in Winter and Summer\n",
|
||||||
|
"\n",
|
||||||
|
"Refer to the [Data dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) to learn more about the columns that have been provided.\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#Install the pandas library\r\n",
|
||||||
|
"!pip install pandas"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {
|
||||||
|
"scrolled": true
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 7,
|
||||||
|
"source": [
|
||||||
|
"import pandas as pd\r\n",
|
||||||
|
"\r\n",
|
||||||
|
"path = '../../data/taxi.csv'\r\n",
|
||||||
|
"\r\n",
|
||||||
|
"#Load the csv file into a dataframe\r\n",
|
||||||
|
"df = pd.read_csv(path)\r\n",
|
||||||
|
"\r\n",
|
||||||
|
"#Print the dataframe\r\n",
|
||||||
|
"print(df)\r\n"
|
||||||
|
],
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"output_type": "stream",
|
||||||
|
"name": "stdout",
|
||||||
|
"text": [
|
||||||
|
" VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count \\\n",
|
||||||
|
"0 2.0 2019-07-15 16:27:53 2019-07-15 16:44:21 3.0 \n",
|
||||||
|
"1 2.0 2019-07-17 20:26:35 2019-07-17 20:40:09 6.0 \n",
|
||||||
|
"2 2.0 2019-07-06 16:01:08 2019-07-06 16:10:25 1.0 \n",
|
||||||
|
"3 1.0 2019-07-18 22:32:23 2019-07-18 22:35:08 1.0 \n",
|
||||||
|
"4 2.0 2019-07-19 14:54:29 2019-07-19 15:19:08 1.0 \n",
|
||||||
|
".. ... ... ... ... \n",
|
||||||
|
"195 2.0 2019-01-18 08:42:15 2019-01-18 08:56:57 1.0 \n",
|
||||||
|
"196 1.0 2019-01-19 04:34:45 2019-01-19 04:43:44 1.0 \n",
|
||||||
|
"197 2.0 2019-01-05 10:37:39 2019-01-05 10:42:03 1.0 \n",
|
||||||
|
"198 2.0 2019-01-23 10:36:29 2019-01-23 10:44:34 2.0 \n",
|
||||||
|
"199 2.0 2019-01-30 06:55:58 2019-01-30 07:07:02 5.0 \n",
|
||||||
|
"\n",
|
||||||
|
" trip_distance RatecodeID store_and_fwd_flag PULocationID DOLocationID \\\n",
|
||||||
|
"0 2.02 1.0 N 186 233 \n",
|
||||||
|
"1 1.59 1.0 N 141 161 \n",
|
||||||
|
"2 1.69 1.0 N 246 249 \n",
|
||||||
|
"3 0.90 1.0 N 229 141 \n",
|
||||||
|
"4 4.79 1.0 N 237 107 \n",
|
||||||
|
".. ... ... ... ... ... \n",
|
||||||
|
"195 1.18 1.0 N 43 237 \n",
|
||||||
|
"196 2.30 1.0 N 148 234 \n",
|
||||||
|
"197 0.83 1.0 N 237 263 \n",
|
||||||
|
"198 1.12 1.0 N 144 113 \n",
|
||||||
|
"199 2.41 1.0 N 209 107 \n",
|
||||||
|
"\n",
|
||||||
|
" payment_type fare_amount extra mta_tax tip_amount tolls_amount \\\n",
|
||||||
|
"0 1.0 12.0 1.0 0.5 4.08 0.0 \n",
|
||||||
|
"1 2.0 10.0 0.5 0.5 0.00 0.0 \n",
|
||||||
|
"2 2.0 8.5 0.0 0.5 0.00 0.0 \n",
|
||||||
|
"3 1.0 4.5 3.0 0.5 1.65 0.0 \n",
|
||||||
|
"4 1.0 19.5 0.0 0.5 5.70 0.0 \n",
|
||||||
|
".. ... ... ... ... ... ... \n",
|
||||||
|
"195 1.0 10.0 0.0 0.5 2.16 0.0 \n",
|
||||||
|
"196 1.0 9.5 0.5 0.5 2.15 0.0 \n",
|
||||||
|
"197 1.0 5.0 0.0 0.5 1.16 0.0 \n",
|
||||||
|
"198 2.0 7.0 0.0 0.5 0.00 0.0 \n",
|
||||||
|
"199 1.0 10.5 0.0 0.5 1.00 0.0 \n",
|
||||||
|
"\n",
|
||||||
|
" improvement_surcharge total_amount congestion_surcharge \n",
|
||||||
|
"0 0.3 20.38 2.5 \n",
|
||||||
|
"1 0.3 13.80 2.5 \n",
|
||||||
|
"2 0.3 11.80 2.5 \n",
|
||||||
|
"3 0.3 9.95 2.5 \n",
|
||||||
|
"4 0.3 28.50 2.5 \n",
|
||||||
|
".. ... ... ... \n",
|
||||||
|
"195 0.3 12.96 0.0 \n",
|
||||||
|
"196 0.3 12.95 0.0 \n",
|
||||||
|
"197 0.3 6.96 0.0 \n",
|
||||||
|
"198 0.3 7.80 0.0 \n",
|
||||||
|
"199 0.3 12.30 0.0 \n",
|
||||||
|
"\n",
|
||||||
|
"[200 rows x 18 columns]\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n---\n\n**Disclaimer**: \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we strive for accuracy, please note that automated translations may contain errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is recommended. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3.9.7 64-bit ('venv': venv)"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"version": "3.9.7",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"file_extension": ".py"
|
||||||
|
},
|
||||||
|
"name": "04-nyc-taxi-join-weather-in-pandas",
|
||||||
|
"notebookId": 1709144033725344,
|
||||||
|
"interpreter": {
|
||||||
|
"hash": "6b9b57232c4b57163d057191678da2030059e733b8becc68f245de5a75abe84e"
|
||||||
|
},
|
||||||
|
"coopTranslator": {
|
||||||
|
"original_hash": "3bd4c20c4e8f3158f483f0f1cc543bb1",
|
||||||
|
"translation_date": "2025-09-03T20:41:34+00:00",
|
||||||
|
"source_file": "4-Data-Science-Lifecycle/14-Introduction/notebook.ipynb",
|
||||||
|
"language_code": "en"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
@ -0,0 +1,154 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# NYC Taxi data in Winter and Summer\n",
|
||||||
|
"\n",
|
||||||
|
"Refer to the [Data dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) to learn more about the columns that have been provided.\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"#Install the pandas library\r\n",
|
||||||
|
"!pip install pandas"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {
|
||||||
|
"scrolled": true
|
||||||
|
}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 7,
|
||||||
|
"source": [
|
||||||
|
"import pandas as pd\r\n",
|
||||||
|
"\r\n",
|
||||||
|
"path = '../../data/taxi.csv'\r\n",
|
||||||
|
"\r\n",
|
||||||
|
"#Load the csv file into a dataframe\r\n",
|
||||||
|
"df = pd.read_csv(path)\r\n",
|
||||||
|
"\r\n",
|
||||||
|
"#Print the dataframe\r\n",
|
||||||
|
"print(df)\r\n"
|
||||||
|
],
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"output_type": "stream",
|
||||||
|
"name": "stdout",
|
||||||
|
"text": [
|
||||||
|
" VendorID tpep_pickup_datetime tpep_dropoff_datetime passenger_count \\\n",
|
||||||
|
"0 2.0 2019-07-15 16:27:53 2019-07-15 16:44:21 3.0 \n",
|
||||||
|
"1 2.0 2019-07-17 20:26:35 2019-07-17 20:40:09 6.0 \n",
|
||||||
|
"2 2.0 2019-07-06 16:01:08 2019-07-06 16:10:25 1.0 \n",
|
||||||
|
"3 1.0 2019-07-18 22:32:23 2019-07-18 22:35:08 1.0 \n",
|
||||||
|
"4 2.0 2019-07-19 14:54:29 2019-07-19 15:19:08 1.0 \n",
|
||||||
|
".. ... ... ... ... \n",
|
||||||
|
"195 2.0 2019-01-18 08:42:15 2019-01-18 08:56:57 1.0 \n",
|
||||||
|
"196 1.0 2019-01-19 04:34:45 2019-01-19 04:43:44 1.0 \n",
|
||||||
|
"197 2.0 2019-01-05 10:37:39 2019-01-05 10:42:03 1.0 \n",
|
||||||
|
"198 2.0 2019-01-23 10:36:29 2019-01-23 10:44:34 2.0 \n",
|
||||||
|
"199 2.0 2019-01-30 06:55:58 2019-01-30 07:07:02 5.0 \n",
|
||||||
|
"\n",
|
||||||
|
" trip_distance RatecodeID store_and_fwd_flag PULocationID DOLocationID \\\n",
|
||||||
|
"0 2.02 1.0 N 186 233 \n",
|
||||||
|
"1 1.59 1.0 N 141 161 \n",
|
||||||
|
"2 1.69 1.0 N 246 249 \n",
|
||||||
|
"3 0.90 1.0 N 229 141 \n",
|
||||||
|
"4 4.79 1.0 N 237 107 \n",
|
||||||
|
".. ... ... ... ... ... \n",
|
||||||
|
"195 1.18 1.0 N 43 237 \n",
|
||||||
|
"196 2.30 1.0 N 148 234 \n",
|
||||||
|
"197 0.83 1.0 N 237 263 \n",
|
||||||
|
"198 1.12 1.0 N 144 113 \n",
|
||||||
|
"199 2.41 1.0 N 209 107 \n",
|
||||||
|
"\n",
|
||||||
|
" payment_type fare_amount extra mta_tax tip_amount tolls_amount \\\n",
|
||||||
|
"0 1.0 12.0 1.0 0.5 4.08 0.0 \n",
|
||||||
|
"1 2.0 10.0 0.5 0.5 0.00 0.0 \n",
|
||||||
|
"2 2.0 8.5 0.0 0.5 0.00 0.0 \n",
|
||||||
|
"3 1.0 4.5 3.0 0.5 1.65 0.0 \n",
|
||||||
|
"4 1.0 19.5 0.0 0.5 5.70 0.0 \n",
|
||||||
|
".. ... ... ... ... ... ... \n",
|
||||||
|
"195 1.0 10.0 0.0 0.5 2.16 0.0 \n",
|
||||||
|
"196 1.0 9.5 0.5 0.5 2.15 0.0 \n",
|
||||||
|
"197 1.0 5.0 0.0 0.5 1.16 0.0 \n",
|
||||||
|
"198 2.0 7.0 0.0 0.5 0.00 0.0 \n",
|
||||||
|
"199 1.0 10.5 0.0 0.5 1.00 0.0 \n",
|
||||||
|
"\n",
|
||||||
|
" improvement_surcharge total_amount congestion_surcharge \n",
|
||||||
|
"0 0.3 20.38 2.5 \n",
|
||||||
|
"1 0.3 13.80 2.5 \n",
|
||||||
|
"2 0.3 11.80 2.5 \n",
|
||||||
|
"3 0.3 9.95 2.5 \n",
|
||||||
|
"4 0.3 28.50 2.5 \n",
|
||||||
|
".. ... ... ... \n",
|
||||||
|
"195 0.3 12.96 0.0 \n",
|
||||||
|
"196 0.3 12.95 0.0 \n",
|
||||||
|
"197 0.3 6.96 0.0 \n",
|
||||||
|
"198 0.3 7.80 0.0 \n",
|
||||||
|
"199 0.3 12.30 0.0 \n",
|
||||||
|
"\n",
|
||||||
|
"[200 rows x 18 columns]\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# Use the cells below to do your own Exploratory Data Analysis\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n---\n\n**Disclaimer**: \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3.9.7 64-bit ('venv': venv)"
|
||||||
|
},
|
||||||
|
"language_info": {
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"name": "python",
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"version": "3.9.7",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"file_extension": ".py"
|
||||||
|
},
|
||||||
|
"name": "04-nyc-taxi-join-weather-in-pandas",
|
||||||
|
"notebookId": 1709144033725344,
|
||||||
|
"interpreter": {
|
||||||
|
"hash": "6b9b57232c4b57163d057191678da2030059e733b8becc68f245de5a75abe84e"
|
||||||
|
},
|
||||||
|
"coopTranslator": {
|
||||||
|
"original_hash": "7bca1c1abc1e55842817b62e44e1a963",
|
||||||
|
"translation_date": "2025-09-03T20:41:31+00:00",
|
||||||
|
"source_file": "4-Data-Science-Lifecycle/15-analyzing/assignment.ipynb",
|
||||||
|
"language_code": "en"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
@ -0,0 +1,193 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# Analyzing Data\n",
|
||||||
|
"Examples of the Pandas functions mentioned in the [lesson](README.md).\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 1,
|
||||||
|
"source": [
|
||||||
|
"import pandas as pd\r\n",
|
||||||
|
"import glob\r\n",
|
||||||
|
"\r\n",
|
||||||
|
"#Loading the dataset\r\n",
|
||||||
|
"path = '../../data/emails.csv'\r\n",
|
||||||
|
"email_df = pd.read_csv(path)"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 2,
|
||||||
|
"source": [
|
||||||
|
"# Using Describe on the email dataset\r\n",
|
||||||
|
"print(email_df.describe())"
|
||||||
|
],
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"output_type": "stream",
|
||||||
|
"name": "stdout",
|
||||||
|
"text": [
|
||||||
|
" the to ect and for of \\\n",
|
||||||
|
"count 406.000000 406.000000 406.000000 406.000000 406.000000 406.000000 \n",
|
||||||
|
"mean 7.022167 6.519704 4.948276 3.059113 3.502463 2.662562 \n",
|
||||||
|
"std 10.945522 9.801907 9.293820 6.267806 4.901372 5.443939 \n",
|
||||||
|
"min 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 \n",
|
||||||
|
"25% 1.000000 1.000000 1.000000 0.000000 1.000000 0.000000 \n",
|
||||||
|
"50% 3.000000 3.000000 2.000000 1.000000 2.000000 1.000000 \n",
|
||||||
|
"75% 9.000000 7.750000 4.000000 3.000000 4.750000 3.000000 \n",
|
||||||
|
"max 99.000000 88.000000 79.000000 69.000000 39.000000 57.000000 \n",
|
||||||
|
"\n",
|
||||||
|
" a you in on is this \\\n",
|
||||||
|
"count 406.000000 406.000000 406.000000 406.000000 406.000000 406.000000 \n",
|
||||||
|
"mean 57.017241 2.394089 10.817734 11.591133 5.901478 1.485222 \n",
|
||||||
|
"std 78.868243 4.067015 19.050972 16.407175 8.793103 2.912473 \n",
|
||||||
|
"min 0.000000 0.000000 0.000000 0.000000 0.000000 0.000000 \n",
|
||||||
|
"25% 15.000000 0.000000 1.250000 3.000000 1.000000 0.000000 \n",
|
||||||
|
"50% 29.000000 1.000000 5.000000 6.000000 3.000000 0.000000 \n",
|
||||||
|
"75% 61.000000 3.000000 12.000000 13.000000 7.000000 2.000000 \n",
|
||||||
|
"max 843.000000 31.000000 223.000000 125.000000 61.000000 24.000000 \n",
|
||||||
|
"\n",
|
||||||
|
" i be that will \n",
|
||||||
|
"count 406.000000 406.000000 406.000000 406.000000 \n",
|
||||||
|
"mean 47.155172 2.950739 1.034483 0.955665 \n",
|
||||||
|
"std 71.043009 4.297865 1.904846 2.042271 \n",
|
||||||
|
"min 0.000000 0.000000 0.000000 0.000000 \n",
|
||||||
|
"25% 11.000000 1.000000 0.000000 0.000000 \n",
|
||||||
|
"50% 24.000000 1.000000 0.000000 0.000000 \n",
|
||||||
|
"75% 50.750000 3.000000 1.000000 1.000000 \n",
|
||||||
|
"max 754.000000 40.000000 14.000000 24.000000 \n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 5,
|
||||||
|
"source": [
|
||||||
|
"# Sampling 10 emails\r\n",
|
||||||
|
"print(email_df.sample(10))"
|
||||||
|
],
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"output_type": "stream",
|
||||||
|
"name": "stdout",
|
||||||
|
"text": [
|
||||||
|
" Email No. the to ect and for of a you in on is this i \\\n",
|
||||||
|
"150 Email 151 0 1 2 0 3 0 15 0 0 5 0 0 7 \n",
|
||||||
|
"380 Email 5147 0 3 2 0 0 0 7 0 1 1 0 0 3 \n",
|
||||||
|
"19 Email 20 3 4 11 0 4 2 32 1 1 3 9 5 25 \n",
|
||||||
|
"300 Email 301 2 1 1 0 1 1 15 2 2 3 2 0 8 \n",
|
||||||
|
"307 Email 308 0 0 1 0 0 0 1 0 1 0 0 0 2 \n",
|
||||||
|
"167 Email 168 2 2 2 1 5 1 24 2 5 6 4 0 30 \n",
|
||||||
|
"320 Email 321 10 12 4 6 8 6 187 5 26 28 23 2 171 \n",
|
||||||
|
"61 Email 62 0 1 1 0 4 1 15 4 4 3 3 0 19 \n",
|
||||||
|
"26 Email 27 5 4 1 1 4 4 51 0 8 6 6 2 44 \n",
|
||||||
|
"73 Email 74 0 0 1 0 0 0 7 0 4 3 0 0 6 \n",
|
||||||
|
"\n",
|
||||||
|
" be that will \n",
|
||||||
|
"150 1 0 0 \n",
|
||||||
|
"380 0 0 0 \n",
|
||||||
|
"19 3 0 1 \n",
|
||||||
|
"300 0 0 0 \n",
|
||||||
|
"307 0 0 0 \n",
|
||||||
|
"167 2 0 0 \n",
|
||||||
|
"320 5 1 1 \n",
|
||||||
|
"61 2 0 0 \n",
|
||||||
|
"26 6 0 0 \n",
|
||||||
|
"73 0 0 0 \n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 14,
|
||||||
|
"source": [
|
||||||
|
"# Returns rows where there are more occurrences of \"to\" than \"the\"\r\n",
|
||||||
|
"print(email_df.query('the < to'))"
|
||||||
|
],
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"output_type": "stream",
|
||||||
|
"name": "stdout",
|
||||||
|
"text": [
|
||||||
|
" Email No. the to ect and for of a you in on is this i \\\n",
|
||||||
|
"1 Email 2 8 13 24 6 6 2 102 1 18 21 13 0 61 \n",
|
||||||
|
"3 Email 4 0 5 22 0 5 1 51 2 1 5 9 2 16 \n",
|
||||||
|
"5 Email 6 4 5 1 4 2 3 45 1 16 12 8 1 52 \n",
|
||||||
|
"7 Email 8 0 2 2 3 1 2 21 6 2 6 2 0 28 \n",
|
||||||
|
"13 Email 14 4 5 7 1 5 1 37 1 8 8 6 1 43 \n",
|
||||||
|
".. ... ... .. ... ... ... .. ... ... .. .. .. ... .. \n",
|
||||||
|
"390 Email 5157 4 13 1 0 3 1 48 2 8 26 9 1 45 \n",
|
||||||
|
"393 Email 5160 2 13 1 0 2 1 38 2 7 24 6 1 34 \n",
|
||||||
|
"396 Email 5163 2 3 1 2 1 2 32 0 7 3 2 0 26 \n",
|
||||||
|
"404 Email 5171 2 7 1 0 2 1 28 2 8 11 7 1 39 \n",
|
||||||
|
"405 Email 5172 22 24 5 1 6 5 148 8 23 13 5 4 99 \n",
|
||||||
|
"\n",
|
||||||
|
" be that will \n",
|
||||||
|
"1 4 2 0 \n",
|
||||||
|
"3 2 0 0 \n",
|
||||||
|
"5 2 0 0 \n",
|
||||||
|
"7 1 0 1 \n",
|
||||||
|
"13 1 0 1 \n",
|
||||||
|
".. .. ... ... \n",
|
||||||
|
"390 1 0 0 \n",
|
||||||
|
"393 1 0 0 \n",
|
||||||
|
"396 3 0 0 \n",
|
||||||
|
"404 1 0 0 \n",
|
||||||
|
"405 6 4 1 \n",
|
||||||
|
"\n",
|
||||||
|
"[169 rows x 17 columns]\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n---\n\n**Disclaimer**: \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python",
|
||||||
|
"version": "3.9.7",
|
||||||
|
"mimetype": "text/x-python",
|
||||||
|
"codemirror_mode": {
|
||||||
|
"name": "ipython",
|
||||||
|
"version": 3
|
||||||
|
},
|
||||||
|
"pygments_lexer": "ipython3",
|
||||||
|
"nbconvert_exporter": "python",
|
||||||
|
"file_extension": ".py"
|
||||||
|
},
|
||||||
|
"kernelspec": {
|
||||||
|
"name": "python3",
|
||||||
|
"display_name": "Python 3.9.7 64-bit ('venv': venv)"
|
||||||
|
},
|
||||||
|
"interpreter": {
|
||||||
|
"hash": "6b9b57232c4b57163d057191678da2030059e733b8becc68f245de5a75abe84e"
|
||||||
|
},
|
||||||
|
"coopTranslator": {
|
||||||
|
"original_hash": "9d102c8c3cdbc8ea4e92fc32593462c6",
|
||||||
|
"translation_date": "2025-09-03T20:41:25+00:00",
|
||||||
|
"source_file": "4-Data-Science-Lifecycle/15-analyzing/notebook.ipynb",
|
||||||
|
"language_code": "en"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
@ -0,0 +1,323 @@
|
|||||||
|
{
|
||||||
|
"cells": [
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"# Data Science in the Cloud: The \"Azure ML SDK\" way \n",
|
||||||
|
"\n",
|
||||||
|
"## Introduction\n",
|
||||||
|
"\n",
|
||||||
|
"In this notebook, we will explore how to use the Azure ML SDK to train, deploy, and utilize a model through Azure ML.\n",
|
||||||
|
"\n",
|
||||||
|
"Prerequisites:\n",
|
||||||
|
"1. You have created an Azure ML workspace.\n",
|
||||||
|
"2. You have uploaded the [Heart Failure dataset](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data) into Azure ML.\n",
|
||||||
|
"3. You have added this notebook to Azure ML Studio.\n",
|
||||||
|
"\n",
|
||||||
|
"The steps to follow are:\n",
|
||||||
|
"\n",
|
||||||
|
"1. Create an Experiment in an existing Workspace.\n",
|
||||||
|
"2. Set up a Compute cluster.\n",
|
||||||
|
"3. Load the dataset.\n",
|
||||||
|
"4. Configure AutoML using AutoMLConfig.\n",
|
||||||
|
"5. Execute the AutoML experiment.\n",
|
||||||
|
"6. Review the results and identify the best model.\n",
|
||||||
|
"7. Register the best model.\n",
|
||||||
|
"8. Deploy the best model.\n",
|
||||||
|
"9. Use the endpoint.\n",
|
||||||
|
"\n",
|
||||||
|
"## Azure Machine Learning SDK-specific imports\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"from azureml.core import Workspace, Experiment\n",
|
||||||
|
"from azureml.core.compute import AmlCompute\n",
|
||||||
|
"from azureml.train.automl import AutoMLConfig\n",
|
||||||
|
"from azureml.widgets import RunDetails\n",
|
||||||
|
"from azureml.core.model import InferenceConfig, Model\n",
|
||||||
|
"from azureml.core.webservice import AciWebservice"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Initialize Workspace\n",
|
||||||
|
"Initialize a workspace object using the saved configuration. Ensure the config file is located at .\\config.json\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"ws = Workspace.from_config()\n",
|
||||||
|
"print(ws.name, ws.resource_group, ws.location, ws.subscription_id, sep = '\\n')"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Create an Azure ML experiment\n",
|
||||||
|
"\n",
|
||||||
|
"Let's create an experiment named 'aml-experiment' in the workspace we just initialized.\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"experiment_name = 'aml-experiment'\n",
|
||||||
|
"experiment = Experiment(ws, experiment_name)\n",
|
||||||
|
"experiment"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Create a Compute Cluster\n",
|
||||||
|
"You need to create a [compute target](https://docs.microsoft.com/azure/machine-learning/concept-azure-machine-learning-architecture#compute-target) for your AutoML run.\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"aml_name = \"heart-f-cluster\"\n",
|
||||||
|
"try:\n",
|
||||||
|
" aml_compute = AmlCompute(ws, aml_name)\n",
|
||||||
|
" print('Found existing AML compute context.')\n",
|
||||||
|
"except:\n",
|
||||||
|
" print('Creating new AML compute context.')\n",
|
||||||
|
" aml_config = AmlCompute.provisioning_configuration(vm_size = \"Standard_D2_v2\", min_nodes=1, max_nodes=3)\n",
|
||||||
|
" aml_compute = AmlCompute.create(ws, name = aml_name, provisioning_configuration = aml_config)\n",
|
||||||
|
" aml_compute.wait_for_completion(show_output = True)\n",
|
||||||
|
"\n",
|
||||||
|
"cts = ws.compute_targets\n",
|
||||||
|
"compute_target = cts[aml_name]"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Data\n",
|
||||||
|
"Ensure that the dataset has been uploaded to Azure ML and that the key matches the dataset name exactly.\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"key = 'heart-failure-records'\n",
|
||||||
|
"dataset = ws.datasets[key]\n",
|
||||||
|
"df = dataset.to_pandas_dataframe()\n",
|
||||||
|
"df.describe()"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## AutoML Configuration\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"automl_settings = {\n",
|
||||||
|
" \"experiment_timeout_minutes\": 20,\n",
|
||||||
|
" \"max_concurrent_iterations\": 3,\n",
|
||||||
|
" \"primary_metric\" : 'AUC_weighted'\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"automl_config = AutoMLConfig(compute_target=compute_target,\n",
|
||||||
|
" task = \"classification\",\n",
|
||||||
|
" training_data=dataset,\n",
|
||||||
|
" label_column_name=\"DEATH_EVENT\",\n",
|
||||||
|
" enable_early_stopping= True,\n",
|
||||||
|
" featurization= 'auto',\n",
|
||||||
|
" debug_log = \"automl_errors.log\",\n",
|
||||||
|
" **automl_settings\n",
|
||||||
|
" )"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## AutoML Run\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"remote_run = experiment.submit(automl_config)"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"RunDetails(remote_run).show()"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"best_run, fitted_model = remote_run.get_output()"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"best_run.get_properties()"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"model_name = best_run.properties['model_name']\n",
|
||||||
|
"script_file_name = 'inference/score.py'\n",
|
||||||
|
"best_run.download_file('outputs/scoring_file_v_1_0_0.py', 'inference/score.py')\n",
|
||||||
|
"description = \"aml heart failure project sdk\"\n",
|
||||||
|
"model = best_run.register_model(model_name = model_name,\n",
|
||||||
|
" description = description,\n",
|
||||||
|
" tags = None)"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Deploy the Best Model\n",
|
||||||
|
"\n",
|
||||||
|
"Run the following code to deploy the best model. You can check the deployment status in the Azure ML portal. This process may take a few minutes.\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"inference_config = InferenceConfig(entry_script=script_file_name, environment=best_run.get_environment())\n",
|
||||||
|
"\n",
|
||||||
|
"aciconfig = AciWebservice.deploy_configuration(cpu_cores = 1,\n",
|
||||||
|
" memory_gb = 1,\n",
|
||||||
|
" tags = {'type': \"automl-heart-failure-prediction\"},\n",
|
||||||
|
" description = 'Sample service for AutoML Heart Failure Prediction')\n",
|
||||||
|
"\n",
|
||||||
|
"aci_service_name = 'automl-hf-sdk'\n",
|
||||||
|
"aci_service = Model.deploy(ws, aci_service_name, [model], inference_config, aciconfig)\n",
|
||||||
|
"aci_service.wait_for_deployment(True)\n",
|
||||||
|
"print(aci_service.state)"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"source": [
|
||||||
|
"## Use the Endpoint\n",
|
||||||
|
"You can provide inputs based on the sample input below.\n"
|
||||||
|
],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"data = {\n",
|
||||||
|
" \"data\":\n",
|
||||||
|
" [\n",
|
||||||
|
" {\n",
|
||||||
|
" 'age': \"60\",\n",
|
||||||
|
" 'anaemia': \"false\",\n",
|
||||||
|
" 'creatinine_phosphokinase': \"500\",\n",
|
||||||
|
" 'diabetes': \"false\",\n",
|
||||||
|
" 'ejection_fraction': \"38\",\n",
|
||||||
|
" 'high_blood_pressure': \"false\",\n",
|
||||||
|
" 'platelets': \"260000\",\n",
|
||||||
|
" 'serum_creatinine': \"1.40\",\n",
|
||||||
|
" 'serum_sodium': \"137\",\n",
|
||||||
|
" 'sex': \"false\",\n",
|
||||||
|
" 'smoking': \"false\",\n",
|
||||||
|
" 'time': \"130\",\n",
|
||||||
|
" },\n",
|
||||||
|
" ],\n",
|
||||||
|
"}\n",
|
||||||
|
"\n",
|
||||||
|
"test_sample = str.encode(json.dumps(data))"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": null,
|
||||||
|
"source": [
|
||||||
|
"response = aci_service.run(input_data=test_sample)\n",
|
||||||
|
"response"
|
||||||
|
],
|
||||||
|
"outputs": [],
|
||||||
|
"metadata": {}
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"\n---\n\n**Disclaimer**: \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
|
||||||
|
]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"metadata": {
|
||||||
|
"orig_nbformat": 4,
|
||||||
|
"language_info": {
|
||||||
|
"name": "python"
|
||||||
|
},
|
||||||
|
"coopTranslator": {
|
||||||
|
"original_hash": "af42669556d5dc19fc4cc3866f7d2597",
|
||||||
|
"translation_date": "2025-09-03T20:36:29+00:00",
|
||||||
|
"source_file": "5-Data-Science-In-Cloud/19-Azure/notebook.ipynb",
|
||||||
|
"language_code": "en"
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"nbformat": 4,
|
||||||
|
"nbformat_minor": 2
|
||||||
|
}
|
@ -1,40 +1,78 @@
|
|||||||
<!--
|
<!--
|
||||||
CO_OP_TRANSLATOR_METADATA:
|
CO_OP_TRANSLATOR_METADATA:
|
||||||
{
|
{
|
||||||
"original_hash": "2583a9894af7123b2fcae3376b14c035",
|
"original_hash": "8141e7195841682914be03ef930fe43d",
|
||||||
"translation_date": "2025-08-27T17:13:35+00:00",
|
"translation_date": "2025-09-03T20:09:19+00:00",
|
||||||
"source_file": "1-Introduction/01-defining-data-science/README.md",
|
"source_file": "1-Introduction/01-defining-data-science/README.md",
|
||||||
"language_code": "mr"
|
"language_code": "mr"
|
||||||
}
|
}
|
||||||
-->
|
-->
|
||||||
|
## डेटा प्रकार
|
||||||
|
|
||||||
आपण असा युक्तिवाद करू शकता की हा दृष्टिकोन आदर्श नाही, कारण मॉड्यूल्सची लांबी वेगवेगळी असू शकते. कदाचित मॉड्यूलच्या लांबीने (अक्षरांच्या संख्येने) वेळ विभागणे आणि त्या मूल्यांची तुलना करणे अधिक न्याय्य ठरेल.
|
जसे आपण आधीच उल्लेख केले आहे, डेटा सर्वत्र आहे. आपल्याला फक्त योग्य प्रकारे तो पकडण्याची गरज आहे! **संरचित** आणि **असंरचित** डेटामध्ये फरक करणे उपयुक्त आहे. संरचित डेटा सामान्यतः चांगल्या प्रकारे संरचित स्वरूपात सादर केला जातो, अनेकदा टेबल किंवा टेबल्सच्या स्वरूपात, तर असंरचित डेटा फक्त फाइल्सचा संग्रह असतो. कधी कधी आपण **अर्ध-संरचित** डेटाबद्दल देखील बोलतो, ज्यामध्ये काही प्रकारची रचना असते जी मोठ्या प्रमाणात बदलू शकते.
|
||||||
जेव्हा आपण बहुपर्यायी प्रश्नांच्या चाचण्यांचे निकाल विश्लेषित करण्यास सुरुवात करतो, तेव्हा आपण ठरवू शकतो की विद्यार्थ्यांना कोणत्या संकल्पना समजण्यात अडचण येते आणि त्या माहितीचा उपयोग सामग्री सुधारण्यासाठी करू शकतो. हे करण्यासाठी, आपल्याला चाचण्या अशा प्रकारे डिझाइन कराव्या लागतील की प्रत्येक प्रश्न विशिष्ट संकल्पना किंवा ज्ञानाच्या तुकड्याशी जोडलेला असेल.
|
|
||||||
|
|
||||||
जर आपण आणखी गुंतागुंतीचे व्हायचे असेल, तर आपण प्रत्येक मॉड्यूलसाठी घेतलेला वेळ विद्यार्थ्यांच्या वयोगटाच्या विरोधात प्लॉट करू शकतो. आपल्याला कदाचित असे आढळेल की काही वयोगटांसाठी मॉड्यूल पूर्ण करण्यासाठी अत्याधिक वेळ लागतो किंवा विद्यार्थी ते पूर्ण करण्यापूर्वीच सोडून देतात. हे आपल्याला मॉड्यूलसाठी वयोमर्यादा शिफारसी देण्यास मदत करू शकते आणि चुकीच्या अपेक्षांमुळे होणारा असमाधान कमी करू शकते.
|
| संरचित | अर्ध-संरचित | असंरचित |
|
||||||
|
| ------------------------------------------------------------------------ | -------------------------------------------------------------------------------------------- | ----------------------------------- |
|
||||||
|
| लोकांची यादी त्यांच्या फोन नंबरसह | विकिपीडिया पृष्ठे लिंकसह | एनसायक्लोपीडिया ब्रिटानिकाचा मजकूर |
|
||||||
|
| गेल्या २० वर्षांतील प्रत्येक मिनिटाला इमारतीतील सर्व खोल्यांचे तापमान | लेखक, प्रकाशन तारीख, आणि सारांशासह JSON स्वरूपात वैज्ञानिक पेपरांचा संग्रह | कॉर्पोरेट दस्तऐवजांसह फाइल शेअर |
|
||||||
|
| इमारतीत प्रवेश करणाऱ्या सर्व लोकांचे वय आणि लिंग डेटा | इंटरनेट पृष्ठे | देखरेख कॅमेरामधून कच्चा व्हिडिओ फीड |
|
||||||
|
|
||||||
|
## डेटा कुठे मिळवायचा
|
||||||
|
|
||||||
|
डेटा मिळवण्यासाठी अनेक संभाव्य स्रोत आहेत, आणि त्यातील सर्वांची यादी करणे अशक्य होईल! तथापि, आपण डेटा मिळवण्यासाठी काही सामान्य ठिकाणांचा उल्लेख करूया:
|
||||||
|
|
||||||
|
* **संरचित**
|
||||||
|
- **इंटरनेट ऑफ थिंग्स** (IoT), ज्यामध्ये तापमान किंवा दाब सेन्सर्ससारख्या विविध सेन्सर्समधून डेटा मिळतो, उपयुक्त डेटा प्रदान करतो. उदाहरणार्थ, जर ऑफिस इमारत IoT सेन्सर्सने सुसज्ज असेल, तर आपण खर्च कमी करण्यासाठी स्वयंचलितपणे हीटिंग आणि लाइटिंग नियंत्रित करू शकतो.
|
||||||
|
- **सर्वेक्षण** जे आपण वापरकर्त्यांना खरेदी केल्यानंतर किंवा वेबसाइटला भेट दिल्यानंतर पूर्ण करण्यास सांगतो.
|
||||||
|
- **वर्तन विश्लेषण** उदाहरणार्थ, आपल्याला समजून घेण्यास मदत करू शकते की वापरकर्ता साइटमध्ये किती खोलवर जातो आणि साइट सोडण्याचे सामान्य कारण काय आहे.
|
||||||
|
* **असंरचित**
|
||||||
|
- **मजकूर** समग्र **भावना स्कोर** किंवा कीवर्ड आणि अर्थपूर्ण अर्थ काढण्यासाठी समृद्ध स्रोत असू शकतो.
|
||||||
|
- **प्रतिमा** किंवा **व्हिडिओ**. देखरेख कॅमेरामधून व्हिडिओ रस्त्यावरच्या वाहतुकीचा अंदाज घेण्यासाठी वापरला जाऊ शकतो आणि संभाव्य ट्रॅफिक जॅमबद्दल लोकांना माहिती देऊ शकतो.
|
||||||
|
- वेब सर्व्हर **लॉग्स** आपल्याला समजून घेण्यासाठी वापरले जाऊ शकतात की आमच्या साइटवरील कोणती पृष्ठे सर्वाधिक वेळा भेट दिली जातात आणि किती वेळासाठी.
|
||||||
|
* अर्ध-संरचित
|
||||||
|
- **सोशल नेटवर्क** ग्राफ्स वापरकर्त्यांच्या व्यक्तिमत्त्वांबद्दल आणि माहिती पसरविण्यात संभाव्य प्रभावीतेबद्दल डेटा मिळवण्यासाठी उत्कृष्ट स्रोत असू शकतात.
|
||||||
|
- जेव्हा आमच्याकडे पार्टीमधील छायाचित्रांचा समूह असतो, तेव्हा आम्ही लोक एकमेकांसोबत छायाचित्रे घेत असलेल्या ग्राफद्वारे **गट गतिशीलता** डेटा काढण्याचा प्रयत्न करू शकतो.
|
||||||
|
|
||||||
|
डेटाचे विविध संभाव्य स्रोत माहित असल्याने, आपण डेटा सायन्स तंत्रज्ञान लागू करण्याच्या विविध परिस्थितींबद्दल विचार करू शकता, परिस्थिती अधिक चांगल्या प्रकारे जाणून घेण्यासाठी आणि व्यवसाय प्रक्रिया सुधारण्यासाठी.
|
||||||
|
|
||||||
|
## डेटा सह काय करता येईल
|
||||||
|
|
||||||
|
डेटा सायन्समध्ये, आम्ही डेटा प्रवासाच्या खालील टप्प्यांवर लक्ष केंद्रित करतो:
|
||||||
|
|
||||||
|
## डिजिटलायझेशन आणि डिजिटल ट्रान्सफॉर्मेशन
|
||||||
|
|
||||||
|
गेल्या दशकात, अनेक व्यवसायांनी व्यवसाय निर्णय घेताना डेटाचे महत्त्व समजून घेतले आहे. व्यवसाय चालवण्यासाठी डेटा सायन्स तत्त्वे लागू करण्यासाठी, प्रथम काही डेटा गोळा करणे आवश्यक आहे, म्हणजेच व्यवसाय प्रक्रियांना डिजिटल स्वरूपात अनुवादित करणे. याला **डिजिटलायझेशन** म्हणतात. या डेटावर डेटा सायन्स तंत्रज्ञान लागू करून निर्णयांचे मार्गदर्शन केल्याने उत्पादकतेत लक्षणीय वाढ (किंवा व्यवसायातील मोठा बदल) होऊ शकतो, ज्याला **डिजिटल ट्रान्सफॉर्मेशन** म्हणतात.
|
||||||
|
|
||||||
|
उदाहरण विचार करूया. समजा आमच्याकडे डेटा सायन्स कोर्स आहे (जसे की हा) जो आम्ही विद्यार्थ्यांना ऑनलाइन वितरित करतो, आणि आम्हाला तो सुधारण्यासाठी डेटा सायन्स वापरायचा आहे. आपण ते कसे करू शकतो?
|
||||||
|
|
||||||
|
आपण विचारू शकतो "काय डिजिटल स्वरूपात बदलता येईल?" सर्वात सोपा मार्ग म्हणजे प्रत्येक विद्यार्थ्याला प्रत्येक मॉड्यूल पूर्ण करण्यासाठी लागणारा वेळ मोजणे आणि प्रत्येक मॉड्यूलच्या शेवटी बहुपर्यायी चाचणी देऊन मिळवलेले ज्ञान मोजणे. सर्व विद्यार्थ्यांमध्ये पूर्ण करण्यासाठी लागणारा वेळ सरासरी करून, आम्ही शोधू शकतो की कोणते मॉड्यूल्स विद्यार्थ्यांसाठी सर्वाधिक अडचणी निर्माण करतात आणि त्यांना सोपे करण्यावर काम करू शकतो.
|
||||||
|
आपण असा युक्तिवाद करू शकता की हा दृष्टिकोन आदर्श नाही, कारण मॉड्यूल्स वेगवेगळ्या लांबीचे असू शकतात. वेळ मॉड्यूलच्या लांबीने (अक्षरांच्या संख्येने) विभागणे आणि त्या मूल्यांची तुलना करणे कदाचित अधिक न्याय्य ठरेल.
|
||||||
|
जेव्हा आपण बहुपर्यायी परीक्षांच्या निकालांचे विश्लेषण करायला सुरुवात करतो, तेव्हा आपण हे ठरवू शकतो की विद्यार्थ्यांना कोणत्या संकल्पना समजण्यात अडचण येत आहे आणि त्या माहितीचा उपयोग सामग्री सुधारण्यासाठी करू शकतो. हे करण्यासाठी, आपल्याला परीक्षांचे असे डिझाइन करणे आवश्यक आहे की प्रत्येक प्रश्न विशिष्ट संकल्पना किंवा ज्ञानाच्या भागाशी संबंधित असेल.
|
||||||
|
|
||||||
|
जर आपण आणखी गुंतागुंतीचे व्हायचे ठरवले, तर आपण प्रत्येक मॉड्यूलसाठी घेतलेल्या वेळेचा विद्यार्थ्यांच्या वयोगटाशी संबंध लावू शकतो. कदाचित आपल्याला असे आढळेल की काही वयोगटांसाठी मॉड्यूल पूर्ण करण्यासाठी खूप जास्त वेळ लागतो, किंवा विद्यार्थी ते पूर्ण करण्याआधीच सोडून देतात. यामुळे आपल्याला मॉड्यूलसाठी योग्य वयोमर्यादा शिफारसी देण्यास मदत होईल आणि चुकीच्या अपेक्षांमुळे होणारी असमाधानता कमी करता येईल.
|
||||||
|
|
||||||
## 🚀 आव्हान
|
## 🚀 आव्हान
|
||||||
|
|
||||||
या आव्हानात, आपण डेटा सायन्स क्षेत्राशी संबंधित संकल्पना शोधण्याचा प्रयत्न करू, ते मजकूर पाहून. आपण डेटा सायन्सवरील विकिपीडिया लेख घेऊ, मजकूर डाउनलोड आणि प्रक्रिया करू, आणि नंतर खालीलप्रमाणे एक वर्ड क्लाउड तयार करू:
|
या आव्हानात, आपण डेटा सायन्स क्षेत्राशी संबंधित संकल्पना शोधण्याचा प्रयत्न करू. यासाठी, आपण डेटा सायन्सवरील विकिपीडिया लेख घेऊ, मजकूर डाउनलोड करून प्रक्रिया करू, आणि नंतर खालीलप्रमाणे वर्ड क्लाउड तयार करू:
|
||||||
|
|
||||||

|

|
||||||
|
|
||||||
[`notebook.ipynb`](../../../../../../../../../1-Introduction/01-defining-data-science/notebook.ipynb ':ignore') येथे भेट द्या आणि कोड वाचा. तुम्ही कोड चालवू शकता आणि तो डेटा ट्रान्सफॉर्मेशन कसे प्रत्यक्षात करतो ते पाहू शकता.
|
[`notebook.ipynb`](../../../../../../../../../1-Introduction/01-defining-data-science/notebook.ipynb ':ignore') येथे भेट द्या आणि कोड वाचा. तुम्ही कोड चालवून पाहू शकता आणि तो डेटा ट्रान्सफॉर्मेशन कसे करतो हे रिअल टाइममध्ये पाहू शकता.
|
||||||
|
|
||||||
> जर तुम्हाला जुपिटर नोटबुकमध्ये कोड कसा चालवायचा माहित नसेल, तर [हा लेख](https://soshnikov.com/education/how-to-execute-notebooks-from-github/) वाचा.
|
> जर तुम्हाला Jupyter Notebook मध्ये कोड कसा चालवायचा हे माहित नसेल, तर [हा लेख](https://soshnikov.com/education/how-to-execute-notebooks-from-github/) वाचा.
|
||||||
|
|
||||||
## [व्याख्यानानंतरची क्विझ](https://purple-hill-04aebfb03.1.azurestaticapps.net/quiz/1)
|
## [व्याख्यानानंतरची प्रश्नमंजुषा](https://ff-quizzes.netlify.app/en/ds/)
|
||||||
|
|
||||||
## असाइनमेंट्स
|
## असाइनमेंट्स
|
||||||
|
|
||||||
* **कार्य 1**: वरील कोड बदलून **Big Data** आणि **Machine Learning** क्षेत्रांसाठी संबंधित संकल्पना शोधा.
|
* **कार्य 1**: वरील कोडमध्ये बदल करून **Big Data** आणि **Machine Learning** क्षेत्रांसाठी संबंधित संकल्पना शोधा.
|
||||||
* **कार्य 2**: [डेटा सायन्स परिदृश्यांबद्दल विचार करा](assignment.md)
|
* **कार्य 2**: [डेटा सायन्स परिदृश्यांवर विचार करा](assignment.md)
|
||||||
|
|
||||||
## क्रेडिट्स
|
## श्रेय
|
||||||
|
|
||||||
ही शिकवण [Dmitry Soshnikov](http://soshnikov.com) यांनी ♥️ सह तयार केली आहे.
|
ही धडा [दिमित्री सॉश्निकोव्ह](http://soshnikov.com) यांनी ♥️ सह तयार केली आहे.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
**अस्वीकरण**:
|
**अस्वीकरण**:
|
||||||
हा दस्तऐवज AI भाषांतर सेवा [Co-op Translator](https://github.com/Azure/co-op-translator) वापरून भाषांतरित करण्यात आला आहे. आम्ही अचूकतेसाठी प्रयत्नशील असलो तरी कृपया लक्षात ठेवा की स्वयंचलित भाषांतरांमध्ये त्रुटी किंवा अचूकतेचा अभाव असू शकतो. मूळ भाषेतील दस्तऐवज हा अधिकृत स्रोत मानला जावा. महत्त्वाच्या माहितीसाठी व्यावसायिक मानवी भाषांतराची शिफारस केली जाते. या भाषांतराचा वापर करून उद्भवलेल्या कोणत्याही गैरसमज किंवा चुकीच्या अर्थासाठी आम्ही जबाबदार राहणार नाही.
|
हा दस्तऐवज AI भाषांतर सेवा [Co-op Translator](https://github.com/Azure/co-op-translator) वापरून भाषांतरित करण्यात आला आहे. आम्ही अचूकतेसाठी प्रयत्नशील असलो तरी कृपया लक्षात ठेवा की स्वयंचलित भाषांतरांमध्ये त्रुटी किंवा अचूकतेचा अभाव असू शकतो. मूळ भाषेतील दस्तऐवज हा अधिकृत स्रोत मानला जावा. महत्त्वाच्या माहितीसाठी व्यावसायिक मानवी भाषांतराची शिफारस केली जाते. या भाषांतराचा वापर करून निर्माण होणाऱ्या कोणत्याही गैरसमज किंवा चुकीच्या अर्थासाठी आम्ही जबाबदार राहणार नाही.
|
Loading…
Reference in new issue