From 2f0cf2c9a263f4b1f463454752c518a428d16281 Mon Sep 17 00:00:00 2001
From: Aditya Garg <61191738+AdityaGarg00@users.noreply.github.com>
Date: Sat, 2 Oct 2021 21:13:17 +0530
Subject: [PATCH 001/319] hindi translate
---
2-Working-With-Data/translations/README.hi.md | 20 +++++++++++++++++++
1 file changed, 20 insertions(+)
create mode 100644 2-Working-With-Data/translations/README.hi.md
diff --git a/2-Working-With-Data/translations/README.hi.md b/2-Working-With-Data/translations/README.hi.md
new file mode 100644
index 00000000..b58ca374
--- /dev/null
+++ b/2-Working-With-Data/translations/README.hi.md
@@ -0,0 +1,20 @@
+# डेटा के साथ काम करना
+
+
+> तस्वीर Alexander Sinn द्वारा Unsplash
+पर
+
+
+इन पाठों में, आप कुछ ऐसे तरीके सीखेंगे जिनसे डेटा को प्रबंधित, हेरफेर और अनुप्रयोगों में उपयोग किया जा सकता है। आप रिलेशनल और नॉन-रिलेशनल डेटाबेस के बारे में जानेंगे और उनमें डेटा कैसे स्टोर किया जा सकता है। आप डेटा को प्रबंधित करने के लिए पायथन के साथ काम करने के मूल सिद्धांतों को सीखेंगे, और आप कुछ ऐसे तरीकों की खोज करेंगे जिनसे आप डेटा को प्रबंधित करने और माइन करने के लिए पायथन के साथ काम कर सकते हैं।
+### विषय
+
+1. [संबंधपरक डेटाबेस](05-relational-databases/README.md)
+2. [[गैर-संबंधपरक डेटाबेस](06-non-relational/README.md)
+3. [पायथन के साथ काम करना](07-python/README.md)
+4. [डेटा तैयार करना](08-data-preparation/README.md)
+
+### क्रेडिट
+
+These lessons were written with ❤️ by [Christopher Harrison](https://twitter.com/geektrainer), [Dmitry Soshnikov](https://twitter.com/shwars) and [Jasmine Greenaway](https://twitter.com/paladique)
+
+ये पाठ [क्रिस्टोफर हैरिसन](https://twitter.com/geektrainer), [दिमित्री सोशनिकोव](https://twitter.com/shwars) और [जैस्मीन ग्रीनवे](https://twitter.com/shwars) द्वारा ❤️ के साथ लिखे गए थे।
\ No newline at end of file
From eac002b92c62b11f1ef074c94a1d0126f6993ef7 Mon Sep 17 00:00:00 2001
From: Aditya Garg <61191738+AdityaGarg00@users.noreply.github.com>
Date: Sat, 2 Oct 2021 21:14:11 +0530
Subject: [PATCH 002/319] Update README.hi.md
---
2-Working-With-Data/translations/README.hi.md | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/2-Working-With-Data/translations/README.hi.md b/2-Working-With-Data/translations/README.hi.md
index b58ca374..3da7704c 100644
--- a/2-Working-With-Data/translations/README.hi.md
+++ b/2-Working-With-Data/translations/README.hi.md
@@ -15,6 +15,4 @@
### क्रेडिट
-These lessons were written with ❤️ by [Christopher Harrison](https://twitter.com/geektrainer), [Dmitry Soshnikov](https://twitter.com/shwars) and [Jasmine Greenaway](https://twitter.com/paladique)
-
-ये पाठ [क्रिस्टोफर हैरिसन](https://twitter.com/geektrainer), [दिमित्री सोशनिकोव](https://twitter.com/shwars) और [जैस्मीन ग्रीनवे](https://twitter.com/shwars) द्वारा ❤️ के साथ लिखे गए थे।
\ No newline at end of file
+ये पाठ [क्रिस्टोफर हैरिसन](https://twitter.com/geektrainer), [दिमित्री सोशनिकोव](https://twitter.com/shwars) और [जैस्मीन ग्रीनवे](https://twitter.com/shwars) द्वारा ❤️ के साथ लिखे गए थे।
From f8e3f606b67840a30e876cbebbe881ef3b6538d3 Mon Sep 17 00:00:00 2001
From: Aditya Garg <61191738+AdityaGarg00@users.noreply.github.com>
Date: Sat, 2 Oct 2021 21:16:04 +0530
Subject: [PATCH 003/319] Update README.hi.md
---
2-Working-With-Data/translations/README.hi.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/2-Working-With-Data/translations/README.hi.md b/2-Working-With-Data/translations/README.hi.md
index 3da7704c..525b3efd 100644
--- a/2-Working-With-Data/translations/README.hi.md
+++ b/2-Working-With-Data/translations/README.hi.md
@@ -15,4 +15,4 @@
### क्रेडिट
-ये पाठ [क्रिस्टोफर हैरिसन](https://twitter.com/geektrainer), [दिमित्री सोशनिकोव](https://twitter.com/shwars) और [जैस्मीन ग्रीनवे](https://twitter.com/shwars) द्वारा ❤️ के साथ लिखे गए थे।
+ये पाठ [क्रिस्टोफर हैरिसन](https://twitter.com/geektrainer), [दिमित्री सोशनिकोव](https://twitter.com/shwars) और [जैस्मीन ग्रीनवे](https://twitter.com/shwars) द्वारा ❤️ से लिखे गए थे।
From a1bd78c2a6525cb1514a7dc1fa362a9d5a8c1e57 Mon Sep 17 00:00:00 2001
From: Aditya Garg <61191738+AdityaGarg00@users.noreply.github.com>
Date: Sun, 3 Oct 2021 21:36:03 +0530
Subject: [PATCH 004/319] Update README.hi.md
---
2-Working-With-Data/translations/README.hi.md | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/2-Working-With-Data/translations/README.hi.md b/2-Working-With-Data/translations/README.hi.md
index 525b3efd..4e8cd85c 100644
--- a/2-Working-With-Data/translations/README.hi.md
+++ b/2-Working-With-Data/translations/README.hi.md
@@ -8,10 +8,10 @@
इन पाठों में, आप कुछ ऐसे तरीके सीखेंगे जिनसे डेटा को प्रबंधित, हेरफेर और अनुप्रयोगों में उपयोग किया जा सकता है। आप रिलेशनल और नॉन-रिलेशनल डेटाबेस के बारे में जानेंगे और उनमें डेटा कैसे स्टोर किया जा सकता है। आप डेटा को प्रबंधित करने के लिए पायथन के साथ काम करने के मूल सिद्धांतों को सीखेंगे, और आप कुछ ऐसे तरीकों की खोज करेंगे जिनसे आप डेटा को प्रबंधित करने और माइन करने के लिए पायथन के साथ काम कर सकते हैं।
### विषय
-1. [संबंधपरक डेटाबेस](05-relational-databases/README.md)
-2. [[गैर-संबंधपरक डेटाबेस](06-non-relational/README.md)
-3. [पायथन के साथ काम करना](07-python/README.md)
-4. [डेटा तैयार करना](08-data-preparation/README.md)
+1. [संबंधपरक डेटाबेस](../05-relational-databases/README.md)
+2. [[गैर-संबंधपरक डेटाबेस](../06-non-relational/README.md)
+3. [पायथन के साथ काम करना](../07-python/README.md)
+4. [डेटा तैयार करना](../08-data-preparation/README.md)
### क्रेडिट
From 211acce83aeaece4696075761ea8a16537bafb81 Mon Sep 17 00:00:00 2001
From: Flex Zhong
Date: Mon, 4 Oct 2021 10:27:34 +0800
Subject: [PATCH 005/319] Create assignment.zh-cn.md
---
.../translations/assignment.zh-cn.md | 11 +++++++++++
1 file changed, 11 insertions(+)
create mode 100644 3-Data-Visualization/11-visualization-proportions/translations/assignment.zh-cn.md
diff --git a/3-Data-Visualization/11-visualization-proportions/translations/assignment.zh-cn.md b/3-Data-Visualization/11-visualization-proportions/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..a01c2494
--- /dev/null
+++ b/3-Data-Visualization/11-visualization-proportions/translations/assignment.zh-cn.md
@@ -0,0 +1,11 @@
+# 在 Excel 中试试
+
+## 指示
+
+你知道在 Excel 中可以创建圆环图、饼图和华夫饼图吗?使用你选择的数据集,直接在 Excel 电子表格中创建这三种图表。
+
+## 评分表
+
+| 优秀 | 一般 | 需要改进 |
+| ----------------------- | ------------------------ | ---------------------- |
+| 在 Excel 中制作了三种图表 | 在 Excel 中制作了两种图表 | 在 Excel 中只制作了一种图表 |
From 1c0db49931655b5b9cd987cc20ebd5ed351ce25d Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Sun, 3 Oct 2021 21:44:13 -0500
Subject: [PATCH 006/319] feat: section1 - Translate main readme file to
spanish [ES]
---
.../translations/README.es.md | 165 ++++++++++++++++++
1-Introduction/translations/README.es.md | 19 ++
2 files changed, 184 insertions(+)
diff --git a/1-Introduction/01-defining-data-science/translations/README.es.md b/1-Introduction/01-defining-data-science/translations/README.es.md
index e69de29b..873ad74f 100644
--- a/1-Introduction/01-defining-data-science/translations/README.es.md
+++ b/1-Introduction/01-defining-data-science/translations/README.es.md
@@ -0,0 +1,165 @@
+# Defining Data Science
+
+| ](../../sketchnotes/01-Definitions.png)|
+|:---:|
+|Defining Data Science - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
+
+---
+
+[](https://youtu.be/pqqsm5reGvs)
+
+## [Pre-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/0)
+
+## What is Data?
+In our everyday life, we are constantly surrounded by data. The text you are reading now is data, the list of phone numbers of your friends in your smartphone is data, as well as the current time displayed on your watch. As human beings, we naturally operate with data by counting the money we have or writing letters to our friends.
+
+However, data became much more critical with the creation of computers. The primary role of computers is to perform computations, but they need data to operate on. Thus, we need to understand how computers store and process data.
+
+With the emergence of the Internet, the role of computers as data handling devices increased. If you think of it, we now use computers more and more for data processing and communication, rather than actual computations. When we write an e-mail to a friend or search for some information on the Internet - we are essentially creating, storing, transmitting, and manipulating data.
+> Can you remember the last time you have used computers to actually compute something?
+
+## What is Data Science?
+
+In [Wikipedia](https://en.wikipedia.org/wiki/Data_science), **Data Science** is defined as *a scientific field that uses scientific methods to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains*.
+
+This definition highlights the following important aspects of data science:
+
+* The main goal of data science is to **extract knowledge** from data, in order words - to **understand** data, find some hidden relationships and build a **model**.
+* Data science uses **scientific methods**, such as probability and statistics. In fact, when the term *data science* was first introduced, some people argued that data science is just a new fancy name for statistics. Nowadays it has become evident that the field is much broader.
+* Obtained knowledge should be applied to produce some **actionable insights**.
+* We should be able to operate on both **structured** and **unstructured** data. We will come back to discuss different types of data later in the course.
+* **Application domain** is an important concept, and data scientist often needs at least some degree of expertise in the problem domain.
+
+> Another important aspect of Data Science is that it studies how data can be gathered, stored and operated upon using computers. While statistics gives us mathematical foundations, data science applies mathematical concepts to actually draw insights from data.
+
+One of the ways (attributed to [Jim Gray](https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist))) to look at the data science is to consider it to be a separate paradigm of science:
+* **Empyrical**, in which we rely mostly on observations and results of experiments
+* **Theoretical**, where new concepts emerge from existing scientific knowledge
+* **Computational**, where we discover new principles based on some computational experiments
+* **Data-Driven**, based on discovering relationships and patterns in the data
+
+## Other Related Fields
+
+Since data is a pervasive concept, data science itself is also a broad field, touching many other related disciplines.
+
+
+
Databases
+
+The most obvious thing to consider is **how to store** the data, i.e. how to structure them in a way that allows faster processing. There are different types of databases that store structured and unstructured data, which [we will consider in our course](../../2-Working-With-Data/README.md).
+
+
Big Data
+
+Often we need to store and process really large quantities of data with relatively simple structure. There are special approaches and tools to store that data in a distributed manner on a computer cluster, and process them efficiently.
+
+
Machine Learning
+
+One of the ways to understand the data is to **build a model** that will be able to predict desired outcome. Being able to learn such models from data is the area studied in **machine learning**. You may want to have a look at our [Machine Learning for Beginners](https://github.com/microsoft/ML-For-Beginners/) Curriculum to get deeper into that field.
+
+
Artificial Intelligence
+
+As machine learning, artificial intelligence also relies on data, and it involves building high complexity models that will exhibit the behavior similar to a human being. Also, AI methods often allow us to turn unstructured data (eg. natural language) into structured by extracting some insights.
+
+
Visualization
+
+Vast amounts of data are incomprehensible for a human being, but once we create useful visualizations - we can start making much more sense of data, and drawing some conclusions. Thus, it is important to know many ways to visualize information - something that we will cover in [Section 3](../../3-Data-Visualization/README.md) of our course. Related fields also include **Infographics**, and **Human-Computer Interaction** in general.
+
+
+
+## Types of Data
+
+As we have already mentioned - data is everywhere, we just need to capture it in the right way! It is useful to distinguish between **structured** and **unstructured** data. The former are typically represented in some well-structured form, often as a table or number of tables, while latter is just a collection of files. Sometimes we can also talk about **semistructured** data, that have some sort of a structure that may vary greatly.
+
+| Structured | Semi-structured | Unstructured |
+|----------- |-----------------|--------------|
+| List of people with their phone numbers | Wikipedia pages with links | Text of Encyclopaedia Britannica |
+| Temperature in all rooms of a building at every minute for the last 20 years | Collection of scientific papers in JSON format with authors, data of publication, and abstract | File share with corporate documents |
+| Data for age and gender of all people entering the building | Internet pages | Raw video feed from surveillance camera |
+
+## Where to get Data
+
+There are many possible sources of data, and it will be impossible to list all of them! However, let's mention some of the typical places where you can get data:
+
+* **Structured**
+ - **Internet of Things**, including data from different sensors, such as temperature or pressure sensors, provides a lot of useful data. For example, if an office building is equipped with IoT sensors, we can automatically control heating and lighting in order to minimize costs.
+ - **Surveys** that we ask users after purchase of a good, or after visiting a web site.
+ - **Analysis of behavior** can, for example, help us understand how deeply a user goes into a site, and what is the typical reason for leaving the site.
+* **Unstructured**
+ - **Texts** can be a rich source of insights, starting from overall **sentiment score**, up to extracting keywords and even some semantic meaning.
+ - **Images** or **Video**. A video from surveillance camera can be used to estimate traffic on the road, and inform people about potential traffic jams.
+ - Web server **Logs** can be used to understand which pages of our site are most visited, and for how long.
+* Semi-structured
+ - **Social Network** graph can be a great source of data about user personality and potential effectiveness in spreading information around.
+ - When we have a bunch of photographs from a party, we can try to extract **Group Dynamics** data by building a graph of people taking pictures with each other.
+
+By knowing different possible sources of data, you can try to think about different scenarios where data science techniques can be applied to know the situation better, and to improve business processes.
+
+## What you can do with Data
+
+In Data Science, we focus on the following steps of data journey:
+
+
+
1) Data Acquisition
+
+First step is to collect the data. While in many cases it can be a straightforward process, like data coming to a database from web application, sometimes we need to use special techniques. For example, data from IoT sensors can be overwhelming, and it is a good practice to use buffering endpoints such as IoT Hub to collect all the data before further processing.
+
+
2) Data Storage
+
+Storing the data can be challenging, especially if we are talking about big data. When deciding how to store data, it makes sense to anticipate the way you would want later on to query them. There are several ways data can be stored:
+
+
Relational database stores a collection of tables, and uses a special language called SQL to query them. Typically, tables would be connected to each other using some schema. In many cases we need to convert the data from original form to fit the schema.
+
NoSQL database, such as CosmosDB, does not enforce schema on data, and allows storing more complex data, for example, hierarchical JSON documents or graphs. However, NoSQL database does not have rich querying capabilities of SQL, and cannot enforce referential integrity between data.
+
Data Lake storage is used for large collections of data in raw form. Data lakes are often used with big data, where all data cannot fit into one machine, and has to be stored and processed by a cluster. Parquet is the data format that is often used in conjunction with big data.
+
+
+
3) Data Processing
+
+This is the most exciting part of data journey, which involved processing the data from its original form to the form that can be used for visualization/model training. When dealing with unstructured data such as text or images, we may need to use some AI techniques to extract **features** from the data, thus converting it to structured form.
+
+
4) Visualization / Human Insights
+
+Often to understand the data we need to visualize them. Having many different visualization techniques in our toolbox we can find the right view to make an insight. Often, data scientist needs to "play with data", visualizing it many times and looking for some relationships. Also, we may use techniques from statistics to test some hypotheses or prove correlation between different pieces of data.
+
+
5) Training predictive model
+
+Because the ultimate goal of data science is to be able to take decisions based on data, we may want to use the techniques of Machine Learning to build predictive model that will be able to solve our problem.
+
+
+
+Of course, depending on the actual data some steps might be missing (eg., when we already have the data in the database, or when we do not need model training), or some steps might be repeated several times (such as data processing).
+
+## Digitalization and Digital Transformation
+
+In the last decade, many businesses started to understand the importance of data when making business decisions. To apply data science principles to running a business one first needs to collect some data, i.e. somehow turn business processes into digital form. This is known as **digitalization**, and followed by using data science techniques to guide decisions it often leads to significant increase of productivity (or even business pivot), called **digital transformation**.
+
+Let's consider an example. Suppose, we have a data science course (like this one), which we deliver online to students, and we want to use data science to improve it. How can we do it?
+
+We can start with thinking "what can be digitized?". The simplest way would be to measure time it takes each student to complete each module, and the obtained knowledge (eg. by giving multiple-choice test at the end of each module). By averaging time-to-complete across all students, we can find out which modules cause the most problems to students, and work on simplifying them.
+
+> You may argue that this approach is not ideal, because modules can be of different length. It is probably more fair to divide the time by the length of the module (in number of characters), and compare those values instead.
+
+When we start analyzing results of multiple-choice tests, we can try to find out specific concepts that students understand poorly, and improve the content. To do that, we need to design tests in such a way that each question maps to a certain concept or chunk of knowledge.
+
+If we want to get even more complicated, we can plot the time taken for each module against the age category of students. We might find out that for some age categories it takes inappropriately long time to complete the module, or students drop out at certain point. This can help us provide age recommendation for the module, and minimize people's dissatisfaction from wrong expectations.
+
+## 🚀 Challenge
+
+In this challenge, we will try to find concepts relevant to the field of Data Science by looking at texts. We will take Wikipedia article on Data Science, download and process the text, and then build a word cloud like this one:
+
+
+
+Visit [`notebook.ipynb`](notebook.ipynb) to read through the code. You can also run the code, and see how it performs all data transformations in real time.
+
+> If you do not know how to run code in Jupyter Notebook, have a look at [this article](https://soshnikov.com/education/how-to-execute-notebooks-from-github/).
+
+
+
+## [Post-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/1)
+
+## Assignments
+
+* **Task 1**: Modify the code above to find out related concepts for the fields of **Big Data** and **Machine Learning**
+* **Task 2**: [Think About Data Science Scenarios](assignment.md)
+
+## Credits
+
+This lesson has been authored with ♥️ by [Dmitry Soshnikov](http://soshnikov.com)
diff --git a/1-Introduction/translations/README.es.md b/1-Introduction/translations/README.es.md
index e69de29b..e905bf11 100644
--- a/1-Introduction/translations/README.es.md
+++ b/1-Introduction/translations/README.es.md
@@ -0,0 +1,19 @@
+# Introduction to Data Science
+
+
+> Fotografía de Stephen Dawson en Unsplash
+
+En estas lecciones descubrirás cómo se define la Ciencia de Datos y aprenderás acerca de
+las cosideraciones éticas que deben ser tomadas por un científico de datos. También aprenderás
+cómo se definen los datos y un poco de probabilidad y estadística, el núcleo académico de la Ciencia de Datos.
+
+### Temas
+
+1. [Definiendo la Ciencia de Datos](01-defining-data-science/README.md)
+2. [Ética de la Ciencia de Datos](02-ethics/README.md)
+3. [Definición de Datos](03-defining-data/README.md)
+4. [introducción a la probabilidad y estadística](04-stats-and-probability/README.md)
+
+### Créditos
+
+Éstas lecciones fueron escritas con ❤️ por [Nitya Narasimhan](https://twitter.com/nitya) y [Dmitry Soshnikov](https://twitter.com/shwars).
From 503468f6ea6ba12e59a63b05165e94209c2d14e4 Mon Sep 17 00:00:00 2001
From: Nirmalya Misra <39618712+nirmalya8@users.noreply.github.com>
Date: Mon, 4 Oct 2021 09:56:58 +0530
Subject: [PATCH 007/319] Added DataFrame.shape and DataFrame.columns
---
.../08-data-preparation/notebook.ipynb | 965 +++++++++++-------
1 file changed, 605 insertions(+), 360 deletions(-)
diff --git a/2-Working-With-Data/08-data-preparation/notebook.ipynb b/2-Working-With-Data/08-data-preparation/notebook.ipynb
index e45a5cb5..b1c1d7a0 100644
--- a/2-Working-With-Data/08-data-preparation/notebook.ipynb
+++ b/2-Working-With-Data/08-data-preparation/notebook.ipynb
@@ -1,318 +1,510 @@
{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "anaconda-cloud": {},
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3",
+ "language": "python"
+ },
+ "language_info": {
+ "mimetype": "text/x-python",
+ "nbconvert_exporter": "python",
+ "name": "python",
+ "file_extension": ".py",
+ "version": "3.5.4",
+ "pygments_lexer": "ipython3",
+ "codemirror_mode": {
+ "version": 3,
+ "name": "ipython"
+ }
+ },
+ "colab": {
+ "name": "notebook.ipynb",
+ "provenance": []
+ }
+ },
"cells": [
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "rQ8UhzFpgRra"
+ },
"source": [
- "# Data Preparation\r\n",
- "\r\n",
- "[Original Notebook source from *Data Science: Introduction to Machine Learning for Data Science Python and Machine Learning Studio by Lee Stott*](https://github.com/leestott/intro-Datascience/blob/master/Course%20Materials/4-Cleaning_and_Manipulating-Reference.ipynb)\r\n",
- "\r\n",
- "## Exploring `DataFrame` information\r\n",
- "\r\n",
- "> **Learning goal:** By the end of this subsection, you should be comfortable finding general information about the data stored in pandas DataFrames.\r\n",
- "\r\n",
- "Once you have loaded your data into pandas, it will more likely than not be in a `DataFrame`. However, if the data set in your `DataFrame` has 60,000 rows and 400 columns, how do you even begin to get a sense of what you're working with? Fortunately, pandas provides some convenient tools to quickly look at overall information about a `DataFrame` in addition to the first few and last few rows.\r\n",
- "\r\n",
+ "# Data Preparation\n",
+ "\n",
+ "[Original Notebook source from *Data Science: Introduction to Machine Learning for Data Science Python and Machine Learning Studio by Lee Stott*](https://github.com/leestott/intro-Datascience/blob/master/Course%20Materials/4-Cleaning_and_Manipulating-Reference.ipynb)\n",
+ "\n",
+ "## Exploring `DataFrame` information\n",
+ "\n",
+ "> **Learning goal:** By the end of this subsection, you should be comfortable finding general information about the data stored in pandas DataFrames.\n",
+ "\n",
+ "Once you have loaded your data into pandas, it will more likely than not be in a `DataFrame`. However, if the data set in your `DataFrame` has 60,000 rows and 400 columns, how do you even begin to get a sense of what you're working with? Fortunately, pandas provides some convenient tools to quickly look at overall information about a `DataFrame` in addition to the first few and last few rows.\n",
+ "\n",
"In order to explore this functionality, we will import the Python scikit-learn library and use an iconic dataset that every data scientist has seen hundreds of times: British biologist Ronald Fisher's *Iris* data set used in his 1936 paper \"The use of multiple measurements in taxonomic problems\":"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "collapsed": true,
+ "trusted": false,
+ "id": "hB1RofhdgRrp"
+ },
"source": [
- "import pandas as pd\r\n",
- "from sklearn.datasets import load_iris\r\n",
- "\r\n",
- "iris = load_iris()\r\n",
+ "import pandas as pd\n",
+ "from sklearn.datasets import load_iris\n",
+ "\n",
+ "iris = load_iris()\n",
"iris_df = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])"
],
- "outputs": [],
+ "execution_count": 1,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
"metadata": {
- "collapsed": true,
- "trusted": false
- }
+ "id": "AGA0A_Y8hMdz"
+ },
+ "source": [
+ "### `DataFrame.shape`\n",
+ "We have loaded the Iris Dataset in the variable `iris_df`. Before diving into the data, it would be valuable to know the number of datapoints we have and the overall size of the dataset. It is useful to look at the volume of data we are dealing with. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "LOe5jQohhulf",
+ "outputId": "9cf67a6a-5779-453b-b2ed-58f4f1aab507",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "iris_df.shape"
+ ],
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(150, 4)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 2
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "smE7AGzOhxk2"
+ },
+ "source": [
+ "So, we are dealing with 150 rows and 4 columns of data. Each row represents one datapoint and each column represents a single feature associated with the data frame. So basically, there are 150 datapoints containing 4 features each.\n",
+ "\n",
+ "`shape` here is an attribute of the dataframe and not a function, which is why it doesn't end in a pair of parentheses. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "d3AZKs0PinGP"
+ },
+ "source": [
+ "### `DataFrame.columns`\n",
+ "Let us now move into the 4 columns of data. What does each of them exactly represent? The `columns` attribute will give us the name of the columns in the dataframe. "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "YPGh_ziji-CY",
+ "outputId": "ca186194-a126-4348-f58e-aab7ebc8f7b7",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "iris_df.columns"
+ ],
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "Index(['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)',\n",
+ " 'petal width (cm)'],\n",
+ " dtype='object')"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 4
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "TsobcU_VjCC_"
+ },
+ "source": [
+ "As we can see, there are four(4) columns. The `columns` attribute tells us the name of the columns and basically nothing else. This attribute assumes importance when we want to identify the features a dataset contains."
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "2UTlvkjmgRrs"
+ },
"source": [
- "### `DataFrame.info`\r\n",
+ "### `DataFrame.info`\n",
"Let's take a look at this dataset to see what we have:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "dHHRyG0_gRrt",
+ "outputId": "ca9de335-9e65-486a-d1e2-3e73d060c701",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
"source": [
"iris_df.info()"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "RangeIndex: 150 entries, 0 to 149\n",
+ "Data columns (total 4 columns):\n",
+ " # Column Non-Null Count Dtype \n",
+ "--- ------ -------------- ----- \n",
+ " 0 sepal length (cm) 150 non-null float64\n",
+ " 1 sepal width (cm) 150 non-null float64\n",
+ " 2 petal length (cm) 150 non-null float64\n",
+ " 3 petal width (cm) 150 non-null float64\n",
+ "dtypes: float64(4)\n",
+ "memory usage: 4.8 KB\n"
+ ]
+ }
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "1XgVMpvigRru"
+ },
"source": [
"From this, we know that the *Iris* dataset has 150 entries in four columns. All of the data is stored as 64-bit floating-point numbers."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "-lviAu99gRrv"
+ },
"source": [
- "### `DataFrame.head`\r\n",
+ "### `DataFrame.head`\n",
"Next, let's see what the first few rows of our `DataFrame` look like:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "DZMJZh0OgRrw"
+ },
"source": [
"iris_df.head()"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "oj7GkrTdgRry"
+ },
"source": [
- "### Exercise:\r\n",
- "\r\n",
+ "### Exercise:\n",
+ "\n",
"By default, `DataFrame.head` returns the first five rows of a `DataFrame`. In the code cell below, can you figure out how to get it to show more?"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "collapsed": true,
+ "trusted": false,
+ "id": "EKRmRFFegRrz"
+ },
"source": [
"# Hint: Consult the documentation by using iris_df.head?"
],
- "outputs": [],
- "metadata": {
- "collapsed": true,
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "BJ_cpZqNgRr1"
+ },
"source": [
- "### `DataFrame.tail`\r\n",
+ "### `DataFrame.tail`\n",
"The flipside of `DataFrame.head` is `DataFrame.tail`, which returns the last five rows of a `DataFrame`:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "heanjfGWgRr2"
+ },
"source": [
"iris_df.tail()"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "31kBWfyLgRr3"
+ },
"source": [
- "In practice, it is useful to be able to easily examine the first few rows or the last few rows of a `DataFrame`, particularly when you are looking for outliers in ordered datasets.\r\n",
- "\r\n",
+ "In practice, it is useful to be able to easily examine the first few rows or the last few rows of a `DataFrame`, particularly when you are looking for outliers in ordered datasets.\n",
+ "\n",
"> **Takeaway:** Even just by looking at the metadata about the information in a DataFrame or the first and last few values in one, you can get an immediate idea about the size, shape, and content of the data you are dealing with."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "BvnoojWsgRr4"
+ },
"source": [
- "## Dealing with missing data\r\n",
- "\r\n",
- "> **Learning goal:** By the end of this subsection, you should know how to replace or remove null values from DataFrames.\r\n",
- "\r\n",
- "Most of the time the datasets you want to use (of have to use) have missing values in them. How missing data is handled carries with it subtle tradeoffs that can affect your final analysis and real-world outcomes.\r\n",
- "\r\n",
- "Pandas handles missing values in two ways. The first you've seen before in previous sections: `NaN`, or Not a Number. This is a actually a special value that is part of the IEEE floating-point specification and it is only used to indicate missing floating-point values.\r\n",
- "\r\n",
+ "## Dealing with missing data\n",
+ "\n",
+ "> **Learning goal:** By the end of this subsection, you should know how to replace or remove null values from DataFrames.\n",
+ "\n",
+ "Most of the time the datasets you want to use (of have to use) have missing values in them. How missing data is handled carries with it subtle tradeoffs that can affect your final analysis and real-world outcomes.\n",
+ "\n",
+ "Pandas handles missing values in two ways. The first you've seen before in previous sections: `NaN`, or Not a Number. This is a actually a special value that is part of the IEEE floating-point specification and it is only used to indicate missing floating-point values.\n",
+ "\n",
"For missing values apart from floats, pandas uses the Python `None` object. While it might seem confusing that you will encounter two different kinds of values that say essentially the same thing, there are sound programmatic reasons for this design choice and, in practice, going this route enables pandas to deliver a good compromise for the vast majority of cases. Notwithstanding this, both `None` and `NaN` carry restrictions that you need to be mindful of with regards to how they can be used."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "lOHqUlZFgRr5"
+ },
"source": [
- "### `None`: non-float missing data\r\n",
- "Because `None` comes from Python, it cannot be used in NumPy and pandas arrays that are not of data type `'object'`. Remember, NumPy arrays (and the data structures in pandas) can contain only one type of data. This is what gives them their tremendous power for large-scale data and computational work, but it also limits their flexibility. Such arrays have to upcast to the “lowest common denominator,” the data type that will encompass everything in the array. When `None` is in the array, it means you are working with Python objects.\r\n",
- "\r\n",
+ "### `None`: non-float missing data\n",
+ "Because `None` comes from Python, it cannot be used in NumPy and pandas arrays that are not of data type `'object'`. Remember, NumPy arrays (and the data structures in pandas) can contain only one type of data. This is what gives them their tremendous power for large-scale data and computational work, but it also limits their flexibility. Such arrays have to upcast to the “lowest common denominator,” the data type that will encompass everything in the array. When `None` is in the array, it means you are working with Python objects.\n",
+ "\n",
"To see this in action, consider the following example array (note the `dtype` for it):"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "QIoNdY4ngRr7"
+ },
"source": [
- "import numpy as np\r\n",
- "\r\n",
- "example1 = np.array([2, None, 6, 8])\r\n",
+ "import numpy as np\n",
+ "\n",
+ "example1 = np.array([2, None, 6, 8])\n",
"example1"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "pdlgPNbhgRr7"
+ },
"source": [
"The reality of upcast data types carries two side effects with it. First, operations will be carried out at the level of interpreted Python code rather than compiled NumPy code. Essentially, this means that any operations involving `Series` or `DataFrames` with `None` in them will be slower. While you would probably not notice this performance hit, for large datasets it might become an issue.\n",
"\n",
"The second side effect stems from the first. Because `None` essentially drags `Series` or `DataFrame`s back into the world of vanilla Python, using NumPy/pandas aggregations like `sum()` or `min()` on arrays that contain a ``None`` value will generally produce an error:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "gWbx-KB9gRr8"
+ },
"source": [
"example1.sum()"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "LcEwO8UogRr9"
+ },
"source": [
"**Key takeaway**: Addition (and other operations) between integers and `None` values is undefined, which can limit what you can do with datasets that contain them."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "pWvVHvETgRr9"
+ },
"source": [
"### `NaN`: missing float values\n",
"\n",
"In contrast to `None`, NumPy (and therefore pandas) supports `NaN` for its fast, vectorized operations and ufuncs. The bad news is that any arithmetic performed on `NaN` always results in `NaN`. For example:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "rcFYfMG9gRr9"
+ },
"source": [
"np.nan + 1"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "BW3zQD2-gRr-"
+ },
"source": [
"np.nan * 0"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "fU5IPRcCgRr-"
+ },
"source": [
"The good news: aggregations run on arrays with `NaN` in them don't pop errors. The bad news: the results are not uniformly useful:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "LCInVgSSgRr_"
+ },
"source": [
- "example2 = np.array([2, np.nan, 6, 8]) \r\n",
+ "example2 = np.array([2, np.nan, 6, 8]) \n",
"example2.sum(), example2.min(), example2.max()"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "nhlnNJT7gRr_"
+ },
"source": [
"### Exercise:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
- "source": [
- "# What happens if you add np.nan and None together?\r\n"
- ],
- "outputs": [],
"metadata": {
"collapsed": true,
- "trusted": false
- }
+ "trusted": false,
+ "id": "yan3QRaOgRr_"
+ },
+ "source": [
+ "# What happens if you add np.nan and None together?\n"
+ ],
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "_iDvIRC8gRsA"
+ },
"source": [
"Remember: `NaN` is just for missing floating-point values; there is no `NaN` equivalent for integers, strings, or Booleans."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "kj6EKdsAgRsA"
+ },
"source": [
"### `NaN` and `None`: null values in pandas\n",
"\n",
"Even though `NaN` and `None` can behave somewhat differently, pandas is nevertheless built to handle them interchangeably. To see what we mean, consider a `Series` of integers:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "Nji-KGdNgRsA"
+ },
"source": [
- "int_series = pd.Series([1, 2, 3], dtype=int)\r\n",
+ "int_series = pd.Series([1, 2, 3], dtype=int)\n",
"int_series"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "WklCzqb8gRsB"
+ },
"source": [
"### Exercise:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
- "source": [
- "# Now set an element of int_series equal to None.\r\n",
- "# How does that element show up in the Series?\r\n",
- "# What is the dtype of the Series?\r\n"
- ],
- "outputs": [],
"metadata": {
"collapsed": true,
- "trusted": false
- }
+ "trusted": false,
+ "id": "Cy-gqX5-gRsB"
+ },
+ "source": [
+ "# Now set an element of int_series equal to None.\n",
+ "# How does that element show up in the Series?\n",
+ "# What is the dtype of the Series?\n"
+ ],
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "WjMQwltNgRsB"
+ },
"source": [
"In the process of upcasting data types to establish data homogeneity in `Seires` and `DataFrame`s, pandas will willingly switch missing values between `None` and `NaN`. Because of this design feature, it can be helpful to think of `None` and `NaN` as two different flavors of \"null\" in pandas. Indeed, some of the core methods you will use to deal with missing values in pandas reflect this idea in their names:\n",
"\n",
@@ -322,513 +514,566 @@
"- `fillna()`: Returns a copy of the data with missing values filled or imputed\n",
"\n",
"These are important methods to master and get comfortable with, so let's go over them each in some depth."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "Yh5ifd9FgRsB"
+ },
"source": [
"### Detecting null values\n",
"Both `isnull()` and `notnull()` are your primary methods for detecting null data. Both return Boolean masks over your data."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "collapsed": true,
+ "trusted": false,
+ "id": "e-vFp5lvgRsC"
+ },
"source": [
"example3 = pd.Series([0, np.nan, '', None])"
],
- "outputs": [],
- "metadata": {
- "collapsed": true,
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "1XdaJJ7PgRsC"
+ },
"source": [
"example3.isnull()"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "PaSZ0SQygRsC"
+ },
"source": [
"Look closely at the output. Does any of it surprise you? While `0` is an arithmetic null, it's nevertheless a perfectly good integer and pandas treats it as such. `''` is a little more subtle. While we used it in Section 1 to represent an empty string value, it is nevertheless a string object and not a representation of null as far as pandas is concerned.\n",
"\n",
"Now, let's turn this around and use these methods in a manner more like you will use them in practice. You can use Boolean masks directly as a ``Series`` or ``DataFrame`` index, which can be useful when trying to work with isolated missing (or present) values."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "PlBqEo3mgRsC"
+ },
"source": [
"### Exercise:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
- "source": [
- "# Try running example3[example3.notnull()].\r\n",
- "# Before you do so, what do you expect to see?\r\n"
- ],
- "outputs": [],
"metadata": {
"collapsed": true,
- "trusted": false
- }
+ "trusted": false,
+ "id": "ggDVf5uygRsD"
+ },
+ "source": [
+ "# Try running example3[example3.notnull()].\n",
+ "# Before you do so, what do you expect to see?\n"
+ ],
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "D_jWN7mHgRsD"
+ },
"source": [
"**Key takeaway**: Both the `isnull()` and `notnull()` methods produce similar results when you use them in `DataFrame`s: they show the results and the index of those results, which will help you enormously as you wrestle with your data."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "3VaYC1TvgRsD"
+ },
"source": [
"### Dropping null values\n",
"\n",
"Beyond identifying missing values, pandas provides a convenient means to remove null values from `Series` and `DataFrame`s. (Particularly on large data sets, it is often more advisable to simply remove missing [NA] values from your analysis than deal with them in other ways.) To see this in action, let's return to `example3`:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "7uIvS097gRsD"
+ },
"source": [
- "example3 = example3.dropna()\r\n",
+ "example3 = example3.dropna()\n",
"example3"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "hil2cr64gRsD"
+ },
"source": [
"Note that this should look like your output from `example3[example3.notnull()]`. The difference here is that, rather than just indexing on the masked values, `dropna` has removed those missing values from the `Series` `example3`.\n",
"\n",
"Because `DataFrame`s have two dimensions, they afford more options for dropping data."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "an-l74sPgRsE"
+ },
"source": [
- "example4 = pd.DataFrame([[1, np.nan, 7], \r\n",
- " [2, 5, 8], \r\n",
- " [np.nan, 6, 9]])\r\n",
+ "example4 = pd.DataFrame([[1, np.nan, 7], \n",
+ " [2, 5, 8], \n",
+ " [np.nan, 6, 9]])\n",
"example4"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "66wwdHZrgRsE"
+ },
"source": [
"(Did you notice that pandas upcast two of the columns to floats to accommodate the `NaN`s?)\n",
"\n",
"You cannot drop a single value from a `DataFrame`, so you have to drop full rows or columns. Depending on what you are doing, you might want to do one or the other, and so pandas gives you options for both. Because in data science, columns generally represent variables and rows represent observations, you are more likely to drop rows of data; the default setting for `dropna()` is to drop all rows that contain any null values:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "jAVU24RXgRsE"
+ },
"source": [
"example4.dropna()"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "TrQRBuTDgRsE"
+ },
"source": [
"If necessary, you can drop NA values from columns. Use `axis=1` to do so:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "GrBhxu9GgRsE"
+ },
"source": [
"example4.dropna(axis='columns')"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "KWXiKTfMgRsF"
+ },
"source": [
"Notice that this can drop a lot of data that you might want to keep, particularly in smaller datasets. What if you just want to drop rows or columns that contain several or even just all null values? You specify those setting in `dropna` with the `how` and `thresh` parameters.\n",
"\n",
"By default, `how='any'` (if you would like to check for yourself or see what other parameters the method has, run `example4.dropna?` in a code cell). You could alternatively specify `how='all'` so as to drop only rows or columns that contain all null values. Let's expand our example `DataFrame` to see this in action."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "Bcf_JWTsgRsF"
+ },
"source": [
- "example4[3] = np.nan\r\n",
+ "example4[3] = np.nan\n",
"example4"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "oXXSfQFHgRsF"
+ },
"source": [
"### Exercise:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
- "source": [
- "# How might you go about dropping just column 3?\r\n",
- "# Hint: remember that you will need to supply both the axis parameter and the how parameter.\r\n"
- ],
- "outputs": [],
"metadata": {
"collapsed": true,
- "trusted": false
- }
+ "trusted": false,
+ "id": "ExUwQRxpgRsF"
+ },
+ "source": [
+ "# How might you go about dropping just column 3?\n",
+ "# Hint: remember that you will need to supply both the axis parameter and the how parameter.\n"
+ ],
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "38kwAihWgRsG"
+ },
"source": [
"The `thresh` parameter gives you finer-grained control: you set the number of *non-null* values that a row or column needs to have in order to be kept:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "M9dCNMaagRsG"
+ },
"source": [
"example4.dropna(axis='rows', thresh=3)"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "fmSFnzZegRsG"
+ },
"source": [
"Here, the first and last row have been dropped, because they contain only two non-null values."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "mCcxLGyUgRsG"
+ },
"source": [
"### Filling null values\n",
"\n",
"Depending on your dataset, it can sometimes make more sense to fill null values with valid ones rather than drop them. You could use `isnull` to do this in place, but that can be laborious, particularly if you have a lot of values to fill. Because this is such a common task in data science, pandas provides `fillna`, which returns a copy of the `Series` or `DataFrame` with the missing values replaced with one of your choosing. Let's create another example `Series` to see how this works in practice."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "0ybtWLDdgRsG"
+ },
"source": [
- "example5 = pd.Series([1, np.nan, 2, None, 3], index=list('abcde'))\r\n",
+ "example5 = pd.Series([1, np.nan, 2, None, 3], index=list('abcde'))\n",
"example5"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "yrsigxRggRsH"
+ },
"source": [
"You can fill all of the null entries with a single value, such as `0`:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "KXMIPsQdgRsH"
+ },
"source": [
"example5.fillna(0)"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "FI9MmqFJgRsH"
+ },
"source": [
"### Exercise:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
- "source": [
- "# What happens if you try to fill null values with a string, like ''?\r\n"
- ],
- "outputs": [],
"metadata": {
"collapsed": true,
- "trusted": false
- }
+ "trusted": false,
+ "id": "af-ezpXdgRsH"
+ },
+ "source": [
+ "# What happens if you try to fill null values with a string, like ''?\n"
+ ],
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "kq3hw1kLgRsI"
+ },
"source": [
"You can **forward-fill** null values, which is to use the last valid value to fill a null:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "vO3BuNrggRsI"
+ },
"source": [
"example5.fillna(method='ffill')"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "nDXeYuHzgRsI"
+ },
"source": [
"You can also **back-fill** to propagate the next valid value backward to fill a null:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "4M5onHcEgRsI"
+ },
"source": [
"example5.fillna(method='bfill')"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "collapsed": true,
+ "id": "MbBzTom5gRsI"
+ },
"source": [
"As you might guess, this works the same with `DataFrame`s, but you can also specify an `axis` along which to fill null values:"
- ],
- "metadata": {
- "collapsed": true
- }
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "aRpIvo4ZgRsI"
+ },
"source": [
"example4"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "VM1qtACAgRsI"
+ },
"source": [
"example4.fillna(method='ffill', axis=1)"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "ZeMc-I1EgRsI"
+ },
"source": [
"Notice that when a previous value is not available for forward-filling, the null value remains."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "eeAoOU0RgRsJ"
+ },
"source": [
"### Exercise:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
- "source": [
- "# What output does example4.fillna(method='bfill', axis=1) produce?\r\n",
- "# What about example4.fillna(method='ffill') or example4.fillna(method='bfill')?\r\n",
- "# Can you think of a longer code snippet to write that can fill all of the null values in example4?\r\n"
- ],
- "outputs": [],
"metadata": {
"collapsed": true,
- "trusted": false
- }
+ "trusted": false,
+ "id": "e8S-CjW8gRsJ"
+ },
+ "source": [
+ "# What output does example4.fillna(method='bfill', axis=1) produce?\n",
+ "# What about example4.fillna(method='ffill') or example4.fillna(method='bfill')?\n",
+ "# Can you think of a longer code snippet to write that can fill all of the null values in example4?\n"
+ ],
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "YHgy0lIrgRsJ"
+ },
"source": [
"You can be creative about how you use `fillna`. For example, let's look at `example4` again, but this time let's fill the missing values with the average of all of the values in the `DataFrame`:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "OtYVErEygRsJ"
+ },
"source": [
"example4.fillna(example4.mean())"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "zpMvCkLSgRsJ"
+ },
"source": [
"Notice that column 3 is still valueless: the default direction is to fill values row-wise.\n",
"\n",
"> **Takeaway:** There are multiple ways to deal with missing values in your datasets. The specific strategy you use (removing them, replacing them, or even how you replace them) should be dictated by the particulars of that data. You will develop a better sense of how to deal with missing values the more you handle and interact with datasets."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "K8UXOJYRgRsJ"
+ },
"source": [
"## Removing duplicate data\n",
"\n",
"> **Learning goal:** By the end of this subsection, you should be comfortable identifying and removing duplicate values from DataFrames.\n",
"\n",
"In addition to missing data, you will often encounter duplicated data in real-world datasets. Fortunately, pandas provides an easy means of detecting and removing duplicate entries."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "qrEG-Wa0gRsJ"
+ },
"source": [
"### Identifying duplicates: `duplicated`\n",
"\n",
"You can easily spot duplicate values using the `duplicated` method in pandas, which returns a Boolean mask indicating whether an entry in a `DataFrame` is a duplicate of an ealier one. Let's create another example `DataFrame` to see this in action."
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "ZLu6FEnZgRsJ"
+ },
"source": [
- "example6 = pd.DataFrame({'letters': ['A','B'] * 2 + ['B'],\r\n",
- " 'numbers': [1, 2, 1, 3, 3]})\r\n",
+ "example6 = pd.DataFrame({'letters': ['A','B'] * 2 + ['B'],\n",
+ " 'numbers': [1, 2, 1, 3, 3]})\n",
"example6"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "cIduB5oBgRsK"
+ },
"source": [
"example6.duplicated()"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "0eDRJD4SgRsK"
+ },
"source": [
"### Dropping duplicates: `drop_duplicates`\n",
"`drop_duplicates` simply returns a copy of the data for which all of the `duplicated` values are `False`:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "w_YPpqIqgRsK"
+ },
"source": [
"example6.drop_duplicates()"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "69AqoCZAgRsK"
+ },
"source": [
"Both `duplicated` and `drop_duplicates` default to consider all columnsm but you can specify that they examine only a subset of columns in your `DataFrame`:"
- ],
- "metadata": {}
+ ]
},
{
"cell_type": "code",
- "execution_count": null,
+ "metadata": {
+ "trusted": false,
+ "id": "BILjDs67gRsK"
+ },
"source": [
"example6.drop_duplicates(['letters'])"
],
- "outputs": [],
- "metadata": {
- "trusted": false
- }
+ "execution_count": null,
+ "outputs": []
},
{
"cell_type": "markdown",
+ "metadata": {
+ "id": "GvX4og1EgRsL"
+ },
"source": [
"> **Takeaway:** Removing duplicate data is an essential part of almost every data-science project. Duplicate data can change the results of your analyses and give you inaccurate results!"
- ],
- "metadata": {}
+ ]
}
- ],
- "metadata": {
- "anaconda-cloud": {},
- "kernelspec": {
- "name": "python3",
- "display_name": "Python 3",
- "language": "python"
- },
- "language_info": {
- "mimetype": "text/x-python",
- "nbconvert_exporter": "python",
- "name": "python",
- "file_extension": ".py",
- "version": "3.5.4",
- "pygments_lexer": "ipython3",
- "codemirror_mode": {
- "version": 3,
- "name": "ipython"
- }
- }
- },
- "nbformat": 4,
- "nbformat_minor": 1
+ ]
}
\ No newline at end of file
From c1476a87d2a4dbbf0e55df660da084d9a78a647f Mon Sep 17 00:00:00 2001
From: Jen Looper
Date: Mon, 4 Oct 2021 10:57:17 -0400
Subject: [PATCH 008/319] updating readme, as in a table we have to use html
markup
---
1-Introduction/01-defining-data-science/README.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/1-Introduction/01-defining-data-science/README.md b/1-Introduction/01-defining-data-science/README.md
index 873ad74f..ccfe6ef7 100644
--- a/1-Introduction/01-defining-data-science/README.md
+++ b/1-Introduction/01-defining-data-science/README.md
@@ -45,7 +45,7 @@ Since data is a pervasive concept, data science itself is also a broad field, to
Databases
-The most obvious thing to consider is **how to store** the data, i.e. how to structure them in a way that allows faster processing. There are different types of databases that store structured and unstructured data, which [we will consider in our course](../../2-Working-With-Data/README.md).
+The most obvious thing to consider is **how to store** the data, i.e. how to structure them in a way that allows faster processing. There are different types of databases that store structured and unstructured data, which we will consider in our course.
Big Data
@@ -53,7 +53,7 @@ Often we need to store and process really large quantities of data with relative
Machine Learning
-One of the ways to understand the data is to **build a model** that will be able to predict desired outcome. Being able to learn such models from data is the area studied in **machine learning**. You may want to have a look at our [Machine Learning for Beginners](https://github.com/microsoft/ML-For-Beginners/) Curriculum to get deeper into that field.
+One of the ways to understand the data is to **build a model** that will be able to predict desired outcome. Being able to learn such models from data is the area studied in **machine learning**. You may want to have a look at our Machine Learning for Beginners Curriculum to get deeper into that field.
Artificial Intelligence
@@ -61,7 +61,7 @@ As machine learning, artificial intelligence also relies on data, and it involve
Visualization
-Vast amounts of data are incomprehensible for a human being, but once we create useful visualizations - we can start making much more sense of data, and drawing some conclusions. Thus, it is important to know many ways to visualize information - something that we will cover in [Section 3](../../3-Data-Visualization/README.md) of our course. Related fields also include **Infographics**, and **Human-Computer Interaction** in general.
+Vast amounts of data are incomprehensible for a human being, but once we create useful visualizations - we can start making much more sense of data, and drawing some conclusions. Thus, it is important to know many ways to visualize information - something that we will cover in Section 3 of our course. Related fields also include **Infographics**, and **Human-Computer Interaction** in general.
From 943172fd55d7d9b08d8bee906086cf43402041af Mon Sep 17 00:00:00 2001
From: Nirmalya Misra <39618712+nirmalya8@users.noreply.github.com>
Date: Mon, 4 Oct 2021 21:50:02 +0530
Subject: [PATCH 009/319] Added DataFrame.describe() and elaborated on some of
the existing explanations.
---
.../08-data-preparation/notebook.ipynb | 372 ++++++++++++++++--
1 file changed, 350 insertions(+), 22 deletions(-)
diff --git a/2-Working-With-Data/08-data-preparation/notebook.ipynb b/2-Working-With-Data/08-data-preparation/notebook.ipynb
index b1c1d7a0..c6ca05dc 100644
--- a/2-Working-With-Data/08-data-preparation/notebook.ipynb
+++ b/2-Working-With-Data/08-data-preparation/notebook.ipynb
@@ -76,10 +76,10 @@
"cell_type": "code",
"metadata": {
"id": "LOe5jQohhulf",
- "outputId": "9cf67a6a-5779-453b-b2ed-58f4f1aab507",
"colab": {
"base_uri": "https://localhost:8080/"
- }
+ },
+ "outputId": "968f9fb0-6cb7-4985-c64b-b332c086bdbf"
},
"source": [
"iris_df.shape"
@@ -123,15 +123,15 @@
"cell_type": "code",
"metadata": {
"id": "YPGh_ziji-CY",
- "outputId": "ca186194-a126-4348-f58e-aab7ebc8f7b7",
"colab": {
"base_uri": "https://localhost:8080/"
- }
+ },
+ "outputId": "ffad1c9f-06b4-49d9-b409-5e4cc1b9f19b"
},
"source": [
"iris_df.columns"
],
- "execution_count": 4,
+ "execution_count": 3,
"outputs": [
{
"output_type": "execute_result",
@@ -143,7 +143,7 @@
]
},
"metadata": {},
- "execution_count": 4
+ "execution_count": 3
}
]
},
@@ -163,7 +163,7 @@
},
"source": [
"### `DataFrame.info`\n",
- "Let's take a look at this dataset to see what we have:"
+ "The amount of data(given by the `shape` attribute) and the name of the features or columns(given by the `columns` attribute) tell us something about the dataset. Now, we would want to dive deeper into the dataset. The `DataFrame.info()` function is quite useful for this. "
]
},
{
@@ -171,15 +171,15 @@
"metadata": {
"trusted": false,
"id": "dHHRyG0_gRrt",
- "outputId": "ca9de335-9e65-486a-d1e2-3e73d060c701",
"colab": {
"base_uri": "https://localhost:8080/"
- }
+ },
+ "outputId": "325edd04-3809-4d71-b6c3-94c65b162882"
},
"source": [
"iris_df.info()"
],
- "execution_count": 3,
+ "execution_count": 4,
"outputs": [
{
"output_type": "stream",
@@ -206,7 +206,150 @@
"id": "1XgVMpvigRru"
},
"source": [
- "From this, we know that the *Iris* dataset has 150 entries in four columns. All of the data is stored as 64-bit floating-point numbers."
+ "From here, we get to can make a few observations:\n",
+ "1. The DataType of each column: In this dataset, all of the data is stored as 64-bit floating-point numbers.\n",
+ "2. Number of Non-Null values: Dealing with null values is an important step in data preparation. It will be dealt with later in the notebook."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IYlyxbpWFEF4"
+ },
+ "source": [
+ "### DataFrame.describe()\n",
+ "Say we have a lot of numerical data in our dataset. Univariate statistical calculations such as the mean, median, quartiles etc. can be done on each of the columns individually. The `DataFrame.describe()` function provides us with a statistical summary of the numerical columns of a dataset.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "tWV-CMstFIRA",
+ "outputId": "7c5cd72f-51d8-474c-966b-d2fbbdb7b7fc",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 297
+ }
+ },
+ "source": [
+ "iris_df.describe()"
+ ],
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
sepal length (cm)
\n",
+ "
sepal width (cm)
\n",
+ "
petal length (cm)
\n",
+ "
petal width (cm)
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
count
\n",
+ "
150.000000
\n",
+ "
150.000000
\n",
+ "
150.000000
\n",
+ "
150.000000
\n",
+ "
\n",
+ "
\n",
+ "
mean
\n",
+ "
5.843333
\n",
+ "
3.057333
\n",
+ "
3.758000
\n",
+ "
1.199333
\n",
+ "
\n",
+ "
\n",
+ "
std
\n",
+ "
0.828066
\n",
+ "
0.435866
\n",
+ "
1.765298
\n",
+ "
0.762238
\n",
+ "
\n",
+ "
\n",
+ "
min
\n",
+ "
4.300000
\n",
+ "
2.000000
\n",
+ "
1.000000
\n",
+ "
0.100000
\n",
+ "
\n",
+ "
\n",
+ "
25%
\n",
+ "
5.100000
\n",
+ "
2.800000
\n",
+ "
1.600000
\n",
+ "
0.300000
\n",
+ "
\n",
+ "
\n",
+ "
50%
\n",
+ "
5.800000
\n",
+ "
3.000000
\n",
+ "
4.350000
\n",
+ "
1.300000
\n",
+ "
\n",
+ "
\n",
+ "
75%
\n",
+ "
6.400000
\n",
+ "
3.300000
\n",
+ "
5.100000
\n",
+ "
1.800000
\n",
+ "
\n",
+ "
\n",
+ "
max
\n",
+ "
7.900000
\n",
+ "
4.400000
\n",
+ "
6.900000
\n",
+ "
2.500000
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)\n",
+ "count 150.000000 150.000000 150.000000 150.000000\n",
+ "mean 5.843333 3.057333 3.758000 1.199333\n",
+ "std 0.828066 0.435866 1.765298 0.762238\n",
+ "min 4.300000 2.000000 1.000000 0.100000\n",
+ "25% 5.100000 2.800000 1.600000 0.300000\n",
+ "50% 5.800000 3.000000 4.350000 1.300000\n",
+ "75% 6.400000 3.300000 5.100000 1.800000\n",
+ "max 7.900000 4.400000 6.900000 2.500000"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zjjtW5hPGMuM"
+ },
+ "source": [
+ "The output above shows the total number of data points, mean, standard deviation, minimum, lower quartile(25%), median(50%), upper quartile(75%) and the maximum value of each column."
]
},
{
@@ -216,20 +359,117 @@
},
"source": [
"### `DataFrame.head`\n",
- "Next, let's see what the first few rows of our `DataFrame` look like:"
+ "With all the above functions and attributes, we have got a top level view of the dataset. We know how many data points are there, how many features are there, the data type of each feature and the number of non-null values for each feature.\n",
+ "\n",
+ "Now its time to look at the data itself. Let's see what the first few rows(the first few datapoints) of our `DataFrame` look like:"
]
},
{
"cell_type": "code",
"metadata": {
"trusted": false,
- "id": "DZMJZh0OgRrw"
+ "id": "DZMJZh0OgRrw",
+ "outputId": "c12ac408-abdb-48a5-ca3f-93b02f963b2f",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 204
+ }
},
"source": [
"iris_df.head()"
],
- "execution_count": null,
- "outputs": []
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
sepal length (cm)
\n",
+ "
sepal width (cm)
\n",
+ "
petal length (cm)
\n",
+ "
petal width (cm)
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
5.1
\n",
+ "
3.5
\n",
+ "
1.4
\n",
+ "
0.2
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
4.9
\n",
+ "
3.0
\n",
+ "
1.4
\n",
+ "
0.2
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
4.7
\n",
+ "
3.2
\n",
+ "
1.3
\n",
+ "
0.2
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
4.6
\n",
+ "
3.1
\n",
+ "
1.5
\n",
+ "
0.2
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
5.0
\n",
+ "
3.6
\n",
+ "
1.4
\n",
+ "
0.2
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)\n",
+ "0 5.1 3.5 1.4 0.2\n",
+ "1 4.9 3.0 1.4 0.2\n",
+ "2 4.7 3.2 1.3 0.2\n",
+ "3 4.6 3.1 1.5 0.2\n",
+ "4 5.0 3.6 1.4 0.2"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "EBHEimZuEFQK"
+ },
+ "source": [
+ "As the output here, we can see five(5) entries of the dataset. If we look at the index at the left, we find out that these are the first five rows."
+ ]
},
{
"cell_type": "markdown",
@@ -239,7 +479,7 @@
"source": [
"### Exercise:\n",
"\n",
- "By default, `DataFrame.head` returns the first five rows of a `DataFrame`. In the code cell below, can you figure out how to get it to show more?"
+ "From the example given above, it is clear that, by default, `DataFrame.head` returns the first five rows of a `DataFrame`. In the code cell below, can you figure out a way to display more than five rows?"
]
},
{
@@ -252,7 +492,7 @@
"source": [
"# Hint: Consult the documentation by using iris_df.head?"
],
- "execution_count": null,
+ "execution_count": 6,
"outputs": []
},
{
@@ -262,20 +502,106 @@
},
"source": [
"### `DataFrame.tail`\n",
- "The flipside of `DataFrame.head` is `DataFrame.tail`, which returns the last five rows of a `DataFrame`:"
+ "Another way of looking at the data can be from the end(instead of the beginning). The flipside of `DataFrame.head` is `DataFrame.tail`, which returns the last five rows of a `DataFrame`:"
]
},
{
"cell_type": "code",
"metadata": {
"trusted": false,
- "id": "heanjfGWgRr2"
+ "id": "heanjfGWgRr2",
+ "outputId": "2930cf87-bfeb-4ddc-8be1-53d0e57a06b3",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 204
+ }
},
"source": [
"iris_df.tail()"
],
- "execution_count": null,
- "outputs": []
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
sepal length (cm)
\n",
+ "
sepal width (cm)
\n",
+ "
petal length (cm)
\n",
+ "
petal width (cm)
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
145
\n",
+ "
6.7
\n",
+ "
3.0
\n",
+ "
5.2
\n",
+ "
2.3
\n",
+ "
\n",
+ "
\n",
+ "
146
\n",
+ "
6.3
\n",
+ "
2.5
\n",
+ "
5.0
\n",
+ "
1.9
\n",
+ "
\n",
+ "
\n",
+ "
147
\n",
+ "
6.5
\n",
+ "
3.0
\n",
+ "
5.2
\n",
+ "
2.0
\n",
+ "
\n",
+ "
\n",
+ "
148
\n",
+ "
6.2
\n",
+ "
3.4
\n",
+ "
5.4
\n",
+ "
2.3
\n",
+ "
\n",
+ "
\n",
+ "
149
\n",
+ "
5.9
\n",
+ "
3.0
\n",
+ "
5.1
\n",
+ "
1.8
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)\n",
+ "145 6.7 3.0 5.2 2.3\n",
+ "146 6.3 2.5 5.0 1.9\n",
+ "147 6.5 3.0 5.2 2.0\n",
+ "148 6.2 3.4 5.4 2.3\n",
+ "149 5.9 3.0 5.1 1.8"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ]
},
{
"cell_type": "markdown",
@@ -283,7 +609,9 @@
"id": "31kBWfyLgRr3"
},
"source": [
- "In practice, it is useful to be able to easily examine the first few rows or the last few rows of a `DataFrame`, particularly when you are looking for outliers in ordered datasets.\n",
+ "In practice, it is useful to be able to easily examine the first few rows or the last few rows of a `DataFrame`, particularly when you are looking for outliers in ordered datasets. \n",
+ "\n",
+ "All the functions and attributes shown above with the help of code examples, help us get a look and feel of the data. \n",
"\n",
"> **Takeaway:** Even just by looking at the metadata about the information in a DataFrame or the first and last few values in one, you can get an immediate idea about the size, shape, and content of the data you are dealing with."
]
From c89a0fc36358116b867e7bfedb59f1fbf3210956 Mon Sep 17 00:00:00 2001
From: INDRASHIS PAUL
Date: Mon, 4 Oct 2021 22:34:07 +0530
Subject: [PATCH 010/319] Enhance README with notebook content
---
.../08-data-preparation/README.md | 103 ++++++++++++++++++
1 file changed, 103 insertions(+)
diff --git a/2-Working-With-Data/08-data-preparation/README.md b/2-Working-With-Data/08-data-preparation/README.md
index e1de8260..5bb6c3d8 100644
--- a/2-Working-With-Data/08-data-preparation/README.md
+++ b/2-Working-With-Data/08-data-preparation/README.md
@@ -28,6 +28,109 @@ Depending on its source, raw data may contain some inconsistencies that will cau
- **Missing Data**: Missing data can cause inaccuracies as well as weak or biased results. Sometimes these can be resolved by a "reload" of the data, filling in the missing values with computation and code like Python, or simply just removing the value and corresponding data. There are numerous reasons for why data may be missing and the actions that are taken to resolve these missing values can be dependent on how and why they went missing in the first place.
+## Exploring DataFrame information
+> **Learning goal:** By the end of this subsection, you should be comfortable finding general information about the data stored in pandas DataFrames.
+
+Once you have loaded your data into pandas, it will more likely than not be in a DataFrame(refer to the previous [lesson](https://github.com/IndraP24/Data-Science-For-Beginners/tree/main/2-Working-With-Data/07-python#dataframe) for detailed overview). However, if the data set in your DataFrame has 60,000 rows and 400 columns, how do you even begin to get a sense of what you're working with? Fortunately, [pandas](https://pandas.pydata.org/) provides some convenient tools to quickly look at overall information about a DataFrame in addition to the first few and last few rows.
+
+In order to explore this functionality, we will import the Python scikit-learn library and use an iconic dataset: the **Iris data set **.
+```python
+import pandas as pd
+from sklearn.datasets import load_iris
+
+iris = load_iris()
+iris_df = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])
+```
+| |sepal length (cm)|sepal width (cm)|petal length (cm)|petal width (cm)|
+|----------------------------------------|-----------------|----------------|-----------------|----------------|
+|0 |5.1 |3.5 |1.4 |0.2 |
+|1 |4.9 |3.0 |1.4 |0.2 |
+|2 |4.7 |3.2 |1.3 |0.2 |
+|3 |4.6 |3.1 |1.5 |0.2 |
+|4 |5.0 |3.6 |1.4 |0.2 |
+
+- **DataFrame.info**: To start off, the `info()` method is used to print a summary of the content present in a `DataFrame`. Let's take a look at this dataset to see what we have:
+```python
+iris_df.info()
+```
+```
+RangeIndex: 150 entries, 0 to 149
+Data columns (total 4 columns):
+ # Column Non-Null Count Dtype
+--- ------ -------------- -----
+ 0 sepal length (cm) 150 non-null float64
+ 1 sepal width (cm) 150 non-null float64
+ 2 petal length (cm) 150 non-null float64
+ 3 petal width (cm) 150 non-null float64
+dtypes: float64(4)
+memory usage: 4.8 KB
+```
+From this, we know that the *Iris* dataset has 150 entries in four columns with no null entries. All of the data is stored as 64-bit floating-point numbers.
+
+- **DataFrame.head()**: Next, to check the actual content of the `DataFrame`, we use the `head()` method. Let's see what the first few rows of our `iris_df` look like:
+```python
+iris_df.head()
+```
+```
+ sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
+0 5.1 3.5 1.4 0.2
+1 4.9 3.0 1.4 0.2
+2 4.7 3.2 1.3 0.2
+3 4.6 3.1 1.5 0.2
+4 5.0 3.6 1.4 0.2
+```
+- **DataFrame.tail()**: Conversely, to check the last few rows of the `DataFrame`, we use the `tail()` method:
+```python
+iris_df.tail()
+```
+```
+ sepal length (cm) sepal width (cm) petal length (cm) petal width (cm)
+145 6.7 3.0 5.2 2.3
+146 6.3 2.5 5.0 1.9
+147 6.5 3.0 5.2 2.0
+148 6.2 3.4 5.4 2.3
+149 5.9 3.0 5.1 1.8
+```
+> **Takeaway:** Even just by looking at the metadata about the information in a DataFrame or the first and last few values in one, you can get an immediate idea about the size, shape, and content of the data you are dealing with.
+
+## Dealing with Missing Data
+> **Learning goal:** By the end of this subsection, you should know how to replace or remove null values from DataFrames.
+
+Most of the time the datasets you want to use (of have to use) have missing values in them. How missing data is handled carries with it subtle tradeoffs that can affect your final analysis and real-world outcomes.
+
+Pandas handles missing values in two ways. The first you've seen before in previous sections: `NaN`, or Not a Number. This is a actually a special value that is part of the IEEE floating-point specification and it is only used to indicate missing floating-point values.
+
+For missing values apart from floats, pandas uses the Python `None` object. While it might seem confusing that you will encounter two different kinds of values that say essentially the same thing, there are sound programmatic reasons for this design choice and, in practice, going this route enables pandas to deliver a good compromise for the vast majority of cases. Notwithstanding this, both `None` and `NaN` carry restrictions that you need to be mindful of with regards to how they can be used.
+
+Check out more about `NaN` and `None` from the [notebook](https://github.com/microsoft/Data-Science-For-Beginners/blob/main/4-Data-Science-Lifecycle/15-analyzing/notebook.ipynb)!
+
+- **Detecting null values**: In `pandas`, the `isnull()` and `notnull()` methods are your primary methods for detecting null data. Both return Boolean masks over your data. We will be using `numpy` for `NaN` values:
+```python
+import numpy as np
+
+example1 = pd.Series([0, np.nan, '', None])
+example1.isnull()
+```
+```
+0 False
+1 True
+2 False
+3 True
+dtype: bool
+```
+Look closely at the output. Does any of it surprise you? While `0` is an arithmetic null, it's nevertheless a perfectly good integer and pandas treats it as such. `''` is a little more subtle. While we used it in Section 1 to represent an empty string value, it is nevertheless a string object and not a representation of null as far as pandas is concerned.
+
+Now, let's turn this around and use these methods in a manner more like you will use them in practice. You can use Boolean masks directly as a ``Series`` or ``DataFrame`` index, which can be useful when trying to work with isolated missing (or present) values.
+
+> **Tkeaway**: Both the `isnull()` and `notnull()` methods produce similar results when you use them in `DataFrame`s: they show the results and the index of those results, which will help you enormously as you wrestle with your data.
+
+- **Dropping null values**: Beyond identifying missing values, pandas provides a convenient means to remove null values from `Series` and `DataFrame`s. (Particularly on large data sets, it is often more advisable to simply remove missing [NA] values from your analysis than deal with them in other ways.) To see this in action, let's return to `example1`:
+```python
+example1 = example1.dropna()
+example1
+```
+
+
## 🚀 Challenge
From bf8efbd42a49fd13aea1af5f8eb31fa28be6d5bd Mon Sep 17 00:00:00 2001
From: Oscar RG
Date: Mon, 4 Oct 2021 20:21:02 +0200
Subject: [PATCH 011/319] feature: Spanish translation for quiz-app added
---
.../src/assets/translations/es/group-1.json | 418 ++++++++++++++
.../src/assets/translations/es/group-2.json | 410 ++++++++++++++
.../src/assets/translations/es/group-3.json | 513 ++++++++++++++++++
.../src/assets/translations/es/group-4.json | 305 +++++++++++
.../src/assets/translations/es/group-5.json | 316 +++++++++++
.../src/assets/translations/es/group-6.json | 108 ++++
quiz-app/src/assets/translations/es/index.js | 20 +
quiz-app/src/assets/translations/index.js | 8 +-
8 files changed, 2095 insertions(+), 3 deletions(-)
create mode 100644 quiz-app/src/assets/translations/es/group-1.json
create mode 100644 quiz-app/src/assets/translations/es/group-2.json
create mode 100644 quiz-app/src/assets/translations/es/group-3.json
create mode 100644 quiz-app/src/assets/translations/es/group-4.json
create mode 100644 quiz-app/src/assets/translations/es/group-5.json
create mode 100644 quiz-app/src/assets/translations/es/group-6.json
create mode 100644 quiz-app/src/assets/translations/es/index.js
diff --git a/quiz-app/src/assets/translations/es/group-1.json b/quiz-app/src/assets/translations/es/group-1.json
new file mode 100644
index 00000000..7fb4a262
--- /dev/null
+++ b/quiz-app/src/assets/translations/es/group-1.json
@@ -0,0 +1,418 @@
+[{
+ "title": "Ciencia de datos para principiantes: Cuestionarios",
+ "complete": "Enhorabuena, has completado el cuestionario!",
+ "error": "Lo siento, inténtalo de nuevo",
+ "quizzes": [{
+ "id": 0,
+ "title": "Definición de la ciencia de los datos - Cuestionario previo",
+ "quiz": [{
+ "questionText": "¿Por qué la palabra _Ciencia_ en Ciencia de Datos?",
+ "answerOptions": [{
+ "answerText": "Utiliza métodos científicos para analizar los datos",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Sólo las personas con títulos académicos pueden entenderlo",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Para que suene bien",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "El aprendizaje de la ciencia de los datos sólo es útil para los desarrolladores",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué necesitamos para demostrar que los jugadores de baloncesto son más altos que la gente media?",
+ "answerOptions": [{
+ "answerText": "Recopilar datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Conocer algo de probabilidad y estadística",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 1,
+ "title": "Definir la ciencia de los datos: Cuestionario final",
+ "quiz": [{
+ "questionText": "¿Qué áreas están estrechamente relacionadas con la ciencia de los datos?",
+ "answerOptions": [{
+ "answerText": "Inteligencia Artificial",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Aprendizaje automático",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuál de los siguientes representaciones es un ejemplo de datos no estructurados?",
+ "answerOptions": [{
+ "answerText": "Una lista de alumnos en clase",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Una colección de ensayos de estudiantes",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Un gráfico de amigos de usuarios en redes sociales",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuál es el objetivo principal de la ciencia de datos?",
+ "answerOptions": [{
+ "answerText": "recoger datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "procesar los datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ser capaz de tomar decisiones basadas en datos",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 2,
+ "title": "Ética - Cuestionario previo",
+ "quiz": [{
+ "questionText": "¿Qué es la consideración de la ética en la Ciencia de los Datos?",
+ "answerOptions": [{
+ "answerText": "Recogida de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Diseño de algoritmos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuáles de los siguientes elementos forman parte de los desafíos éticos?",
+ "answerOptions": [{
+ "answerText": "La transparencia",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "La privacidad",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "La seguridad y fiabilidad",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La ética de los datos también incluye la IA y los algoritmos de aprendizaje automático",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Falso, están separados",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 3,
+ "title": "Ética - Cuestionario final",
+ "quiz": [{
+ "questionText": "¿Cuál es la principal diferencia entre la ética y la ética aplicada?",
+ "answerOptions": [{
+ "answerText": "La ética aplicada es un tipo específico de ética",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "La ética aplicada no es un término real",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "La ética aplicada es el proceso de encontrar y corregir problemas éticos",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuál es la diferencia entre reglamento de ética y principios de ética?",
+ "answerOptions": [{
+ "answerText": "La normativa ética se centra en la ética habitual",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "No hay ninguna diferencia",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Los principios éticos no están relacionados con una ley concreta, mientras que los reglamentos sí.",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuáles son los principios y prácticas éticas reales que se pueden aplicar?",
+ "answerOptions": [{
+ "answerText": "Sesgo de recogida",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Cumplimiento de la normativa ética",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 4,
+ "title": "Definición de datos - Cuestionario previo",
+ "quiz": [{
+ "questionText": "¿Cuáles de estos datos podrían ser cuantitativos?",
+ "answerOptions": [{
+ "answerText": "Fotos de perros",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Reseñas de hoteles",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Notas o calificaciones de estudiantes",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuáles de estos datos podrían ser datos cualitativos?",
+ "answerOptions": [{
+ "answerText": "Lista de salarios de los empleados",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Reseñas de hoteles",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Notas o calificaciones de estudiantes",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuál es el objetivo principal de la clasificación de datos?",
+ "answerOptions": [{
+ "answerText": "El almacenamiento correcto de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Dar un nombre propio a los datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Saber cuál es el mejor método para organizarlo para la legibilidad y el análisis",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 5,
+ "title": "Definición de los datos - Cuestionario final",
+ "quiz": [{
+ "questionText": "El profesor está revisando el número de respuestas correctas de los alumnos, ¿de qué tipo de datos se trata?",
+ "answerOptions": [{
+ "answerText": "Datos cualitativos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Datos cuantitativos",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Una empresa está recogiendo encuestas de sus clientes para mejorar sus productos. ¿De qué tipo de fuente de datos se trata?",
+ "answerOptions": [{
+ "answerText": "Primario",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Secundario",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Terciario",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Un estudiante está recogiendo datos mediante consultas. ¿Qué fuente de datos podría ser?",
+ "answerOptions": [{
+ "answerText": "Archivos locales",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "API",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Base de datos",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 6,
+ "title": "Estadística y Probabilidad - Cuestionario previo",
+ "quiz": [{
+ "questionText": "¿Por qué la estadística y la probabilidad son importantes para la ciencia de los datos?",
+ "answerOptions": [{
+ "answerText": "Porque no se puede operar con datos sin saber matemáticas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Porque la ciencia de los datos es una ciencia y tiene una sólida base formal",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Porque queremos evitar que la gente sin formación se dedique a la ciencia de los datos",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Puedes sacar cara 10 veces seguidas al lanzar una moneda?",
+ "answerOptions": [{
+ "answerText": "Sí",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "No",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Al lanzar un dado, ¿cuál es la probabilidad de obtener un número par?",
+ "answerOptions": [{
+ "answerText": "1/2",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "1/3",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Es imposible de predecir",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 7,
+ "title": "Estadística y Probabilidad - Cuestionario final",
+ "quiz": [{
+ "questionText": "Queremos demostrar que los jugadores de baloncesto son más altos que la gente media. Hemos recogido las estaturas de 20 personas de ambos grupos. ¿Qué tenemos que hacer?",
+ "answerOptions": [{
+ "answerText": "comparar la media",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Recoger más datos, ¡20 no es suficiente!",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "utilizar la prueba t de Student",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cómo podemos demostrar que los ingresos de una persona dependen del nivel de educación?",
+ "answerOptions": [{
+ "answerText": "calcular el coeficiente de correlación",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "dividir en grupos con y sin estudios y calcular las medias",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "utilizar la prueba t de Studen",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Lanzamos los dados 100 veces y calculamos el valor medio. ¿Cuál sería la distribución del resultado?",
+ "answerOptions": [{
+ "answerText": "uniforme",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "normal",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ninguna de las respuestas anteriores",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+}]
diff --git a/quiz-app/src/assets/translations/es/group-2.json b/quiz-app/src/assets/translations/es/group-2.json
new file mode 100644
index 00000000..3056ccbb
--- /dev/null
+++ b/quiz-app/src/assets/translations/es/group-2.json
@@ -0,0 +1,410 @@
+[{
+ "title": "Ciencia de datos para principiantes: Cuestionarios",
+ "complete": "Enhorabuena, has completado el cuestionario!",
+ "error": "Lo siento, inténtalo de nuevo",
+ "quizzes": [{
+ "id": 8,
+ "title": "Bases de datos relacionales - Precuestionario",
+ "quiz": [{
+ "questionText": "Una base de datos puede considerarse una tabla con columnas y filas",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La mayoría de las bases de datos se componen de",
+ "answerOptions": [{
+ "answerText": "una tabla",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "una hoja de cálculo",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "muchas tablas",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Puede evitar la duplicación de nombres de columnas si",
+ "answerOptions": [{
+ "answerText": "creando muchas tablas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "creando tablas con relaciones incorporadas",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "creando una única tabla gigante",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 9,
+ "title": "Bases de datos relacionales - Cuestionario final",
+ "quiz": [{
+ "questionText": "Una clave primaria es",
+ "answerOptions": [{
+ "answerText": "un valor utilizado para identificar una fila específica en una tabla",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "un valor utilizado para hacer que los valores sean únicos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "un valor utilizado para forzar la capitalización",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Una columna numérica 'ID' sería una buena clave primari",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Una clave foránea se utiliza para",
+ "answerOptions": [{
+ "answerText": "valores que hacen referencia a los identificadores en una tabla separada",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "valores que hacen referencia a las cadenas en una tabla separada",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "conservar los valores que cambian con el tiempo",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 10,
+ "title": "Bases de datos no relacionales - Precuestionario",
+ "quiz": [{
+ "questionText": "Las hojas de cálculo son datos no relacionales",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Identifica la diferencia entre datos no relacionales y relacionales.",
+ "answerOptions": [{
+ "answerText": "Las bases de datos relacionales siempre contienen columnas y filas",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Algunos tipos de datos no relacionales utilizan columnas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "No hay ninguna diferencia",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué significa NoSQL?",
+ "answerOptions": [{
+ "answerText": "Nope SQL",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "No sólo SQL",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "No más SQL",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 11,
+ "title": "Bases de datos no relacionales - Cuestionario final",
+ "quiz": [{
+ "questionText": "¿Cuál de estos NO es un tipo de NoSQL?",
+ "answerOptions": [{
+ "answerText": "Object oriented",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Clave-valor",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Columnas",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué utilizas para hacer cálculos en las hojas de cálculo?",
+ "answerOptions": [{
+ "answerText": "Python",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Alias",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Fórmulas",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿A qué otro tipo de datos no relacionales se parece un documento en una base de datos de documentos?",
+ "answerOptions": [{
+ "answerText": "Claves",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Columnas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "JSON",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 12,
+ "title": "Python - Precuestionario",
+ "quiz": [{
+ "questionText": "Python es un buen lenguaje para",
+ "answerOptions": [{
+ "answerText": "La ciencia de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Princiantes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Python es un buen lenguaje para la Ciencia de Datos porque",
+ "answerOptions": [{
+ "answerText": "tiene muchas bibliotecas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "es un lenguaje rico pero sencillo",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "No puedes hacer ciencia de datos si no sabes Python",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 13,
+ "title": "Python - Cuestionario final",
+ "quiz": [{
+ "questionText": "¿Qué biblioteca utilizarías para representar una lista de notas de los alumnos en clase?",
+ "answerOptions": [{
+ "answerText": "Numpy",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "SciPy",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Pandas",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Tienes un DataFrame con una lista de alumnos, su número de grupo y la nota media. ¿Qué operación utilizarías para calcular la nota media por grupo?",
+ "answerOptions": [{
+ "answerText": "average",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "avg",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "groupby",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Tienes 100 amigos y quieres representar la información sobre la frecuencia con la que se hacen fotos entre ellos. ¿Qué estructura de datos utilizarías?",
+ "answerOptions": [{
+ "answerText": "Numpy array",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Pandas DataFrame",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Pandas Series",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 14,
+ "title": "Preparación de datos - Precuestionario",
+ "quiz": [{
+ "questionText": "¿Cuál de ellos forma parte del proceso de preparación de datos?",
+ "answerOptions": [{
+ "answerText": "Validación del modelo",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Clasificación",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Limpieza de los datos",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Por qué es tan importante la preparación de los datos?",
+ "answerOptions": [{
+ "answerText": "Hace que los modelos sean más precisos",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "No es importante",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Los ordenadores pueden ayudar a limpiar los datos",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuál es el objetivo de la limpieza de datos?",
+ "answerOptions": [{
+ "answerText": "Corrección de problemas de formato",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Fijar los tipos de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 15,
+ "title": "Preparación de los datos - Cuestionario final",
+ "quiz": [{
+ "questionText": "Fusionar o unir dos conjuntos de datos en uno solo puede afectar a la coherencia de los datos",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué es lo primero que hay que hacer ante la falta de datos?",
+ "answerOptions": [{
+ "answerText": "Borrar los datos relacionados a esa falta",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Evaluar por qué falta",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Intenta rellenar los valores vacíos",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La unión de dos o más conjuntos de datos puede provocar los siguientes problemas",
+ "answerOptions": [{
+ "answerText": "Duplicados",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Un formato inconsistente",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+}]
diff --git a/quiz-app/src/assets/translations/es/group-3.json b/quiz-app/src/assets/translations/es/group-3.json
new file mode 100644
index 00000000..54a2ea47
--- /dev/null
+++ b/quiz-app/src/assets/translations/es/group-3.json
@@ -0,0 +1,513 @@
+[{
+ "title": "Ciencia de datos para principiantes: Cuestionarios",
+ "complete": "Enhorabuena, has completado el cuestionario!",
+ "error": "Lo siento, inténtalo de nuevo",
+ "quizzes": [{
+ "id": 16,
+ "title": "Visualización de cantidades - Cuestionario previo",
+ "quiz": [{
+ "questionText": "Una biblioteca útil para las visualizaciones de datos es:",
+ "answerOptions": [{
+ "answerText": "Matplotlib",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Matchartlib",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Matgraphtlib",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Puede visualizar las cantidades utilizando:",
+ "answerOptions": [{
+ "answerText": "gráficos de dispersión",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "gráfico líneas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Los gráficos de barras son útiles para visualizar la cantidad",
+ "answerOptions": [{
+ "answerText": "verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "falso",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 17,
+ "title": "Visualización de cantidades - Cuestionario final",
+ "quiz": [{
+ "questionText": "Puede analizar las tendencias a lo largo del tiempo utilizando un:",
+ "answerOptions": [{
+ "answerText": "gráfico de líneas",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "gráfico de barras",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "gráfico circular",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Puede comparar valores utilizando este tipo de gráfico:",
+ "answerOptions": [{
+ "answerText": "gráfico de líneas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "gráfico de barras",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Mostra cómo se relacionan las partes con un todo utilizando este tipo de gráfico:",
+ "answerOptions": [{
+ "answerText": "líneas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "columnas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "circular",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 18,
+ "title": "Visualización de distribuciones - Cuestionario previo",
+ "quiz": [{
+ "questionText": "Los histogramas son en general tipos de gráficos más sofisticados que los de dispersión",
+ "answerOptions": [{
+ "answerText": "verdaderos",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Puede visualizar las distribuciones utilizando este tipo de gráfico:",
+ "answerOptions": [{
+ "answerText": "gráfico de dispersión",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "gráfico circular",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "gráfico de columnas",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Esta biblioteca es especialmente útil para construir histogramas",
+ "answerOptions": [{
+ "answerText": "TensorFlow",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "PyTorch",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Matplotlib",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 19,
+ "title": "Visualización de distribuciones - Cuestionario final",
+ "quiz": [{
+ "questionText": "Los histogramas pueden utilizarse para analizar este tipo de datos:",
+ "answerOptions": [{
+ "answerText": "textual",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "numérico",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "En un histograma, un "
+ bin " se refiere a:",
+ "answerOptions": [{
+ "answerText": "una clase de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "una agrupación de datos",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "datos desechables",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": " Para construir un gráfico de densidad suave, utilice esta biblioteca:",
+ "answerOptions": [{
+ "answerText": "Seaborn",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Matplotlib",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "PyTorch",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 20,
+ "title": "Visualización de las proporciones - Cuestionario previo",
+ "quiz": [{
+ "questionText": "Para visualizar las proporciones, utilice este tipo de gráfico:",
+ "answerOptions": [{
+ "answerText": "circular",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "waffle",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ninguna de las respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Una herramienta gratuita para visualizar sus datos es:",
+ "answerOptions": [{
+ "answerText": "Popculator",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Graphculator",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Charticulator",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Utiliza `plt.pie` para mostrar un:",
+ "answerOptions": [{
+ "answerText": "gráfico circular",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "gráfico de gofres",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "gráfico de dispersión",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 21,
+ "title": "Visualización de las proporciones - Cuestionario final",
+ "quiz": [{
+ "questionText": "Puedes editar los colores de tus gráficos de gofres",
+ "answerOptions": [{
+ "answerText": "verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Utiliza esta biblioteca para construir gráficos de gofres:",
+ "answerOptions": [{
+ "answerText": "edwaffle",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "pywaffle",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "yumwaffle",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "En un gráfico de rosca, construye el círculo central utilizando esta sintaxis:",
+ "answerOptions": [{
+ "answerText": "plt.Oval",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "plt.Circle",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "plt.Edit",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 22,
+ "title": "Visualización de las relaciones - Cuestionario previo",
+ "quiz": [{
+ "questionText": "Utiliza esta biblioteca para visualizar las relaciones:",
+ "answerOptions": [{
+ "answerText": "Seaborn",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Merborn",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Reborn",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Utiliza la función `relplot` de Seaborn para visualizar",
+ "answerOptions": [{
+ "answerText": "relaciones categóricas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "relaciones estadísticas",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "relaciones especiales",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Al editar el tono de un gráfico de dispersión, puedes:",
+ "answerOptions": [{
+ "answerText": "mostrar la distribución de un conjunto de datos",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "mostrar los colores de los artículos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "mostrar las temperaturas",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 23,
+ "title": "Visualización de las relaciones - Cuestionario final",
+ "quiz": [{
+ "questionText": "Una forma más accesible de mostrar la distribución es:",
+ "answerOptions": [{
+ "answerText": "Uso de la variación de la forma de los puntos de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Empleo de la variación del tamaño de los puntos de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Utilizando cualquiera de los respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Utilizando la función `relplot` de Seaborn puedes ver una agregación de datos en torno a un gráfico de líneas",
+ "answerOptions": [{
+ "answerText": "verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Las cuadrículas de facetas ayudan a visualizar",
+ "answerOptions": [{
+ "answerText": "una faceta determinada de los datos",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "valores atípicos en los datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "progresión en los datos",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 24,
+ "title": "Visualizaciones significativas - Cuestionario previo",
+ "quiz": [{
+ "questionText": "Es relativamente fácil engañar a los usuarios editando gráficos",
+ "answerOptions": [{
+ "answerText": "verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La selección del tipo de gráfico que se va a construir depende del tipo de datos que se tenga y de la historia que se cuente sobre ellos",
+ "answerOptions": [{
+ "answerText": "verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Para crear un gráfico engañoso, algunos creadores manipulan:",
+ "answerOptions": [{
+ "answerText": "los colores del gráfico",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "los ejes X e Y",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "cualquiera de las respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 25,
+ "title": "Visualizaciones significativas - Cuestionario final",
+ "quiz": [{
+ "questionText": "Asegúrate de que sus gráficos son:",
+ "answerOptions": [{
+ "answerText": "accesibles",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "legibles",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "cualquiera de las respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "D3 es una excelente biblioteca de gráficos que se puede utilizar para crear:",
+ "answerOptions": [{
+ "answerText": "animar visualizaciones",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "crear infografía",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "aprendizaje avanzado",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Asegurate que la legibilidad de tu gráfico mediante:",
+ "answerOptions": [{
+ "answerText": "añadir etiquetas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "agregando colores",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "alinear correctamente las etiquetas",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+}]
diff --git a/quiz-app/src/assets/translations/es/group-4.json b/quiz-app/src/assets/translations/es/group-4.json
new file mode 100644
index 00000000..913d1113
--- /dev/null
+++ b/quiz-app/src/assets/translations/es/group-4.json
@@ -0,0 +1,305 @@
+[{
+ "title": "Ciencia de datos para principiantes: Cuestionarios",
+ "complete": "Enhorabuena, has completado el cuestionario!",
+ "error": "Lo siento, inténtalo de nuevo",
+ "quizzes": [{
+ "id": 26,
+ "title": "Ciclo de vida de la ciencia de datos - Introducción Cuestionario previo",
+ "quiz": [{
+ "questionText": "¿Cuál es el primer paso del ciclo de vida de la ciencia de datos?",
+ "answerOptions": [{
+ "answerText": "Análisis de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Limpieza de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Adquisición de datos y problema a resolver",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Una vez alcanzado el siguiente paso del ciclo de vida de la ciencia de datos, no se puede volver a los pasos anteriores",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué consideraciones hay que tener en cuenta a la hora de conservar los datos?",
+ "answerOptions": [{
+ "answerText": "Limpieza de los datos",
+ "isCorrect": "false"
+ },
+
+ {
+ "answerText": "Mantener la seguridad de los datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 27,
+ "title": "Ciclo de vida de la ciencia de datos - Cuestionario posterior",
+ "quiz": [{
+ "questionText": "¿Qué paso del ciclo de vida de la ciencia de datos es más probable que produzca un modelo? ",
+ "answerOptions": [{
+ "answerText": "Procesamiento",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Mantenimiento",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Captura",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué preguntas se haría un científico de datos en la fase de captura del ciclo de vida de la ciencia de datos?",
+ "answerOptions": [{
+ "answerText": "¿Cuáles son las limitaciones?",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "¿Tienen sentido los datos?",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "¿Tiene sentido el modelo?",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuáles son las técnicas más comunes en la etapa de procesamiento?",
+ "answerOptions": [{
+ "answerText": "Agrupación",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Inteligencia artificial",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 28,
+ "title": "Ciclo de vida de la ciencia de los datos - Análisis previo",
+ "quiz": [{
+ "questionText": "Analizar puede referirse a analizar modelos o datos",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué importancia tiene la exploración de los datos antes de utilizarlos en un modelo o en un análisis posterior?",
+ "answerOptions": [{
+ "answerText": "Para eliminar los datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Identificar los desafíos en los datos",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Explorar los datos no es importante",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué hace un científico de datos cuando explora los datos?",
+ "answerOptions": [{
+ "answerText": "Muestreo",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Visualización",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 29,
+ "title": "Ciclo de vida de la ciencia de los datos - Análisis cuestionario final",
+ "quiz": [{
+ "questionText": "La visualización nunca forma parte del análisis exploratorio de datos (AED)",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué funciones de Panda proporcionan un perfil de datos básico?",
+ "answerOptions": [{
+ "answerText": "pandas()",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "isnull()",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "describe()",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Para qué sirve el muestreo de datos?",
+ "answerOptions": [{
+ "answerText": "El muestreo es mejor que la elaboración de perfiles de datos cuando se buscan errores",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "El muestreo no sirve para nada",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "El muestreo se utiliza para analizar los datos de un gran conjunto de datos porque es difícil analizarlos todos",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 30,
+ "title": "Ciclo de vida de la ciencia de los datos - comunicación",
+ "quiz": [{
+ "questionText": "Cuando se comunican datos a un público, es una buena práctica centrarse únicamente en las cifras y no en la historia.",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuál es un ejemplo de comunicación bidireccional?",
+ "answerOptions": [{
+ "answerText": "Cuando un presentador se dirige a un público y deja la palabra abierta para preguntas y comentarios.",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Cuando un líder lanza un mensaje de televisión o radio a sus electores.",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Cuando una marca se anuncia a los clientes potenciales.",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Cuando una persona hace una presentación ante un público, y éste no proporciona ninguna información, ¿quién actúa como receptor de la información?",
+ "answerOptions": [{
+ "answerText": "El presentador",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "La audiencia",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Miembros específicos de la audiencia",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 31,
+ "title": "Ciclo de vida de la ciencia de los datos - comunicación cuestionario final",
+ "quiz": [{
+ "questionText": "A la hora de comunicar datos, los colores pueden servir para evocar emociones en el público.",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "De las siguientes opciones, ¿cuál es la explicación más significativa del progreso anual de una empresa?",
+ "answerOptions": [{
+ "answerText": "Hemos tenido un año excelente. Nuestros usuarios han crecido un 30%, nuestros ingresos han aumentado un 21% y hemos incorporado 100 nuevos miembros al equipo.",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Hemos tenido un gran año. Nuestros usuarios han crecido enormemente, nuestros ingresos han aumentado mucho e incluso hemos incorporado varios miembros nuevos al equipo en todas las divisiones.",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Nuestro año de progreso fue simplemente fenomenal. No puedo exagerar el aumento que hemos tenido con el crecimiento de nuestros usuarios, nuestros ingresos o nuestro equipo.",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "A la hora de comunicar datos a una audiencia ejecutiva, ¿cuál sería una estrategia aceptable?",
+ "answerOptions": [{
+ "answerText": "Profundiza en los pequeños detalles y dedique menos tiempo a la importancia de los datos.",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Concentrate principalmente en la importancia de los datos y en los próximos pasos recomendados sobre la base de los datos.",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Explica todo el contexto e intente hacer una lluvia de ideas sobre posibles soluciones con los ejecutivos.",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+}]
diff --git a/quiz-app/src/assets/translations/es/group-5.json b/quiz-app/src/assets/translations/es/group-5.json
new file mode 100644
index 00000000..44b196eb
--- /dev/null
+++ b/quiz-app/src/assets/translations/es/group-5.json
@@ -0,0 +1,316 @@
+[{
+ "title": "Ciencia de datos para principiantes: Cuestionarios",
+ "complete": "Enhorabuena, has completado el cuestionario!",
+ "error": "Lo siento, inténtalo de nuevo",
+ "quizzes": [{
+ "id": 32,
+ "title": "Ciencia de los datos en la nube - Introducción - Cuestionario previo",
+ "quiz": [{
+ "questionText": "¿Qué es la nube?",
+ "answerOptions": [{
+ "answerText": "Una colección de bases de datos para almacenar big data.",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Un conjunto de servicios informáticos de pago por Internet.",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Una masa visible de partículas suspendidas en el aire..",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué es la computación en nube?",
+ "answerOptions": [{
+ "answerText": "La prestación de servicios informáticos a través de Internet..",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Crear su propio centro de datos.",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Usar internet.",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuáles son las ventajas de la nube?",
+ "answerOptions": [{
+ "answerText": "Flexibilidad, escalabilidad, fiabilidad y seguridad",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Flexibilidad, escalabilidad, variabilidad, seguridad",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Claridad, escalabilidad, fiabilidad, variabilidad",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 33,
+ "title": "Ciencia de datos en la nube - Introducción - Cuestionario final",
+ "quiz": [{
+ "questionText": "¿Cuál NO es necesariamente una buena razón para elegir la nube?",
+ "answerOptions": [{
+ "answerText": "Uso de servicios de aprendizaje automático e inteligencia de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Procesamiento de grandes cantidades de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Almacenamiento de datos gubernamentales sensibles/confidenciales",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿De qué manera utilizan los científicos de datos la nube?",
+ "answerOptions": [{
+ "answerText": "Tareas de infraestructura",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Almacenamiento de grandes cantidades de datos",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Configurar las opciones de red",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué servicio de computación en nube proporciona acceso a aplicaciones de software sin necesidad de mantener el software?",
+ "answerOptions": [{
+ "answerText": "Infraestructura como servicio",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Plataforma como servicio",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Software como servicio",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 34,
+ "title": "Ciencia de datos en la nube - Low-Code",
+ "quiz": [{
+ "questionText": "Una de las ventajas de utilizar Jupyter Notebooks es que se pueden crear rápidamente prototipos de modelos.",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Por qué un científico de datos utilizaría Azure Machine Learning?",
+ "answerOptions": [{
+ "answerText": "Para ahorrar tiempo en la exploración y el preprocesamiento de datos.",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Para producir modelos precisos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "No es necesario tener experiencia programando para utilizar Azure ML.",
+ "answerOptions": [{
+ "answerText": "Verdadero",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Falso, programar siempre es necesario",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 35,
+ "title": "Ciencia de datos en la nube - Low-Code - Cuestionario final",
+ "quiz": [{
+ "questionText": "¿Qué hay que crear antes de acceder a Azure ML Studio?",
+ "answerOptions": [{
+ "answerText": "Un espacio de trabajo",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Una instancia de proceso",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Un cluster",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuáles de las siguientes tareas son compatibles con el ML automatizado?",
+ "answerOptions": [{
+ "answerText": "Creación de imágenes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Clasificación",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Generación de lenguaje natural",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿En qué caso necesitas la GPU en lugar de la CPU?",
+ "answerOptions": [{
+ "answerText": "Cuando se tienen datos tabulares",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Cuando tengas suficiente dinero para permitírtelo",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Cuando trabajas con Deep Learning",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 36,
+ "title": "Ciencia de datos en la nube - Azure",
+ "quiz": [{
+ "questionText": "¿Cuál de ellos se vería afectado por el aumento del tamaño de la agrupación?",
+ "answerOptions": [{
+ "answerText": "Capacidad de respuesta",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Coste",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Rendimiento del modelo",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuál es la ventaja de utilizar herramientas de bajo código?",
+ "answerOptions": [{
+ "answerText": "No se requiere experiencia en código",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Etiquetar automáticamente el conjunto de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Mayor seguridad del modelo",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Qué es AutoML?",
+ "answerOptions": [{
+ "answerText": "Una herramienta para automatizar el preprocesamiento de datos",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Una herramienta para automatizar el despliegue de modelos",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 37,
+ "title": "Ciencia de datos en la nube - Azure - Cuestionario final",
+ "quiz": [{
+ "questionText": "¿Cuál es la razón para crear una AutoMLConfig?",
+ "answerOptions": [{
+ "answerText": "Es donde se dividen los datos de entrenamiento y de prueba",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Es donde se especifica el modelo a entrenar",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Proporciona todos los detalles de su experimento AutoML",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuál de las siguientes métricas es soportada por Automated ML para una tarea de clasificación?",
+ "answerOptions": [{
+ "answerText": "Precisión",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "r2_score",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "normalized_root_mean_error",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "¿Cuál NO es una ventaja de utilizar el SDK?",
+ "answerOptions": [{
+ "answerText": "Puede utilizarse para automatizar múltiples tareas y ejecuciones",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Facilita la edición programada de las ejecuciones",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Puede utilizarse a través de una interfaz gráfica de usuario",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+}]
diff --git a/quiz-app/src/assets/translations/es/group-6.json b/quiz-app/src/assets/translations/es/group-6.json
new file mode 100644
index 00000000..ad094d6a
--- /dev/null
+++ b/quiz-app/src/assets/translations/es/group-6.json
@@ -0,0 +1,108 @@
+[{
+ "title": "Ciencia de datos para principiantes: Cuestionarios",
+ "complete": "Enhorabuena, has completado el cuestionario!",
+ "error": "Lo siento, inténtalo de nuevo",
+ "quizzes": [{
+ "id": 38,
+ "title": "Ciencia de los datos en la naturaleza",
+ "quiz": [{
+ "questionText": "La ciencia de los datos puede utilizarse en muchos sectores, como",
+ "answerOptions": [{
+ "answerText": "Finanzas y Humanidades",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Agricultura y manufactura",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "La ciencia de los datos en el contexto de la investigación puede centrarse en:",
+ "answerOptions": [{
+ "answerText": "oportunidades de innovación",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "gestión de errores",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "impulsar las ventas",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "El 'Estudio de los matices de género' se centró en",
+ "answerOptions": [{
+ "answerText": "transformar el discurso de género",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "los sesgos inherentes al análisis facial",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Ninguna de las respuestas anteriores",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 39,
+ "title": "Ciencia de los datos en la naturaleza - Cuestionario final",
+ "quiz": [{
+ "questionText": "Las humanidades digitales son una práctica que combina los métodos informáticos con la investigación humanística",
+ "answerOptions": [{
+ "answerText": "Verdadera",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Falso",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Utilizar la ciencia de los datos para la investigación de la sostenibilidad",
+ "answerOptions": [{
+ "answerText": "Estudiar la deforestación",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Estudio de los datos del cambio climático",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "En finanzas, se puede utilizar la ciencia de los datos para",
+ "answerOptions": [{
+ "answerText": "Investigación sobre sus servicios financieros",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Personalizar las opciones de fondos de inversión en función del tipo de cliente",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Las dos respuestas anteriores",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+}]
diff --git a/quiz-app/src/assets/translations/es/index.js b/quiz-app/src/assets/translations/es/index.js
new file mode 100644
index 00000000..ff9602b0
--- /dev/null
+++ b/quiz-app/src/assets/translations/es/index.js
@@ -0,0 +1,20 @@
+import es0 from './group-1.json';
+import es1 from './group-2.json';
+import es2 from './group-3.json';
+
+import es3 from './group-4.json';
+
+import es4 from './group-5.json';
+
+import es5 from './group-6.json';
+
+const quiz = {
+ 0: es0[0],
+ 1: es1[0],
+ 2: es2[0],
+ 3: es3[0],
+ 4: es4[0],
+ 5: es5[0],
+};
+
+export default quiz;
diff --git a/quiz-app/src/assets/translations/index.js b/quiz-app/src/assets/translations/index.js
index 35040cda..7b1c4125 100644
--- a/quiz-app/src/assets/translations/index.js
+++ b/quiz-app/src/assets/translations/index.js
@@ -1,8 +1,10 @@
-import englishQuizzes from "./en/";
-import frenchQuizzes from "./fr/";
+import englishQuizzes from './en/';
+import frenchQuizzes from './fr/';
+import spanishQuizzes from './es/';
const messages = {
en: englishQuizzes,
- fr: frenchQuizzes
+ fr: frenchQuizzes,
+ es: spanishQuizzes,
};
export default messages;
From 41086f1f75798da23c2cfb049894f1c99c0c8469 Mon Sep 17 00:00:00 2001
From: Jen Looper
Date: Mon, 4 Oct 2021 17:45:46 -0400
Subject: [PATCH 012/319] fixing the build and adding 'es' to dropdown
---
quiz-app/src/App.vue | 2 +-
quiz-app/src/assets/translations/es/group-3.json | 3 +--
2 files changed, 2 insertions(+), 3 deletions(-)
diff --git a/quiz-app/src/App.vue b/quiz-app/src/App.vue
index 4c6c8df8..65b54a08 100644
--- a/quiz-app/src/App.vue
+++ b/quiz-app/src/App.vue
@@ -5,7 +5,7 @@
{{ questions[locale][0].title }}
diff --git a/quiz-app/src/assets/translations/es/group-3.json b/quiz-app/src/assets/translations/es/group-3.json
index 54a2ea47..45ba5657 100644
--- a/quiz-app/src/assets/translations/es/group-3.json
+++ b/quiz-app/src/assets/translations/es/group-3.json
@@ -173,8 +173,7 @@
]
},
{
- "questionText": "En un histograma, un "
- bin " se refiere a:",
+ "questionText": "En un histograma, un 'bin' se refiere a:",
"answerOptions": [{
"answerText": "una clase de datos",
"isCorrect": "false"
From 40706007057840bca37292598e90a12aea7c6133 Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Mon, 4 Oct 2021 19:50:05 -0500
Subject: [PATCH 013/319] feat: Translation for module 1 section 1wq
---
.../translations/README.es.md | 190 +++++++++---------
1 file changed, 100 insertions(+), 90 deletions(-)
diff --git a/1-Introduction/01-defining-data-science/translations/README.es.md b/1-Introduction/01-defining-data-science/translations/README.es.md
index 873ad74f..20011f7c 100644
--- a/1-Introduction/01-defining-data-science/translations/README.es.md
+++ b/1-Introduction/01-defining-data-science/translations/README.es.md
@@ -1,165 +1,175 @@
-# Defining Data Science
+# Definiendo la Ciencia de Datos
-| ](../../sketchnotes/01-Definitions.png)|
+| ](../../sketchnotes/01-Definitions.png)|
|:---:|
-|Defining Data Science - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
+|Definiendo la Ciencia de Datos - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
---
-[](https://youtu.be/pqqsm5reGvs)
+[](https://youtu.be/pqqsm5reGvs)
-## [Pre-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/0)
+## [Examen previo a la lección](https://red-water-0103e7a0f.azurestaticapps.net/quiz/0)
-## What is Data?
-In our everyday life, we are constantly surrounded by data. The text you are reading now is data, the list of phone numbers of your friends in your smartphone is data, as well as the current time displayed on your watch. As human beings, we naturally operate with data by counting the money we have or writing letters to our friends.
+## ¿Qué son los Datos?
+En nuestra vida diaria, estamos constantemente rodeados por datos. El texto que estás leyendo ahora son datos,
+la lista de números telefónicos de tus amigos en tu móvil son datos, también como la hora actual que se muestra en tu reloj.
+Como seres humanos, operamos naturalmente con datos, contando el dinero que tenemos o escribiendo cartas a nuestros amigos.
-However, data became much more critical with the creation of computers. The primary role of computers is to perform computations, but they need data to operate on. Thus, we need to understand how computers store and process data.
+Sin embargo, los datos se vuelven más críticos con la creación de las computadoras. El rol principal de las computadoras
+es realizar cálculos, pero éstas necesitan datos para operar. Por lo cual, necesitamos entender cómo las computadoras
+almacenan y procesan los datos.
-With the emergence of the Internet, the role of computers as data handling devices increased. If you think of it, we now use computers more and more for data processing and communication, rather than actual computations. When we write an e-mail to a friend or search for some information on the Internet - we are essentially creating, storing, transmitting, and manipulating data.
-> Can you remember the last time you have used computers to actually compute something?
+Con el surgimiento de internet, el rol de las computadoras como dispositivos para la manipulación de datos incrementó.
+Si lo piensas, ahora usamos computadoras mucho más para la comunicación y el procesamiento de datos, en lugar de para hacer cálculos. Cuando escribimos un correo electrónico a un amigo o buscamos alguna información en internet - estamos
+creando, almacenando, transmitiendo y manipulando datos.
-## What is Data Science?
+> ¿Recuerdas la última vez que usaste una computadora para realmente calcular algo?
-In [Wikipedia](https://en.wikipedia.org/wiki/Data_science), **Data Science** is defined as *a scientific field that uses scientific methods to extract knowledge and insights from structured and unstructured data, and apply knowledge and actionable insights from data across a broad range of application domains*.
+## ¿Qué es Ciencia de Datos?
-This definition highlights the following important aspects of data science:
+En [Wikipedia](https://en.wikipedia.org/wiki/Data_science), se define la **Ciencia de Datos** como *un campo de las ciencias que usa métodos científicos para extraer conocimiento y perspectivas de datos estructurados y no estructurados, y
+aplicar el conocimiento y conocimiento práctico de los datos a través de un amplio rango de dominios de aplicación*.
-* The main goal of data science is to **extract knowledge** from data, in order words - to **understand** data, find some hidden relationships and build a **model**.
-* Data science uses **scientific methods**, such as probability and statistics. In fact, when the term *data science* was first introduced, some people argued that data science is just a new fancy name for statistics. Nowadays it has become evident that the field is much broader.
-* Obtained knowledge should be applied to produce some **actionable insights**.
-* We should be able to operate on both **structured** and **unstructured** data. We will come back to discuss different types of data later in the course.
-* **Application domain** is an important concept, and data scientist often needs at least some degree of expertise in the problem domain.
+Ésta definición destaca los siguientes aspectos importantes para la ciencia de datos:
-> Another important aspect of Data Science is that it studies how data can be gathered, stored and operated upon using computers. While statistics gives us mathematical foundations, data science applies mathematical concepts to actually draw insights from data.
+* El objetivo principal para la ciencia de datos es **extraer conocimiento** de los datos, en otras palabras - **entender** los datos, encontrar relaciones ocultas y construir un **modelo**.
+* La ciencia de datos usa **métodos científicos**, como la probabilidad y estadística. De hecho, cuando el término **ciencia de datos** fue usado por primera vez, algunas personas argumentaron que la ciencia de datos era solo un nuevo nombre elegante para estadística. En estos días se ha vuelto evidente que es un campo mucho más amplio.
+* El conocimiento obtenido puede ser aplicado para producir **conocimiento práctico**.
+* Seremos capace de operar tanto datos **estructurados** y **no estructurados**. Más adelante en el curso discutiremos los diferentes tupos de datos.
+* El **dominio de la aplicación** es un concepto importante, y un científico de datos necesita al menos cierta experiencia en el dominio del problema.
-One of the ways (attributed to [Jim Gray](https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist))) to look at the data science is to consider it to be a separate paradigm of science:
-* **Empyrical**, in which we rely mostly on observations and results of experiments
-* **Theoretical**, where new concepts emerge from existing scientific knowledge
-* **Computational**, where we discover new principles based on some computational experiments
-* **Data-Driven**, based on discovering relationships and patterns in the data
+> Otro aspecto importante de la Ciencia de Datos es que esta estudia como los datos son obtenidos, almacenados y operados usando computadoras. Mientras la estadística nos da los fundamentos matemáticos, la ciencia de datos aplica los conceptos matemáticos para realmente extraer conocimiento de los datos.
-## Other Related Fields
+Una de las formas (atribuidas a [Jim Gray](https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist))) de ver a la ciencia de datos es considerarla como un paradigma separado de la ciencia:
+* **Empírica**, en la que confíamos mayormente en observaciones y resultados de experimientos
+* **Teórica**, donde surgen nuevos conceptos desde el conocimiento científico existente
+* **Computacional**, donde descubrimos nuevos principios basados en algunos experimentos computacionales
+* **Dirigidos por datos**, basados en el descubrimiento de relaciones y patrones en los datos
-Since data is a pervasive concept, data science itself is also a broad field, touching many other related disciplines.
+## Otros campos relacionados
+
+Ya que los datos son un concepto predominante, la ciencia de datos en sí misma también es un amplio campo, abarcando muchas otras disciplinas relacionadas.
-
Databases
+
Bases de datos
-The most obvious thing to consider is **how to store** the data, i.e. how to structure them in a way that allows faster processing. There are different types of databases that store structured and unstructured data, which [we will consider in our course](../../2-Working-With-Data/README.md).
+La cosa más obvia a considerar es **cómo almacenar** los datos, por ejemplo como estructurarlos de tal formar que se procesen más rápido. Existen distintos tipos de bases de datos que almacenan datos estructurados y no estructurados, los
+cuales [consideraremos en este curso] (../../2-Working-With-Data/README.md).
Big Data
-Often we need to store and process really large quantities of data with relatively simple structure. There are special approaches and tools to store that data in a distributed manner on a computer cluster, and process them efficiently.
+Usualmente necesitamos almacenar y procesar enormes cantidades de datos con estructuras relativamente simples. Existen
+formas especiales y herramientas para almacenar los datos en una forma distribuida on un clúster de computadoras, y procesarlas eficientemente.
-
Machine Learning
+
Aprendizaje automático
-One of the ways to understand the data is to **build a model** that will be able to predict desired outcome. Being able to learn such models from data is the area studied in **machine learning**. You may want to have a look at our [Machine Learning for Beginners](https://github.com/microsoft/ML-For-Beginners/) Curriculum to get deeper into that field.
+Una de las formas de entender los datos es **construir un modelo** que será capaz de predecir el resultado deseado. Ser capaz de aprender esos modelos de los datos es el área de estudio del **aprendizaje automático**. Querrás dar un vistazo a nuestro currículum de [Aprendizaje automático para principiantes](https://github.com/microsoft/ML-For-Beginners/) para profundizar en ese campo.
-
Artificial Intelligence
+
Inteligencia aritifcial
-As machine learning, artificial intelligence also relies on data, and it involves building high complexity models that will exhibit the behavior similar to a human being. Also, AI methods often allow us to turn unstructured data (eg. natural language) into structured by extracting some insights.
+Así como el aprendizaje automático, la inteligencia artificial también depende de los datos, e involucra la construcción de modelos altamente complejos que expondrán un comportamiento similar a un ser humano. Además, los métodos de AI usualmente nos permiten convertir datos no estructurados (por ejemplo, lenguaje natural) en datos estructurados extrayendo conocimiento útil.
-
Visualization
+
Visualización
-Vast amounts of data are incomprehensible for a human being, but once we create useful visualizations - we can start making much more sense of data, and drawing some conclusions. Thus, it is important to know many ways to visualize information - something that we will cover in [Section 3](../../3-Data-Visualization/README.md) of our course. Related fields also include **Infographics**, and **Human-Computer Interaction** in general.
+Cantidades descomunales de datos son incomprensibles para un ser humano, pero una vez que creamos visualizaciones útiles - podemos iniciar haciendo más sentido de los datos, y extrayendo algunas conclusiones. Por lo tanto, es importante conocer diversas formas de visualizar la información - lo cual cubriremos en la [Sección 3](../../3-Data-Visualization/README.md) de nuestro curso. Campos relacionados incluyen **infografías**, e **interacción humano-computadora** en general.
-## Types of Data
+## Tipos de datos
-As we have already mentioned - data is everywhere, we just need to capture it in the right way! It is useful to distinguish between **structured** and **unstructured** data. The former are typically represented in some well-structured form, often as a table or number of tables, while latter is just a collection of files. Sometimes we can also talk about **semistructured** data, that have some sort of a structure that may vary greatly.
+Como ya se ha mencionado - los datos están en todas partes, ¡sólo necesitamos capturarlos en la forma correcta! Es útil distinguir entre datos **estructurados** y **no estructurados**. Los primeros típicamente son representados en una forma bien estructurada, usualmente como una tabla o conunto de tablas, mientras que los últimos es sólo una colección de archivos. Algunas veces podemos hablar de datos **semi-estructurados**, que tienen cierta estructura la cual podría variar mucho.
-| Structured | Semi-structured | Unstructured |
-|----------- |-----------------|--------------|
-| List of people with their phone numbers | Wikipedia pages with links | Text of Encyclopaedia Britannica |
-| Temperature in all rooms of a building at every minute for the last 20 years | Collection of scientific papers in JSON format with authors, data of publication, and abstract | File share with corporate documents |
-| Data for age and gender of all people entering the building | Internet pages | Raw video feed from surveillance camera |
+| Estructurado | Semi-estructurado | No estructurado |
+|------------- |-------------------|-----------------|
+| Lista de personas con sus números telefónicos | Páginas de wikipedia con enlaces | Texto de la enciclopedia Británica |
+| Temperatura en todas las habitaciones de un edificio a cada minuto por los últimos 20 años | Colección de documentos científicos en formato JSON con autores, fecha de publicación, y resumen | Recurso compartido de archivos con documentos corporativos |
+| Datos por edad y género de todas las personas que entrar al edificio | Páginas de internet | Vídeo sin procesar de cámara de vigilancia |
-## Where to get Data
+## Dónde obtener datos
-There are many possible sources of data, and it will be impossible to list all of them! However, let's mention some of the typical places where you can get data:
+Hay múltiples fuentes de datos, y ¡sería imposible listarlas todas! Sin embargo, mencionemos algunos de los lugares típicos en dónde obtener datos:
-* **Structured**
- - **Internet of Things**, including data from different sensors, such as temperature or pressure sensors, provides a lot of useful data. For example, if an office building is equipped with IoT sensors, we can automatically control heating and lighting in order to minimize costs.
- - **Surveys** that we ask users after purchase of a good, or after visiting a web site.
- - **Analysis of behavior** can, for example, help us understand how deeply a user goes into a site, and what is the typical reason for leaving the site.
-* **Unstructured**
- - **Texts** can be a rich source of insights, starting from overall **sentiment score**, up to extracting keywords and even some semantic meaning.
- - **Images** or **Video**. A video from surveillance camera can be used to estimate traffic on the road, and inform people about potential traffic jams.
- - Web server **Logs** can be used to understand which pages of our site are most visited, and for how long.
-* Semi-structured
- - **Social Network** graph can be a great source of data about user personality and potential effectiveness in spreading information around.
- - When we have a bunch of photographs from a party, we can try to extract **Group Dynamics** data by building a graph of people taking pictures with each other.
+* **Estructurados**
+ - **Internet de las cosas**, incluyendo datos de distintos sensore, como sensores de temperatura o presión, proveen muchos datos útiles. Por ejemplo, si una oficina es equipada con sensores IoT, podemos controlar automáticamente la calefacción e iluminación para minimizar costos.
+ - **Encuestas** que realizamos a los usuarios después de pagar un producto o después de visitar un sitio web.
+ - **Análisis de comportamiento** podemos, por ejemplo, ayudarnos a entender que tanto profundiza un usuario en un sitio, y cuál es la razón típica por la cual lo deja.
+* **No estructurados**
+ - Los **Textos** pueden ser una fuente rica en conocimiento práctico, empezando por el **sentimiento principal** generalizado, hasta la extracción de palabras clave e incluso algún significado semántico.
+ - **Imágenes** o **Video**. Un video de una cámara de vigilancia puede ser usado para estimar el tráfico en carretera, e informar a las personas acerca de posibles embotellamientos.
+ - **Bitácoras** de servidores web pueden ser usadas para entender qué páginas de nuestro sitio son las más visitadas y por cuánto tiempo.
+* **Semi-estructurados**
+ - Grafos de **redes sociales** pueden ser una gran fuente de datos acerca de la la personalidad del usuario y efectividad potencial de difusión de la información.
+ - Cuando tenemos un conjunto de fotografías de una fiesta, podemos intentar extraer datos de la **dinámica de grupos** construyendo un grafo de personas tomándose fotos unas a otras.
-By knowing different possible sources of data, you can try to think about different scenarios where data science techniques can be applied to know the situation better, and to improve business processes.
+Conociendo posibles fuentes de datos diversas, puedes intentar pensar en distintos escenarios donde se pueden aplicar técnicas de ciencia de datos para conocer mejor la situación, y mejroar los procesos de negocio.
-## What you can do with Data
+## Qué puedes hacer con los datos
-In Data Science, we focus on the following steps of data journey:
+En la ciencia de datos, nos enfocamos en los siguientes pasos del viaje de los datos:
-
1) Data Acquisition
+
1) Adquisición de datos
-First step is to collect the data. While in many cases it can be a straightforward process, like data coming to a database from web application, sometimes we need to use special techniques. For example, data from IoT sensors can be overwhelming, and it is a good practice to use buffering endpoints such as IoT Hub to collect all the data before further processing.
+El primer paso es reunir los datos. Mientras que en muchos casos esto puede ser un proceso simple, como datos obtenidos des una base de datos de una aplicación web. algunas veces necesitamos usar técnicas especiales. Por ejemplo, los datos obtenidos desde sensorres IoT pueden ser inmensos, y es una buena práctica el uso de endpoints búfer como IoT Hub para para reunir todos los datos antes de procesarlos.
-
2) Data Storage
+
2) Almacenamiento de datos
-Storing the data can be challenging, especially if we are talking about big data. When deciding how to store data, it makes sense to anticipate the way you would want later on to query them. There are several ways data can be stored:
+Almacenar los datos puede ser desafiante, especialmente si hablamos de big data. Al decidir cómo almacer datos, hace sentido anticiparse a la forma en la cual serán consultados. Existen varias formas de almacenar los datos:
-
Relational database stores a collection of tables, and uses a special language called SQL to query them. Typically, tables would be connected to each other using some schema. In many cases we need to convert the data from original form to fit the schema.
-
NoSQL database, such as CosmosDB, does not enforce schema on data, and allows storing more complex data, for example, hierarchical JSON documents or graphs. However, NoSQL database does not have rich querying capabilities of SQL, and cannot enforce referential integrity between data.
-
Data Lake storage is used for large collections of data in raw form. Data lakes are often used with big data, where all data cannot fit into one machine, and has to be stored and processed by a cluster. Parquet is the data format that is often used in conjunction with big data.
+
Las bases de datos relacionales almacenan una colección de tabla, y usan un lenguaje especial llamado SQL para consultalos. Típicamente, las tablas estarían conectadas unas a otras mediante un esquema. En muchas ocasiones necesitamos convertir los datos desde la fuente original para que se ajusten al esquema.
+
Bases de datos NoSQL, como CosmosDB, no exigen un esquema de datos, y permiten almacenar datos más complejos, por ejemplo, documentos JSON jerárquicos o grafos. Sin embargo, Las bases de datos NoSQL no tienen capacidades de consulta SQL sofisticadas, y no requieren integridad referencial entre los datos.
+
El almacenamiento en lago de datos se usa para grandes colecciones de datos sin procesamiento. Los lagos de datos suelen ser usados con big data, donde todos los datos no pueden ser reunidos en un único equipo, y tienen que ser almacenados y procesados por un clúster. Parquet es un formato de datos que se utiliza comúnmente en conjunto con big data.
-
3) Data Processing
+
3) Procesamiento de datos
-This is the most exciting part of data journey, which involved processing the data from its original form to the form that can be used for visualization/model training. When dealing with unstructured data such as text or images, we may need to use some AI techniques to extract **features** from the data, thus converting it to structured form.
+Esta es la parte más emocionante del viaje de los datos, el cual involucra el procesamiento de los datos desde su forma original hasta la forma en que puede ser usada por visualizaciones/modelo de entrenamiento. Cuando tratamos con datos no estructurados como texto o imágenes, debemos usar algunas técnias de IA para extraer las **características** de los datos, y así convertirlos a su forma estructurada.
-
4) Visualization / Human Insights
+
4) Visualización / entendimiento humano
-Often to understand the data we need to visualize them. Having many different visualization techniques in our toolbox we can find the right view to make an insight. Often, data scientist needs to "play with data", visualizing it many times and looking for some relationships. Also, we may use techniques from statistics to test some hypotheses or prove correlation between different pieces of data.
+Usualmente para entender los datos necesitamos visualizarlos. Teniendo diversas ténicas de visualización en nuestro arsenal podemos encontrar la visualización adecuada para comprenderla. Comúnmente, un científico de datos necesita "jugar con los datos", visualizádolos varias veces y buscando alguna relación. Además, debemos usar técnicas de estadística para probar algunas hipótesis o probar la correlación entre distintas porciones de datos.
-
5) Training predictive model
+
5) Entrenando modelos predictivos
-Because the ultimate goal of data science is to be able to take decisions based on data, we may want to use the techniques of Machine Learning to build predictive model that will be able to solve our problem.
+Ya que el principal objetivo de la ciencia de datos es ser capaz de tomar decisiones basándonos en los datos, debemos usar técnicas de aprendizaje automático para construir modelos predictivos que serán capces de resolver nuestros problemas.
-Of course, depending on the actual data some steps might be missing (eg., when we already have the data in the database, or when we do not need model training), or some steps might be repeated several times (such as data processing).
+Por supuesto, dependiendo de los datos reales algunos pasos serán omitidos (por ejemplo, cuando ya tenemos los datos en la base de datos, o cuando no necesitamos modelo de entrenamiento), o algunos pasos deben ser repetidos varias veces (como el procesamiento de datos).
-## Digitalization and Digital Transformation
+## Digitalización y transformación digital
-In the last decade, many businesses started to understand the importance of data when making business decisions. To apply data science principles to running a business one first needs to collect some data, i.e. somehow turn business processes into digital form. This is known as **digitalization**, and followed by using data science techniques to guide decisions it often leads to significant increase of productivity (or even business pivot), called **digital transformation**.
+En la última década, muchos negocios comenzaron a entender la importancia de los datos al tomar decisiones de negocio. Para aplicar los principios de ciencia de datos para dirigir un negocio primero necesitas reunir algunos datos, por ejemplo, de alguna forma digitalizar los procesos de negocio. Esto es conocido como **digitalización**, y seguido usar técnicas de ciencia de datos para guiar decisiones esto usualmente conlleva a un incremento significativo de la productividad (o incluso negocios pivote), llamado **transformación digital**.
-Let's consider an example. Suppose, we have a data science course (like this one), which we deliver online to students, and we want to use data science to improve it. How can we do it?
+Consideremos el siguiente ejemplo. Supongaos, tenemos un curso de ciencia de datos (como éste), el cual ofrecemos en línea a los estudiante, y queremos usar ciencia de datos para mejorarl. ¿Cómo podemos hacerlo?
-We can start with thinking "what can be digitized?". The simplest way would be to measure time it takes each student to complete each module, and the obtained knowledge (eg. by giving multiple-choice test at the end of each module). By averaging time-to-complete across all students, we can find out which modules cause the most problems to students, and work on simplifying them.
+Podemos comenzar pensando "¿qué puede ser digitalizado?". La forma más simple sería medir el tiempo que le toma a cada estuddiante completar cada módulo, y el conocimiento obtenido (por ejemplo, realizando exámenes de opción múltiple al final de cada módulo). Promediando el tiempo en concluir de todos los estudiantes, y trabajar en simplificarlos.
-> You may argue that this approach is not ideal, because modules can be of different length. It is probably more fair to divide the time by the length of the module (in number of characters), and compare those values instead.
+> Argumentarás que este enfoque no es idóneo, porque los módulos pueden tener distinta duración. Problablemente es más justo dividir el tiempo por la longitud del módulo (en número de caracteres), y comparar esos valores en su lugar.
-When we start analyzing results of multiple-choice tests, we can try to find out specific concepts that students understand poorly, and improve the content. To do that, we need to design tests in such a way that each question maps to a certain concept or chunk of knowledge.
+Cuando comenzamos analizando los resultados de los exámenes de opción múltiple, intentamos encontrar conceptos específicos que los estudiantes entendieron vagamente,y mejorar el contenido. Para hacerlo, necesitamos diseñar exámenes de tal forma que cada pregunta se relacione a un concepto concreto o porción de conocimiento.
-If we want to get even more complicated, we can plot the time taken for each module against the age category of students. We might find out that for some age categories it takes inappropriately long time to complete the module, or students drop out at certain point. This can help us provide age recommendation for the module, and minimize people's dissatisfaction from wrong expectations.
+Si queremos hacerlo aún más complejo, podemos trazar el tiempo invertido en cada módulo contra la categoría de edad de los estudiantes. Encontraremos que para algunas categorías de edad les toma ciertamente más tiempo el completar el módulo, o algunos estudiantes abandonan el curso en cierto punto. Esot nos puede ayudar a sugerir recomendaciones de módulos por edad, y así minimizar el descontengo de la gente por falsas expectativas.
-## 🚀 Challenge
+## 🚀 Desafío
-In this challenge, we will try to find concepts relevant to the field of Data Science by looking at texts. We will take Wikipedia article on Data Science, download and process the text, and then build a word cloud like this one:
+En este desafío, intentaremos encontrar los conceptos relevante para el campo de la Ciencia de Datos consultando algunos textos. Tomarermos un artículo de Wikipedia de Ciecnia de Datos, descargaremos y procesaremos el texto, y luego construiremos una nube de palabras como esta:
-
+
-Visit [`notebook.ipynb`](notebook.ipynb) to read through the code. You can also run the code, and see how it performs all data transformations in real time.
+Visita [`notebook.ipynb`](notebook.ipynb) para leer el código.También pueder ejecutarlo y ver como realiza todas las transformaciones de los datos en tiempo real.
-> If you do not know how to run code in Jupyter Notebook, have a look at [this article](https://soshnikov.com/education/how-to-execute-notebooks-from-github/).
+> Si no sabes como ejecutar el código en Jupyter Notebook, da un vistazo a [este artículo](https://soshnikov.com/education/how-to-execute-notebooks-from-github/).
-## [Post-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/1)
+## [Cuestionario porterior a la lección](https://red-water-0103e7a0f.azurestaticapps.net/quiz/1)
-## Assignments
+## Ejercicios
-* **Task 1**: Modify the code above to find out related concepts for the fields of **Big Data** and **Machine Learning**
-* **Task 2**: [Think About Data Science Scenarios](assignment.md)
+* **Tarea 1**: Modifica el código anterior para encontrar conceptos relacionados para los campos de **Big Data** y **Machine Learning**
+* **Tarea 2**: [Piensa en los escenarios para la Ciencia de Datos](assignment.md)
-## Credits
+## Créditos
-This lesson has been authored with ♥️ by [Dmitry Soshnikov](http://soshnikov.com)
+Esta lección ha sido escrita con ♥️ por [Dmitry Soshnikov](http://soshnikov.com)
From 1f95d73ca3d7c1a5f55b98893cd6edf7e029a575 Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Mon, 4 Oct 2021 19:50:53 -0500
Subject: [PATCH 014/319] feat: Module 1 section 2 - Add file to be translated
---
.../02-ethics/translations/README.es.md | 263 ++++++++++++++++++
1 file changed, 263 insertions(+)
diff --git a/1-Introduction/02-ethics/translations/README.es.md b/1-Introduction/02-ethics/translations/README.es.md
index e69de29b..d7442aaa 100644
--- a/1-Introduction/02-ethics/translations/README.es.md
+++ b/1-Introduction/02-ethics/translations/README.es.md
@@ -0,0 +1,263 @@
+# Introduction to Data Ethics
+
+| ](../../sketchnotes/02-Ethics.png)|
+|:---:|
+| Data Science Ethics - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
+
+---
+
+We are all data citizens living in a datafied world.
+
+Market trends tell us that by 2022, 1-in-3 large organizations will buy and sell their data through online [Marketplaces and Exchanges](https://www.gartner.com/smarterwithgartner/gartner-top-10-trends-in-data-and-analytics-for-2020/). As **App Developers**, we'll find it easier and cheaper to integrate data-driven insights and algorithm-driven automation into daily user experiences. But as AI becomes pervasive, we'll also need to understand the potential harms caused by the [weaponization](https://www.youtube.com/watch?v=TQHs8SA1qpk) of such algorithms at scale.
+
+Trends also indicate that we will create and consume over [180 zettabytes](https://www.statista.com/statistics/871513/worldwide-data-created/) of data by 2025. As **Data Scientists**, this gives us unprecedented levels of access to personal data. This means we can build behavioral profiles of users and influence decision-making in ways that create an [illusion of free choice](https://www.datasciencecentral.com/profiles/blogs/the-illusion-of-choice) while potentially nudging users towards outcomes we prefer. It also raises broader questions on data privacy and user protections.
+
+Data ethics are now _necessary guardrails_ for data science and engineering, helping us minimize potential harms and unintended consequences from our data-driven actions. The [Gartner Hype Cycle for AI](https://www.gartner.com/smarterwithgartner/2-megatrends-dominate-the-gartner-hype-cycle-for-artificial-intelligence-2020/) identifies relevant trends in digital ethics, responsible AI ,and AI governances as key drivers for larger megatrends around _democratization_ and _industrialization_ of AI.
+
+
+
+In this lesson, we'll explore the fascinating area of data ethics - from core concepts and challenges, to case studies and applied AI concepts like governance - that help establish an ethics culture in teams and organizations that work with data and AI.
+
+
+
+
+## [Pre-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/2) 🎯
+
+## Basic Definitions
+
+Let's start by understanding the basic terminology.
+
+The word "ethics" comes from the [Greek word "ethikos"](https://en.wikipedia.org/wiki/Ethics) (and its root "ethos") meaning _character or moral nature_.
+
+**Ethics** is about the shared values and moral principles that govern our behavior in society. Ethics is based not on laws but on
+widely accepted norms of what is "right vs. wrong". However, ethical considerations can influence corporate governance initiatives and government regulations that create more incentives for compliance.
+
+**Data Ethics** is a [new branch of ethics](https://royalsocietypublishing.org/doi/full/10.1098/rsta.2016.0360#sec-1) that "studies and evaluates moral problems related to _data, algorithms and corresponding practices_". Here, **"data"** focuses on actions related to generation, recording, curation, processing dissemination, sharing ,and usage, **"algorithms"** focuses on AI, agents, machine learning ,and robots, and **"practices"** focuses on topics like responsible innovation, programming, hacking and ethics codes.
+
+**Applied Ethics** is the [practical application of moral considerations](https://en.wikipedia.org/wiki/Applied_ethics). It's the process of actively investigating ethical issues in the context of _real-world actions, products and processes_, and taking corrective measures to make that these remain aligned with our defined ethical values.
+
+**Ethics Culture** is about [_operationalizing_ applied ethics](https://hbr.org/2019/05/how-to-design-an-ethical-organization) to make sure that our ethical principles and practices are adopted in a consistent and scalable manner across the entire organization. Successful ethics cultures define organization-wide ethical principles, provide meaningful incentives for compliance, and reinforce ethics norms by encouraging and amplifying desired behaviors at every level of the organization.
+
+
+## Ethics Concepts
+
+In this section, we'll discuss concepts like **shared values** (principles) and **ethical challenges** (problems) for data ethics - and explore **case studies** that help you understand these concepts in real-world contexts.
+
+### 1. Ethics Principles
+
+Every data ethics strategy begins by defining _ethical principles_ - the "shared values" that describe acceptable behaviors, and guide compliant actions, in our data & AI projects. You can define these at an individual or team level. However, most large organizations outline these in an _ethical AI_ mission statement or framework that is defined at corporate levels and enforced consistently across all teams.
+
+**Example:** Microsoft's [Responsible AI](https://www.microsoft.com/en-us/ai/responsible-ai) mission statement reads: _"We are committed to the advancement of AI-driven by ethical principles that put people first"_ - identifying 6 ethical principles in the framework below:
+
+
+
+Let's briefly explore these principles. _Transparency_ and _accountability_ are foundational values that other principles built upon - so let's begin there:
+
+* [**Accountability**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) makes practitioners _responsible_ for their data & AI operations, and compliance with these ethical principles.
+* [**Transparency**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) ensures that data and AI actions are _understandable_ (interpretable) to users, explaining the what and why behind decisions.
+* [**Fairness**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1%3aprimaryr6) - focuses on ensuring AI treats _all people_ fairly, addressing any systemic or implicit socio-technical biases in data and systems.
+* [**Reliability & Safety**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - ensures that AI behaves _consistently_ with defined values, minimizing potential harms or unintended consequences.
+* [**Privacy & Security**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - is about understanding data lineage, and providing _data privacy and related protections_ to users.
+* [**Inclusiveness**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - is about designing AI solutions with intention, adapting them to meet a _broad range of human needs_ & capabilities.
+
+> 🚨 Think about what your data ethics mission statement could be. Explore ethical AI frameworks from other organizations - here are examples from [IBM](https://www.ibm.com/cloud/learn/ai-ethics), [Google](https://ai.google/principles) ,and [Facebook](https://ai.facebook.com/blog/facebooks-five-pillars-of-responsible-ai/). What shared values do they have in common? How do these principles relate to the AI product or industry they operate in?
+
+### 2. Ethics Challenges
+
+Once we have ethical principles defined, the next step is to evaluate our data and AI actions to see if they align with those shared values. Think about your actions in two categories: _data collection_ and _algorithm design_.
+
+With data collection, actions will likely involve **personal data** or personally identifiable information (PII) for identifiable living individuals. This includes [diverse items of non-personal data](https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en) that _collectively_ identify an individual. Ethical challenges can relate to _data privacy_, _data ownership_, and related topics like _informed consent_ and _intellectual property rights_ for users.
+
+With algorithm design, actions will involve collecting & curating **datasets**, then using them to train & deploy **data models** that predict outcomes or automate decisions in real-world contexts. Ethical challenges can arise from _dataset bias_, _data quality_ issues, _unfairness_ ,and _misrepresentation_ in algorithms - including some issues that are systemic in nature.
+
+In both cases, ethics challenges highlight areas where our actions may encounter conflict with our shared values. To detect, mitigate, minimize, or eliminate, these concerns - we need to ask moral "yes/no" questions related to our actions, then take corrective actions as needed. Let's take a look at some ethical challenges and the moral questions they raise:
+
+
+#### 2.1 Data Ownership
+
+Data collection often involves personal data that can identify the data subjects. [Data ownership](https://permission.io/blog/data-ownership) is about _control_ and [_user rights_](https://permission.io/blog/data-ownership) related to the creation, processing ,and dissemination of data.
+
+The moral questions we need to ask are:
+ * Who owns the data? (user or organization)
+ * What rights do data subjects have? (ex: access, erasure, portability)
+ * What rights do organizations have? (ex: rectify malicious user reviews)
+
+#### 2.2 Informed Consent
+
+[Informed consent](https://legaldictionary.net/informed-consent/) defines the act of users agreeing to an action (like data collection) with a _full understanding_ of relevant facts including the purpose, potential risks ,and alternatives.
+
+Questions to explore here are:
+ * Did the user (data subject) give permission for data capture and usage?
+ * Did the user understand the purpose for which that data was captured?
+ * Did the user understand the potential risks from their participation?
+
+#### 2.3 Intellectual Property
+
+[Intellectual property](https://en.wikipedia.org/wiki/Intellectual_property) refers to intangible creations resulting from the human initiative, that may _have economic value_ to individuals or businesses.
+
+Questions to explore here are:
+ * Did the collected data have economic value to a user or business?
+ * Does the **user** have intellectual property here?
+ * Does the **organization** have intellectual property here?
+ * If these rights exist, how are we protecting them?
+
+#### 2.4 Data Privacy
+
+[Data privacy](https://www.northeastern.edu/graduate/blog/what-is-data-privacy/) or information privacy refers to the preservation of user privacy and protection of user identity with respect to personally identifiable information.
+
+Questions to explore here are:
+ * Is users' (personal) data secured against hacks and leaks?
+ * Is users' data accessible only to authorized users and contexts?
+ * Is users' anonymity preserved when data is shared or disseminated?
+ * Can a user be de-identified from anonymized datasets?
+
+
+#### 2.5 Right To Be Forgotten
+
+The [Right To Be Forgotten](https://en.wikipedia.org/wiki/Right_to_be_forgotten) or [Right to Erasure](https://www.gdpreu.org/right-to-be-forgotten/) provides additional personal data protection to users. Specifically, it gives users the right to request deletion or removal of personal data from Internet searches and other locations, _under specific circumstances_ - allowing them a fresh start online without past actions being held against them.
+
+Questions to explore here are:
+ * Does the system allow data subjects to request erasure?
+ * Should the withdrawal of user consent trigger automated erasure?
+ * Was data collected without consent or by unlawful means?
+ * Are we compliant with government regulations for data privacy?
+
+
+#### 2.6 Dataset Bias
+
+Dataset or [Collection Bias](http://researcharticles.com/index.php/bias-in-data-collection-in-research/) is about selecting a _non-representative_ subset of data for algorithm development, creating potential unfairness in result outcomes for diverse groups. Types of bias include selection or sampling bias, volunteer bias, and instrument bias.
+
+Questions to explore here are:
+ * Did we recruit a representative set of data subjects?
+ * Did we test our collected or curated dataset for various biases?
+ * Can we mitigate or remove any discovered biases?
+
+#### 2.7 Data Quality
+
+[Data Quality](https://lakefs.io/data-quality-testing/) looks at the validity of the curated dataset used to develop our algorithms, checking to see if features and records meet requirements for the level of accuracy and consistency needed for our AI purpose.
+
+Questions to explore here are:
+ * Did we capture valid _features_ for our use case?
+ * Was data captured _consistently_ across diverse data sources?
+ * Is the dataset _complete_ for diverse conditions or scenarios?
+ * Is information captured _accurately_ in reflecting reality?
+
+#### 2.8 Algorithm Fairness
+
+[Algorithm Fairness](https://towardsdatascience.com/what-is-algorithm-fairness-3182e161cf9f) checks to see if the algorithm design systematically discriminates against specific subgroups of data subjects leading to [potential harms](https://docs.microsoft.com/en-us/azure/machine-learning/concept-fairness-ml) in _allocation_ (where resources are denied or withheld from that group) and _quality of service_ (where AI is not as accurate for some subgroups as it is for others).
+
+Questions to explore here are:
+ * Did we evaluate model accuracy for diverse subgroups and conditions?
+ * Did we scrutinize the system for potential harms (e.g., stereotyping)?
+ * Can we revise data or retrain models to mitigate identified harms?
+
+Explore resources like [AI Fairness checklists](https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4t6dA) to learn more.
+
+#### 2.9 Misrepresentation
+
+[Data Misrepresentation](https://www.sciencedirect.com/topics/computer-science/misrepresentation) is about asking whether we are communicating insights from honestly reported data in a deceptive manner to support a desired narrative.
+
+Questions to explore here are:
+ * Are we reporting incomplete or inaccurate data?
+ * Are we visualizing data in a manner that drives misleading conclusions?
+ * Are we using selective statistical techniques to manipulate outcomes?
+ * Are there alternative explanations that may offer a different conclusion?
+
+#### 2.10 Free Choice
+The [Illusion of Free Choice](https://www.datasciencecentral.com/profiles/blogs/the-illusion-of-choice) occurs when system "choice architectures" use decision-making algorithms to nudge people towards taking a preferred outcome while seeming to give them options and control. These [dark patterns](https://www.darkpatterns.org/) can cause social and economic harm to users. Because user decisions impact behavior profiles, these actions potentially drive future choices that can amplify or extend the impact of these harms.
+
+Questions to explore here are:
+ * Did the user understand the implications of making that choice?
+ * Was the user aware of (alternative) choices and the pros & cons of each?
+ * Can the user reverse an automated or influenced choice later?
+
+### 3. Case Studies
+
+To put these ethical challenges in real-world contexts, it helps to look at case studies that highlight the potential harms and consequences to individuals and society, when such ethics violations are overlooked.
+
+Here are a few examples:
+
+| Ethics Challenge | Case Study |
+|--- |--- |
+| **Informed Consent** | 1972 - [Tuskegee Syphillis Study](https://en.wikipedia.org/wiki/Tuskegee_Syphilis_Study) - African American men who participated in the study were promised free medical care _but deceived_ by researchers who failed to inform subjects of their diagnosis or about availability of treatment. Many subjects died & partners or children were affected; the study lasted 40 years. |
+| **Data Privacy** | 2007 - The [Netflix data prize](https://www.wired.com/2007/12/why-anonymous-data-sometimes-isnt/) provided researchers with _10M anonymized movie rankings from 50K customers_ to help improve recommendation algorithms. However, researchers were able to correlate anonymized data with personally-identifiable data in _external datasets_ (e.g., IMDb comments) - effectively "de-anonymizing" some Netflix subscribers.|
+| **Collection Bias** | 2013 - The City of Boston [developed Street Bump](https://www.boston.gov/transportation/street-bump), an app that let citizens report potholes, giving the city better roadway data to find and fix issues. However, [people in lower income groups had less access to cars and phones](https://hbr.org/2013/04/the-hidden-biases-in-big-data), making their roadway issues invisible in this app. Developers worked with academics to _equitable access and digital divides_ issues for fairness. |
+| **Algorithmic Fairness** | 2018 - The MIT [Gender Shades Study](http://gendershades.org/overview.html) evaluated the accuracy of gender classification AI products, exposing gaps in accuracy for women and persons of color. A [2019 Apple Card](https://www.wired.com/story/the-apple-card-didnt-see-genderand-thats-the-problem/) seemed to offer less credit to women than men. Both illustrated issues in algorithmic bias leading to socio-economic harms.|
+| **Data Misrepresentation** | 2020 - The [Georgia Department of Public Health released COVID-19 charts](https://www.vox.com/covid-19-coronavirus-us-response-trump/2020/5/18/21262265/georgia-covid-19-cases-declining-reopening) that appeared to mislead citizens about trends in confirmed cases with non-chronological ordering on the x-axis. This illustrates misrepresentation through visualization tricks. |
+| **Illusion of free choice** | 2020 - Learning app [ABCmouse paid $10M to settle an FTC complaint](https://www.washingtonpost.com/business/2020/09/04/abcmouse-10-million-ftc-settlement/) where parents were trapped into paying for subscriptions they couldn't cancel. This illustrates dark patterns in choice architectures, where users were nudged towards potentially harmful choices. |
+| **Data Privacy & User Rights** | 2021 - Facebook [Data Breach](https://www.npr.org/2021/04/09/986005820/after-data-breach-exposes-530-million-facebook-says-it-will-not-notify-users) exposed data from 530M users, resulting in a $5B settlement to the FTC. It however refused to notify users of the breach violating user rights around data transparency and access. |
+
+Want to explore more case studies? Check out these resources:
+* [Ethics Unwrapped](https://ethicsunwrapped.utexas.edu/case-studies) - ethics dilemmas across diverse industries.
+* [Data Science Ethics course](https://www.coursera.org/learn/data-science-ethics#syllabus) - landmark case studies explored.
+* [Where things have gone wrong](https://deon.drivendata.org/examples/) - deon checklist with examples
+
+> 🚨 Think about the case studies you've seen - have you experienced, or been affected by, a similar ethical challenge in your life? Can you think of at least one other case study that illustrates one of the ethical challenges we've discussed in this section?
+
+## Applied Ethics
+
+We've talked about ethics concepts, challenges ,and case studies in real-world contexts. But how do we get started _applying_ ethical principles and practices in our projects? And how do we _operationalize_ these practices for better governance? Let's explore some real-world solutions:
+
+### 1. Professional Codes
+
+Professional Codes offer one option for organizations to "incentivize" members to support their ethical principles and mission statement. Codes are _moral guidelines_ for professional behavior, helping employees or members make decisions that align with their organization's principles. They are only as good as the voluntary compliance from members; however, many organizations offer additional rewards and penalties to motivate compliance from members.
+
+Examples include:
+
+ * [Oxford Munich](http://www.code-of-ethics.org/code-of-conduct/) Code of Ethics
+ * [Data Science Association](http://datascienceassn.org/code-of-conduct.html) Code of Conduct (created 2013)
+ * [ACM Code of Ethics and Professional Conduct](https://www.acm.org/code-of-ethics) (since 1993)
+
+> 🚨 Do you belong to a professional engineering or data science organization? Explore their site to see if they define a professional code of ethics. What does this say about their ethical principles? How are they "incentivizing" members to follow the code?
+
+### 2. Ethics Checklists
+
+While professional codes define required _ethical behavior_ from practitioners, they [have known limitations](https://resources.oreilly.com/examples/0636920203964/blob/master/of_oaths_and_checklists.md) in enforcement, particularly in large-scale projects. Instead, many data Science experts [advocate for checklists](https://resources.oreilly.com/examples/0636920203964/blob/master/of_oaths_and_checklists.md), that can **connect principles to practices** in more deterministic and actionable ways.
+
+Checklists convert questions into "yes/no" tasks that can be operationalized, allowing them to be tracked as part of standard product release workflows.
+
+Examples include:
+ * [Deon](https://deon.drivendata.org/) - a general-purpose data ethics checklist created from [industry recommendations](https://deon.drivendata.org/#checklist-citations) with a command-line tool for easy integration.
+ * [Privacy Audit Checklist](https://cyber.harvard.edu/ecommerce/privacyaudit.html) - provides general guidance for information handling practices from legal and social exposure perspectives.
+ * [AI Fairness Checklist](https://www.microsoft.com/en-us/research/project/ai-fairness-checklist/) - created by AI practitioners to support the adoption and integration of fairness checks into AI development cycles.
+ * [22 questions for ethics in data and AI](https://medium.com/the-organization/22-questions-for-ethics-in-data-and-ai-efb68fd19429) - more open-ended framework, structured for initial exploration of ethical issues in design, implementation, and organizational, contexts.
+
+### 3. Ethics Regulations
+
+Ethics is about defining shared values and doing the right thing _voluntarily_. **Compliance** is about _following the law_ if and where defined. **Governance** broadly covers all the ways in which organizations operate to enforce ethical principles and comply with established laws.
+
+Today, governance takes two forms within organizations. First, it's about defining **ethical AI** principles and establishing practices to operationalize adoption across all AI-related projects in the organization. Second, it's about complying with all government-mandated **data protection regulations** for regions it operates in.
+
+Examples of data protection and privacy regulations:
+
+ * `1974`, [US Privacy Act](https://www.justice.gov/opcl/privacy-act-1974) - regulates _federal govt._ collection, use ,and disclosure of personal information.
+ * `1996`, [US Health Insurance Portability & Accountability Act (HIPAA)](https://www.cdc.gov/phlp/publications/topic/hipaa.html) - protects personal health data.
+ * `1998`, [US Children's Online Privacy Protection Act (COPPA)](https://www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/childrens-online-privacy-protection-rule) - protects data privacy of children under 13.
+ * `2018`, [General Data Protection Regulation (GDPR)](https://gdpr-info.eu/) - provides user rights, data protection ,and privacy.
+ * `2018`, [California Consumer Privacy Act (CCPA)](https://www.oag.ca.gov/privacy/ccpa) gives consumers more _rights_ over their (personal) data.
+ * `2021`, China's [Personal Information Protection Law](https://www.reuters.com/world/china/china-passes-new-personal-data-privacy-law-take-effect-nov-1-2021-08-20/) just passed, creating one of the strongest online data privacy regulations worldwide.
+
+> 🚨 The European Union defined GDPR (General Data Protection Regulation) remains one of the most influential data privacy regulations today. Did you know it also defines [8 user rights](https://www.freeprivacypolicy.com/blog/8-user-rights-gdpr) to protect citizens' digital privacy and personal data? Learn about what these are, and why they matter.
+
+
+### 4. Ethics Culture
+
+Note that there remains an intangible gap between _compliance_ (doing enough to meet "the letter of the law") and addressing [systemic issues](https://www.coursera.org/learn/data-science-ethics/home/week/4) (like ossification, information asymmetry ,and distributional unfairness) that can speed up the weaponization of AI.
+
+The latter requires [collaborative approaches to defining ethics cultures](https://towardsdatascience.com/why-ai-ethics-requires-a-culture-driven-approach-26f451afa29f) that build emotional connections and consistent shared values _across organizations_ in the industry. This calls for more [formalized data ethics cultures](https://www.codeforamerica.org/news/formalizing-an-ethical-data-culture/) in organizations - allowing _anyone_ to [pull the Andon cord](https://en.wikipedia.org/wiki/Andon_(manufacturing)) (to raise ethics concerns early in the process) and making _ethical assessments_ (e.g., in hiring) a core criteria team formation in AI projects.
+
+---
+## [Post-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/3) 🎯
+## Review & Self Study
+
+Courses and books help with understanding core ethics concepts and challenges, while case studies and tools help with applied ethics practices in real-world contexts. Here are a few resources to start with.
+
+* [Machine Learning For Beginners](https://github.com/microsoft/ML-For-Beginners/blob/main/1-Introduction/3-fairness/README.md) - lesson on Fairness, from Microsoft.
+* [Principles of Responsible AI](https://docs.microsoft.com/en-us/learn/modules/responsible-ai-principles/) - free learning path from Microsoft Learn.
+* [Ethics and Data Science](https://resources.oreilly.com/examples/0636920203964) - O'Reilly EBook (M. Loukides, H. Mason et. al)
+* [Data Science Ethics](https://www.coursera.org/learn/data-science-ethics#syllabus) - online course from the University of Michigan.
+* [Ethics Unwrapped](https://ethicsunwrapped.utexas.edu/case-studies) - case studies from the University of Texas.
+
+# Assignment
+
+[Write A Data Ethics Case Study](assignment.md)
From 3856c3bb631da2c48948c1bc74f43cc28d7ab16b Mon Sep 17 00:00:00 2001
From: Heril Changwal
Date: Tue, 5 Oct 2021 10:36:11 +0530
Subject: [PATCH 015/319] Added Translation README.hi
---
4-Data-Science-Lifecycle/translations/README.hi.md | 13 +++++++++++++
1 file changed, 13 insertions(+)
create mode 100644 4-Data-Science-Lifecycle/translations/README.hi.md
diff --git a/4-Data-Science-Lifecycle/translations/README.hi.md b/4-Data-Science-Lifecycle/translations/README.hi.md
new file mode 100644
index 00000000..27463b08
--- /dev/null
+++ b/4-Data-Science-Lifecycle/translations/README.hi.md
@@ -0,0 +1,13 @@
+# डेटा विज्ञान जीवनचक्र
+
+>तस्वीर Headway द्वारा Unsplashपर
+
+इन पाठों में, आप डेटा विज्ञान जीवनचक्र के कुछ पहलुओं का पता लगाएंगे, जिसमें डेटा के आसपास विश्लेषण और संचार शामिल है।
+
+### विषय
+1. [परिचय] (../14-Introduction/README.md)
+2. [विश्लेषण] (../15-Analyzing/README.md)
+3. [संचार] (../16-Communication/README.md)
+
+### क्रेडिट
+ये पाठ ❤ के साथ [जालेन मैक्गी](https://twitter.com/JalenMCG) और [जैस्मीन ग्रीनवे](https://twitter.com/paladique) द्वारा लिखे गए थे।
\ No newline at end of file
From 97356ea5c16fe89920b15b9197880a14e11bf07e Mon Sep 17 00:00:00 2001
From: Heril Changwal <76246330+Heril18@users.noreply.github.com>
Date: Tue, 5 Oct 2021 10:41:59 +0530
Subject: [PATCH 016/319] Update README.hi.md
---
4-Data-Science-Lifecycle/translations/README.hi.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/4-Data-Science-Lifecycle/translations/README.hi.md b/4-Data-Science-Lifecycle/translations/README.hi.md
index 27463b08..71811e7b 100644
--- a/4-Data-Science-Lifecycle/translations/README.hi.md
+++ b/4-Data-Science-Lifecycle/translations/README.hi.md
@@ -1,5 +1,5 @@
# डेटा विज्ञान जीवनचक्र
-
+
>तस्वीर Headway द्वारा Unsplashपर
इन पाठों में, आप डेटा विज्ञान जीवनचक्र के कुछ पहलुओं का पता लगाएंगे, जिसमें डेटा के आसपास विश्लेषण और संचार शामिल है।
@@ -7,7 +7,7 @@
### विषय
1. [परिचय] (../14-Introduction/README.md)
2. [विश्लेषण] (../15-Analyzing/README.md)
-3. [संचार] (../16-Communication/README.md)
+3. [संचार] (../16-communication/README.md)
### क्रेडिट
-ये पाठ ❤ के साथ [जालेन मैक्गी](https://twitter.com/JalenMCG) और [जैस्मीन ग्रीनवे](https://twitter.com/paladique) द्वारा लिखे गए थे।
\ No newline at end of file
+ये पाठ ❤ के साथ [जालेन मैक्गी](https://twitter.com/JalenMCG) और [जैस्मीन ग्रीनवे](https://twitter.com/paladique) द्वारा लिखे गए थे।
From 829b38f6f0ebb713d43691006ac4a388640d8bd3 Mon Sep 17 00:00:00 2001
From: Heril Changwal <76246330+Heril18@users.noreply.github.com>
Date: Tue, 5 Oct 2021 10:44:32 +0530
Subject: [PATCH 017/319] Update README.hi.md
---
4-Data-Science-Lifecycle/translations/README.hi.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/4-Data-Science-Lifecycle/translations/README.hi.md b/4-Data-Science-Lifecycle/translations/README.hi.md
index 71811e7b..11f31faf 100644
--- a/4-Data-Science-Lifecycle/translations/README.hi.md
+++ b/4-Data-Science-Lifecycle/translations/README.hi.md
@@ -1,12 +1,12 @@
# डेटा विज्ञान जीवनचक्र

->तस्वीर Headway द्वारा Unsplashपर
+>तस्वीर Headway द्वारा Unsplash पर
इन पाठों में, आप डेटा विज्ञान जीवनचक्र के कुछ पहलुओं का पता लगाएंगे, जिसमें डेटा के आसपास विश्लेषण और संचार शामिल है।
### विषय
1. [परिचय] (../14-Introduction/README.md)
-2. [विश्लेषण] (../15-Analyzing/README.md)
+2. [विश्लेषण] (../15-analyzing/README.md)
3. [संचार] (../16-communication/README.md)
### क्रेडिट
From 4d7f576b72a588e2dc4191ca0c9eced39a520243 Mon Sep 17 00:00:00 2001
From: Heril Changwal <76246330+Heril18@users.noreply.github.com>
Date: Tue, 5 Oct 2021 10:45:20 +0530
Subject: [PATCH 018/319] Update README.hi.md
---
4-Data-Science-Lifecycle/translations/README.hi.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/4-Data-Science-Lifecycle/translations/README.hi.md b/4-Data-Science-Lifecycle/translations/README.hi.md
index 11f31faf..90097585 100644
--- a/4-Data-Science-Lifecycle/translations/README.hi.md
+++ b/4-Data-Science-Lifecycle/translations/README.hi.md
@@ -5,7 +5,7 @@
इन पाठों में, आप डेटा विज्ञान जीवनचक्र के कुछ पहलुओं का पता लगाएंगे, जिसमें डेटा के आसपास विश्लेषण और संचार शामिल है।
### विषय
-1. [परिचय] (../14-Introduction/README.md)
+1. [परिचय] (https://github.com/microsoft/Data-Science-For-Beginners/blob/main/4-Data-Science-Lifecycle/14-Introduction/README.md)
2. [विश्लेषण] (../15-analyzing/README.md)
3. [संचार] (../16-communication/README.md)
From 2aa308da3363b8266b725870bc005a072561346f Mon Sep 17 00:00:00 2001
From: Heril Changwal <76246330+Heril18@users.noreply.github.com>
Date: Tue, 5 Oct 2021 10:46:44 +0530
Subject: [PATCH 019/319] Update README.hi.md
---
4-Data-Science-Lifecycle/translations/README.hi.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/4-Data-Science-Lifecycle/translations/README.hi.md b/4-Data-Science-Lifecycle/translations/README.hi.md
index 90097585..512fea7b 100644
--- a/4-Data-Science-Lifecycle/translations/README.hi.md
+++ b/4-Data-Science-Lifecycle/translations/README.hi.md
@@ -5,9 +5,9 @@
इन पाठों में, आप डेटा विज्ञान जीवनचक्र के कुछ पहलुओं का पता लगाएंगे, जिसमें डेटा के आसपास विश्लेषण और संचार शामिल है।
### विषय
-1. [परिचय] (https://github.com/microsoft/Data-Science-For-Beginners/blob/main/4-Data-Science-Lifecycle/14-Introduction/README.md)
-2. [विश्लेषण] (../15-analyzing/README.md)
-3. [संचार] (../16-communication/README.md)
+1. [परिचय](../14-Introduction/README.md)
+2. [विश्लेषण](../15-analyzing/README.md)
+3. [संचार](../16-communication/README.md)
### क्रेडिट
ये पाठ ❤ के साथ [जालेन मैक्गी](https://twitter.com/JalenMCG) और [जैस्मीन ग्रीनवे](https://twitter.com/paladique) द्वारा लिखे गए थे।
From 52452e59ea521a9c208d143d7ce047c7e33ad362 Mon Sep 17 00:00:00 2001
From: INDRASHIS PAUL
Date: Tue, 5 Oct 2021 11:07:14 +0530
Subject: [PATCH 020/319] Add the contents from the notebook to README
1. Copied almost all content (both code and explanations) from the notebook to the README in a proper format.
2. Left out extra portions and exercises in the notebook for the readers to try out.
---
.../08-data-preparation/README.md | 187 +++++++++++++++++-
1 file changed, 184 insertions(+), 3 deletions(-)
diff --git a/2-Working-With-Data/08-data-preparation/README.md b/2-Working-With-Data/08-data-preparation/README.md
index 5bb6c3d8..58c2e528 100644
--- a/2-Working-With-Data/08-data-preparation/README.md
+++ b/2-Working-With-Data/08-data-preparation/README.md
@@ -33,7 +33,8 @@ Depending on its source, raw data may contain some inconsistencies that will cau
Once you have loaded your data into pandas, it will more likely than not be in a DataFrame(refer to the previous [lesson](https://github.com/IndraP24/Data-Science-For-Beginners/tree/main/2-Working-With-Data/07-python#dataframe) for detailed overview). However, if the data set in your DataFrame has 60,000 rows and 400 columns, how do you even begin to get a sense of what you're working with? Fortunately, [pandas](https://pandas.pydata.org/) provides some convenient tools to quickly look at overall information about a DataFrame in addition to the first few and last few rows.
-In order to explore this functionality, we will import the Python scikit-learn library and use an iconic dataset: the **Iris data set **.
+In order to explore this functionality, we will import the Python scikit-learn library and use an iconic dataset: the **Iris data set**.
+
```python
import pandas as pd
from sklearn.datasets import load_iris
@@ -122,19 +123,199 @@ Look closely at the output. Does any of it surprise you? While `0` is an arithme
Now, let's turn this around and use these methods in a manner more like you will use them in practice. You can use Boolean masks directly as a ``Series`` or ``DataFrame`` index, which can be useful when trying to work with isolated missing (or present) values.
-> **Tkeaway**: Both the `isnull()` and `notnull()` methods produce similar results when you use them in `DataFrame`s: they show the results and the index of those results, which will help you enormously as you wrestle with your data.
+> **Takeaway**: Both the `isnull()` and `notnull()` methods produce similar results when you use them in `DataFrame`s: they show the results and the index of those results, which will help you enormously as you wrestle with your data.
- **Dropping null values**: Beyond identifying missing values, pandas provides a convenient means to remove null values from `Series` and `DataFrame`s. (Particularly on large data sets, it is often more advisable to simply remove missing [NA] values from your analysis than deal with them in other ways.) To see this in action, let's return to `example1`:
```python
example1 = example1.dropna()
example1
```
+```
+0 0
+2
+dtype: object
+```
+Note that this should look like your output from `example3[example3.notnull()]`. The difference here is that, rather than just indexing on the masked values, `dropna` has removed those missing values from the `Series` `example1`.
+
+Because `DataFrame`s have two dimensions, they afford more options for dropping data.
+
+```python
+example2 = pd.DataFrame([[1, np.nan, 7],
+ [2, 5, 8],
+ [np.nan, 6, 9]])
+example2
+```
+| | 0 | 1 | 2 |
+|------|---|---|---|
+|0 |1.0|NaN|7 |
+|1 |2.0|5.0|8 |
+|2 |NaN|6.0|9 |
+
+(Did you notice that pandas upcast two of the columns to floats to accommodate the `NaN`s?)
+
+You cannot drop a single value from a `DataFrame`, so you have to drop full rows or columns. Depending on what you are doing, you might want to do one or the other, and so pandas gives you options for both. Because in data science, columns generally represent variables and rows represent observations, you are more likely to drop rows of data; the default setting for `dropna()` is to drop all rows that contain any null values:
+
+```python
+example2.dropna()
+```
+```
+ 0 1 2
+1 2.0 5.0 8
+```
+If necessary, you can drop NA values from columns. Use `axis=1` to do so:
+```python
+example2.dropna(axis='columns')
+```
+```
+ 2
+0 7
+1 8
+2 9
+```
+Notice that this can drop a lot of data that you might want to keep, particularly in smaller datasets. What if you just want to drop rows or columns that contain several or even just all null values? You specify those setting in `dropna` with the `how` and `thresh` parameters.
+
+By default, `how='any'` (if you would like to check for yourself or see what other parameters the method has, run `example4.dropna?` in a code cell). You could alternatively specify `how='all'` so as to drop only rows or columns that contain all null values. Let's expand our example `DataFrame` to see this in action.
+
+```python
+example2[3] = np.nan
+example2
+```
+| |0 |1 |2 |3 |
+|------|---|---|---|---|
+|0 |1.0|NaN|7 |NaN|
+|1 |2.0|5.0|8 |NaN|
+|2 |NaN|6.0|9 |NaN|
+
+The `thresh` parameter gives you finer-grained control: you set the number of *non-null* values that a row or column needs to have in order to be kept:
+```python
+example2.dropna(axis='rows', thresh=3)
+```
+```
+ 0 1 2 3
+1 2.0 5.0 8 NaN
+```
+Here, the first and last row have been dropped, because they contain only two non-null values.
+
+- **Filling null values**: Depending on your dataset, it can sometimes make more sense to fill null values with valid ones rather than drop them. You could use `isnull` to do this in place, but that can be laborious, particularly if you have a lot of values to fill. Because this is such a common task in data science, pandas provides `fillna`, which returns a copy of the `Series` or `DataFrame` with the missing values replaced with one of your choosing. Let's create another example `Series` to see how this works in practice.
+```python
+example3 = pd.Series([1, np.nan, 2, None, 3], index=list('abcde'))
+example3
+```
+```
+a 1.0
+b NaN
+c 2.0
+d NaN
+e 3.0
+dtype: float64
+```
+You can fill all of the null entries with a single value, such as `0`:
+```python
+example3.fillna(0)
+```
+```
+a 1.0
+b 0.0
+c 2.0
+d 0.0
+e 3.0
+dtype: float64
+```
+You can **forward-fill** null values, which is to use the last valid value to fill a null:
+```python
+example3.fillna(method='ffill')
+```
+```
+a 1.0
+b 1.0
+c 2.0
+d 2.0
+e 3.0
+dtype: float64
+```
+You can also **back-fill** to propagate the next valid value backward to fill a null:
+```python
+example3.fillna(method='bfill')
+```
+```
+a 1.0
+b 2.0
+c 2.0
+d 3.0
+e 3.0
+dtype: float64
+```
+As you might guess, this works the same with `DataFrame`s, but you can also specify an `axis` along which to fill null values. taking the previously used `example2` again:
+```python
+example2.fillna(method='ffill', axis=1)
+```
+```
+ 0 1 2 3
+0 1.0 1.0 7.0 7.0
+1 2.0 5.0 8.0 8.0
+2 NaN 6.0 9.0 9.0
+```
+Notice that when a previous value is not available for forward-filling, the null value remains.
+
+> **Takeaway:** There are multiple ways to deal with missing values in your datasets. The specific strategy you use (removing them, replacing them, or even how you replace them) should be dictated by the particulars of that data. You will develop a better sense of how to deal with missing values the more you handle and interact with datasets.
+
+## Removing duplicate data
+
+> **Learning goal:** By the end of this subsection, you should be comfortable identifying and removing duplicate values from DataFrames.
+
+In addition to missing data, you will often encounter duplicated data in real-world datasets. Fortunately, `pandas` provides an easy means of detecting and removing duplicate entries.
+
+- **Identifying duplicates: `duplicated`**: You can easily spot duplicate values using the `duplicated` method in pandas, which returns a Boolean mask indicating whether an entry in a `DataFrame` is a duplicate of an ealier one. Let's create another example `DataFrame` to see this in action.
+```python
+example4 = pd.DataFrame({'letters': ['A','B'] * 2 + ['B'],
+ 'numbers': [1, 2, 1, 3, 3]})
+example4
+```
+| |letters|numbers|
+|------|-------|-------|
+|0 |A |1 |
+|1 |B |2 |
+|2 |A |1 |
+|3 |B |3 |
+|4 |B |3 |
+
+```python
+example4.duplicated()
+```
+```
+0 False
+1 False
+2 True
+3 False
+4 True
+dtype: bool
+```
+- **Dropping duplicates: `drop_duplicates`: `drop_duplicates` simply returns a copy of the data for which all of the `duplicated` values are `False`:
+```python
+example4.drop_duplicates()
+```
+```
+ letters numbers
+0 A 1
+1 B 2
+3 B 3
+```
+Both `duplicated` and `drop_duplicates` default to consider all columnsm but you can specify that they examine only a subset of columns in your `DataFrame`:
+```python
+example6.drop_duplicates(['letters'])
+```
+```
+letters numbers
+0 A 1
+1 B 2
+```
+> **Takeaway:** Removing duplicate data is an essential part of almost every data-science project. Duplicate data can change the results of your analyses and give you inaccurate results!
## 🚀 Challenge
-Give the exercises in the [notebook](https://github.com/microsoft/Data-Science-For-Beginners/blob/main/4-Data-Science-Lifecycle/15-analyzing/notebook.ipynb) a try!
+All of the discussed materials are provided as a [Jupyter Notebook](https://github.com/microsoft/Data-Science-For-Beginners/blob/main/4-Data-Science-Lifecycle/15-analyzing/notebook.ipynb). Additionally, there are exercises present after each section, give them a try!
## [Post-Lecture Quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/15)
From 083760cdf7416cafbd7dfb56f74c0aeadfb3e6d0 Mon Sep 17 00:00:00 2001
From: IndraP24
Date: Tue, 5 Oct 2021 11:13:08 +0530
Subject: [PATCH 021/319] Update the notebook with outputs to be used as
reference for the README
---
.../08-data-preparation/notebook.ipynb | 1220 +++++++++++++++--
1 file changed, 1114 insertions(+), 106 deletions(-)
diff --git a/2-Working-With-Data/08-data-preparation/notebook.ipynb b/2-Working-With-Data/08-data-preparation/notebook.ipynb
index e45a5cb5..93c37c35 100644
--- a/2-Working-With-Data/08-data-preparation/notebook.ipynb
+++ b/2-Working-With-Data/08-data-preparation/notebook.ipynb
@@ -3,28 +3,28 @@
{
"cell_type": "markdown",
"source": [
- "# Data Preparation\r\n",
- "\r\n",
- "[Original Notebook source from *Data Science: Introduction to Machine Learning for Data Science Python and Machine Learning Studio by Lee Stott*](https://github.com/leestott/intro-Datascience/blob/master/Course%20Materials/4-Cleaning_and_Manipulating-Reference.ipynb)\r\n",
- "\r\n",
- "## Exploring `DataFrame` information\r\n",
- "\r\n",
- "> **Learning goal:** By the end of this subsection, you should be comfortable finding general information about the data stored in pandas DataFrames.\r\n",
- "\r\n",
- "Once you have loaded your data into pandas, it will more likely than not be in a `DataFrame`. However, if the data set in your `DataFrame` has 60,000 rows and 400 columns, how do you even begin to get a sense of what you're working with? Fortunately, pandas provides some convenient tools to quickly look at overall information about a `DataFrame` in addition to the first few and last few rows.\r\n",
- "\r\n",
- "In order to explore this functionality, we will import the Python scikit-learn library and use an iconic dataset that every data scientist has seen hundreds of times: British biologist Ronald Fisher's *Iris* data set used in his 1936 paper \"The use of multiple measurements in taxonomic problems\":"
+ "# Data Preparation\n",
+ "\n",
+ "[Original Notebook source from *Data Science: Introduction to Machine Learning for Data Science Python and Machine Learning Studio by Lee Stott*](https://github.com/leestott/intro-Datascience/blob/master/Course%20Materials/4-Cleaning_and_Manipulating-Reference.ipynb)\n",
+ "\n",
+ "## Exploring `DataFrame` information\n",
+ "\n",
+ "> **Learning goal:** By the end of this subsection, you should be comfortable finding general information about the data stored in pandas DataFrames.\n",
+ "\n",
+ "Once you have loaded your data into pandas, it will more likely than not be in a `DataFrame`. \n",
+ "\n",
+ "In order to explore our `DataFramme`, we will import the Python `scikit-learn` library and use an iconic dataset that every data scientist has seen hundreds of times: British biologist Ronald Fisher's **Iris data set** used in his 1936 paper \"*The use of multiple measurements in taxonomic problems*\":"
],
"metadata": {}
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 1,
"source": [
- "import pandas as pd\r\n",
- "from sklearn.datasets import load_iris\r\n",
- "\r\n",
- "iris = load_iris()\r\n",
+ "import pandas as pd\n",
+ "from sklearn.datasets import load_iris\n",
+ "\n",
+ "iris = load_iris()\n",
"iris_df = pd.DataFrame(data=iris['data'], columns=iris['feature_names'])"
],
"outputs": [],
@@ -43,11 +43,29 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 2,
"source": [
"iris_df.info()"
],
- "outputs": [],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\n",
+ "RangeIndex: 150 entries, 0 to 149\n",
+ "Data columns (total 4 columns):\n",
+ " # Column Non-Null Count Dtype \n",
+ "--- ------ -------------- ----- \n",
+ " 0 sepal length (cm) 150 non-null float64\n",
+ " 1 sepal width (cm) 150 non-null float64\n",
+ " 2 petal length (cm) 150 non-null float64\n",
+ " 3 petal width (cm) 150 non-null float64\n",
+ "dtypes: float64(4)\n",
+ "memory usage: 4.8 KB\n"
+ ]
+ }
+ ],
"metadata": {
"trusted": false
}
@@ -69,11 +87,92 @@
},
{
"cell_type": "code",
- "execution_count": null,
+ "execution_count": 3,
"source": [
"iris_df.head()"
],
- "outputs": [],
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
"
+ ],
+ "text/plain": [
+ " 2\n",
+ "0 7\n",
+ "1 8\n",
+ "2 9"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 24
+ }
+ ]
},
{
"cell_type": "markdown",
@@ -1197,21 +1394,104 @@
"source": [
"Notice that this can drop a lot of data that you might want to keep, particularly in smaller datasets. What if you just want to drop rows or columns that contain several or even just all null values? You specify those setting in `dropna` with the `how` and `thresh` parameters.\n",
"\n",
- "By default, `how='any'` (if you would like to check for yourself or see what other parameters the method has, run `example4.dropna?` in a code cell). You could alternatively specify `how='all'` so as to drop only rows or columns that contain all null values. Let's expand our example `DataFrame` to see this in action."
+ "By default, `how='any'` (if you would like to check for yourself or see what other parameters the method has, run `example4.dropna?` in a code cell). You could alternatively specify `how='all'` so as to drop only rows or columns that contain all null values. Let's expand our example `DataFrame` to see this in action in the next exercise."
]
},
{
"cell_type": "code",
"metadata": {
"trusted": false,
- "id": "Bcf_JWTsgRsF"
+ "id": "Bcf_JWTsgRsF",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 142
+ },
+ "outputId": "07e8f4eb-18c8-4e5d-9317-6a9a3db38b73"
},
"source": [
"example4[3] = np.nan\n",
"example4"
],
- "execution_count": null,
- "outputs": []
+ "execution_count": 25,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
0
\n",
+ "
1
\n",
+ "
2
\n",
+ "
3
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
1.0
\n",
+ "
NaN
\n",
+ "
7
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
2.0
\n",
+ "
5.0
\n",
+ "
8
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
NaN
\n",
+ "
6.0
\n",
+ "
9
\n",
+ "
NaN
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2 3\n",
+ "0 1.0 NaN 7 NaN\n",
+ "1 2.0 5.0 8 NaN\n",
+ "2 NaN 6.0 9 NaN"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 25
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "pNZer7q9JPNC"
+ },
+ "source": [
+ "> Key takeaways: \n",
+ "1. Dropping null values is a good idea only if the dataset is large enough.\n",
+ "2. Full rows or columns can be dropped if they have most of their data missing.\n",
+ "3. The `DataFrame.dropna(axis=)` method helps in dropping null values. The `axis` argument signifies whether rows are to be dropped or columns. \n",
+ "4. The `how` argument can also be used. By default it is set to `any`. So, it drops only those rows/columns which contain any null values. It can be set to `all` to specify that we will drop only those rows/columns where all values are null."
+ ]
},
{
"cell_type": "markdown",
@@ -1233,7 +1513,7 @@
"# How might you go about dropping just column 3?\n",
"# Hint: remember that you will need to supply both the axis parameter and the how parameter.\n"
],
- "execution_count": null,
+ "execution_count": 26,
"outputs": []
},
{
@@ -1249,13 +1529,67 @@
"cell_type": "code",
"metadata": {
"trusted": false,
- "id": "M9dCNMaagRsG"
+ "id": "M9dCNMaagRsG",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 80
+ },
+ "outputId": "b2c00415-95a6-4a5c-e3f9-781ff5cc8625"
},
"source": [
"example4.dropna(axis='rows', thresh=3)"
],
- "execution_count": null,
- "outputs": []
+ "execution_count": 27,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
0
\n",
+ "
1
\n",
+ "
2
\n",
+ "
3
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
1
\n",
+ "
2.0
\n",
+ "
5.0
\n",
+ "
8
\n",
+ "
NaN
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2 3\n",
+ "1 2.0 5.0 8 NaN"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 27
+ }
+ ]
},
{
"cell_type": "markdown",
@@ -1274,7 +1608,10 @@
"source": [
"### Filling null values\n",
"\n",
- "Depending on your dataset, it can sometimes make more sense to fill null values with valid ones rather than drop them. You could use `isnull` to do this in place, but that can be laborious, particularly if you have a lot of values to fill. Because this is such a common task in data science, pandas provides `fillna`, which returns a copy of the `Series` or `DataFrame` with the missing values replaced with one of your choosing. Let's create another example `Series` to see how this works in practice."
+ "It sometimes makes sense to fill in missing values with ones which could be valid. There are a few techniques to fill null values. The first is using Domain Knowledge(knowledge of the subject on which the dataset is based) to somehow approximate the missing values. \n",
+ "\n",
+ "\n",
+ "You could use `isnull` to do this in place, but that can be laborious, particularly if you have a lot of values to fill. Because this is such a common task in data science, pandas provides `fillna`, which returns a copy of the `Series` or `DataFrame` with the missing values replaced with one of your choosing. Let's create another example `Series` to see how this works in practice."
]
},
{
From 87ef4f0875c32b8eaaabcf5a904e3cedb10825c1 Mon Sep 17 00:00:00 2001
From: Nirmalya Misra <39618712+nirmalya8@users.noreply.github.com>
Date: Tue, 5 Oct 2021 12:28:48 +0530
Subject: [PATCH 024/319] fillna for Categorical columns added
---
.../08-data-preparation/notebook.ipynb | 294 ++++++++++++++++++
1 file changed, 294 insertions(+)
diff --git a/2-Working-With-Data/08-data-preparation/notebook.ipynb b/2-Working-With-Data/08-data-preparation/notebook.ipynb
index ac9bab82..3e8ae01e 100644
--- a/2-Working-With-Data/08-data-preparation/notebook.ipynb
+++ b/2-Working-With-Data/08-data-preparation/notebook.ipynb
@@ -1614,6 +1614,300 @@
"You could use `isnull` to do this in place, but that can be laborious, particularly if you have a lot of values to fill. Because this is such a common task in data science, pandas provides `fillna`, which returns a copy of the `Series` or `DataFrame` with the missing values replaced with one of your choosing. Let's create another example `Series` to see how this works in practice."
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "CE8S7louLezV"
+ },
+ "source": [
+ "First let us consider non-numeric data. In datasets, we have columns with categorical data. Eg. Gender, True or False etc.\n",
+ "\n",
+ "In most of these cases, we replace missing values with the `mode` of the column. Say, we have 100 data points and 90 have said True, 8 have said False and 2 have not filled. Then, we can will the 2 with True, considering the full column. \n",
+ "\n",
+ "Again, here we can use domain knowledge here. Let us consider an example of filling with the mode."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "MY5faq4yLdpQ",
+ "outputId": "c3838b07-0d15-471e-8dad-370de91d4bdc",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 204
+ }
+ },
+ "source": [
+ "fill_with_mode = pd.DataFrame([[1,2,\"True\"],\n",
+ " [3,4,None],\n",
+ " [5,6,\"False\"],\n",
+ " [7,8,\"True\"],\n",
+ " [9,10,\"True\"]])\n",
+ "\n",
+ "fill_with_mode"
+ ],
+ "execution_count": 28,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2\n",
+ "0 1 2 True\n",
+ "1 3 4 True\n",
+ "2 5 6 False\n",
+ "3 7 8 True\n",
+ "4 9 10 True"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 31
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SktitLxxOR16"
+ },
+ "source": [
+ "As we can see, the null value has been replaced. Needless to say, we could have written anything in place or `'True'` and it would have got substituted."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "heYe1I0dOmQ_"
+ },
+ "source": [
+ "Now, coming to numeric data. Here, we have a two common ways of replacing missing values:\n",
+ "\n",
+ "1. Replace with Median of the row\n",
+ "2. Replace with Mean of the row \n",
+ "\n",
+ "We replace with Median, in case of skewed data with outliers. This is beacuse median is robust to outliers.\n",
+ "\n",
+ "When the data is normalized, we can use mean, as in that case, mean and median would be pretty close."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "09HM_2feOj5Y"
+ },
+ "source": [
+ ""
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
{
"cell_type": "code",
"metadata": {
From f5bef705d7480b57f61f4f8e63db49e9cd2e276c Mon Sep 17 00:00:00 2001
From: Nirmalya Misra <39618712+nirmalya8@users.noreply.github.com>
Date: Tue, 5 Oct 2021 12:43:18 +0530
Subject: [PATCH 025/319] fillna with mean added
---
.../08-data-preparation/notebook.ipynb | 261 +++++++++++++++++-
1 file changed, 247 insertions(+), 14 deletions(-)
diff --git a/2-Working-With-Data/08-data-preparation/notebook.ipynb b/2-Working-With-Data/08-data-preparation/notebook.ipynb
index 3e8ae01e..b5a6bac0 100644
--- a/2-Working-With-Data/08-data-preparation/notebook.ipynb
+++ b/2-Working-With-Data/08-data-preparation/notebook.ipynb
@@ -1630,12 +1630,12 @@
{
"cell_type": "code",
"metadata": {
- "id": "MY5faq4yLdpQ",
- "outputId": "c3838b07-0d15-471e-8dad-370de91d4bdc",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
- }
+ },
+ "id": "MY5faq4yLdpQ",
+ "outputId": "c3838b07-0d15-471e-8dad-370de91d4bdc"
},
"source": [
"fill_with_mode = pd.DataFrame([[1,2,\"True\"],\n",
@@ -1736,11 +1736,11 @@
{
"cell_type": "code",
"metadata": {
- "id": "WKy-9Y2tN5jv",
- "outputId": "41f5064e-502d-4aec-dc2d-86f885068b4f",
"colab": {
"base_uri": "https://localhost:8080/"
- }
+ },
+ "id": "WKy-9Y2tN5jv",
+ "outputId": "41f5064e-502d-4aec-dc2d-86f885068b4f"
},
"source": [
"fill_with_mode[2].value_counts()"
@@ -1784,12 +1784,12 @@
{
"cell_type": "code",
"metadata": {
- "id": "tvas7c9_OPWE",
- "outputId": "7282c4f7-0e59-4398-b4f2-5919baf61164",
"colab": {
"base_uri": "https://localhost:8080/",
"height": 204
- }
+ },
+ "id": "tvas7c9_OPWE",
+ "outputId": "7282c4f7-0e59-4398-b4f2-5919baf61164"
},
"source": [
"fill_with_mode"
@@ -1894,19 +1894,252 @@
"\n",
"We replace with Median, in case of skewed data with outliers. This is beacuse median is robust to outliers.\n",
"\n",
- "When the data is normalized, we can use mean, as in that case, mean and median would be pretty close."
+ "When the data is normalized, we can use mean, as in that case, mean and median would be pretty close.\n",
+ "\n",
+ "First, let us take a column which is normally distributed and let us fill the missing value with the mean of the column. "
]
},
{
"cell_type": "code",
"metadata": {
- "id": "09HM_2feOj5Y"
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 204
+ },
+ "id": "09HM_2feOj5Y",
+ "outputId": "ade42fec-dc40-45d0-e22c-974849ea8664"
},
"source": [
- ""
+ "fill_with_mean = pd.DataFrame([[-2,0,1],\n",
+ " [-1,2,3],\n",
+ " [np.nan,4,5],\n",
+ " [1,6,7],\n",
+ " [2,8,9]])\n",
+ "\n",
+ "fill_with_mean"
],
- "execution_count": null,
- "outputs": []
+ "execution_count": 33,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2 3\n",
+ "0 1.0 1.0 7.0 7.0\n",
+ "1 2.0 5.0 8.0 8.0\n",
+ "2 NaN 6.0 9.0 9.0"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 48
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ZeMc-I1EgRsI"
+ },
+ "source": [
+ "Notice that when a previous value is not available for forward-filling, the null value remains."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "eeAoOU0RgRsJ"
+ },
+ "source": [
+ "### Exercise:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "collapsed": true,
+ "trusted": false,
+ "id": "e8S-CjW8gRsJ"
+ },
+ "source": [
+ "# What output does example4.fillna(method='bfill', axis=1) produce?\n",
+ "# What about example4.fillna(method='ffill') or example4.fillna(method='bfill')?\n",
+ "# Can you think of a longer code snippet to write that can fill all of the null values in example4?\n"
+ ],
+ "execution_count": 49,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YHgy0lIrgRsJ"
+ },
+ "source": [
+ "You can be creative about how you use `fillna`. For example, let's look at `example4` again, but this time let's fill the missing values with the average of all of the values in the `DataFrame`:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "trusted": false,
+ "id": "OtYVErEygRsJ",
+ "outputId": "ad5f4520-cf88-4e3e-fa16-54bda5efa417",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 142
+ }
+ },
+ "source": [
+ "example4.fillna(example4.mean())"
+ ],
+ "execution_count": 50,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
0
\n",
+ "
1
\n",
+ "
2
\n",
+ "
3
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
1.0
\n",
+ "
5.5
\n",
+ "
7
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
2.0
\n",
+ "
5.0
\n",
+ "
8
\n",
+ "
NaN
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
1.5
\n",
+ "
6.0
\n",
+ "
9
\n",
+ "
NaN
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2 3\n",
+ "0 1.0 5.5 7 NaN\n",
+ "1 2.0 5.0 8 NaN\n",
+ "2 1.5 6.0 9 NaN"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 50
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zpMvCkLSgRsJ"
+ },
+ "source": [
+ "Notice that column 3 is still valueless: the default direction is to fill values row-wise.\n",
+ "\n",
+ "> **Takeaway:** There are multiple ways to deal with missing values in your datasets. The specific strategy you use (removing them, replacing them, or even how you replace them) should be dictated by the particulars of that data. You will develop a better sense of how to deal with missing values the more you handle and interact with datasets."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "bauDnESIl9FH"
+ },
+ "source": [
+ "### Encoding Categorical Data\n",
+ "\n",
+ "Machine learning models only deal with numbers and any form of numeric data. It won't be able to tell the difference between a Yes and a No, but it would be able to distinguish between 0 and 1. So, after filling in the missing values, we need to do encode the categorical data to some numeric form for the model to understand.\n",
+ "\n",
+ "Encoding can be done in two ways. We will be discussing them next.\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uDq9SxB7mu5i"
+ },
+ "source": [
+ "**LABEL ENCODING**\n",
+ "\n",
+ "\n",
+ "Label encoding is basically converting each category to a number. For example, say we have a dataset of airline passengers and there is a column containing their class among the following ['business class', 'economy class','first class']. If Label encoding is done on this, this would be transformed to [0,1,2]. Let us see an example via code. As we would be learning `scikit-learn` in the upcoming notebooks, we won't use it here."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "1vGz7uZyoWHL",
+ "outputId": "5003c8cd-ff07-4399-a5b2-621b45184511",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 235
+ }
+ },
+ "source": [
+ "label = pd.DataFrame([\n",
+ " [10,'business class'],\n",
+ " [20,'first class'],\n",
+ " [30, 'economy class'],\n",
+ " [40, 'economy class'],\n",
+ " [50, 'economy class'],\n",
+ " [60, 'business class']\n",
+ "],columns=['ID','class'])\n",
+ "label"
+ ],
+ "execution_count": 70,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
ID
\n",
+ "
class
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
10
\n",
+ "
business class
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
20
\n",
+ "
first class
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
30
\n",
+ "
economy class
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
40
\n",
+ "
economy class
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
50
\n",
+ "
economy class
\n",
+ "
\n",
+ "
\n",
+ "
5
\n",
+ "
60
\n",
+ "
business class
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " ID class\n",
+ "0 10 business class\n",
+ "1 20 first class\n",
+ "2 30 economy class\n",
+ "3 40 economy class\n",
+ "4 50 economy class\n",
+ "5 60 business class"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 70
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IDHnkwTYov-h"
},
"source": [
- "Notice that when a previous value is not available for forward-filling, the null value remains."
+ "To perform label encoding on the 1st column, we have to first describe a mapping from each class to a number, before replacing"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ZC5URJG3o1ES",
+ "outputId": "c75465b2-169e-417c-8769-680aaf1cd268",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 235
+ }
+ },
+ "source": [
+ "class_labels = {'business class':0,'economy class':1,'first class':2}\n",
+ "label['class'] = label['class'].replace(class_labels)\n",
+ "label"
+ ],
+ "execution_count": 71,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
\n",
+ "
ID
\n",
+ "
class
\n",
+ "
\n",
+ " \n",
+ " \n",
+ "
\n",
+ "
0
\n",
+ "
10
\n",
+ "
0
\n",
+ "
\n",
+ "
\n",
+ "
1
\n",
+ "
20
\n",
+ "
2
\n",
+ "
\n",
+ "
\n",
+ "
2
\n",
+ "
30
\n",
+ "
1
\n",
+ "
\n",
+ "
\n",
+ "
3
\n",
+ "
40
\n",
+ "
1
\n",
+ "
\n",
+ "
\n",
+ "
4
\n",
+ "
50
\n",
+ "
1
\n",
+ "
\n",
+ "
\n",
+ "
5
\n",
+ "
60
\n",
+ "
0
\n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " ID class\n",
+ "0 10 0\n",
+ "1 20 2\n",
+ "2 30 1\n",
+ "3 40 1\n",
+ "4 50 1\n",
+ "5 60 0"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 71
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ftnF-TyapOPt"
+ },
+ "source": [
+ "As we can see, the output matches what we thought would happen. So, when do we use label encoding? Label encoding is used in either or both of the following cases :\n",
+ "1. When the number of categories is large\n",
+ "2. When the categories are in order. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "eQPAPVwsqWT7"
+ },
+ "source": [
+ "**ONE HOT ENCODING**\n",
+ "\n",
+ "Another type of encoding is One Hot Encoding. In this type of encoding, each category of the column gets added as a separate column and each datapoint will get a 0 or a 1 based on whether it contains that category. So, if there are n different categories, n columns will be appended to the dataframe.\n",
+ "\n",
+ "For example, let us take the same aeroplane class example. The categories were: ['business class', 'economy class','first class'] . So, if we perform one hot encoding, the following three columns will be added to the dataset: ['class_business class','class_economy class','class_first class']."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ZM0eVh0ArKUL",
+ "outputId": "cba4258f-a6c3-45e0-dd69-32b73b2cd735",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 235
+ }
+ },
+ "source": [
+ "one_hot = pd.DataFrame([\n",
+ " [10,'business class'],\n",
+ " [20,'first class'],\n",
+ " [30, 'economy class'],\n",
+ " [40, 'economy class'],\n",
+ " [50, 'economy class'],\n",
+ " [60, 'business class']\n",
+ "],columns=['ID','class'])\n",
+ "one_hot"
+ ],
+ "execution_count": 67,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
"
+ ],
+ "text/plain": [
+ " ID class_business class class_economy class class_first class\n",
+ "0 10 1 0 0\n",
+ "1 20 0 0 1\n",
+ "2 30 0 1 0\n",
+ "3 40 0 1 0\n",
+ "4 50 0 1 0\n",
+ "5 60 1 0 0"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 69
+ }
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "eeAoOU0RgRsJ"
+ "id": "_zXRLOjXujdA"
},
"source": [
- "### Exercise:"
+ "Each one hot encoded column contains 0 or 1, which specifies whether that category exists for that datapoint."
]
},
- {
- "cell_type": "code",
- "metadata": {
- "collapsed": true,
- "trusted": false,
- "id": "e8S-CjW8gRsJ"
- },
- "source": [
- "# What output does example4.fillna(method='bfill', axis=1) produce?\n",
- "# What about example4.fillna(method='ffill') or example4.fillna(method='bfill')?\n",
- "# Can you think of a longer code snippet to write that can fill all of the null values in example4?\n"
- ],
- "execution_count": null,
- "outputs": []
- },
{
"cell_type": "markdown",
"metadata": {
- "id": "YHgy0lIrgRsJ"
+ "id": "bDnC4NQOu0qr"
},
"source": [
- "You can be creative about how you use `fillna`. For example, let's look at `example4` again, but this time let's fill the missing values with the average of all of the values in the `DataFrame`:"
+ "When do we use one hot encoding? One hot encoding is used in either or both of the following cases :\n",
+ "\n",
+ "1. When the number of categories and the size of the dataset is smaller.\n",
+ "2. When the categories follow no particular order."
]
},
- {
- "cell_type": "code",
- "metadata": {
- "trusted": false,
- "id": "OtYVErEygRsJ"
- },
- "source": [
- "example4.fillna(example4.mean())"
- ],
- "execution_count": null,
- "outputs": []
- },
{
"cell_type": "markdown",
"metadata": {
- "id": "zpMvCkLSgRsJ"
+ "id": "XnUmci_4uvyu"
},
"source": [
- "Notice that column 3 is still valueless: the default direction is to fill values row-wise.\n",
- "\n",
- "> **Takeaway:** There are multiple ways to deal with missing values in your datasets. The specific strategy you use (removing them, replacing them, or even how you replace them) should be dictated by the particulars of that data. You will develop a better sense of how to deal with missing values the more you handle and interact with datasets."
+ "> Key Takeaways:\n",
+ "1. Encoding is done to convert non-numeric data to numeric data.\n",
+ "2. There are two types of encoding: Label encoding and One Hot encoding, both of which can be performed based on the demands of the dataset. "
]
},
{
@@ -2366,27 +3428,121 @@
"cell_type": "code",
"metadata": {
"trusted": false,
- "id": "ZLu6FEnZgRsJ"
+ "id": "ZLu6FEnZgRsJ",
+ "outputId": "d62ede23-a8ba-412b-f666-6fc1a43af424",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 204
+ }
},
"source": [
"example6 = pd.DataFrame({'letters': ['A','B'] * 2 + ['B'],\n",
" 'numbers': [1, 2, 1, 3, 3]})\n",
"example6"
],
- "execution_count": null,
- "outputs": []
+ "execution_count": 72,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
+# علم داده برای مبتدیان - برنامه درسی
+
+
+[](https://github.com/microsoft/Data-Science-For-Beginners/blob/master/LICENSE)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/graphs/contributors/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/issues/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/pulls/)
+[](http://makeapullrequest.com)
+
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/watchers/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/network/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/stargazers/)
+
+
From 13262cc93bdde3b9f14f3873bd172e0661c42e3d Mon Sep 17 00:00:00 2001
From: alirezaAsadi2018 <44492608+alirezaAsadi2018@users.noreply.github.com>
Date: Wed, 6 Oct 2021 18:39:05 +0330
Subject: [PATCH 033/319] Translate "Getting Started" and "Introduction"
---
translations/README.fa.md | 23 ++++++++++++++++++++++-
1 file changed, 22 insertions(+), 1 deletion(-)
diff --git a/translations/README.fa.md b/translations/README.fa.md
index d6ad626a..b582b8f4 100644
--- a/translations/README.fa.md
+++ b/translations/README.fa.md
@@ -13,4 +13,25 @@
[](https://GitHub.com/microsoft/Data-Science-For-Beginners/network/)
[](https://GitHub.com/microsoft/Data-Science-For-Beginners/stargazers/)
-
+طرفداران Azure Cloud در مایکروسافت مفتخر هستند که یک برنامه درسی 10 هفته ای و 20 درسی درباره علم داده ارائه دهند. هر درس شامل کوییزهای پیش از درس و پس از درس، دستورالعمل های کتبی برای تکمیل درس، راه حل و تکلیف است. آموزش پروژه محور ما به شما این امکان را می دهد در حین ساختن یاد بگیرید، راهی ثابت شده جهت "ماندگاری" مهارت های جدید.
+
+**تشکر از صمیم قلب از نویسندگانمان:** [Jasmine Greenaway](https://www.twitter.com/paladique), [Dmitry Soshnikov](http://soshnikov.com), [Nitya Narasimhan](https://twitter.com/nitya), [Jalen McGee](https://twitter.com/JalenMcG), [Jen Looper](https://twitter.com/jenlooper), [Maud Levy](https://twitter.com/maudstweets), [Tiffany Souterre](https://twitter.com/TiffanySouterre), [Christopher Harrison](https://www.twitter.com/geektrainer).
+
+**🙏 تشکر ویژه 🙏 از نویسندگان سفیر دانشجویی مایکروسافت، بازبینی کنندگان، و مشارکت کنندگان در محتوا،** به ویژه [Raymond Wangsa Putra](https://www.linkedin.com/in/raymond-wp/), [Ankita Singh](https://www.linkedin.com/in/ankitasingh007), [Rohit Yadav](https://www.linkedin.com/in/rty2423), [Arpita Das](https://www.linkedin.com/in/arpitadas01/), [Mohamma Iftekher (Iftu) Ebne Jalal](https://twitter.com/iftu119), [Dishita Bhasin](https://www.linkedin.com/in/dishita-bhasin-7065281bb), [Miguel Correa](https://www.linkedin.com/in/miguelmque/), [Nawrin Tabassum](https://www.linkedin.com/in/nawrin-tabassum), [Sanya Sinha](https://www.linkedin.com/mwlite/in/sanya-sinha-13aab1200), [Majd Safi](https://www.linkedin.com/in/majd-s/), [Sheena Narula](https://www.linkedin.com/in/sheena-narula-n/), [Anupam Mishra](https://www.linkedin.com/in/anupam--mishra/), [Dibri Nsofor](https://www.linkedin.com/in/dibrinsofor), [Aditya Garg](https://github.com/AdityaGarg00), [Alondra Sanchez](https://www.linkedin.com/in/alondra-sanchez-molina/), Yogendrasingh Pawar, Max Blum, Samridhi Sharma, Tauqeer Ahmad, Aaryan Arora, ChhailBihari Dubey
+
+ | ](../sketchnotes/00-Title.png)|
+|:---:|
+| علم داده برای مبتدیان - طرح از [@nitya](https://twitter.com/nitya)_ |
+
+
+# شروع به کار
+
+> **معلمان**، ما در مورد نحوه استفاده از این برنامه درسی [برخی از پیشنهادات را درج کرده ایم](../for-teachers.md). بسیار خوشحال می شویم که بازخوردهای شما را در [انجمن بحث و گفت و گوی](https://github.com/microsoft/Data-Science-For-Beginners/discussions) خود داشته باشیم!
+
+> **دانش آموزان**، اگر قصد دارید به تنهایی از این برنامه درسی استفاده کنید، کل ریپو را فورک کنید و تمرینات را خودتان به تنهایی انجام دهید. ابتدا با آزمون قبل از درس آغاز کنید، سپس درسنامه را خوانده و باقی فعالیت ها را تکمیل کنید. سعی کنید به جای کپی کردن کد راه حل، خودتان پروژه ها را با درک مفاهیم درسنامه ایجاد کنید. با این حال،کد راه حل در پوشه های /solutions داخل هر درس پروژه محور موجود می باشد. ایده دیگر تشکیل گروه مطالعه با دوستان است تا بتوانید مطالب را با هم مرور کنید، پیشنهاد ما [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/qprpajyoy3x0g7?WT.mc_id=academic-40229-cxa) می باشد.
+
+
+
+
From 8481565c0a0415b676c000bf5362f05c5a76781c Mon Sep 17 00:00:00 2001
From: Dmitri Soshnikov
Date: Wed, 6 Oct 2021 18:46:28 +0300
Subject: [PATCH 034/319] Correct the link to Intro to DS video
---
1-Introduction/01-defining-data-science/README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/1-Introduction/01-defining-data-science/README.md b/1-Introduction/01-defining-data-science/README.md
index aa92a9c9..bedbb1e7 100644
--- a/1-Introduction/01-defining-data-science/README.md
+++ b/1-Introduction/01-defining-data-science/README.md
@@ -6,7 +6,7 @@
---
-[](https://youtu.be/pqqsm5reGvs)
+[](https://youtu.be/beZ7Mb_oz9I)
## [Pre-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/0)
From 9f414d6975a8d8ecca59ae67c4706b7408999deb Mon Sep 17 00:00:00 2001
From: Dmitri Soshnikov
Date: Wed, 6 Oct 2021 18:48:12 +0300
Subject: [PATCH 035/319] Correct link to intro to DS video on home page
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 7b2a9e8e..4843d1f8 100644
--- a/README.md
+++ b/README.md
@@ -64,7 +64,7 @@ In addition, a low-stakes quiz before a class sets the intention of the student
| Lesson Number | Topic | Lesson Grouping | Learning Objectives | Linked Lesson | Author |
| :-----------: | :----------------------------------------: | :--------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------: | :----: |
-| 01 | Defining Data Science | [Introduction](1-Introduction/README.md) | Learn the basic concepts behind data science and how it’s related to artificial intelligence, machine learning, and big data. | [lesson](1-Introduction/01-defining-data-science/README.md) [video](https://youtu.be/pqqsm5reGvs) | [Dmitry](http://soshnikov.com) |
+| 01 | Defining Data Science | [Introduction](1-Introduction/README.md) | Learn the basic concepts behind data science and how it’s related to artificial intelligence, machine learning, and big data. | [lesson](1-Introduction/01-defining-data-science/README.md) [video](https://youtu.be/beZ7Mb_oz9I) | [Dmitry](http://soshnikov.com) |
| 02 | Data Science Ethics | [Introduction](1-Introduction/README.md) | Data Ethics Concepts, Challenges & Frameworks. | [lesson](1-Introduction/02-ethics/README.md) | [Nitya](https://twitter.com/nitya) |
| 03 | Defining Data | [Introduction](1-Introduction/README.md) | How data is classified and its common sources. | [lesson](1-Introduction/03-defining-data/README.md) | [Jasmine](https://www.twitter.com/paladique) |
| 04 | Introduction to Statistics & Probability | [Introduction](1-Introduction/README.md) | The mathematical techniques of probability and statistics to understand data. | [lesson](1-Introduction/04-stats-and-probability/README.md) [video](https://youtu.be/Z5Zy85g4Yjw) | [Dmitry](http://soshnikov.com) |
From 4884470175a2b2156c76065389943d3489050e09 Mon Sep 17 00:00:00 2001
From: alirezaAsadi2018 <44492608+alirezaAsadi2018@users.noreply.github.com>
Date: Wed, 6 Oct 2021 21:46:48 +0330
Subject: [PATCH 036/319] Complete Pedagogy, Each lesson includes, and Lessons
part
---
translations/README.fa.md | 57 ++++++++++++++++++++++++++++++++++++++-
1 file changed, 56 insertions(+), 1 deletion(-)
diff --git a/translations/README.fa.md b/translations/README.fa.md
index b582b8f4..d3da65e2 100644
--- a/translations/README.fa.md
+++ b/translations/README.fa.md
@@ -21,7 +21,7 @@
| ](../sketchnotes/00-Title.png)|
|:---:|
-| علم داده برای مبتدیان - طرح از [@nitya](https://twitter.com/nitya)_ |
+| علم داده برای مبتدیان - یادداشت بصری (sketchnote) از [@nitya](https://twitter.com/nitya)_ |
# شروع به کار
@@ -34,4 +34,59 @@
> 🎥 برای مشاهده ویدیویی در مورد این پروژه و افرادی که آن را ایجاد کرده اند، روی تصویر بالا کلیک کنید!-->
+
+## آموزش
+
+ما هنگام تدوین این برنامه درسی دو اصل آموزشی را انتخاب کرده ایم: اطمینان حاصل کنیم که پروژه محور است و شامل آزمونهای مکرر می باشد. دانش آموزان به محض تکمیل این سری آموزشی، اصول اولیه علم داده، شامل اصول اخلاقی، آماده سازی داده ها، روش های مختلف کار با داده ها، تصویرسازی داده ها، تجزیه و تحلیل داده ها، موارد استفاده از علم داده در دنیای واقعی و بسیاری مورد دیگر را فرا می گیرند.
+
+علاوه بر این، یک کوییز با امتیاز کم قبل از کلاس، مقصود دانش آموز درجهت یادگیری یک موضوع را مشخص می کند، در حالی که کوییز دوم بعد از کلاس ماندگاری بیشتر مطالب را تضمین می کند. این برنامه درسی طوری طراحی شده است که انعطاف پذیر و سرگرم کننده باشد و می تواند به طور کامل یا جزئی مورد استفاده قرار گیرد. پروژه از کوچک شروع می شوند و تا پایان چرخه ۱۰ هفته ای همینطور پیچیده تر می شوند.
+
+> دستورالعمل های ما را درباره [کد رفتار](../CODE_OF_CONDUCT.md), [مشارکت](../CONTRIBUTING.md), [ترجمه](../TRANSLATIONS.md) ببینید. ما از بازخورد سازنده شما استقبال می کنیم!
+
+ ## هر درس شامل:
+
+- یادداشت های بصری (sketchnote) اختیاری
+- فیلم های مکمل اختیاری
+- کوییز های دست گرمی قبل از درس
+- درسنامه مکتوب
+- راهنمای گام به گام نحوه ساخت پروژه برای درس های مبتنی بر پروژه
+- بررسی دانش
+- یک چالش
+- منابع خواندنی مکمل
+- تمرین
+- کوییز پس از درس
+
+> **نکته ای در مورد آزمونها**: همه آزمون ها در [این برنامه](https://red-water-0103e7a0f.azurestaticapps.net/) موجود هستند، برای در مجموع ۴۰ کوییز که هرکدام شامل سه سوال می باشد. کوییزها از داخل درسنامه لینک داده شده اند اما برنامه کوییز را می توان به صورت لوکال اجرا کرد. برای اینکار، دستورالعمل موجود در پوشه `quiz-app` را دنبال کنید. سوالات به تدریج در حال لوکال سازی هستند.
+
+## Lessons
+
+
+| ](../sketchnotes/00-Roadmap.png)|
+|:---:|
+| علم داده برای مبتدیان: نقشه راه - یادداشت بصری از [@nitya](https://twitter.com/nitya)_ |
+
+
+| شماره درس | موضوع | گروه بندی درس | اهداف یادگیری | درس پیوند شده | نویسنده |
+| :-----------: | :----------------------------------------: | :--------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------: | :----: |
+| ۱ | تعریف علم داده | [معرفی](../1-Introduction/README.md) | مفاهیم اساسی علم داده و نحوه ارتباط آن با هوش مصنوعی، یادگیری ماشین و کلان داده را بیاموزید. | [درسنامه](../1-Introduction/01-defining-data-science/README.md) [ویدیو](https://youtu.be/pqqsm5reGvs) | [Dmitry](http://soshnikov.com) |
+| ۲ | اصول اخلاقی علم داده | [معرفی](../1-Introduction/README.md) | مفاهیم اخلاق داده ها، چالش ها و چارچوب ها. | [درسنامه](../1-Introduction/02-ethics/README.md) | [Nitya](https://twitter.com/nitya) |
+| ۳ | تعریف داده | [معرفی](../1-Introduction/README.md) | نحوه دسته بندی داده ها و منابع رایج آن. | [درسنامه](../1-Introduction/03-defining-data/README.md) | [Jasmine](https://www.twitter.com/paladique) |
+| ۴ | مقدمه ای بر آمار و احتمال | [معرفی](../1-Introduction/README.md) | تکنیک های ریاضی آمار و احتمال برای درک داده ها. | [درسنامه](../1-Introduction/04-stats-and-probability/README.md) [ویدیو](https://youtu.be/Z5Zy85g4Yjw) | [Dmitry](http://soshnikov.com) |
+| ۵ | کار با داده های رابطه ای | [کار با داده ها](../2-Working-With-Data/README.md) | مقدمه ای بر داده های رابطه ای و مبانی اکتشاف و تجزیه و تحلیل داده های رابطه ای با زبان پرس و جوی ساختار یافته ، که به SQL نیز معروف است (تلفظ کنید “see-quell”). | [درسنامه](../2-Working-With-Data/05-relational-databases/README.md) | [Christopher](https://www.twitter.com/geektrainer) | | |
+| ۶ | کار با داده های NoSQL | [کار با داده ها](../2-Working-With-Data/README.md) | مقدمه ای بر داده های غیر رابطه ای، انواع مختلف آن و مبانی کاوش و تجزیه و تحلیل پایگاه داده های اسناد(document databases). | [درسنامه](../2-Working-With-Data/06-non-relational/README.md) | [Jasmine](https://twitter.com/paladique)|
+| ۷ | کار با پایتون | [کار با داده ها](../2-Working-With-Data/README.md) | اصول استفاده از پایتون برای کاوش داده با کتابخانه هایی مانند Pandas. توصیه می شود مبانی برنامه نویسی پایتون را بلد باشید. | [درسنامه](../2-Working-With-Data/07-python/README.md) [ویدیو](https://youtu.be/dZjWOGbsN4Y) | [Dmitry](http://soshnikov.com) |
+| ۸ | آماده سازی داده ها | [کار با داده ها](../2-Working-With-Data/README.md) | مباحث مربوط به تکنیک های داده ای برای پاکسازی و تبدیل داده ها به منظور رسیدگی به چالش های داده های مفقود شده، نادرست یا ناقص. | [درسنامه](../2-Working-With-Data/08-data-preparation/README.md) | [Jasmine](https://www.twitter.com/paladique) |
+| ۹ | تصویرسازی مقادیر | [تصویرسازی داده ها](../3-Data-Visualization/README.md) | نحوه استفاده از Matplotlib برای تصویرسازی داده های پرندگان را می آموزید. 🦆 | [درسنامه](../3-Data-Visualization/09-visualization-quantities/README.md) | [Jen](https://twitter.com/jenlooper) |
+| ۱۰ | تصویرسازی توزیع داده ها | [تصویرسازی داده ها](../3-Data-Visualization/README.md) | تصویرسازی مشاهدات و روندها در یک بازه زمانی. | [درسنامه](../3-Data-Visualization/10-visualization-distributions/README.md) | [Jen](https://twitter.com/jenlooper) |
+| ۱۱ | تصویرسازی نسبت ها | [تصویرسازی داده ها](../3-Data-Visualization/README.md) | تصویرسازی درصدهای مجزا و گروهی. | [درسنامه](../3-Data-Visualization/11-visualization-proportions/README.md) | [Jen](https://twitter.com/jenlooper) |
+| ۱۲ | تصویرسازی روابط | [تصویرسازی داده ها](../3-Data-Visualization/README.md) | تصویرسازی ارتباطات و همبستگی بین مجموعه داده ها و متغیرهای آنها. | [درسنامه](../3-Data-Visualization/12-visualization-relationships/README.md) | [Jen](https://twitter.com/jenlooper) |
+| ۱۳ | تصویرسازی های معنی دار | [تصویرسازی داده ها](../3-Data-Visualization/README.md) | تکنیک ها و راهنمایی هایی برای تبدیل تصویرسازی های شما به خروجی های ارزشمندی جهت حل موثرتر مشکلات و بینش ها. | [درسنامه](../3-Data-Visualization/13-meaningful-visualizations/README.md) | [Jen](https://twitter.com/jenlooper) |
+| ۱۴ | مقدمه ای بر چرخه حیات علم داده | [چرخه حیات](../4-Data-Science-Lifecycle/README.md) | مقدمه ای بر چرخه حیات علم داده و اولین گام آن برای دستیابی به داده ها و استخراج آن ها. | [درسنامه](../4-Data-Science-Lifecycle/14-Introduction/README.md) | [Jasmine](https://twitter.com/paladique) |
+| ۱۵ | تجزیه و تحلیل | [چرخه حیات](../4-Data-Science-Lifecycle/README.md) | این مرحله از چرخه حیات علم داده بر تکنیک های تجزیه و تحلیل داده ها متمرکز است. | [درسنامه](../4-Data-Science-Lifecycle/15-Analyzing/README.md) | [Jasmine](https://twitter.com/paladique) | | |
+| ۱۶ | ارتباطات | [چرخه حیات](../4-Data-Science-Lifecycle/README.md) | این مرحله از چرخه حیات علم داده بر روی ارائه بینش از داده ها به نحوی که درک آنها را برای تصمیم گیرندگان آسان تر بکند، متمرکز است. | [درسنامه](../4-Data-Science-Lifecycle/16-Communication/README.md) | [Jalen](https://twitter.com/JalenMcG) | | |
+| ۱۷ | علم داده در فضای ابری | [داده های ابری](../5-Data-Science-In-Cloud/README.md) | این سری از درسنامه ها علم داده در فضای ابری و مزایای آن را معرفی می کند. | [درسنامه](../5-Data-Science-In-Cloud/17-Introduction/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) و [Maud](https://twitter.com/maudstweets) |
+| ۱۸ | علم داده در فضای ابری | [داده های ابری](../5-Data-Science-In-Cloud/README.md) | آموزش مدل ها با استفاده از ابزارهای کد کمتر(low code). |[درسنامه](../5-Data-Science-In-Cloud/18-Low-Code/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) و [Maud](https://twitter.com/maudstweets) |
+| ۱۹ | علم داده در فضای | [داده های ابری](../5-Data-Science-In-Cloud/README.md) | استقرار(Deploy) مدل ها با استفاده از استودیوی یادگیری ماشین آژور(Azure Machine Learning Studio). | [درسنامه](../5-Data-Science-In-Cloud/19-Azure/README.md)| [Tiffany](https://twitter.com/TiffanySouterre) و [Maud](https://twitter.com/maudstweets) |
+| ۲۰ | علم داده در طبیعت | [در طبیعت](../6-Data-Science-In-Wild/README.md) | پروژه های علم داده در دنیای واقعی. | [درسنامه](../6-Data-Science-In-Wild/20-Real-World-Examples/README.md) | [Nitya](https://twitter.com/nitya) |
+
From a74f0d5aa74d504cda4dc3d3a64dffc9fffa0528 Mon Sep 17 00:00:00 2001
From: alirezaAsadi2018 <44492608+alirezaAsadi2018@users.noreply.github.com>
Date: Wed, 6 Oct 2021 21:48:10 +0330
Subject: [PATCH 037/319] translate header "lessons" that was forgotten
---
translations/README.fa.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/translations/README.fa.md b/translations/README.fa.md
index d3da65e2..f986ade0 100644
--- a/translations/README.fa.md
+++ b/translations/README.fa.md
@@ -58,7 +58,7 @@
> **نکته ای در مورد آزمونها**: همه آزمون ها در [این برنامه](https://red-water-0103e7a0f.azurestaticapps.net/) موجود هستند، برای در مجموع ۴۰ کوییز که هرکدام شامل سه سوال می باشد. کوییزها از داخل درسنامه لینک داده شده اند اما برنامه کوییز را می توان به صورت لوکال اجرا کرد. برای اینکار، دستورالعمل موجود در پوشه `quiz-app` را دنبال کنید. سوالات به تدریج در حال لوکال سازی هستند.
-## Lessons
+## درسنامه
| ](../sketchnotes/00-Roadmap.png)|
From 0ffd49b93fa3dc58a0ce3b8c7be5d36972589523 Mon Sep 17 00:00:00 2001
From: alirezaAsadi2018 <44492608+alirezaAsadi2018@users.noreply.github.com>
Date: Wed, 6 Oct 2021 22:12:26 +0330
Subject: [PATCH 038/319] Finish translating Offline access, PDF, Help Wanted,
and Other Curricula
---
translations/README.fa.md | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
diff --git a/translations/README.fa.md b/translations/README.fa.md
index f986ade0..d8de4c52 100644
--- a/translations/README.fa.md
+++ b/translations/README.fa.md
@@ -28,7 +28,7 @@
> **معلمان**، ما در مورد نحوه استفاده از این برنامه درسی [برخی از پیشنهادات را درج کرده ایم](../for-teachers.md). بسیار خوشحال می شویم که بازخوردهای شما را در [انجمن بحث و گفت و گوی](https://github.com/microsoft/Data-Science-For-Beginners/discussions) خود داشته باشیم!
-> **دانش آموزان**، اگر قصد دارید به تنهایی از این برنامه درسی استفاده کنید، کل ریپو را فورک کنید و تمرینات را خودتان به تنهایی انجام دهید. ابتدا با آزمون قبل از درس آغاز کنید، سپس درسنامه را خوانده و باقی فعالیت ها را تکمیل کنید. سعی کنید به جای کپی کردن کد راه حل، خودتان پروژه ها را با درک مفاهیم درسنامه ایجاد کنید. با این حال،کد راه حل در پوشه های /solutions داخل هر درس پروژه محور موجود می باشد. ایده دیگر تشکیل گروه مطالعه با دوستان است تا بتوانید مطالب را با هم مرور کنید، پیشنهاد ما [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/qprpajyoy3x0g7?WT.mc_id=academic-40229-cxa) می باشد.
+> **دانش آموزان**، اگر قصد دارید به تنهایی از این برنامه درسی استفاده کنید، کل مخزن را فورک کنید و تمرینات را خودتان به تنهایی انجام دهید. ابتدا با آزمون قبل از درس آغاز کنید، سپس درسنامه را خوانده و باقی فعالیت ها را تکمیل کنید. سعی کنید به جای کپی کردن کد راه حل، خودتان پروژه ها را با درک مفاهیم درسنامه ایجاد کنید. با این حال،کد راه حل در پوشه های /solutions داخل هر درس پروژه محور موجود می باشد. ایده دیگر تشکیل گروه مطالعه با دوستان است تا بتوانید مطالب را با هم مرور کنید، پیشنهاد ما [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/qprpajyoy3x0g7?WT.mc_id=academic-40229-cxa) می باشد.
+
+## Pédagogie
+
+Nous avons choisi deux principes pédagogiques lors de la création de ce programme d'études : veiller à ce qu'il soit basé sur des projets et à ce qu'il comprenne des quiz fréquents. À la fin de cette série, les élèves auront appris les principes de base de la data science, notamment les concepts éthiques, la préparation des données, les différentes façons de travailler avec les données, la visualisation des données, l'analyse des données, des cas d'utilisation réels de data science, etc.
+
+De plus, un quiz à faible enjeu à réaliser avant chaque cours permet de préparer l'étudiant à l'apprentissage du sujet, et un second quiz après le cours permet de fixer encore davantage le contenu dans l'esprit des apprenants. Ce curriculum se veut flexible et ammusant et il peut être suivi dans son intégralité ou en partie. Les premiers projets sont modestes et deviennent de plus en plus ardus.
+
+> Qeulques liens utiles : [Code de conduite](CODE_OF_CONDUCT.md), [Comment contribuer](CONTRIBUTING.md), [Traductions](TRANSLATIONS.md). Tout feedback constructif sera le bienvenu !
+
+## Chaque cours comprend :
+
+- Un sketchnote optionnel
+- Une vidéo complémentaire optionnelle
+- Un quiz préalable
+- Un cours écrit
+- Pour les cours basés sur des projets à réaliser : un guide de création du projet
+- Des vérifications de connaissances
+- Un challenge
+- De la lecture complémentaire
+- Un exercice
+- Un quiz de fin
+
+> **Concernant les quiz** : Vous pourrez retrouver tous les quiz [dans cette application](https://red-water-0103e7a0f.azurestaticapps.net/). Il y a 40 quiz, avec trois questions chacun. Vous les retrouverez dans chaque cours correspondant, mais vous pouvez aussi utiliser l'application de quiz en local en suivant les instruction disponibles dans le dossier `quiz-app`. Les quiz sont en cours de localisation.
+
+## Cours
+
+
+| ](./sketchnotes/00-Roadmap.png)|
+|:---:|
+| Data Science For Beginners: Roadmap - _Sketchnote réalisé par [@nitya](https://twitter.com/nitya)_ |
+
+
+| Numéro du cours | Topic | Chapitre | Objectifs d'apprentissage | Liens vers les cours | Auteurs |
+| :-----------: | :----------------------------------------: | :--------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------: | :----: |
+| 01 | Qu'est-ce que la Data Science ? | [Introduction](1-Introduction/README.md) | Apprenez les concepts de base de la data science et le lien entre la data science, l'intelligence artificielle, le machine learning et la big data. | [cours](1-Introduction/01-defining-data-science/README.md) [vidéo](https://youtu.be/pqqsm5reGvs) | [Dmitry](http://soshnikov.com) |
+| 02 | Data Science et éthique | [Introduction](1-Introduction/README.md) | Les concepts d'éthique dans le domaine des données, les challenges et les principes d'encadrement. | [cours](1-Introduction/02-ethics/README.md) | [Nitya](https://twitter.com/nitya) |
+| 03 | Définition de la data | [Introduction](1-Introduction/README.md) | Comment classifier les données et d'où viennent-elles principalement ? | [cours](1-Introduction/03-defining-data/README.md) | [Jasmine](https://www.twitter.com/paladique) |
+| 04 | Introduction aux statistiques et aux probabilités | [Introduction](1-Introduction/README.md) | Techniques mathématiques de probabilités et de statistiques au service de la data. | [cours](1-Introduction/04-stats-and-probability/README.md) [vidéo](https://youtu.be/Z5Zy85g4Yjw) | [Dmitry](http://soshnikov.com) |
+| 05 | Utilisation de données relationnelles | [Exploiter des données](2-Working-With-Data/README.md) | Introduction aux données relationnelles et aux bases d'exploration et d'analyse des données relationnelles avec le Structured Query Language, alias SQL (pronouncé “sicouel”). | [cours](2-Working-With-Data/05-relational-databases/README.md) | [Christopher](https://www.twitter.com/geektrainer) | | |
+| 06 | Utilisation de données NoSQL | [Exploiter des données](2-Working-With-Data/README.md) | Présentation des données non relationelles, les types de données et les fondamentaux de l'exploration et de l'analyse de bases de données documentaires. | [cours](2-Working-With-Data/06-non-relational/README.md) | [Jasmine](https://twitter.com/paladique)|
+| 07 | Utilisation de Python | [Exploiter des données](2-Working-With-Data/README.md) | Les principes de base de Python pour l'exploration de données, et les librairies courantes telles que Pandas. Des connaissances de base de la programmation Python sont recommandées pour ce cours.| [cours](2-Working-With-Data/07-python/README.md) [vidéo](https://youtu.be/dZjWOGbsN4Y) | [Dmitry](http://soshnikov.com) |
+| 08 | Préparation des données | [Working With Data](2-Working-With-Data/README.md) | Techniques de nettoyage et de transformation des données pour gérer des données manquantes, inexactesou incomplètes. | [cours](2-Working-With-Data/08-data-preparation/README.md) | [Jasmine](https://www.twitter.com/paladique) |
+| 09 | Visualisation des quantités | [Data Visualization](3-Data-Visualization/README.md) | Apprendre à utiliser Matplotlib pour visualiser des données sur les oiseaux 🦆 | [cours](3-Data-Visualization/09-visualization-quantities/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 10 | Visualisation de la distribution des données | [Data Visualization](3-Data-Visualization/README.md) | Visualisation d'observations et de tendances dans un intervalle. | [cours](3-Data-Visualization/10-visualization-distributions/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 11 | Visualiser des proportions | [Data Visualization](3-Data-Visualization/README.md) | Visualisation de pourcentages discrets et groupés. | [cours](3-Data-Visualization/11-visualization-proportions/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 12 | Visualisation de relations | [Data Visualization](3-Data-Visualization/README.md) | Visualisation de connections et de corrélations entre différents sets de données et leurs variables. | [cours](3-Data-Visualization/12-visualization-relationships/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 13 | Visualisations significatives | [Data Visualization](3-Data-Visualization/README.md) | Techniques et conseils pour donner de la valeur à vos visualisations, les rendre utiles à la compréhension et à la résolution de problèmes. | [cours](3-Data-Visualization/13-meaningful-visualizations/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 14 | Introduction au cycle de vie de la Data Science | [Cycle de vie](4-Data-Science-Lifecycle/README.md) | Présentation du cycle de la data science et des premières étapes d'acquisition et d'extraction des données. | [cours](4-Data-Science-Lifecycle/14-Introduction/README.md) | [Jasmine](https://twitter.com/paladique) |
+| 15 | Analyse | [Cycle de vie](4-Data-Science-Lifecycle/README.md) | Cette étape du cycle de vie de la data science se concentre sur les techniques d'analysation des données. | [cours](4-Data-Science-Lifecycle/15-Analyzing/README.md) | [Jasmine](https://twitter.com/paladique) | | |
+| 16 | Communication | [Cycle de vie](4-Data-Science-Lifecycle/README.md) | Cette étape du cycle de vie de la data science se concentre sur la présentation des informations tirées des données de manière à faciliter la compréhension d'une situation par des décisionnaires. | [cours](4-Data-Science-Lifecycle/16-Communication/README.md) | [Jalen](https://twitter.com/JalenMcG) | | |
+| 17 | La Data Science dans le Cloud | [Cloud Data](5-Data-Science-In-Cloud/README.md) | Ce cours présente le Cloud et l'intérêt du Cloud pour la Data Science. | [cours](5-Data-Science-In-Cloud/17-Introduction/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) et [Maud](https://twitter.com/maudstweets) |
+| 18 | La Data Science dans le Cloud | [Cloud Data](5-Data-Science-In-Cloud/README.md) | Entraîner un modèle avec des outils de low code. |[cours](5-Data-Science-In-Cloud/18-Low-Code/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) et [Maud](https://twitter.com/maudstweets) |
+| 19 | La Data Science dans le Cloud | [Cloud Data](5-Data-Science-In-Cloud/README.md) | Déployer des modèles avec Azure Machine Learning Studio. | [cours](5-Data-Science-In-Cloud/19-Azure/README.md)| [Tiffany](https://twitter.com/TiffanySouterre) et [Maud](https://twitter.com/maudstweets) |
+| 20 | La Data Science dans la nature | [In the Wild](6-Data-Science-In-Wild/README.md) | Des projets concrets de data science sur le terrain. | [cours](6-Data-Science-In-Wild/20-Real-World-Examples/README.md) | [Nitya](https://twitter.com/nitya) |
+## Accès hors ligne
+
+Vous pouvez retrouver cette documentation hors ligne à l'aide de [Docsify](https://docsify.js.org/#/). Forkez ce repository, [installez Docsify](https://docsify.js.org/#/quickstart) sur votre machine locale, et tapez `docsify serve` dans le dossier racine de ce repository. Vous retrouverez le site web sur le port 3000 de votre localhost : `localhost:3000`.
+
+> Remarque : vous ne pourrez pas utiliser de notebook avec Docsify. Si vous vouhaitez utilisr un notebook, vous pouvez le faire séparémmnt à l'aide d'un kernel Python dans VS Code.
+## PDF
+
+Vous trouverez un PDF contenant tous les cours du curriculum [ici](https://microsoft.github.io/Data-Science-For-Beginners/pdf/readme.pdf).
+
+## Appel à contribution
+
+Si vous souhaitez traduire le curriculum entier ou en partie, veuillez suivre notre guide de [traduction](TRANSLATIONS.md).
+
+## Autres Curricula
+
+Notre équipe a créé d'autres cours ! Ne manquez pas :
+
+- [Le Machine Learning pour les débutants](https://aka.ms/ml-beginners)
+- [L'IoT pour les débutants](https://aka.ms/iot-beginners)
+- [Le développement Web pour les débutants](https://aka.ms/webdev-beginners)
From 633fbe6576ba18e309863724c07c076147446a7f Mon Sep 17 00:00:00 2001
From: Maud
Date: Thu, 7 Oct 2021 10:47:45 +0200
Subject: [PATCH 045/319] Edit image path
---
translations/README.fr.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/translations/README.fr.md b/translations/README.fr.md
index c6ec2c0b..59116f49 100644
--- a/translations/README.fr.md
+++ b/translations/README.fr.md
@@ -16,7 +16,7 @@ L'équipe Azure Cloud Advocates de Microsoft a le plaisir de vous offrir un curr
**🙏 Nous remercions également particulièrement 🙏 les auteurs, correcteurs et contributeurs membres du programme Microsoft Learn Student Ambassadors**, notamment [Raymond Wangsa Putra](https://www.linkedin.com/in/raymond-wp/), [Ankita Singh](https://www.linkedin.com/in/ankitasingh007), [Rohit Yadav](https://www.linkedin.com/in/rty2423), [Arpita Das](https://www.linkedin.com/in/arpitadas01/), [Mohamma Iftekher (Iftu) Ebne Jalal](https://twitter.com/iftu119), [Dishita Bhasin](https://www.linkedin.com/in/dishita-bhasin-7065281bb), [Miguel Correa](https://www.linkedin.com/in/miguelmque/), [Nawrin Tabassum](https://www.linkedin.com/in/nawrin-tabassum), [Sanya Sinha](https://www.linkedin.com/mwlite/in/sanya-sinha-13aab1200), [Majd Safi](https://www.linkedin.com/in/majd-s/), [Sheena Narula](https://www.linkedin.com/in/sheena-narula-n/), [Anupam Mishra](https://www.linkedin.com/in/anupam--mishra/), [Dibri Nsofor](https://www.linkedin.com/in/dibrinsofor), [Aditya Garg](https://github.com/AdityaGarg00), [Alondra Sanchez](https://www.linkedin.com/in/alondra-sanchez-molina/), Yogendrasingh Pawar, Max Blum, Samridhi Sharma, Tauqeer Ahmad, Aaryan Arora, ChhailBihari Dubey
-| ](./sketchnotes/00-Title.png)|
+| ](../sketchnotes/00-Title.png)|
|:---:|
| Data Science For Beginners - _Sketchnote réalisé par [@nitya](https://twitter.com/nitya)_ |
@@ -57,7 +57,7 @@ De plus, un quiz à faible enjeu à réaliser avant chaque cours permet de prép
## Cours
-| ](./sketchnotes/00-Roadmap.png)|
+| ](../sketchnotes/00-Roadmap.png)|
|:---:|
| Data Science For Beginners: Roadmap - _Sketchnote réalisé par [@nitya](https://twitter.com/nitya)_ |
From 20727d12c2389a45dda3c3e0e7687672620c3070 Mon Sep 17 00:00:00 2001
From: Mikhail Sadiakhmatov
Date: Thu, 7 Oct 2021 14:36:09 +0300
Subject: [PATCH 046/319] main readme translated
---
translations/README.ru.md | 110 ++++++++++++++++++++++++++++++++++++++
1 file changed, 110 insertions(+)
create mode 100644 translations/README.ru.md
diff --git a/translations/README.ru.md b/translations/README.ru.md
new file mode 100644
index 00000000..ddaf5af8
--- /dev/null
+++ b/translations/README.ru.md
@@ -0,0 +1,110 @@
+# Наука о данных для начинающих - Учебный план
+
+[](https://github.com/microsoft/Data-Science-For-Beginners/blob/master/LICENSE)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/graphs/contributors/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/issues/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/pulls/)
+[](http://makeapullrequest.com)
+
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/watchers/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/network/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/stargazers/)
+
+Команда Azure Cloud Advocates от компании Microsoft рада представить вам десятинедельный учебный курс по науке о данным, разбитый на 20 уроков. Каждый урок содержит вступительный и проверочный тесты, инструкции для прохождения, решение и домашнее задание. Мы выбрали методику проектно-ориентированного обучения как проверенный способ освоения новых навыков. Она помогает Вам учиться в процессе работы над проектом.
+
+**Выражаем благодарность нашим авторам:** [Jasmine Greenaway](https://www.twitter.com/paladique), [Dmitry Soshnikov](http://soshnikov.com), [Nitya Narasimhan](https://twitter.com/nitya), [Jalen McGee](https://twitter.com/JalenMcG), [Jen Looper](https://twitter.com/jenlooper), [Maud Levy](https://twitter.com/maudstweets), [Tiffany Souterre](https://twitter.com/TiffanySouterre), [Christopher Harrison](https://www.twitter.com/geektrainer).
+
+**🙏 Отдельная благодарность 🙏 нашей команде авторов Microsoft Student Ambassador и редакторам,** в особенности [Raymond Wangsa Putra](https://www.linkedin.com/in/raymond-wp/), [Ankita Singh](https://www.linkedin.com/in/ankitasingh007), [Rohit Yadav](https://www.linkedin.com/in/rty2423), [Arpita Das](https://www.linkedin.com/in/arpitadas01/), [Mohamma Iftekher (Iftu) Ebne Jalal](https://twitter.com/iftu119), [Dishita Bhasin](https://www.linkedin.com/in/dishita-bhasin-7065281bb), [Miguel Correa](https://www.linkedin.com/in/miguelmque/), [Nawrin Tabassum](https://www.linkedin.com/in/nawrin-tabassum), [Sanya Sinha](https://www.linkedin.com/mwlite/in/sanya-sinha-13aab1200), [Majd Safi](https://www.linkedin.com/in/majd-s/), [Sheena Narula](https://www.linkedin.com/in/sheena-narula-n/), [Anupam Mishra](https://www.linkedin.com/in/anupam--mishra/), [Dibri Nsofor](https://www.linkedin.com/in/dibrinsofor), [Aditya Garg](https://github.com/AdityaGarg00), [Alondra Sanchez](https://www.linkedin.com/in/alondra-sanchez-molina/), Yogendrasingh Pawar, Max Blum, Samridhi Sharma, Tauqeer Ahmad, Aaryan Arora, ChhailBihari Dubey
+
+| ](./sketchnotes/00-Title.png)|
+|:---:|
+| Data Science For Beginners - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
+
+
+# Начало работы
+
+> **Дорогие учителя**, мы [добавили наши рекомендации](for-teachers.md) по работе с курсом. Мы будем рады получить ваши отзывы [на нашем форуме](https://github.com/microsoft/Data-Science-For-Beginners/discussions)!
+
+> **Дорогие студенты**, для самостоятельного прохождения курса сделайте форк всего репозитория, выполните задания самостоятельно, начиная со вступительных тестов, а после прочтения лекции, выполните оставшуюся часть урока. Постарайтесь достигнуть понимания при выполнении заданий и избегайте копирования решения, несмотря на то, что решение доступно в папке `/solutions` для каждого мини-проекта. Отличной идеей также является организовать учебную группу со своими друзьями и пройти этот курс вместе. Для дальнейшего обучения мы рекомендуем портал [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/qprpajyoy3x0g7?WT.mc_id=academic-40229-cxa).
+
+
+
+
+## О методике обучения
+
+Мы выбрали два ключевых пункта при разработке данного учебного курса: проектоориентированность и частая проверка знаний. К концу занятий учащиеся изучат основные принципы науки о данных, среди которых этические аспекты работы с данными, подготовку данных, различные способы обработки данных, визуализация данных, анализ данных, примеры практического использования науки о данных и многое другое.
+
+В дополнение к этому, незначительные тесты перед началом урока поможет мотивировать учеников к изучению темы, а заключительный тест проверит усвоение материала. Мы постарались сделать данный курс гибким и нескучным, поэтому вы можете пройти его полностью или только некоторые разделы. По мере прохождения десятинедельного курса, проекты будут становиться всё сложнее.
+
+> Ознакомьтесь с нашими [правилами поведения](CODE_OF_CONDUCT.md), [сотрудничества](CONTRIBUTING.md), [перевода](TRANSLATIONS.md). Мы приветствуем конструктивную критику.
+
+## Каждый урок включает в себя:
+
+- Небольшой скетч (необязательно)
+- Вспомогательное видео (необязательно)
+- Вступительный тест
+- Учебный материал
+- Пошаговую инструкцию для выполнения проекта (для проектно-ориентированных уроков)
+- Проверку знаний
+- Задачу для выполнения
+- Дополнительные материалы
+- Домашнее задание
+- Проверочный тест
+
+> **О тестах**: Все тесты Вы можете найти [в этом приложении](https://red-water-0103e7a0f.azurestaticapps.net/), их всего 40 по три вопроса в каждом. Ссылки на них находятся внутри уроков, однако приложение не может быть запущено локально. Следуйте инструкциям в папке `quiz-app`. Постепенно тесты будут локализованы.
+
+## Содержание уроков
+
+
+| ](./sketchnotes/00-Roadmap.png)|
+|:---:|
+| Data Science For Beginners: Roadmap - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
+
+
+| Номер урока | Тема | Раздел | Цели | Ссылка | Автор |
+| :-----------: | :----------------------------------------: | :--------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------: | :----: |
+| 01 | Что такое наука о данных | [Введение](1-Introduction/README.md) | Изучить основные понятия науки о данных и её связь с искусственным интеллектом, машинным обучением и большими данными. | [урок](1-Introduction/01-defining-data-science/README.md) [видео](https://youtu.be/beZ7Mb_oz9I) | [Dmitry](http://soshnikov.com) |
+| 02 | Этика и наука о данных | [Введение](1-Introduction/README.md) | Этические аспекты в области науки о данных. | [урок](1-Introduction/02-ethics/README.md) | [Nitya](https://twitter.com/nitya) |
+| 03 | Что такое данные | [Введение](1-Introduction/README.md) | Классификация данных и их источники. | [урок](1-Introduction/03-defining-data/README.md) | [Jasmine](https://www.twitter.com/paladique) |
+| 04 | Введение в статистику и теорию вероятности | [Введение](1-Introduction/README.md) | Вероятностные и статистические приёмы для изучения данных.| [урок](1-Introduction/04-stats-and-probability/README.md) [видео](https://youtu.be/Z5Zy85g4Yjw) | [Dmitry](http://soshnikov.com) |
+| 05 | Работа с реляционными данными | [Работа с данными](2-Working-With-Data/README.md) | Введение в реляционные данные, основы изучения и анализа реляционных данных при помощи структурированного языка запросов, также известного как SQL (произносится “си-квел”). | [урок](2-Working-With-Data/05-relational-databases/README.md) | [Christopher](https://www.twitter.com/geektrainer) | | |
+| 06 | Работа с NoSQL данными | [Работа с данными](2-Working-With-Data/README.md) | Введение в нереляционные данные, их разнообразие и основы работы с документоориентированными базами данных. | [урок](2-Working-With-Data/06-non-relational/README.md) | [Jasmine](https://twitter.com/paladique)|
+| 07 | Работа с языком программирования Python | [Работа с данными](2-Working-With-Data/README.md) | Основы использования языка Python при исследовании данных на примере библиотеки Pandas. Рекомендуется предварительно познакомиться с Python. | [урок](2-Working-With-Data/07-python/README.md) [видео](https://youtu.be/dZjWOGbsN4Y) | [Dmitry](http://soshnikov.com) |
+| 08 | Подготовка данных | [Работа с данными](2-Working-With-Data/README.md) | Методы очистки и трансформации данных для работы с пропусками, ошибками и неполными данными. | [урок](2-Working-With-Data/08-data-preparation/README.md) | [Jasmine](https://www.twitter.com/paladique) |
+| 09 | Визуализация количественных данных | [Визуализация данных](3-Data-Visualization/README.md) | Использование библиотеки Matplotlib для визуализации данных о разнообразии птиц 🦆 | [урок](3-Data-Visualization/09-visualization-quantities/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 10 | Визуализация распределения данных | [Визуализация данных](3-Data-Visualization/README.md) | Визуализация наблюдений и трендов на временнóм интервале | [урок](3-Data-Visualization/10-visualization-distributions/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 11 | Визуализация пропорций | [Визуализация данных](3-Data-Visualization/README.md) | Визуализация дискретных и сгруппированных процентных соотношений. | [урок](3-Data-Visualization/11-visualization-proportions/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 12 | Визуализация связей | [Визуализация данных](3-Data-Visualization/README.md) | Визуализация связей и корреляций между наборами данных и их переменными. | [урок](3-Data-Visualization/12-visualization-relationships/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 13 | Выразительная визуализация | [Визуализация данных](3-Data-Visualization/README.md) | Методы и инструкция для построения визуализации для эффективного решения проблем и получения инсайтов. | [урок](3-Data-Visualization/13-meaningful-visualizations/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 14 | Введение в жизненный цикл проекта в области науки о данных | [Жизненный цикл проекта](4-Data-Science-Lifecycle/README.md) | Введение в жизненный цикл проекта в области науки о данных и его первый этап получения и извлечения данных. | [урок](4-Data-Science-Lifecycle/14-Introduction/README.md) | [Jasmine](https://twitter.com/paladique) |
+| 15 | Анализ данных | [Жизненный цикл проекта](4-Data-Science-Lifecycle/README.md) | Данный этап жизненного цикла сосредоточен на методах анализа данных. | [урок](4-Data-Science-Lifecycle/15-Analyzing/README.md) | [Jasmine](https://twitter.com/paladique) | | |
+| 16 | Взаимодействие | [Жизненный цикл проекта](4-Data-Science-Lifecycle/README.md) | Данный этап жизненного цикла сфокусирован на презентацию инсайтов в данных в виде, легком для понимания лицам, принимающим решения. | [урок](4-Data-Science-Lifecycle/16-Communication/README.md) | [Jalen](https://twitter.com/JalenMcG) | | |
+| 17 | Наука о данных в облачной инфраструктуре | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Данная серия уроков знакомит с применением облачных технологии в науке о данных и его преимуществах. | [урок](5-Data-Science-In-Cloud/17-Introduction/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
+| 18 | Наука о данных в облачной инфраструктуре | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Обучение моделей с минимальным использованием программирования. |[урок](5-Data-Science-In-Cloud/18-Low-Code/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
+| 19 | Наука о данных в облачной инфраструктуре | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Развёртывание моделей с использованием Azure Machine Learning Studio. | [урок](5-Data-Science-In-Cloud/19-Azure/README.md)| [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
+| 20 | Наука о данных на практике | [На практике](6-Data-Science-In-Wild/README.md) | Проекты в области науки о данных на практике. | [урок](6-Data-Science-In-Wild/20-Real-World-Examples/README.md) | [Nitya](https://twitter.com/nitya) |
+
+## Оффлайн доступ
+
+Вы можете запустить данную документацию используя [Docsify](https://docsify.js.org/#/). Сделайте форк данного репозитория, [установите Docsify](https://docsify.js.org/#/quickstart) на Вашем компьютере, и затем введите команду `docsify serve` в корневом разделе репозитория. Веб-сайт будет доступен на порте 3000 Вашей локальной машины: `localhost:3000`.
+
+
+> Отмечаем, что Docsify не поддерживает Jupyter-ноутбуки. Для работы с ними используйте VS Code с запуском ядра Python.
+
+## PDF файлы
+
+PDF файлы всех уроков Вы можете найти [здесь](https://microsoft.github.io/Data-Science-For-Beginners/pdf/readme.pdf).
+
+## Ищем помощников!
+
+Если вы хотите поучаствовать в перевода курса, прочтите нашу [инструкцию по переводу](TRANSLATIONS.md).
+
+## Другие учебные курсы
+
+Наша команда разрабатывает и другие курсы. Познакомьтесь с ними:
+
+- [Машинное обучение для начинающих](https://aka.ms/ml-beginners)
+- [Интернет вещей для начинающих](https://aka.ms/iot-beginners)
+- [Веб-разработка для начинающих](https://aka.ms/webdev-beginners)
From 9cfcd091539a60f70810511c628d00fe67051520 Mon Sep 17 00:00:00 2001
From: Mikhail Sadiakhmatov
Date: Thu, 7 Oct 2021 14:56:22 +0300
Subject: [PATCH 047/319] picture bug fixed
---
translations/README.ru.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/translations/README.ru.md b/translations/README.ru.md
index ddaf5af8..a8e91f34 100644
--- a/translations/README.ru.md
+++ b/translations/README.ru.md
@@ -16,7 +16,7 @@
**🙏 Отдельная благодарность 🙏 нашей команде авторов Microsoft Student Ambassador и редакторам,** в особенности [Raymond Wangsa Putra](https://www.linkedin.com/in/raymond-wp/), [Ankita Singh](https://www.linkedin.com/in/ankitasingh007), [Rohit Yadav](https://www.linkedin.com/in/rty2423), [Arpita Das](https://www.linkedin.com/in/arpitadas01/), [Mohamma Iftekher (Iftu) Ebne Jalal](https://twitter.com/iftu119), [Dishita Bhasin](https://www.linkedin.com/in/dishita-bhasin-7065281bb), [Miguel Correa](https://www.linkedin.com/in/miguelmque/), [Nawrin Tabassum](https://www.linkedin.com/in/nawrin-tabassum), [Sanya Sinha](https://www.linkedin.com/mwlite/in/sanya-sinha-13aab1200), [Majd Safi](https://www.linkedin.com/in/majd-s/), [Sheena Narula](https://www.linkedin.com/in/sheena-narula-n/), [Anupam Mishra](https://www.linkedin.com/in/anupam--mishra/), [Dibri Nsofor](https://www.linkedin.com/in/dibrinsofor), [Aditya Garg](https://github.com/AdityaGarg00), [Alondra Sanchez](https://www.linkedin.com/in/alondra-sanchez-molina/), Yogendrasingh Pawar, Max Blum, Samridhi Sharma, Tauqeer Ahmad, Aaryan Arora, ChhailBihari Dubey
-| ](./sketchnotes/00-Title.png)|
+| ](../sketchnotes/00-Title.png)|
|:---:|
| Data Science For Beginners - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
@@ -58,7 +58,7 @@
## Содержание уроков
-| ](./sketchnotes/00-Roadmap.png)|
+| ](../sketchnotes/00-Roadmap.png)|
|:---:|
| Data Science For Beginners: Roadmap - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
From 89f7d6ef1c88444465f13998d91c197473c62149 Mon Sep 17 00:00:00 2001
From: Izael
Date: Thu, 7 Oct 2021 09:15:59 -0300
Subject: [PATCH 048/319] Translated the Base README.md to pt-br
---
translations/README.pt-br.md | 106 +++++++++++++++++++++++++++++++++++
1 file changed, 106 insertions(+)
create mode 100644 translations/README.pt-br.md
diff --git a/translations/README.pt-br.md b/translations/README.pt-br.md
new file mode 100644
index 00000000..06100182
--- /dev/null
+++ b/translations/README.pt-br.md
@@ -0,0 +1,106 @@
+# Ciência de Dados para Iniciantes - Um Currículo
+
+[](https://github.com/microsoft/Data-Science-For-Beginners/blob/master/LICENSE)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/graphs/contributors/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/issues/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/pulls/)
+[](http://makeapullrequest.com)
+
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/watchers/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/network/)
+[](https://GitHub.com/microsoft/Data-Science-For-Beginners/stargazers/)
+
+Consultores da Azure Cloud na Microsoft estão felizes em oferecer um currículo de 10 semanas com 20 aulas sobre Ciência de Dados. Cada aula inclui quizzes pré e pós aula, instruções sobre como completar cada aula, uma solução, e uma tarefa. Nossa pedagogia baseada em projetos permite que você aprenda enquanto constrói, uma maneira comprovada para novas habilidades "grudarem".
+
+**Muito obrigado aos nossos autores:** [Jasmine Greenaway](https://www.twitter.com/paladique), [Dmitry Soshnikov](http://soshnikov.com), [Nitya Narasimhan](https://twitter.com/nitya), [Jalen McGee](https://twitter.com/JalenMcG), [Jen Looper](https://twitter.com/jenlooper), [Maud Levy](https://twitter.com/maudstweets), [Tiffany Souterre](https://twitter.com/TiffanySouterre), [Christopher Harrison](https://www.twitter.com/geektrainer).
+
+**🙏 Agradecimentos especiais 🙏 para nossos autores, revisores e contribuidores de conteúdo Estudantes Embaixadores da Microsoft,** notavelmente [Raymond Wangsa Putra](https://www.linkedin.com/in/raymond-wp/), [Ankita Singh](https://www.linkedin.com/in/ankitasingh007), [Rohit Yadav](https://www.linkedin.com/in/rty2423), [Arpita Das](https://www.linkedin.com/in/arpitadas01/), [Mohamma Iftekher (Iftu) Ebne Jalal](https://twitter.com/iftu119), [Dishita Bhasin](https://www.linkedin.com/in/dishita-bhasin-7065281bb), [Miguel Correa](https://www.linkedin.com/in/miguelmque/), [Nawrin Tabassum](https://www.linkedin.com/in/nawrin-tabassum), [Sanya Sinha](https://www.linkedin.com/mwlite/in/sanya-sinha-13aab1200), [Majd Safi](https://www.linkedin.com/in/majd-s/), [Sheena Narula](https://www.linkedin.com/in/sheena-narula-n/), [Anupam Mishra](https://www.linkedin.com/in/anupam--mishra/), [Dibri Nsofor](https://www.linkedin.com/in/dibrinsofor), [Aditya Garg](https://github.com/AdityaGarg00), [Alondra Sanchez](https://www.linkedin.com/in/alondra-sanchez-molina/), Yogendrasingh Pawar, Max Blum, Samridhi Sharma, Tauqeer Ahmad, Aaryan Arora, ChhailBihari Dubey
+
+| ](./sketchnotes/00-Title.png)|
+|:---:|
+| Ciência de Dados para Iniciantes - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
+
+
+# Primeiros Passos
+
+> **Professores**, nós [incluímos algumas sugestões](for-teachers.md) em como usar esse currículo. Nós adoraríamos ouvir o seu feedback [no nosso fórum de discussão](https://github.com/microsoft/Data-Science-For-Beginners/discussions)!
+
+> **Estudantes**, para usar esse currículo por conta própria, dê fork nesse repositório, complete os exercícios por sua conta, começando com um quiz pré aula, então leia a aula completando o resto das atividades. Tente criar os projetos compreendendo as aulas ao invés de copiar o código da solução; no entanto o código está disponível na pasta /solutions em cada aula baseada em projeto. Outra ideia seria formar um grupo de estudo com seus amigos e ler o conteúdo juntos. Para mais estudos, nós recomendamos [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/qprpajyoy3x0g7?WT.mc_id=academic-40229-cxa).
+
+
+
+## Pedagogia
+
+Nós escolhemos dois princípios pedagógicos enquanto construíamos esse currículo: garantir que seja baseado em projeto e que possua quizzes frequentes. Ao final dessa séries, estudandes terão aprendido o básico dos princípios de ciência de dados, incluindo conceitos éticos, preparação dos dados, maneiras diferentes de trabalhar com os dados, visualização de dados, análise de dados, casos de uso de ciência de dados no mundo real, e mais.
+
+Além do mais, um quiz com valor baixo antes da aula define a intenção do estudante em relação a aprendizagem de um tópico, enquanto um segundo quiz depois da aula garante uma retenção maior. Esse currículo foi desenhado para ser flexível e divertido e pode ser pego inteiro ou em partes. Os projetos começam pequeno e começam a ficar mais complexos no final do ciclo de 10 semanas.
+
+> Encontre nossos guias de [Código de Conduta](CODE_OF_CONDUCT.md), [Contribuindo](CONTRIBUTING.md), [Tradução](TRANSLATIONS.md). Nós agradecemos seu feedback construtivo!
+
+## Cada aula inclui:
+
+- Nota de esboço opcional
+- Vídeo suplementar opcional
+- Quiz de aquecimento pré-aula
+- Aula escrita
+- Para aulas baseadas em projetos, guias passo-a-passo sobre como construir o projeto
+- Verificação de conhecimento
+- Um desafio
+- Leituras suplementares
+- Tarefa
+- Quiz pós-aula
+
+> **Nota sobre os quizzes**: Todos os quizzes estão [aqui](https://red-water-0103e7a0f.azurestaticapps.net/), para 40 quizzes de três questões cada. Os links deles estão dentro de cada aula mas o "quiz-app" pode ser executado localmente; siga as intruções na pasta `quiz-app`. Eles estão gradualmente localizados.
+
+## Tarefas
+
+
+| ](./sketchnotes/00-Roadmap.png)|
+|:---:|
+| Ciência de Dados para Iniciantes: Roadmap - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
+
+
+| Número da Aula | Tópico | Agrupamento de Aulas | Objetivos de Apredizados | Link da Aula | Autor |
+| :-----------: | :----------------------------------------: | :--------------------------------------------------: | :-----------------------------------------------------------------------------------------------------------------------------------------------------------------------: | :---------------------------------------------------------------------: | :----: |
+| 01 | Definindo Ciência de Dados | [Introdução](1-Introduction/README.md) | Aprenda os conceitos básicos por trás de ciência de dados e como se relaciona com inteligência artificial, aprendizado de máquina, e big data. | [aula](1-Introduction/01-defining-data-science/README.md) [vídeo](https://youtu.be/pqqsm5reGvs) | [Dmitry](http://soshnikov.com) |
+| 02 | Ética de Ciência de Dados | [Introdução](1-Introduction/README.md) | Conceitos da Ética de Ciência de Dados, Desafios e Frameworks. | [aula](1-introdução/02-ethics/README.md) | [Nitya](https://twitter.com/nitya) |
+| 03 | Definindo Dados | [Introdução](1-Introduction/README.md) | Como dados são classificados e sua fontes de origem comuns. | [aula](1-introdução/03-defining-data/README.md) | [Jasmine](https://www.twitter.com/paladique) |
+| 04 | Introdução à Probabilidade e Estatística | [Introdução](1-introdução/README.md) | As técnicas matemáticas de probabilidade e estatísca para enteder dados. | [aula](1-introdução/04-stats-and-probability/README.md) [vídeo](https://youtu.be/Z5Zy85g4Yjw) | [Dmitry](http://soshnikov.com) |
+| 05 | Trabalhando com Dados Relacionais | [Trabalhando com Dados](2-Working-With-Data/README.md) | Introdução à dados relacionais e o básico de exploração e análise de dados relacionais com Linguagem de Consulta Estruturada (Structured Query Language), também conhecida como SQL (pronunciada “see-quell”). | [aula](2-Working-With-Data/05-relational-databases/README.md) | [Christopher](https://www.twitter.com/geektrainer) | | |
+| 06 | Trabalhando com Dados NoSQL | [Trabalhando com Dados](2-Working-With-Data/README.md) | Introdução à dados não relacionais, seus variados tipos e o básico de exploração e análise de bancos de dados de documentos. | [aula](2-Working-With-Data/06-non-relational/README.md) | [Jasmine](https://twitter.com/paladique)|
+| 07 | Trabalhando com Python | [Trabalhando com Dados](2-Working-With-Data/README.md) | Básico de Python para exploração de dados com bibliotecas como o Pandas. Compreensão fundamental de Python é recomendado. | [aula](2-Working-With-Data/07-python/README.md) [vídeo](https://youtu.be/dZjWOGbsN4Y) | [Dmitry](http://soshnikov.com) |
+| 08 | Preparação dos Dados | [Trabalhando com Dados](2-Working-With-Data/README.md) | Tópicos sobre técnicas de dados para limpar e transformas os dados para lidar com desafios de dados ausentes, inacurados, ou incompletos. | [aula](2-Working-With-Data/08-data-preparation/README.md) | [Jasmine](https://www.twitter.com/paladique) |
+| 09 | Visualizando Quantidades | [Visualização de Dados](3-Data-Visualization/README.md) | Aprenda a como usar o Matplotlib para visualizar dados sobre pássaros 🦆 | [aula](3-Data-Visualization/09-visualization-quantities/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 10 | Visualizando Distribuições de Dados | [Visualização de Dados](3-Data-Visualization/README.md) | Visualizando observações e tendências dentro de um itnervalo. | [aula](3-Data-Visualization/10-visualization-distributions/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 11 | Visualizando Proporções | [Visualização de Dados](3-Data-Visualization/README.md) | Visualizando porcentagens discretas e agrupadas. | [aula](3-Data-Visualization/11-visualization-proportions/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 12 | Visualizandos Relações | [Visualização de Dados](3-Data-Visualization/README.md) | Visualizando conexões e correlações entre sets de dados e suas variáveis. | [aula](3-Data-Visualization/12-visualization-relationships/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 13 | Visualizações Significativas | [Visualização de Dados](3-Data-Visualization/README.md) | Técnicas e orientação para fazer suas visualizações valiosas para resolver problemas efetivamente e intuições. | [aula](3-Data-Visualization/13-meaningful-visualizations/README.md) | [Jen](https://twitter.com/jenlooper) |
+| 14 | Introdução ao ciclo de Ciência de Dados | [Ciclo de Vida](4-Data-Science-Lifecycle/README.md) | Introdução ao ciclo de vida de ciência de dados e seu primeiro passo de adquirir e extrair dados. | [aula](4-Data-Science-Lifecycle/14-introdução/README.md) | [Jasmine](https://twitter.com/paladique) |
+| 15 | Análise | [Ciclo de Vida](4-Data-Science-Lifecycle/README.md) | Essa fase do ciclo de vida de ciência de dados foca nas técnicas de análise dados. | [aula](4-Data-Science-Lifecycle/15-Analyzing/README.md) | [Jasmine](https://twitter.com/paladique) | | |
+| 16 | Comunicação | [Ciclo de Vida](4-Data-Science-Lifecycle/README.md) | Essa fase do ciclo de vida de ciência de dados foca em apresentar as intuições dos dados de uma forma que fique fácil para tomadores de decisão entenderem. | [aula](4-Data-Science-Lifecycle/16-Communication/README.md) | [Jalen](https://twitter.com/JalenMcG) | | |
+| 17 | Ciẽncia de Dados na Nuvem | [Dados na Nuvem](5-Data-Science-In-Cloud/README.md) | Esse compilado de aula introdiz ciência de dados na nuvem e seus benefícios. | [aula](5-Data-Science-In-Cloud/17-introdução/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) e [Maud](https://twitter.com/maudstweets) |
+| 18 | Ciẽncia de Dados na Nuvem | [Dados na Nuvem](5-Data-Science-In-Cloud/README.md) | Treinando modelos usando ferramentas de Low Code. |[aula](5-Data-Science-In-Cloud/18-Low-Code/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) e [Maud](https://twitter.com/maudstweets) |
+| 19 | Ciẽncia de Dados na Nuvem | [Dados na Nuvem](5-Data-Science-In-Cloud/README.md) | Implantando modelos com Azure Machine Learning Studio. | [aula](5-Data-Science-In-Cloud/19-Azure/README.md)| [Tiffany](https://twitter.com/TiffanySouterre) e [Maud](https://twitter.com/maudstweets) |
+| 20 | Ciência de Dados na Selva | [Na Selva](6-Data-Science-In-Wild/README.md) | Projetos de Ciência de Dados no mundo real. | [aula](6-Data-Science-In-Wild/20-Real-World-Examples/README.md) | [Nitya](https://twitter.com/nitya) |
+## Acesso offline
+
+Você pode executar essa documentação offline usando [Docsify](https://docsify.js.org/#/). Dê fork nesse repositório, [instale Docsify](https://docsify.js.org/#/quickstart) na sua máquina local e depois, na pasta raíz desse repositório, digite `docsify serve`. O website vai usar a porta 3000 no seu localhost: `localhost:3000`.
+
+> Note, notebooks não serão renderizados via Docsify, então quando precisar rodas um notebook, faça isso separadamente no VS Code rodando um kernel Python.
+## PDF
+
+Um PDF com todas as aulas podem ser encontrados [aqui](https://microsoft.github.io/Data-Science-For-Beginners/pdf/readme.pdf)
+
+## Procura-se Ajuda!
+
+Se você quer traduzir tudo ou parte do currículo, por favor siga o nosso guia de [Tradução](TRANSLATIONS.md) guia.
+
+## Outros Currículos
+
+Nosso time produz outros currículos! Confira:
+
+- [Aprendizado de Máquina para Iniciantes](https://aka.ms/ml-beginners)
+- [IoT para Iniciantes](https://aka.ms/iot-beginners)
+- [Desenvolvimento Web para Iniciantes](https://aka.ms/webdev-beginners)
From 6d0b978baeb9f582e84c19ce514645de6c254019 Mon Sep 17 00:00:00 2001
From: Izael
Date: Thu, 7 Oct 2021 09:26:44 -0300
Subject: [PATCH 049/319] Translated Introduction Base README and path fixes
---
1-Introduction/translations/README.pt-br.md | 17 +++++++++++++++++
translations/README.pt-br.md | 4 ++--
2 files changed, 19 insertions(+), 2 deletions(-)
create mode 100644 1-Introduction/translations/README.pt-br.md
diff --git a/1-Introduction/translations/README.pt-br.md b/1-Introduction/translations/README.pt-br.md
new file mode 100644
index 00000000..ecba7b04
--- /dev/null
+++ b/1-Introduction/translations/README.pt-br.md
@@ -0,0 +1,17 @@
+# Introdução a Ciência de Dados
+
+
+> Foto por Stephen Dawson em Unsplash
+
+Nessas aulas, você irá descobrir como Ciência de Dados é definida e aprender sobre considerações éticas que devem ser consideradas por um cientista de dado. Você também irá aprender como dados são definidos e um pouco sobre estatística e probabilidade, os principais domínios acadêmicos da Ciência de Dados.
+
+### Tópicos
+
+1. [Definindo Ciência de Dados](01-defining-data-science/README.md)
+2. [Ética da Ciência de Dados](02-ethics/README.md)
+3. [Definindo Dados](03-defining-data/README.md)
+4. [Introdução a Estatística e Probabilidade](04-stats-and-probability/README.md)
+
+### Cŕeditos
+
+Essas aulas foram escritas com ❤️ por [Nitya Narasimhan](https://twitter.com/nitya) e [Dmitry Soshnikov](https://twitter.com/shwars).
diff --git a/translations/README.pt-br.md b/translations/README.pt-br.md
index 06100182..aa5b1ad4 100644
--- a/translations/README.pt-br.md
+++ b/translations/README.pt-br.md
@@ -16,7 +16,7 @@ Consultores da Azure Cloud na Microsoft estão felizes em oferecer um currículo
**🙏 Agradecimentos especiais 🙏 para nossos autores, revisores e contribuidores de conteúdo Estudantes Embaixadores da Microsoft,** notavelmente [Raymond Wangsa Putra](https://www.linkedin.com/in/raymond-wp/), [Ankita Singh](https://www.linkedin.com/in/ankitasingh007), [Rohit Yadav](https://www.linkedin.com/in/rty2423), [Arpita Das](https://www.linkedin.com/in/arpitadas01/), [Mohamma Iftekher (Iftu) Ebne Jalal](https://twitter.com/iftu119), [Dishita Bhasin](https://www.linkedin.com/in/dishita-bhasin-7065281bb), [Miguel Correa](https://www.linkedin.com/in/miguelmque/), [Nawrin Tabassum](https://www.linkedin.com/in/nawrin-tabassum), [Sanya Sinha](https://www.linkedin.com/mwlite/in/sanya-sinha-13aab1200), [Majd Safi](https://www.linkedin.com/in/majd-s/), [Sheena Narula](https://www.linkedin.com/in/sheena-narula-n/), [Anupam Mishra](https://www.linkedin.com/in/anupam--mishra/), [Dibri Nsofor](https://www.linkedin.com/in/dibrinsofor), [Aditya Garg](https://github.com/AdityaGarg00), [Alondra Sanchez](https://www.linkedin.com/in/alondra-sanchez-molina/), Yogendrasingh Pawar, Max Blum, Samridhi Sharma, Tauqeer Ahmad, Aaryan Arora, ChhailBihari Dubey
-| ](./sketchnotes/00-Title.png)|
+| ](../sketchnotes/00-Title.png)|
|:---:|
| Ciência de Dados para Iniciantes - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
@@ -57,7 +57,7 @@ Além do mais, um quiz com valor baixo antes da aula define a intenção do estu
## Tarefas
-| ](./sketchnotes/00-Roadmap.png)|
+| ](../sketchnotes/00-Roadmap.png)|
|:---:|
| Ciência de Dados para Iniciantes: Roadmap - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
From d802b3066fe0653ce931d8891e0eff2bbe195615 Mon Sep 17 00:00:00 2001
From: Mikhail Sadiakhmatov
Date: Thu, 7 Oct 2021 16:07:06 +0300
Subject: [PATCH 050/319] 1-base readme translated
---
1-Introduction/translations/README.ru.md | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
create mode 100644 1-Introduction/translations/README.ru.md
diff --git a/1-Introduction/translations/README.ru.md b/1-Introduction/translations/README.ru.md
new file mode 100644
index 00000000..abff417a
--- /dev/null
+++ b/1-Introduction/translations/README.ru.md
@@ -0,0 +1,17 @@
+# Введение в науку о данных
+
+
+> Photo by Stephen Dawson on Unsplash
+
+Пройдя данные уроки Вы узнаете, что такое наука о данных и изучите этические аспекты, которые должен учитывать каждый дата сайентист. Вы также узнаете, что такое данные и немного познакомитесь со статистикой и теорией вероятности, центральной областью науки о данных.
+
+### Разделы
+
+1. [Что такое наука о данных](01-defining-data-science/README.md)
+2. [Этика и наука о данных](02-ethics/README.md)
+3. [Что такое данные](03-defining-data/README.md)
+4. [Введение в статистику и теорию вероятности](04-stats-and-probability/README.md)
+
+### Благодарности
+
+Данные уроки были написаны с ❤️ [Nitya Narasimhan](https://twitter.com/nitya) и [Dmitry Soshnikov](https://twitter.com/shwars).
From 791bd227c15aa3441bf5822bb48fd23b26bb6906 Mon Sep 17 00:00:00 2001
From: Mikhail Sadiakhmatov
Date: Thu, 7 Oct 2021 16:36:02 +0300
Subject: [PATCH 051/319] 2-base readme translated
---
2-Working-With-Data/translations/README.ru.md | 17 +++++++++++++++++
1 file changed, 17 insertions(+)
create mode 100644 2-Working-With-Data/translations/README.ru.md
diff --git a/2-Working-With-Data/translations/README.ru.md b/2-Working-With-Data/translations/README.ru.md
new file mode 100644
index 00000000..0d0e865f
--- /dev/null
+++ b/2-Working-With-Data/translations/README.ru.md
@@ -0,0 +1,17 @@
+# Работа с данными
+
+
+> Photo by Alexander Sinn on Unsplash
+
+На этих уроках Вы изучите способы управления данными, методы работы с ними и как данные могут быть использованы в приложениях. Вы познакомитесь с реляционными и нереляционными базами данных и с тем, как они хранят данные. Вы овладеете основами обработки данных при помощи языка программирования Python.
+
+### Разделы
+
+1. [Реляционные базы данных](05-relational-databases/README.md)
+2. [Нереляционные базы данных](06-non-relational/README.md)
+3. [Работа с языком программирования Python](07-python/README.md)
+4. [Подготовка данных](08-data-preparation/README.md)
+
+### Благодарности
+
+Данные уроки были написаны с ❤️ [Christopher Harrison](https://twitter.com/geektrainer), [Dmitry Soshnikov](https://twitter.com/shwars) и [Jasmine Greenaway](https://twitter.com/paladique)
From ce9f44685c477ab0aaca90a98b5b627583c158fd Mon Sep 17 00:00:00 2001
From: Mikhail Sadiakhmatov
Date: Thu, 7 Oct 2021 17:50:48 +0300
Subject: [PATCH 052/319] 3-base readme translated
---
3-Data-Visualization/translations/README.md | 29 +++++++++++++++++++++
1 file changed, 29 insertions(+)
create mode 100644 3-Data-Visualization/translations/README.md
diff --git a/3-Data-Visualization/translations/README.md b/3-Data-Visualization/translations/README.md
new file mode 100644
index 00000000..f2ae42be
--- /dev/null
+++ b/3-Data-Visualization/translations/README.md
@@ -0,0 +1,29 @@
+# Визуализация данных
+
+
+> Photo by Jenna Lee on Unsplash
+
+
+Визуализация данных - это одна из важнейших задач дата сайентиста. Одним графиком можно заменить тысячу слов. Именно визуализация может помочь Вам распознать все особенности Ваших данных, такие как всплески, выбросы, группы, тренды и др., и понять, какую историю хранят в себе Ваши данные.
+
+В этих пяти уроках Вам предлагается исследовать природные данные и создать красивую визуализацию с использованием различных инструментов.
+
+### Разделы
+
+1. [Визуализация количественных данных](09-visualization-quantities/README.md)
+1. [Визуализация распределения данных](10-visualization-distributions/README.md)
+1. [Визуализация пропорций](11-visualization-proportions/README.md)
+1. [Визуализация связей](12-visualization-relationships/README.md)
+1. [Выразительная визуализация](13-meaningful-visualizations/README.md)
+
+### Благодарности
+
+Данные уроки были написаны с 🌸 [Джен Лупер](https://twitter.com/jenlooper).
+
+🍯 Данные о производстве мёда в США хранятся в проекте Джессики Ли на портале [Kaggle](https://www.kaggle.com/jessicali9530/honey-production). [Данные](https://usda.library.cornell.edu/concern/publications/rn301137d) были получены от [министерства сельского хозяйства США](https://www.nass.usda.gov/About_NASS/index.php).
+
+🍄 Данные о разнообразии грибов выложены при содействии Хаттерас Дантон и также хранятся на портале [Kaggle](https://www.kaggle.com/hatterasdunton/mushroom-classification-updated-dataset). Данный датасет содержит экземпляры 23 видов Агариковых (Пластинчатых) грибов семейства Шампиньоновые. Грибы были нарисованы в книге "The Audubon Society Field Guide to North American Mushrooms" в 1981 году. Данный датасет был передан репозиторию UCI ML в 1987 году.
+
+🦆 Данные о разнообразии птиц Миннесоты расположены на портале [Kaggle](https://www.kaggle.com/hannahcollins/minnesota-birds) и были собраны с сайта [Wikipedia](https://en.wikipedia.org/wiki/List_of_birds_of_Minnesota) Ханной Коллинс.
+
+Все датасеты распространяются по лицензии [CC0: Creative Commons](https://creativecommons.org/publicdomain/zero/1.0/).
\ No newline at end of file
From 69fbc7cd2ee95c23634acb36eff5fed7005fab55 Mon Sep 17 00:00:00 2001
From: Mikhail Sadiakhmatov
Date: Thu, 7 Oct 2021 18:02:23 +0300
Subject: [PATCH 053/319] naming bug fixed
---
3-Data-Visualization/translations/{README.md => README.ru.md} | 0
1 file changed, 0 insertions(+), 0 deletions(-)
rename 3-Data-Visualization/translations/{README.md => README.ru.md} (100%)
diff --git a/3-Data-Visualization/translations/README.md b/3-Data-Visualization/translations/README.ru.md
similarity index 100%
rename from 3-Data-Visualization/translations/README.md
rename to 3-Data-Visualization/translations/README.ru.md
From 147b4d294fbcc2514a1d09ed51c90977ae9b6bf8 Mon Sep 17 00:00:00 2001
From: Jasmine Greenaway
Date: Thu, 7 Oct 2021 11:05:08 -0400
Subject: [PATCH 054/319] Update README.md
---
README.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/README.md b/README.md
index 86c34bc3..679626f1 100644
--- a/README.md
+++ b/README.md
@@ -14,7 +14,7 @@ Azure Cloud Advocates at Microsoft are pleased to offer a 10-week, 20-lesson cur
**Hearty thanks to our authors:** [Jasmine Greenaway](https://www.twitter.com/paladique), [Dmitry Soshnikov](http://soshnikov.com), [Nitya Narasimhan](https://twitter.com/nitya), [Jalen McGee](https://twitter.com/JalenMcG), [Jen Looper](https://twitter.com/jenlooper), [Maud Levy](https://twitter.com/maudstweets), [Tiffany Souterre](https://twitter.com/TiffanySouterre), [Christopher Harrison](https://www.twitter.com/geektrainer).
-**🙏 Special thanks 🙏 to our Microsoft Student Ambassador authors, reviewers and content contributors,** notably [Raymond Wangsa Putra](https://www.linkedin.com/in/raymond-wp/), [Ankita Singh](https://www.linkedin.com/in/ankitasingh007), [Rohit Yadav](https://www.linkedin.com/in/rty2423), [Arpita Das](https://www.linkedin.com/in/arpitadas01/), [Mohamma Iftekher (Iftu) Ebne Jalal](https://twitter.com/iftu119), [Dishita Bhasin](https://www.linkedin.com/in/dishita-bhasin-7065281bb), [Miguel Correa](https://www.linkedin.com/in/miguelmque/), [Nawrin Tabassum](https://www.linkedin.com/in/nawrin-tabassum), [Sanya Sinha](https://www.linkedin.com/mwlite/in/sanya-sinha-13aab1200), [Majd Safi](https://www.linkedin.com/in/majd-s/), [Sheena Narula](https://www.linkedin.com/in/sheena-narula-n/), [Anupam Mishra](https://www.linkedin.com/in/anupam--mishra/), [Dibri Nsofor](https://www.linkedin.com/in/dibrinsofor), [Aditya Garg](https://github.com/AdityaGarg00), [Alondra Sanchez](https://www.linkedin.com/in/alondra-sanchez-molina/), Yogendrasingh Pawar, Max Blum, Samridhi Sharma, Tauqeer Ahmad, Aaryan Arora, ChhailBihari Dubey
+**🙏 Special thanks 🙏 to our Microsoft Student Ambassador authors, reviewers and content contributors,** notably [Raymond Wangsa Putra](https://www.linkedin.com/in/raymond-wp/), [Ankita Singh](https://www.linkedin.com/in/ankitasingh007), [Rohit Yadav](https://www.linkedin.com/in/rty2423), [Arpita Das](https://www.linkedin.com/in/arpitadas01/), [Mohamma Iftekher (Iftu) Ebne Jalal](https://twitter.com/iftu119), [Dishita Bhasin](https://www.linkedin.com/in/dishita-bhasin-7065281bb), [Miguel Correa](https://www.linkedin.com/in/miguelmque/), [Nawrin Tabassum](https://www.linkedin.com/in/nawrin-tabassum), [Sanya Sinha](https://www.linkedin.com/mwlite/in/sanya-sinha-13aab1200), [Majd Safi](https://www.linkedin.com/in/majd-s/), [Sheena Narula](https://www.linkedin.com/in/sheena-narula-n/), [Anupam Mishra](https://www.linkedin.com/in/anupam--mishra/), [Dibri Nsofor](https://www.linkedin.com/in/dibrinsofor), [Aditya Garg](https://github.com/AdityaGarg00), [Alondra Sanchez](https://www.linkedin.com/in/alondra-sanchez-molina/), [Max Blum](https://www.linkedin.com/in/max-blum-6036a1186/), Yogendrasingh Pawar, Samridhi Sharma, Tauqeer Ahmad, Aaryan Arora, ChhailBihari Dubey
| ](./sketchnotes/00-Title.png)|
|:---:|
From f8256d292c8cafaa714b4fa94b1b29b1c3855d3d Mon Sep 17 00:00:00 2001
From: Mikhail Sadiakhmatov
Date: Thu, 7 Oct 2021 18:17:11 +0300
Subject: [PATCH 055/319] 4-base readme translated, base readme improved
---
.../translations/README.ru.md | 16 ++++++++++++++++
translations/README.ru.md | 2 +-
2 files changed, 17 insertions(+), 1 deletion(-)
create mode 100644 4-Data-Science-Lifecycle/translations/README.ru.md
diff --git a/4-Data-Science-Lifecycle/translations/README.ru.md b/4-Data-Science-Lifecycle/translations/README.ru.md
new file mode 100644
index 00000000..cf05568e
--- /dev/null
+++ b/4-Data-Science-Lifecycle/translations/README.ru.md
@@ -0,0 +1,16 @@
+# Введение в жизненный цикл проекта в области науки о данных
+
+
+> Photo by Headway on Unsplash
+
+В данных уроках вы познакомитесь с этапами жизненного циклы проекта в области науки о данных, включая анализ данных и взаимодействие на их основе.
+
+### Разделы
+
+1. [Введение в жизненный цикл проекта в области науки о данных](14-Introduction/README.md)
+2. [Анализ данных](15-Analyzing/README.md)
+3. [Взаимодействие на основе данных](16-communication/README.md)
+
+### Благодарности
+
+Данные уроки были написаны с ❤️ [Jalen McGee](https://twitter.com/JalenMCG) и [Jasmine Greenaway](https://twitter.com/paladique)
diff --git a/translations/README.ru.md b/translations/README.ru.md
index a8e91f34..04991d13 100644
--- a/translations/README.ru.md
+++ b/translations/README.ru.md
@@ -80,7 +80,7 @@
| 13 | Выразительная визуализация | [Визуализация данных](3-Data-Visualization/README.md) | Методы и инструкция для построения визуализации для эффективного решения проблем и получения инсайтов. | [урок](3-Data-Visualization/13-meaningful-visualizations/README.md) | [Jen](https://twitter.com/jenlooper) |
| 14 | Введение в жизненный цикл проекта в области науки о данных | [Жизненный цикл проекта](4-Data-Science-Lifecycle/README.md) | Введение в жизненный цикл проекта в области науки о данных и его первый этап получения и извлечения данных. | [урок](4-Data-Science-Lifecycle/14-Introduction/README.md) | [Jasmine](https://twitter.com/paladique) |
| 15 | Анализ данных | [Жизненный цикл проекта](4-Data-Science-Lifecycle/README.md) | Данный этап жизненного цикла сосредоточен на методах анализа данных. | [урок](4-Data-Science-Lifecycle/15-Analyzing/README.md) | [Jasmine](https://twitter.com/paladique) | | |
-| 16 | Взаимодействие | [Жизненный цикл проекта](4-Data-Science-Lifecycle/README.md) | Данный этап жизненного цикла сфокусирован на презентацию инсайтов в данных в виде, легком для понимания лицам, принимающим решения. | [урок](4-Data-Science-Lifecycle/16-Communication/README.md) | [Jalen](https://twitter.com/JalenMcG) | | |
+| 16 | Взаимодействие на основе данных| [Жизненный цикл проекта](4-Data-Science-Lifecycle/README.md) | Данный этап жизненного цикла сфокусирован на презентацию инсайтов в данных в виде, легком для понимания лицам, принимающим решения. | [урок](4-Data-Science-Lifecycle/16-Communication/README.md) | [Jalen](https://twitter.com/JalenMcG) | | |
| 17 | Наука о данных в облачной инфраструктуре | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Данная серия уроков знакомит с применением облачных технологии в науке о данных и его преимуществах. | [урок](5-Data-Science-In-Cloud/17-Introduction/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
| 18 | Наука о данных в облачной инфраструктуре | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Обучение моделей с минимальным использованием программирования. |[урок](5-Data-Science-In-Cloud/18-Low-Code/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
| 19 | Наука о данных в облачной инфраструктуре | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Развёртывание моделей с использованием Azure Machine Learning Studio. | [урок](5-Data-Science-In-Cloud/19-Azure/README.md)| [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
From 7e3c9eb01744fc0db76bbf1bde3fd5f50a721d7e Mon Sep 17 00:00:00 2001
From: Mikhail Sadiakhmatov
Date: Thu, 7 Oct 2021 19:26:48 +0300
Subject: [PATCH 056/319] 5-base readme translated
---
.../translations/README.ru.md | 22 +++++++++++++++++++
1 file changed, 22 insertions(+)
create mode 100644 5-Data-Science-In-Cloud/translations/README.ru.md
diff --git a/5-Data-Science-In-Cloud/translations/README.ru.md b/5-Data-Science-In-Cloud/translations/README.ru.md
new file mode 100644
index 00000000..cade9559
--- /dev/null
+++ b/5-Data-Science-In-Cloud/translations/README.ru.md
@@ -0,0 +1,22 @@
+# Наука о данных в облачной инфраструктуре
+
+
+
+> Photo by [Jelleke Vanooteghem](https://unsplash.com/@ilumire) from [Unsplash](https://unsplash.com/s/photos/cloud?orientation=landscape)
+
+Когда приходит время анализировать по-настоящему большие данные, использование облачных технологий может обеспечить неоспоримое преимущество. В следующих трёх уроках вы узнаете, что такое облачная инфраструктура и чем она может быть полезна. Для этого мы исследуем набор данных о сердечной недостаточности и построим модель оценки вероятности появления данной болезни. Мы применим все преимущества облачных технологий для тренировки, развёртывания и использования модели в двумя способами. Первый спосои - это использование только пользовательского интерфейса с минимальным применением программирования, второй - использование инструмента под названием Azure Machine Learning Software Developer Kit (Azure ML SDK).
+
+
+
+### Разделы
+
+1. [Преимущества облачной инфраструктуры для науки о данных.](17-Introduction/README.md)
+2. [Наука о данных в облачной инфраструктуре: подходы с минимальным использованием программирования и без него.](18-Low-Code/README.md)
+3. [Наука о данных в облачной инфраструктуре: применение Azure ML SDK](19-Azure/README.md)
+
+### Благодарности
+Данные уроки были написаны с ☁️ и 💕 [Maud Levy](https://twitter.com/maudstweets) and [Tiffany Souterre](https://twitter.com/TiffanySouterre)
+
+
+Данные для прогнозирования сердечной недостаточности были собраны [
+Larxel](https://www.kaggle.com/andrewmvd) и хранятся на портале [Kaggle](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data). Датасет распространятеся по лицензии [Attribution 4.0 International (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/)
From 2c198fac816fe75f652b41d8ff9ec72caba87917 Mon Sep 17 00:00:00 2001
From: Mikhail Sadiakhmatov
Date: Thu, 7 Oct 2021 19:27:21 +0300
Subject: [PATCH 057/319] naming bug fixed, translation updated
---
README.md | 6 +++---
translations/README.ru.md | 6 +++---
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/README.md b/README.md
index 4843d1f8..b2b03ed8 100644
--- a/README.md
+++ b/README.md
@@ -80,9 +80,9 @@ In addition, a low-stakes quiz before a class sets the intention of the student
| 14 | Introduction to the Data Science lifecycle | [Lifecycle](4-Data-Science-Lifecycle/README.md) | Introduction to the data science lifecycle and its first step of acquiring and extracting data. | [lesson](4-Data-Science-Lifecycle/14-Introduction/README.md) | [Jasmine](https://twitter.com/paladique) |
| 15 | Analyzing | [Lifecycle](4-Data-Science-Lifecycle/README.md) | This phase of the data science lifecycle focuses on techniques to analyze data. | [lesson](4-Data-Science-Lifecycle/15-Analyzing/README.md) | [Jasmine](https://twitter.com/paladique) | | |
| 16 | Communication | [Lifecycle](4-Data-Science-Lifecycle/README.md) | This phase of the data science lifecycle focuses on presenting the insights from the data in a way that makes it easier for decision makers to understand. | [lesson](4-Data-Science-Lifecycle/16-Communication/README.md) | [Jalen](https://twitter.com/JalenMcG) | | |
-| 17 | Data Science in the Cloud | [Cloud Data](5-Data-Science-In-Cloud/README.md) | This series of lessons introduces data science in the cloud and its benefits. | [lesson](5-Data-Science-In-Cloud/17-Introduction/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
-| 18 | Data Science in the Cloud | [Cloud Data](5-Data-Science-In-Cloud/README.md) | Training models using Low Code tools. |[lesson](5-Data-Science-In-Cloud/18-Low-Code/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
-| 19 | Data Science in the Cloud | [Cloud Data](5-Data-Science-In-Cloud/README.md) | Deploying models with Azure Machine Learning Studio. | [lesson](5-Data-Science-In-Cloud/19-Azure/README.md)| [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
+| 17 | Why use Cloud for Data Science? | [Cloud Data](5-Data-Science-In-Cloud/README.md) | This series of lessons introduces data science in the cloud and its benefits. | [lesson](5-Data-Science-In-Cloud/17-Introduction/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
+| 18 | Data Science in the Cloud: The "Low code/No code" way | [Cloud Data](5-Data-Science-In-Cloud/README.md) | Training models using Low Code tools. |[lesson](5-Data-Science-In-Cloud/18-Low-Code/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
+| 19 | Data Science in the Cloud: The "Azure ML SDK" way | [Cloud Data](5-Data-Science-In-Cloud/README.md) | Deploying models with Azure Machine Learning Studio. | [lesson](5-Data-Science-In-Cloud/19-Azure/README.md)| [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
| 20 | Data Science in the Wild | [In the Wild](6-Data-Science-In-Wild/README.md) | Data science driven projects in the real world. | [lesson](6-Data-Science-In-Wild/20-Real-World-Examples/README.md) | [Nitya](https://twitter.com/nitya) |
## Offline access
diff --git a/translations/README.ru.md b/translations/README.ru.md
index 04991d13..56aeb7cc 100644
--- a/translations/README.ru.md
+++ b/translations/README.ru.md
@@ -81,9 +81,9 @@
| 14 | Введение в жизненный цикл проекта в области науки о данных | [Жизненный цикл проекта](4-Data-Science-Lifecycle/README.md) | Введение в жизненный цикл проекта в области науки о данных и его первый этап получения и извлечения данных. | [урок](4-Data-Science-Lifecycle/14-Introduction/README.md) | [Jasmine](https://twitter.com/paladique) |
| 15 | Анализ данных | [Жизненный цикл проекта](4-Data-Science-Lifecycle/README.md) | Данный этап жизненного цикла сосредоточен на методах анализа данных. | [урок](4-Data-Science-Lifecycle/15-Analyzing/README.md) | [Jasmine](https://twitter.com/paladique) | | |
| 16 | Взаимодействие на основе данных| [Жизненный цикл проекта](4-Data-Science-Lifecycle/README.md) | Данный этап жизненного цикла сфокусирован на презентацию инсайтов в данных в виде, легком для понимания лицам, принимающим решения. | [урок](4-Data-Science-Lifecycle/16-Communication/README.md) | [Jalen](https://twitter.com/JalenMcG) | | |
-| 17 | Наука о данных в облачной инфраструктуре | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Данная серия уроков знакомит с применением облачных технологии в науке о данных и его преимуществах. | [урок](5-Data-Science-In-Cloud/17-Introduction/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
-| 18 | Наука о данных в облачной инфраструктуре | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Обучение моделей с минимальным использованием программирования. |[урок](5-Data-Science-In-Cloud/18-Low-Code/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
-| 19 | Наука о данных в облачной инфраструктуре | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Развёртывание моделей с использованием Azure Machine Learning Studio. | [урок](5-Data-Science-In-Cloud/19-Azure/README.md)| [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
+| 17 | Преимущества облачной инфраструктуры для науки о данных. | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Данная серия уроков знакомит с применением облачных технологии в науке о данных и его преимуществах. | [урок](5-Data-Science-In-Cloud/17-Introduction/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
+| 18 | Наука о данных в облачной инфраструктуре: подходы с минимальным использованием программирования и без него. | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Обучение моделей с минимальным использованием программирования. |[урок](5-Data-Science-In-Cloud/18-Low-Code/README.md) | [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
+| 19 | Наука о данных в облачной инфраструктуре: применение Azure ML SDK | [Облачные данные](5-Data-Science-In-Cloud/README.md) | Развёртывание моделей с использованием Azure Machine Learning Studio. | [урок](5-Data-Science-In-Cloud/19-Azure/README.md)| [Tiffany](https://twitter.com/TiffanySouterre) and [Maud](https://twitter.com/maudstweets) |
| 20 | Наука о данных на практике | [На практике](6-Data-Science-In-Wild/README.md) | Проекты в области науки о данных на практике. | [урок](6-Data-Science-In-Wild/20-Real-World-Examples/README.md) | [Nitya](https://twitter.com/nitya) |
## Оффлайн доступ
From ad0fc0ab250117e47c2fddfc737c66b3f081ed84 Mon Sep 17 00:00:00 2001
From: Mikhail Sadiakhmatov
Date: Thu, 7 Oct 2021 19:30:47 +0300
Subject: [PATCH 058/319] 6-base readme translated
---
6-Data-Science-In-Wild/translations/README.ru.md | 11 +++++++++++
1 file changed, 11 insertions(+)
create mode 100644 6-Data-Science-In-Wild/translations/README.ru.md
diff --git a/6-Data-Science-In-Wild/translations/README.ru.md b/6-Data-Science-In-Wild/translations/README.ru.md
new file mode 100644
index 00000000..235ab191
--- /dev/null
+++ b/6-Data-Science-In-Wild/translations/README.ru.md
@@ -0,0 +1,11 @@
+# Наука о данных на практике
+
+Примеры реального использования науки о данных в приложениях во многих отраслях.
+
+### Разделы
+
+1. [Наука о данных на практике](20-Real-World-Examples/README.md)
+
+### Благодарности
+
+Написано с ❤️ [Nitya Narasimhan](https://twitter.com/nitya)
From 53f9296d65e2dcb1ebacf8e05565585de2e97dd2 Mon Sep 17 00:00:00 2001
From: Izael
Date: Fri, 8 Oct 2021 09:08:05 -0300
Subject: [PATCH 059/319] Translated 1-01 Defining Data Science
---
.../translations/README.pt-br.md | 165 ++++++++++++++++++
1 file changed, 165 insertions(+)
create mode 100644 1-Introduction/01-defining-data-science/translations/README.pt-br.md
diff --git a/1-Introduction/01-defining-data-science/translations/README.pt-br.md b/1-Introduction/01-defining-data-science/translations/README.pt-br.md
new file mode 100644
index 00000000..b0c065a9
--- /dev/null
+++ b/1-Introduction/01-defining-data-science/translations/README.pt-br.md
@@ -0,0 +1,165 @@
+# Definindo Ciências de Dados
+
+| ](../../../sketchnotes/01-Definitions.png)|
+|:---:|
+|Definindo Ciências de Dados - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
+
+---
+
+[](https://youtu.be/pqqsm5reGvs)
+
+## [Quiz pré-aula](https://red-water-0103e7a0f.azurestaticapps.net/quiz/0)
+
+## O que são Dados?
+Na nossa vida cotidiana, nós estamos constantemente cercados por dados. O texto que você está lendo agora é um dado, a lista de telefones dos seus amigos no seu celular é um dado, assim como o horário atual mostrado no seu relógio. Como seres humanos, nós operamos naturalmente com dados. contando o dinheiro que temos ou escrevendo cartas para os nossos amigos.
+
+No entanto, os dados se tornaram muito mais críticos com a criação de computadores. O papel principal dos computadores é realizar computações, mas eles precisam de dados para operar sobre. Portanto, nós precisamos entender como os computadores armazenam e processam dados.
+
+Com o surgimento da Internet, o papel dos computadores como dispositivos de manipulação de dados aumentou. Se você parar para pensar, agora nós usamos computadores cada vez mais para processamento de dados e comunicação, ao invés de cálculos reais. Quando escrevemos um e-mail para um amigo ou procuramos por alguma informação na Internet - nós estamos essencialmente criando, armazenando, transmitindo, e manipulando dados.
+> Você consegue se lembrar da última vez que usou computadores para de fato computar algo?
+
+## O que é Ciência de Dados?
+
+Na [Wikipedia](https://en.wikipedia.org/wiki/Data_science), **Ciência de Dados** é definido como *um campo científico que utiliza métodos científicos para extrair conhecimento e insights de dados estruturados e não estruturados, e aplicar esse conhecimento e insights acionáveis de dados em uma ampla gama de domínios de aplicativos*.
+
+Essa definição destaca os seguintes aspectos importantes da ciência de dados:
+
+* O principal objetivo da ciência de dados é **extrair conhecimento** dos dados, em outras palavras - **entender** os dados, encontrar alguma relação escondida e construir um **modelo**.
+* Ciência de dados utiliza **métodos científicos**, como probabilidade e estatística. Na verdade, quando o termo *ciência de dados* foi introduzido pela primeira vez, algumas pessoas argumentaram que ciência de dados é apenas um nome chique para estatística. Hoje em dia ficou mais evidente que esse campo é muito mais amplo.
+* Conhecimento adquirido deve ser aplicado para produzir algum **insights acionável**.
+* Nós devemos ser capazes de operar tanto nos dados **estruturados** quanto nos **não estruturados**. Nós voltaremos a discutir diferentes tipos de dados mais para a frente no curso.
+* **Domínio de aplicação** é um conceito importante, e cientistas de dados frequentemente precisam de pelo menos algum grau de perícia no domínio do problema.
+
+> Outro importante aspecto da Ciência de Dados é que ela estuda como os dados podem ser coletados, armazenados e operados por meio de computadores. Enquanto estatística nos fornece fundações matemáticas, ciência de dados aplica conceitos matemáticos para de fato desenhar percepções a partir dos dados.
+
+Uma das formas (atribuída a [Jim Gray](https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist))) para olhar para ciência de dados é considerar que ela é um paradigma separado da ciência:
+* **Empírico**, onde nos baseamos majoritariamente nas observações e resultados dos experimentos
+* **Teórico**, onde novos conceitos surgem a partir de conhecimentos cientificos já existentes
+* **Computacional**, onde nós descobrimos novos princípios baseado em algum experimento computacional
+* **Orientado por Dados**, baseado na descoberta de relações e padrões nos dados
+
+## Outros Campos Relacionados
+
+Já que dados são um conceito difundido, a ciência de dados em si também é um campo amplo, abrangendo muitas outras disciplinas relacionadas.
+
+
+
Banco de Dados
+
+A coisa mais óbvia a considerar é **como armazenar** os dados, ex. como estruturá-los de uma forma que permite um processamento rápido. Existem diferentes tipos de banco de dados que armazenam dados estruturados e não estruturados, que nós vamos considerar nesse curso.
+
+
Big Data
+
+Frequentemente precisamos armazenar e processar quantidades muito grandes de dados com estruturas relativamente simples. Existem algumas abordagens e ferramentas especiais para armazenar esses dados de uma forma distribuída em um cluster de computer, e processá-los de forma eficiente.
+
+
Aprendizado de Máquina
+
+Uma das maneiras de entender dados é **construir um modelo** que será capaz de predizer o resultado esperado. Ser capaz de aprender esses modelos a partir de dados é a área estudada em **aprendizado de máquina**. Você talvez queira olhar o nosso Currículo de Aprendizado de Máquina para Iniciantes para ir mais a fundo nessa área.
+
+
Inteligência Artificial
+
+Como aprendizado de máquina, inteligência artificial também se baseia em dados, e envolve construir modelos de alta complexidade que irão exibir um comportamento similar ao dos seres humanos. Além disso, métodos de IA frequentemente nos permite transformar dados não estruturados (ex. linguagem natural) em dados estruturados extraindo algumas percepções.
+
+
Visualização
+
+Vastas quantidades de dados são incompreensíveis para o ser humano, mas uma vez que criamos visualizações úteis - nós podemos começar a dar muito mais sentido aos dados, e desenhar algumas conclusões. Portanto, é importante conhecer várias formas de visualizar informação - algo que vamos cobrir na Seção 3 do nosso curso. Áreas relacionadas também incluem **Infográficos**, e **Interação Humano-Computador** no geral.
+
+
+
+## Tipos de Dados
+
+Como nós já mencionamos - dados estão em todos os lugares, nós só precisamos coletá-los da maneira certa! É útil distinguir entre dados **estruturados** e **não estruturados**. Os primeiros são tipicamente representados em alguma forma bem estruturado, frequentemente como uma ou várias tabelas, enquanto o segundo é apenas uma coleção de arquivos. Algumas vezes nós também podemos falar de dados **semi estruturados**, que possuem alguma estrutura que pode variar muito.
+
+| Estruturado | Semi-estruturado | Não estruturado |
+|----------- |-----------------|--------------|
+| Lista de pessoas com seus números de telefones | Páginas da Wikipédia com links | Texto da Encyclopædia Britannica |
+| Temperatura de todos os quartos de um prédio a cada minuto nos últimos 20 anos | Coleções de artigos cientificos em formato JSON com autores, datas de publicação, e abstract | Compartilhamento de arquivos com documentos corporativos |
+| Dados para idades e gêneros de todas as pessoas entrando em um prédio | Páginas da Internet | Feed de vídeo bruto da câmera de vigilância |
+
+## Onde conseguir Dados
+
+Existem muitas fontes possíveis de dados, e será impossível listar todas elas. No entanto, vamos mencionar alguns dos lugares típicos onde você pode obter dados:
+
+* **Estruturado**
+ - **Internet das Coisas**, incluindo dados de diferentes sensores, como sensores de temperatura ou de pressão, fornece muitos dados úteis. Por exemplo, se um escritório de um prédio é equipado com sensores IoT, nós podemos automaticamente controlar o aquecimento e a iluminação com o objetivo de minimizar custos.
+ - **Pesquisas** que podemos fazer para os usuários depois de uma compra, ou visitar um web site.
+ - **Análise de comportamento** pode, por exemplo, nos ajudar a entender o quão longe um usuário vai dentro de um site, e qual tipicamente é a razão para deixar um site.
+* **Não estruturado**
+ - **Textos** podem ser uma fonte rica de insights, começando da **pontuação geral de sentimento** (sentiment score), até a extração de palavras chaves e até algum significado semântico.
+ - **Imagens** ou **Vídeo**. Um vídeo de uma câmera de vigilância pode ser usado para estimar o tráfico na rua, e informar as pessoas sobre possíveis engarrafamentos.
+ - **Logs** de servidores web pode ser usado para entender quais páginas do nosso site são mais visitadas, e por quanto tempo.
+* Semi-estruturado
+ - Grafos das **Redes Sociais** podem ser uma boa fonte de dados sobre a personalidade do usuário e a eficácia potencial em espalhar informações.
+ - Quando nós temos um monte de fotos de uma festa, nós podemos tentar extrair dados sobre **Dinâmicas de Grupo** construindo um grafo de pessoas tirando fotos umas das outras.
+
+Conhecendo as diferentes fontes possíveis de dados, você pode tentar pensar sobre diferentes cenários onde técnicas de ciência de dados podem ser aplicadas para conhecer a situação melhor, e melhorar o processo de negócio.
+
+## O que você pode fazer com Dados
+
+Em Ciência de Dados, nós focamos em seguir os passos da jornada dos dados:
+
+
+
1) Aquisição de Dados
+
+Primeiro passo é coletar os dados. Enquanto em muitos casos isso pode ser um processo direto, como dados vindo para um banco de dados a partir de uma aplicação web, algumas vezes nós precisamos usar técnicas especiais. Por exemplo, dados de sensores de IoT podem ser muito pesados, e é uma boa prática usar buffering endpoints como Hub de IoT para coletar todos os dados antes de processá-los.
+
+
2) Armazenamento de Dados
+
+Armazenar os dados pode ser desafiador, especialmente se estamos falando de big data. Enquanto decide como armazenar os dados, faz sentido antecipar a forma como você gostaria de consultá-los mais tarde. Existem diversas formas de como os dados podem ser armazenados:
+
+
Bancos de dados relacionais armazenam uma coleção de tabelas, e utilizam uma linguagem especial chamada SQL para consultá-los. Tipicamente, tabelas seriam conectadas umas às outras usando algum schema. Em vários casas nós precisamos converter os dados da forma original para ajustar al schema.
+
Bancos de dados NoSQL, como CosmosDB, não impõe schema nos dados, e permite o armazenamento de dados mais complexos, como por exemplo, documentos hierárquicos JSON ou grafos. No entanto, bancos de dados NoSQL não possuem a capacidade rica de consulta do SQL, e não podem impor integridade referencial entre os dados.
+
Armazenamento em Data Lake é usado para grandes coleções de dados na forma bruta. Data lakes são frequentemente usados para big data, onde todos não podem se encaixar em uma máquina, e precisam ser armazenados e processados por um cluster. Parquet é o formato de dado que é frequentemente usado em conjunção com big data.
+
+
+
3) Processamento de Dados
+
+Esse é a parte mais emocionante da jornada dos dados, que envolve processar os dados de sua forma original para a forma que pode ser usada para visualização/treinamento do modelo. Quando lidando com dados não estruturados como textos ou imagens, nós podemos precisar de algumas técnicas de IA para extrair **features** dos dados, convertendo-os então para a forma estruturada.
+
+
4) Visualização / Percepções Humanas
+
+Frequentemente para entender os dados precisamos visualizar eles. Tendo várias técnicas de visualização diferentes na nossa caixa de ferramentas, nós podemos encontrar a visualização certa para termos um insight. Frequentemente, cientistas de dados precisam "brincar com dos dados", visualizando-os várias vezes e procurando alguma relação. Também, nós podemos usar algumas técnicas de estatísticas para testar alguma hipótese ou provar uma correlação entre pedaços diferentes de dados.
+
+
5) Treinando modelos preditivos
+
+Já que o maior objetivo da ciência de dados é ser capaz de tomar decisões baseadas em dados, nós podemos querer usar técnicas de Aprendizando de Máquina para construir modelos preditivos que serão capazes de resolver nosso problema.
+
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 4\n",
+ "\\begin{tabular}{r|llll}\n",
+ " & A & B & DivA & LenB\\\\\n",
+ " & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 1 & I & -4 & 1\\\\\n",
+ "\t2 & 2 & like & -3 & 4\\\\\n",
+ "\t3 & 3 & to & -2 & 2\\\\\n",
+ "\t4 & 4 & use & -1 & 3\\\\\n",
+ "\t5 & 5 & Python & 0 & 6\\\\\n",
+ "\t6 & 6 & and & 1 & 3\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 4\n",
+ "\n",
+ "| | A <int> | B <chr> | DivA <dbl> | LenB <int> |\n",
+ "|---|---|---|---|---|\n",
+ "| 1 | 1 | I | -4 | 1 |\n",
+ "| 2 | 2 | like | -3 | 4 |\n",
+ "| 3 | 3 | to | -2 | 2 |\n",
+ "| 4 | 4 | use | -1 | 3 |\n",
+ "| 5 | 5 | Python | 0 | 6 |\n",
+ "| 6 | 6 | and | 1 | 3 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " A B DivA LenB\n",
+ "1 1 I -4 1 \n",
+ "2 2 like -3 4 \n",
+ "3 3 to -2 2 \n",
+ "4 4 use -1 3 \n",
+ "5 5 Python 0 6 \n",
+ "6 6 and 1 3 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(df)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "id": "515c95b2",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAANICAMAAADKOT/pAAAAMFBMVEUAAABNTU1oaGh8fHyM\njIyampqnp6eysrK9vb3Hx8fQ0NDZ2dnh4eHp6enw8PD////QFLu4AAAACXBIWXMAABJ0AAAS\ndAHeZh94AAAVuklEQVR4nO3djVYbuxWAUZn/ULDf/22LDYm5CQbbc0Y6kvZeC4c2C0Yj6cNg\n0absgMVK6wHACIQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQE\nAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQE\nAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQE\nAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQE\nAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQE\nAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEASqEVKAzV+zy+HAaXAIiCQkCCAkCCAkCCAkC\nCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAnO9c3/olxIcJ5DRadS\nEhKcp3x6PPGXV3y+VQmJdMpff379t1d8whUJiXTKp7ev//aKT7gyIZFO+diXQoIlyjcdCQnO\n5FU7WGxfkHMkWOanLSkk+NmPO1JI8KOfN6SQ4Cdn7EchwQ/O2Y5Cgu+dtRuFBN86bzMKCb5x\n7r9pKSQ47eydKCQ46fyNKCQ45YJ9KCQ44ZJtKCT42kW7UEjwpcs2oZDgKxfuQSHBv849Pjp+\nwBXXuPxDEl4CTrt8AwoJ/lanCiExtmu2n5Dgv67afUKC/7hu8wkJPrty7wkJPrl26wkJ/rj4\n+Oj4kVU+JOEl4B8L9p2Q4MOSbSckeLdo1wkJDpZtOiHB3sI9JyTYLd9yQoIFL3v/+QxVPiTh\nJeCPgP0mJKYXsd2ExOxCdpuQmFzMZhMScwvaa0JialFbTUjMLGynCYl5LT8+On6qKh+S8BIQ\nuc2ExKxCd5mQmFTsJhMScwreY0JiStFbTEjMKHyHCYkJxW8wITGdwOOj4+es8iEJL8G0Vtld\nQmIy62wuITGXlfaWkJjKWltLSMxktZ0lJCay3sYSEvNYcV8JiVmscXx0/ORVPiThJZjNuptK\nSMxh5T0lJKaw9pYSEjNYfUcJiQmsv6GExPhybtmco4JTamynmiG93pfN4273dFM2DytdAj4p\n+6OjVY+Pjpeq8iEH283bfZWnx/1juV3lEnB0KKhORlVDeihvz0MPm3K/3W0P78dfAo4Ou2jA\nkDYfN7Y9/LFZ4xLwR/l4qLOZKoZUPn+F+PcLRfnsykvAH+XTW52rrf8hB5tPIW09I7GycnxS\nqnO1Ch9y8PtnpIftx/vxl4CjUrEjr9oxrFFftXOOREXvR0jVdpLfbGBItbeQkBhR9R0kJAZU\nfwMJifE02D9CYjgtto+QGE2T3SMkBtNm8wiJsTTaO0JiJM1+31lIDKTdxhES42i4b4TEMFpu\nGyExiqa7RkgMou2mERJjaLxnhMQQWm8ZITGC5jtGSPQvwf/tlJDoXobtIiR6l2K3CInO5dgs\nQqJvSfaKkOhalq0iJHqWZqcIiY7l2ShCol+J9omQ6FWCY9gjIdGpXJtESPQp2R4REl3KtkWE\nRI/S7RAh0aF8G0RI9Cfh/hAS3cm4PYREb1LuDiHRl1THsEdCoitZt4aQ6EnanSEkOpJ3YwiJ\nfiTeF0KiG5m3hZDoRepdISQ6kXtTCIk+JN8TQqIHSY9hj4REB/JvCCGRXwf7QUik18N2EBLZ\ndbEbhERyfWwGIZFbJ3tBSKTWy1YQEpl1sxOERF7pj2GPhERaPW0DIZFVV7tASCTV1yYQEjl1\ntgeEREq9bQEhkVF3O0BIJNTfBhAS+XS4/kIim46OYY+ERB5l31Cfiy8ksjg8E3X5dLQTEnkc\nVl1IkTqdTJYoHw99Lr6QSKJ8euuPkEiiHJ+UOiQksigddyQk0vCqXbhOJ5MFDkdIvWYkJJLo\nfcmFRAbdr7iQSKD/BRcS7Q2w3kKiuRGWW0i0NsRqC4nGxlhsIdHWIGstJFrq9wT2L0KioXEW\nWki0M9A6C4lmRlpmIdHKUKssJBoZa5GFRBuDrbGQaGK0JRYSLQy3wkKivmGOYY+ERHUjLq+Q\nqG3I1RUSlY25uEKirkHXVkhUNerSComahl1ZIVHRuAsrJOoZeF2FRC0DHsMeCYlKxl5UIVHH\n4GsqJKoYfUmFRA3Dr6iQqGD8BRUS65tgPYXE6mZYTiGxtilWU0isa+hj2CMhsapZllJIrGma\nlRQSK5pnIYXEeiZaRyGxmpmWUUisZapVFBIrmWsRhcQ6JltDIbGGSY5hj4TECuZbQCERb8L1\nExLhZlw+IRFtytUTEsHmXDwhEWvStRMSoWZdOiERadqVExJxpjuGPRISYWZeNiERZepVExJB\n5l40IRFj8jUTEiFmXzIhEWH6FRMSASyYkFjOegmJxSY+hj2qGdL2YfP2+HhTyu2vlS5BVWXf\nkMXaqxjS6+Zt2rdvD3u3q1yCmg7PRJ6O3lUM6b7cbd8e7l/fmrovD2tcgpoOqySkdxVDKmX7\n8fD2XV7ZrHEJKiofDxZrr2pIbw+b8uk//PXXn1x5CSoqn96o+q3dy273uH/YPyN9+0OStelA\nOT4pUTOkl7J5eNndbd5Ker4pz2tcgpqKjo5qvvz9vDl+7/a4ziWoyKt2n9Q9kP11f7Ov6O7x\ndbVLUMnhCElGv/nNBq5iif5LSFzDCv1FSFzBAv1NSFzO+vxDSFzM8vxLSFzK6nxBSFzI4nxF\nSFzEydHXhMQlrMwJQuICFuYUIXE+63KSkDibZTlNSJzLqnxDSJzJonxHSJzHmnxLSJzD8dEP\nhMQZLMhPhMTPrMePhMSPLMfPhMRPrMYZhMQPLMY5hMT3rMVZhMS3LMV5hMQ3HB+dS0icZh3O\nJiROsgznExKnWIULCIkTLMIlhMTXrMFFhMSXLMFlhMQXvOx9KSHxL/N/MSHxD9N/OSHxN7N/\nBSHxF5N/DSHxX+b+KkLiP0z9dYTEZ2b+SkLiyPHR1YTEH6b9ekLiN7O+gJD4YNKXEBLvzPki\nQuLAlC8jJPbM+EJCYmfClxMSjo8CCAmzHUBI0zPZEYQ0O3MdQkiTM9UxhDQ3Mx1ESFMz0VGE\nNDPzHEZI83J8FEhI0zLJkYQ0K3McSkiTMsWxhDQnMxxMSFMywdGCQnp52Cweyg+XII75DRcR\n0uvjTSlC6ofpjbc4pO2vt4rK7XPQeL66BFHK/ujI8dEaFob067bsvYaN599LEOVQkIzWsSSk\n5/u3hjYPL/FrY7HXcJhVIa1jQUibfUX/262xNhZ7BeXjweSuYUFIpTz8fidsOH9dgkDl0xvR\nPCNNoxyflAgX8DPS/4TUh6Kj9XjVbh5etVtR0DnSnXOk7N6PkMzsSvxmwyRM6br8rt0czOjK\n/Pb3FEzo2oQ0A/O5OiFNwHSuT0jjM5sVCGl4JrMGIY3OXFYhpLE5ga1ESEMzkbUIaWTmsRoh\nDcw01iOkcZnFioQ0LJNYk5BGZQ6rEtKgTGFdQhqTGaxMSCNyDFudkAZk+uoT0njMXgNCGo7J\na0FIozF3TQhpMKauDSGNxcw1IqShmLhWhDQS89aMkMbhGLYhIQ3DpLUkpFGYs6aENAhT1paQ\nxmDGGhPSEExYa0IagflqTkgDMF3tCal/ZisBIfXOMWwKQuqcqcpBSH0zU0kIqWsmKgsh9cw8\npSGkjpmmPITUL7OUiJC6ZZIyEVKvzFEqQuqTY9hkhNQlE5SNkHpkftIRUodMTz5C6o/ZSUhI\n3TE5GQmpN+YmJSF1xtTkJKS+mJmkhNQTx7BpCakjpiUvIfXDrCQmpG6YlMyE1AtzkpqQOmFK\nchNSH8xIckLqggnJTkg9MB/pCSk/x7AdaBLSjzvDznlX9jNlMnogpLwO0+TpqA8VQyr/tcYl\nxnKYBSH1oWJI/9sI6RLl48Fk9KDmt3bbu3L7evgMX32KsyubRfn0RnZ1f0b6VcqvnZ+RzlOO\nT0qkV/nFhtfbcrcV0nmKjvpR/VW7x7J5FtJZvGrXkfovf7/c/PwzkM3zXpCfFrvR4hzpXkg/\nMwV98StCOZmBzggppeknoDtCymj2+++QkBKa/Pa7JKR85r77TgkpnalvvltCymbme++YkHJx\nAtspIaUy7Y13T0iZzHrfAxBSIpPe9hCElMecdz0IIaUx5U0PQ0hZzHjPAxFSEhPe8lCElMN8\ndzwYIWXgGLZ7QkpgstsdkpDam+tuByWk5qa62WEJqbWZ7nVgQmpsolsdmpDamudOByekpqa5\n0eEJqaVZ7nMCQmrHMexAhNTMFDc5DSG1MsM9TkRIjUxwi1MRUhvj3+FkhNTE8Dc4HSG1MPr9\nTUhIDQx+e1MSUn1j392khFSbY9ghCamygW9takKqa9w7m5yQqhr2xqYnpJpGvS+EVNOgt8VO\nSDWNeVccCKmaIW+KD0KqZcR74g8h1eEYdnBCqmK4G+IvQqphtPvhH0KqYLDb4QtCWt9Yd8OX\nhLS6oW6GE4S0tpHuhZOEtLKBboVvCGld49wJ3xLSmhzDTkNIKxrkNjiDkNYzxl1wFiGtZoib\n4ExCWssI98DZhLSSAW6BCwhpHf3fARcR0iq6vwEuJKQ19D5+LiakeI5hJySkSGXfUK+DZwkh\nxTk8E3k6mpOQ4hxGLaQ5CSlM+XjocvAsJKQw5dMbsxFSmHJ8UmI6QopTdDQvIcXxqt3EhBTl\ncIQko1kJKUiHQyaQkGL0N2JCCSlEdwMmmJAi9DZewgkpQGfDZQVCWq6v0bIKIS3W1WBZiZAW\ncnLEnpCW6WekrEpIi3QzUFYmpCV6GSerE9ICnQyTCoR0vT5GSRVCuloXg6QSIV2rhzFSjZCu\n4/iI/xDSVdIPkMqEdI3s46M6IV0h+fBoQEiXyz06mhDSxVIPjkaEdKnMY6MZIV0o8dBoSEgX\ncXzE14R0iazjojkhXSDpsEhASOfLOSpSENLZUg6KJIR0roxjIg0hnSnhkEhESGfxsjffE9I5\nso2HdIR0hmTDISEh/SzXaEhJSD9KNRiSEtJPMo2FtIT0g0RDITEhfS/PSEhNSN9xfMSZhPSN\nJMOgA0I6Lcco6IKQTkoxCDohpFMyjIFuCCnvEOiIkLKOgK4IKecA6IyQvrq8jriQkLJdnS4J\nKdfF6ZSQMl2bbgkpz6XpmJCyXJmuCSnHhemckDJcl+4J6dNVdcS1hNT2ogxCSC2vyTCE1O6S\nDERIra7IUITU5oIMRkgtrsdwhFT/cgyoZkjb+1Junz8+ybefpdbOLvthOD5iuYohbTdl7+79\nkyQI6TAGGRGhYkgP5emtpqfN7eGTZAjp54HAeSqGtHn/wNfNzWuKkMrHg5JYrmJIv9vZ3t5+\nFVL57MpLXDaeT2+wTMWQbsr293u3SZ6Rfj8pwUIVQ3oq9x/vvZbbBCF9xKwjAtR8+fvhTz3P\nP3z35lU7OlP1QPbl7vd7r/fNQ3o/QpIRIab9zQYFEWnWkHREqElD0hGx5gxJRwSbMiQdEW3G\nkHREuAlD0hHx5gtJR6xgtpCcwLKKyUKSEeuYKyQdsZKpQtIRa5kpJB2xmolC0hHrmSckHbGi\naULSEWuaJSQdsao5QnIMy8qmCElGrG2GkHTE6iYISUesb/yQdEQFw4ekI2oYPSQdUcXgIemI\nOsYOSUdUMnJIjmGpZuCQZEQ944akIyoaNiQdUdOoIemIqgYNSUfUNWZIOqKyIUPSEbWNGJKO\nqG68kBzD0sBwIcmIFkYLSUc0MVhIOqKNsULSEY0MFZKOaGWkkHREMwOFpCPaGSckHdHQKCE5\nhqWpQUKSEW2NEZKOaGyIkHREayOEpCOaGyAkHdFe/yHpiAS6D0lHZNB7SDoihb5DcgxLEl2H\nJCOy6DkkHZFGxyHpiDz6DUlHJNJtSDoik15D0hGpdBqSjsilz5B0RDI9huQYlnT6CqnsG5IR\n+fQU0uGZyNMRGXUV0uFBSCTUUUjl40FJ5NNZSOXUX0JTnYVU6fpwoY5C+vhvdURCXYXkVTuy\n6imkj3MkyKevkCApIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEA\nIUEAIUEAIUGApCFBZ67Y5fHhNJP9XoxvmdTjSz24C2W/F+NbJvX4Ug/uQtnvxfiWST2+1IO7\nUPZ7Mb5lUo8v9eAulP1ejG+Z1ONLPbgLZb8X41sm9fhSD+5C2e/F+JZJPb7Ug7tQ9nsxvmVS\njy/14C6U/V6Mb5nU40s9uAtlvxfjWyb1+FIP7kLZ78X4lkk9vtSDu1D2ezG+ZVKPL/XgoBdC\nggBCggBCggBCggBCggBCggBCggBCggBCggBCggBCggBCggBCggBCggBCggDDhPR0UzYP29aj\n+Nb/Mk/2y30p96+tR3HS9mGTe30zr+0lHg7/iMAm8UzvtpvEk/2ce/5eN+/jy1t64rW9xEu5\nf9sDT+W+9UC+cXfNvxZSy2bzstvelYfW4zjh/jCyh8Trm3htL3H3fh+Zt+qvq/7ZnUp+HTbq\ntmxaD+SEkn59847sGokn+rXcJh7dfXlpPYRvfXxXnDb0wULaltvWQzjptrwmDumm7B43h2+P\nc3r8+NbusfVATsq7tld4Ks+th3DKY/mV+fmylLvDD/Otx3HS0/7Vhs1T62GclndtL/e6uWs9\nhFNeyl3qbzzfNunLbnuf9yv+4+FVu7TDGyqk7SbvN3Y3+xeWU4e0/xnptdy0HsgJT/tv7d5C\nz/uUlHdtL3abdRfsf5bff8+ZOqTPf+RzU/Y/vm3Thj5QSK83t4lP6xb8u/NVZD8+yB76OCE9\nJ37BroOQHg9Pma9pJ/H95e+851zDhJR3C3ySNqPDT0fb/c8gv1oP5ISHsv89u4e0v3kxTEj3\nyb/iH2Qe3furYnm/Gt0mH98oIWX/1ukg9eieb8sm79f7t2ejTe7xZV5b6IaQIICQIICQIICQ\nIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQ\nIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQelPK613ZPB7e\nf7opN0+Nx8OBkHpTyqa82Zd0u3+n3LYeETsh9eetnO3uqdzsdr/K5mX3sim/Wg8JIfWnlP8d\nHne7u/L89t6zp6QMhNSbUn4/vr/3+w+asgi9EVJKFqE3QkrJIvTmGNLvn5HuGo+InZD6cwzJ\nq3aJCKk3x5CcIyUipN58Cmn3tPGbDUkICQIICQIICQIICQIICQIICQIICQIICQIICQIICQII\nCQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQII\nCQIICQIICQL8H5sEkT1X9RA0AAAAAElFTkSuQmCC",
+ "text/plain": [
+ "plot without title"
+ ]
+ },
+ "metadata": {
+ "image/png": {
+ "height": 420,
+ "width": 420
+ }
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot(df$A,type = 'o',xlab = \"no\",ylab = \"A\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "id": "41b872c9",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAANICAMAAADKOT/pAAAAM1BMVEUAAABNTU1oaGh8fHyM\njIyampqnp6eysrK9vb2+vr7Hx8fQ0NDZ2dnh4eHp6enw8PD////ojgWfAAAACXBIWXMAABJ0\nAAASdAHeZh94AAAaE0lEQVR4nO3d63LeRrIsUFD3I1ki3/9pt6SxtrWPjYHBTnxINNb6QdOe\nYFVFMzM4shQzywswbDn7AJiBIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGA\nIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGA\nIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGA\nIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGA\nIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGA\nIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAInFly4joIclh8GDL/3s9RYI/\nKRIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIE\nKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIE\nKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIE\nKBIEKBIEKBIEKBIEKBIEKBIEKBIEXL9I394vTx9fXj69WZ4+JA+CHS5fpOen5btPH398XN4m\nL4J/7/JF+rB8/zn04Wl5//zy/PNzOMHli/T08wuX5fnnX55i98Aely/Ssvz18ddf/s9//JvX\nX8eslhG/z7l6kZ5+K9Lzf/+JpEj8TaoAly/Sr18jfXj+8/P8CiamSH/a8W/tFIm/UaRf/v3v\nIykSf6NIlSu4GkWqXMHVKFLlCq5GkSpXcDWKVLmCq1GkyhVcjSJVruBqFKlyBVejSJUruBpF\nqlzB1ShS5QquRpEqV3A1ilS5gqtRpMoVXI0iVa7gahSpcgVXo0iVK7gaRapcwdUoUuUKrkaR\nKldwNYpUuYKrUaTKFVyNIlWu4GoUqXIFV6NIlSu4GkWqXMHVKFLlCq5GkSpXcDWKVLmCq1Gk\nyhVcjSJVruBqFKlyBVejSJUruBpFqlzB1ShS5QquRpEqV3A1ilS5gqtRpMoVXI0iVa7gahSp\ncgVXo0iVK7gaRapcwdUoUuUKrkaRKldwNYpUuYKrUaTKFVyNIlWu4GoUqXIFV6NIlSu4GkWq\nXMHVKFLlCq5GkSpXcDWKVLmCq1GkyhVcjSJVruBqFKlyBVejSJUruBpFqlzB1ShS5QquRpEq\nV3A1ilS5gqtRpMoVXI0iVa7gahSpcgUPsgz5fZAiNa7gQUZye0gBFIkrUqTVl0kOO20FD6JI\nqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2GkreBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWS\nw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JIqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2Gkr\neBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWSw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JI\nqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2GkreBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWS\nw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JIqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2Gkr\neBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWSw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JI\nqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2GkreBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWS\nw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JIqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2Gkr\neBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWSw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JI\nqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2GkreBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWS\nw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JIqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2Gkr\neBBFWn2Z5LDTVvAgirT6Mq/9wucPT98/fnyzLG8/H7SCOoq0+jKv/LpvT8vy8vz9ww9vD1lB\nH0VafZlXft375d3z9w/vv33v1PvlwxEr6KNIqy/z2q9bnv/88P2/5S1PR6ygjyKtvsxrv+7H\nFz4tv/3N//cf/+b11xGyjPh9zkBuFemfvF++vrx8/PHhx0+k//qLJEU6X0NuFemffF2ePnx9\neff0vUlf3ixfjlhBTkNuFekffXn662f/x2NWENOQW0Va8fn9mx8tevfx22ErCGnIrSL1r2BD\nQ24VqX8FGxpyq0j9K9jQkFtF6l/BhobcKlL/CjY05FaR+lewoSG3itS/gg0NuVWk/hVsaMit\nIvWvYENDbhWpfwUbGnKrSP0r2NCQW0XqX8GGhtwqUv8KNjTkVpH6V7ChIbeK1L+CDQ25VaT+\nFWxoyK0i9a9gQ0NuFal/BRsacqtI/SvY0JBbRepfwYaG3CpS/wo2NORWkfpXsKEht4rUv4IN\nDblVpP4VbGjIrSL1r2BDQ24VqX8FGxpyq0j9K9jQkFtF6l/BhobcKlL/CjY05FaR+lewoSG3\nitS/gg0NuVWk/hVsaMitIvWvYENDbhWpfwUbGnKrSP0r2NCQW0XqX8GGhtwqUv8KNjTkVpH6\nV7ChIbeK1L+CDQ25VaT+FWxoyK0i9a9gQ0NuFal/BRsacqtI/SvY0JBbRepfwYaG3CpS/wo2\nNORWkfpXsKEht4rUv4INDblVpP4VbGjIrSL1r2BDQ24VqX8FGxpyq0j9K9jQkFtF6l/Bhobc\nKlL/CjY05FaR+lewoSG3itS/gg0NuVWk/hVsaMitIvWvYENDbhWpfwUbGnKrSP0r2NCQW0Xq\nX8GGhtwqUv8KNjTkVpH6V7ChIbeK1L+CDQ25VaT+FbNahvw+qCC3itS/YlZT5bbuIEW6jaly\nW3eQIt3GVLmtO0iRbmOq3NYdpEi3MVVu6w5SpNuYKrd1BynSbUyV27qDFOk2pspt3UGKdBtT\n5bbuIEW6jalyW3eQIt3GVLmtO0iRbmOq3NYdpEi3MVVu6w5SpNuYKrd1BynSbUyV27qDFOk2\npspt3UGKdBtT5bbuIEW6jalyW3eQIt3GVLmtO0iRbmOq3NYdpEi3MVVu6w5SpNuYKrd1BynS\nbUyV27qDFOk2pspt3UGKdBtT5bbuIEW6jalyW3eQIt3GVLmtO0iRbmOq3NYdpEi3MVVu6w5S\npNuYKrd1Bx1epK8fno5ewb8yVW7rDjq2SN8+vlkWReowVW7rDjqwSM+fv7doefsluUGRXm+q\n3NYddFiRPr/9+X/09i05/0WRBkyV27qDjinSl/ffO/T04esSz70ivdpUua076JAiPf1o0R8/\n/oEi9Zgqt3UHHVKkZfnw65Pk+N9XsNtUua07yE+k25gqt3UHHfprpD8UqchUua07yL+1u42p\nclt30OG/j/TO7yOVmCq3dQf5kw23MVVu6w7yZ+1uY6rc1h3kT3/fxlS5rTtIkW5jqtzWHaRI\ntzFVbusOUqTbmCq3dQcp0m1Mldu6gxTpNqbKbd1BinQbU+W27iBFuo2pclt3kCLdxlS5rTtI\nkW5jqtzWHaRItzFVbusOUqTbmCq3dQcp0m1Mldu6gxTpNqbKbd1BinQbU+W27iBFuo2pclt3\nkCLdxlS5rTtIkW5jqtzWHaRItzFVbusOUqTbmCq3dQeVFWnzf1FSkV5tqtzWHaRItzFVbusO\nqijS8n8dsYK5clt3UEWR/nhSpONNldu6gyqK9PL8bnn7838l/B9b9K9bNqdlxO9zBmJSl9u6\ngzqK9PLyeVk+v/g10j9piEldbusOainSy7e3y7tnRfoHDTGpy23dQTVFenn5uDx9UaS/a4hJ\nXW7rDioq0svXN9u/BlKkU2JSl9u6g5qK9PLyXpH+riEmdbmtO6irSBUr2jTEpC63dQcpUr2G\nmNTltu4gRarXEJO63NYdpEj1GmJSl9u6gxSpXkNM6nJbd5Ai1WuISV1u6w5SpHoNManLbd1B\nilSvISZ1ua07SJHqNcSkLrd1BylSvYaY1OW27iBFqtcQk7rc1h2kSPUaYlKX27qDFKleQ0zq\nclt3kCLVa4hJXW7rDlKkeg0xqctt3UGKVK8hJnW5rTtIkeo1xKQut3UHKVK9hpjU5bbuIEWq\n1xCTutzWHaRI9RpiUpfbuoMUqV5DTOpyW3eQItVriEldbusOUqR6DTGpy23dQYpUryEmdbmt\nO0iR6jXEpC63dQcpUr2GmNTltu4gRarXEJO63NYdpEj1GmJSl9u6gxSpXkNM6nJbd5Ai1WuI\nSV1u6w5SpHoNManLbd1BilSvISZ1ua07SJHqNcSkLrd1BylSvYaY1OW27iBFqtcQk7rc1h2k\nSPUaYlKX27qDFKleQ0zqclt3kCLVa4hJXW7rDlKkeg0xqctt3UGKVK8hJnW5rTtIkeo1xKQu\nt3UHKVK9hpjU5bbuIEWq1xCTutzWHaRI9RpiUpfbuoMUqV5DTOpyW3eQItVriEldbusOUqR6\nDTGpy23dQYpUryEmdbmtO0iR6jXEpC63dQcpUr2GmNTltu4gRarXEJO63NYdpEj1GmJSl9u6\ngxSpXkNM6nJbd5Ai1WuISV1u6w5SpHoNManLbd1BilSvISZ1ua07SJHqNcSkLrd1BylSvYaY\n1OW27iBFqtcQk7rc1h2kSPUaYlKX27qDFKleQ0zqclt3kCLVa4hJXW7rDlKkeg0xqctt3UGK\ndJRlyO+DCmJSl9u6gxTpKFPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc5\n7LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxn\nJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvm\njGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRB\ne+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPF\nxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUh\nU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0\nFSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc5\n7LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxn\nJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvm\njGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRB\ne+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPF\nxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUh\nU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0\nFSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc5\n7LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxn\nJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZ+e1X/j8flnefvlzyH+dokinzHHQnjnj2Xnl\n1z0/LT+8+88QReqLiYP2zBnPziu/7sPy6XubPj29/TlEkfpi4qA9c8az88qve/rPF357evNN\nkQ779jroUXPGs/Par/vzC5/fvv2nIi2/Gzjv3x4z4Pc5A9+Vupg4aM+cYa8d9mZ5/vXZ29N/\nIjV8V+pi4qA9c4a9dtin5f2fn31b3ipScJCDzpgz7NXDPvxve75s/Lc3RXJQ5UEdRXr5+u7X\nZ9/eK1JfTBy0Z86wKf5kQ8N3pS4mDtozZ5gihebUxcRBe+YMU6TQnLqYOGjPnGGKFJpTFxMH\n7ZkzTJFCc+pi4qA9c4YpUmhOXUwctGfOMEUKzamLiYP2zBmmSKE5dTFx0J45wxQpNKcuJg7a\nM2eYIoXm1MXEQXvmDFOk0Jy6mDhoz5xhihSaUxcTB+2ZM0yRQnPqYuKgPXOGKVJoTl1MHLRn\nzjBFCs2pi4mD9swZpkihOXUxcdCeOcMUKTSnLiYO2jNnmCKF5tTFxEF75gxTpNCcupg4aM+c\nYYoUmlMXEwftmTNMkUJz6mLioD1zhilSaE5dTBy0Z84wRQrNqYuJg/bMGaZIoTl1MXHQnjnD\nFCk0py4mDtozZ5gihebUxcRBe+YMU6TQnLqYOGjPnGGKFJpTFxMH7ZkzTJFCc+pi4qA9c4Yp\nUmhOXUwctGfOMEUKzamLiYP2zBmmSKE5dTFx0J45wxQpNKcuJg7aM2eYIoXm1MXEQXvmDFOk\n0Jy6mDhoz5xhihSaUxcTB+2ZM0yRQnPqYuKgPXOGKVJoTl1MHLRnzjBFCs2pi4mD9swZpkih\nOXUxcdCeOcMUKTSnLiYO2jNnmCKF5tTFxEF75gxTpNCcupg4aM+cYYoUmlMXEwftmTNMkUJz\n6mLioD1zhilSaE5dTBy0Z84wRQrNqYuJg/bMGaZIoTl1MXHQnjnDFCk0py4mDtozZ5gihebU\nxcRBe+YMU6TQnLqYOGjPnGGKFJpTFxMH7ZkzTJFCc+pi4qA9c4YpUmhOXUwctGfOMEUKzamL\niYP2zBmmSKE5dTFx0J45wxQpNKcuJg7aM2eYIoXm1MXEQXvmDFOk0Jy6mDhoz5xhihSaUxcT\nB+2ZM0yRQnPqYuKgPXOGKVJoTl1MHLRnzjBFCs2pi4mD9swZdmaRliG/Dyr4rtTFxEF75gw7\ntUgDj1D3XXHQ5Q5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiR\njnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQ\nrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU6\n6jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3\nOkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo\n13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93q\nIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe\n00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuD\nFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpN\nB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qoJIi/fHx3fLDuw9/vHLFVN8V\nB13uoIoiPb9Z/vL2dSum+q446HIHVRTpw/L0+evPz759eVo+vGrFVN8VB13uoIoiPS1f//fz\nr8vTq1ZM9V1x0OUOqijSsqz9zZ//5DfrM0akBjnotgetBfM1HvATCeY38GukL99+frb5aySY\n36t/vL397Ufkm+fkSXA9A7+P9OHn7yM9vfu48ftIML8H/MkGmJ8iQYAiQYAiQYAiQYAiQYAi\nQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAi\nQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAi\nQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAi\nQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAi\nQYAiQYAiQYAiQYAiXc2yfHu3PH38+fmnN8ubTyffw0+KdDXL8rR896NJb398srw9+yJeFOl6\nvjfn+eXT8ubl5fPy9PXl69Py+eyTUKTrWZY/fn58eXm3fPn+2Rc/khoo0tUsy6+P//ns1184\nlW/C1ShSJd+Eq1GkSr4JV/NXkX79GundyRfxokjX81eR/Fu7Iop0NX8Vye8jFVGkq/mtSC+f\nnvzJhhKKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGK\nBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAH/Ayzv44rlEgIU\nAAAAAElFTkSuQmCC",
+ "text/plain": [
+ "plot without title"
+ ]
+ },
+ "metadata": {
+ "image/png": {
+ "height": 420,
+ "width": 420
+ }
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "barplot(df$A, ylab = 'A',xlab = 'no')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "11001454",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "670db495",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "4.1.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
From 02921e0c97393b1d8004cc3b21afbf52d376da7b Mon Sep 17 00:00:00 2001
From: Kaushal Joshi <53049546+joshi-kaushal@users.noreply.github.com>
Date: Sun, 10 Oct 2021 22:30:49 +0530
Subject: [PATCH 078/319] Updared README.hi.md - 2
---
1-Introduction/03-defining-data/translations/README.hi.md | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/1-Introduction/03-defining-data/translations/README.hi.md b/1-Introduction/03-defining-data/translations/README.hi.md
index ef6b2932..49c16833 100644
--- a/1-Introduction/03-defining-data/translations/README.hi.md
+++ b/1-Introduction/03-defining-data/translations/README.hi.md
@@ -1,5 +1,5 @@
# डेटा का अवलोकन
-| ](../../sketchnotes/03-DefiningData.png)|
+| ](../../../sketchnotes/03-DefiningData.png)|
|:---:|
|Defining Data - _Sketchnote by [@nitya](https://twitter।com/nitya)_ |
@@ -16,3 +16,7 @@
### परिमाणात्मक डेटा
परिमाणात्मक डेटा मतलब डेटासेट मे उपलब्ध होने वाला ऐसा संख्यात्मक डेटा जिसका इस्तमाल विश्लेषण,मापन और गणितीय चीजोंकेलिए हो सकता है। परिमाणात्मक डेटा के यह कुछ उदाहरण है: देश की जनसंख्या, इंसान की ऊंचाई या कंपनी की तिमाही कमाई। थोड़े अधिक विश्लेषण के बाद परिणामात्मक डेटा से मौसम के अनुसार वायु गुणवत्ता सूचकांक(Air Quality Index) के बदलाव पता करना या फिर किसी सामान्य कार्यदिवस पर भीड़भाड़ वाले घंटे के ट्रैफिक की संभावना का अनुमान लगना मुमकिन है
+
+### गुणात्मक डेटा
+गुणात्मक डेटा, जिसे वर्गीकृत देता भी कहा जाता है, यह एक डेटा का ऐसा प्रकार है जिसे परिमाणात्मक डेटा की तरह वस्तुनिष्ठ तरहसे मापा नहीं जा सकता। यह आम तौर पर अलग अलग प्रकार का आत्मनिष्ठ डेटा होता है जिस से किसी उत्पादन या प्रक्रिया की गुणवत्ता। कभी कभार गुणात्मक डेटा सांखिक स्वरुपमैं होके भी गणितीय कारणों के लिए इस्तमल नहीं किया जा सकता, जैसे की फोन नंबर या समय।
+गुणात्मक डेटा के यह कुछ उदाहरण हो सकते है: विडिओकी टिप्पणियाँ, आपके करीबी दोस्त के गाड़ी के पसंदिता रंग का नमूना बनाना। गुणात्मक डेटा का इस्तमाल करके ग्राहकोंको कोनसा उत्पादन सबसे ज्यादा पसंद आ रहा है या फिर नौकरी आवेदन के रिज्यूमे मैं सबसे ज्यादा इस्तमाल होने वाले शब्द ढूंढ़ना।
From 4ff6ee6b2ece8e2bd337d6b2020256b5ee8ea918 Mon Sep 17 00:00:00 2001
From: Fernanda Kawasaki <50497814+fernandakawasaki@users.noreply.github.com>
Date: Sun, 10 Oct 2021 15:15:00 -0300
Subject: [PATCH 079/319] Add translations up to 'filter your data' section
---
.../translations/README.pt-br.md | 49 ++++++++++---------
1 file changed, 26 insertions(+), 23 deletions(-)
diff --git a/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md b/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md
index afa601e3..e777ac07 100644
--- a/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md
+++ b/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md
@@ -11,23 +11,24 @@ Nessa aula você irá explorar como usar uma das muitas bibliotecas disponíveis
Uma biblioteca excelente para criar gráficos simples e sofisticados de diversos tipos é o [Matplotlib](https://matplotlib.org/stable/index.html). Em geral, o processo de plotar dados com essas bibliotecas inclui identificar as partes do seu dataframe que você quer focar, utilizando quaisquer transformações necessárias nestes dados, atribuindo seus valores dos eixos x e y, decidindo qual tipo de gráfico mostrar, e então mostrando o gráfico. O Matplotlib oferece uma grande variedade de visualizações, mas, nesta aula, iremos focar nos mais apropriados para visualizar quantidade: gráfico de linha, gráfico de dispersão e gráfico de barra.
-> ✅ Use the best chart to suit your data's structure and the story you want to tell.
-> - To analyze trends over time: line
-> - To compare values: bar, column, pie, scatterplot
-> - To show how parts relate to a whole: pie
-> - To show distribution of data: scatterplot, bar
-> - To show trends: line, column
-> - To show relationships between values: line, scatterplot, bubble
+> ✅ Use o melhor gráfico para se adaptar a estrutura dos dados e a história que você quer contar.
+> - Para analisar tendências ao longo do tempo: linha
+> - Para comparar valores: barra, coluna, pizza, dispersão
+> - Para mostrar como as partes se relacionam com o todo: pizza
+> - Para mostrar a distrivuição dos dados: dispersão, barra
+> - Para mostrar tendências: linha, coluna
+> - Para mostrar relações entre valores: linha, dispersão, bolha
-If you have a dataset and need to discover how much of a given item is included, one of the first tasks you have at hand will be to inspect its values.
+Se você tem um dataset e precisa descobrir quanto de um dado item está presente, uma das primeiras coisas que você precisará fazer é examinar seus valores.
-✅ There are very good 'cheat sheets' available for Matplotlib [here](https://github.com/matplotlib/cheatsheets/blob/master/cheatsheets-1.png) and [here](https://github.com/matplotlib/cheatsheets/blob/master/cheatsheets-2.png).
-## Build a line plot about bird wingspan values
+✅ Existem dicas ('cheat sheets') muito boas disponíveis para o Matplotlib [aqui](https://github.com/matplotlib/cheatsheets/blob/master/cheatsheets-1.png) e [aqui](https://github.com/matplotlib/cheatsheets/blob/master/cheatsheets-2.png).
-Open the `notebook.ipynb` file at the root of this lesson folder and add a cell.
+## Construir um gráfico de linhas sobre os valores de envergadura de pássaros
-> Note: the data is stored in the root of this repo in the `/data` folder.
+Abra o arquivo `notebook.ipynb` na raiz da pasta dessa aula e adicione uma célula.
+
+> Nota: os dados estão armazenados na raiz deste repositório na pasta `/data`.
```python
import pandas as pd
@@ -35,7 +36,7 @@ import matplotlib.pyplot as plt
birds = pd.read_csv('../../data/birds.csv')
birds.head()
```
-This data is a mix of text and numbers:
+Esses dados são uma mistura de texto e números:
| | Name | ScientificName | Category | Order | Family | Genus | ConservationStatus | MinLength | MaxLength | MinBodyMass | MaxBodyMass | MinWingspan | MaxWingspan |
@@ -46,19 +47,19 @@ This data is a mix of text and numbers:
| 3 | Ross's goose | Anser rossii | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 57.3 | 64 | 1066 | 1567 | 113 | 116 |
| 4 | Greater white-fronted goose | Anser albifrons | Ducks/Geese/Waterfowl | Anseriformes | Anatidae | Anser | LC | 64 | 81 | 1930 | 3310 | 130 | 165 |
-Let's start by plotting some of the numeric data using a basic line plot. Suppose you wanted a view of the maximum wingspan for these interesting birds.
+Vamos começar plotando alguns dados numéricos com um simples gráfico de linhas. Suponha que você quer uma visualização da envergadura máxima desses pássaros interessantes.
```python
wingspan = birds['MaxWingspan']
wingspan.plot()
```
-
+
-What do you notice immediately? There seems to be at least one outlier - that's quite a wingspan! A 2300 centimeter wingspan equals 23 meters - are there Pterodactyls roaming Minnesota? Let's investigate.
+O que é possível perceber imediatamente? Aparentemente existe pelo menos um outlier - e que envergadura! Uma envergadura de 2300 centímetros equivale a 23 metros - têm pterodáctilos voando em Minnesota? Vamos investigar.
-While you could do a quick sort in Excel to find those outliers, which are probably typos, continue the visualization process by working from within the plot.
+Você poderia fazer uma ordenação rápida no Excel para encontrar esses outliers, que provavelmente são erros de digitação. No entanto, vamos continuar o processo de visualização trabalhando no gráfico.
-Add labels to the x-axis to show what kind of birds are in question:
+Adicione labels (identificadores) no eixo x para mostrar quais tipos de pássaros estão sendo analisados:
```
plt.title('Max Wingspan in Centimeters')
@@ -72,9 +73,9 @@ plt.plot(x, y)
plt.show()
```
-
+
-Even with the rotation of the labels set to 45 degrees, there are too many to read. Let's try a different strategy: label only those outliers and set the labels within the chart. You can use a scatter chart to make more room for the labeling:
+Mesmo com a rotação das labels em 45 graus, existem muitos para ler. Vamos tentar outra estratégia: identificar somente os outliers e colocar as labels dentro do gráfico. Você pode usarj um gráfico de dispersão para abrir mais espaço para identificação:
```python
plt.title('Max Wingspan in Centimeters')
@@ -90,12 +91,14 @@ for i in range(len(birds)):
plt.show()
```
-What's going on here? You used `tick_params` to hide the bottom labels and then created a loop over your birds dataset. Plotting the chart with small round blue dots by using `bo`, you checked for any bird with a maximum wingspan over 500 and displayed their label next to the dot if so. You offset the labels a little on the y axis (`y * (1 - 0.05)`) and used the bird name as a label.
-What did you discover?
+O que aconteceu aqui? Você usou `tick_params` para esconder as labels debaixo e entrão criou um loop sobre o dataset dos paśsaros. Depois, plotou o gráfico com pequenos círculos azuis usando `bo` e procurou por pássaros com envergadura maior que 500 e, se sim, exibiu a label ao lado do círculo. Você ajustou as labels no eixo y (`y * (1 - 0.05)`) e usou o nome do pássaro como label.
+
+O que você descobriu?

-## Filter your data
+
+## Filtrar seus dados
Both the Bald Eagle and the Prairie Falcon, while probably very large birds, appear to be mislabeled, with an extra `0` added to their maximum wingspan. It's unlikely that you'll meet a Bald Eagle with a 25 meter wingspan, but if so, please let us know! Let's create a new dataframe without those two outliers:
From 3b8aefd5fb937b705dd6783fc4ae5b2a9905d81f Mon Sep 17 00:00:00 2001
From: Fernanda Kawasaki <50497814+fernandakawasaki@users.noreply.github.com>
Date: Sun, 10 Oct 2021 15:18:07 -0300
Subject: [PATCH 080/319] update images path
---
.../translations/README.pt-br.md | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md b/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md
index e777ac07..9e620a8e 100644
--- a/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md
+++ b/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md
@@ -1,6 +1,6 @@
# Visualizando Quantidades
-| ](../../sketchnotes/09-Visualizing-Quantities.png)|
+| ](../../../sketchnotes/09-Visualizing-Quantities.png)|
|:---:|
| Visualizando quantidades - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
@@ -53,7 +53,7 @@ Vamos começar plotando alguns dados numéricos com um simples gráfico de linha
wingspan = birds['MaxWingspan']
wingspan.plot()
```
-
+
O que é possível perceber imediatamente? Aparentemente existe pelo menos um outlier - e que envergadura! Uma envergadura de 2300 centímetros equivale a 23 metros - têm pterodáctilos voando em Minnesota? Vamos investigar.
@@ -73,7 +73,7 @@ plt.plot(x, y)
plt.show()
```
-
+
Mesmo com a rotação das labels em 45 graus, existem muitos para ler. Vamos tentar outra estratégia: identificar somente os outliers e colocar as labels dentro do gráfico. Você pode usarj um gráfico de dispersão para abrir mais espaço para identificação:
@@ -96,7 +96,7 @@ O que aconteceu aqui? Você usou `tick_params` para esconder as labels debaixo e
O que você descobriu?
-
+
## Filtrar seus dados
@@ -117,7 +117,7 @@ plt.show()
By filtering out outliers, your data is now more cohesive and understandable.
-
+
Now that we have a cleaner dataset at least in terms of wingspan, let's discover more about these birds.
@@ -143,7 +143,7 @@ birds.plot(x='Category',
title='Birds of Minnesota')
```
-
+
This bar chart, however, is unreadable because there is too much non-grouped data. You need to select only the data that you want to plot, so let's look at the length of birds based on their category.
@@ -158,7 +158,7 @@ category_count = birds.value_counts(birds['Category'].values, sort=True)
plt.rcParams['figure.figsize'] = [6, 12]
category_count.plot.barh()
```
-
+
This bar chart shows a good view of the number of birds in each category. In a blink of an eye, you see that the largest number of birds in this region are in the Ducks/Geese/Waterfowl category. Minnesota is the 'land of 10,000 lakes' so this isn't surprising!
@@ -174,7 +174,7 @@ plt.barh(y=birds['Category'], width=maxlength)
plt.rcParams['figure.figsize'] = [6, 12]
plt.show()
```
-
+
Nothing is surprising here: hummingbirds have the least MaxLength compared to Pelicans or Geese. It's good when data makes logical sense!
@@ -192,7 +192,7 @@ plt.show()
```
In this plot, you can see the range per bird category of the Minimum Length and Maximum length. You can safely say that, given this data, the bigger the bird, the larger its length range. Fascinating!
-
+
## 🚀 Challenge
From 3d4a7c588acd16954bc1ad2b0214baea1aad453c Mon Sep 17 00:00:00 2001
From: Fernanda Kawasaki <50497814+fernandakawasaki@users.noreply.github.com>
Date: Sun, 10 Oct 2021 21:31:03 -0300
Subject: [PATCH 081/319] Fully translated text
Fix links afterwards (link to the translated files, after they are created)
---
.../translations/README.pt-br.md | 64 ++++++++++---------
1 file changed, 34 insertions(+), 30 deletions(-)
diff --git a/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md b/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md
index 9e620a8e..90838098 100644
--- a/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md
+++ b/3-Data-Visualization/09-visualization-quantities/translations/README.pt-br.md
@@ -100,7 +100,7 @@ O que você descobriu?
## Filtrar seus dados
-Both the Bald Eagle and the Prairie Falcon, while probably very large birds, appear to be mislabeled, with an extra `0` added to their maximum wingspan. It's unlikely that you'll meet a Bald Eagle with a 25 meter wingspan, but if so, please let us know! Let's create a new dataframe without those two outliers:
+Apesar de grandes, tanto a Bald Eagle e o Prairie Falcon parecem ter valores errados, com um `0` a mais na envergadura máxima. É imporvável que você encontra uma Bald Eagle com envergadura de 25 metros, mas, se encontrar, por favor nos diga! Agora, vamos criar um novo dataframe sem esses dois outliers:
```python
plt.title('Max Wingspan in Centimeters')
@@ -115,26 +115,27 @@ for i in range(len(birds)):
plt.show()
```
-By filtering out outliers, your data is now more cohesive and understandable.
+Ao remover esses outliers, seus dados ficaram mais coesos e compreensíveis.
-
+
-Now that we have a cleaner dataset at least in terms of wingspan, let's discover more about these birds.
+Agora que temos um dataset mais limpo ao menos em termos de envergadura, vamos aprender mais sobre esses pássaros
-While line and scatter plots can display information about data values and their distributions, we want to think about the values inherent in this dataset. You could create visualizations to answer the following questions about quantity:
+Enquanto gráficos de linha e dispersão conseguem mostrar informações sobre valores e suas distribuições, nós queremos pensar sobre os valores intrínsecos a esse dataset. Você poderia criar visualizações para responder as seguintes perguntas sobre quantidade:
-> How many categories of birds are there, and what are their numbers?
-> How many birds are extinct, endangered, rare, or common?
-> How many are there of the various genus and orders in Linnaeus's terminology?
-## Explore bar charts
+> Quantas categorias de pássaros existem, e quais são seus números?
+> Quantos pássaros estão extintos, em risco de extinção, raros ou comuns?
+> Quantos gêneros e ordens da taxonomia de Lineu (nome científico) existem no dataset?
-Bar charts are practical when you need to show groupings of data. Let's explore the categories of birds that exist in this dataset to see which is the most common by number.
+## Explorar gráfico de barras
-In the notebook file, create a basic bar chart
+Gráfico de barras são práticos quando se precisa mostrar agrupamentos de dados. Vamos explorar as categorias de pássaros que existem nesse dataset para obrservar qual é o mais comum em quantidade.
-✅ Note, you can either filter out the two outlier birds we identified in the previous section, edit the typo in their wingspan, or leave them in for these exercises which do not depend on wingspan values.
+No arquivo notebook, crie um gráfico de barras simples
-If you want to create a bar chart, you can select the data you want to focus on. Bar charts can be created from raw data:
+✅ Note que, você pode remover os dois pássaros outliers que foram identificados anteriormente, editar o erro de digitação na envergadura ou deixá-los nesses exercícios que não dependem dos valores da envergadura.
+
+Se você quer criar um gráfico de barras, você pode selecionar os dados que quer focar. Gráfico de barras pode ser criado a partir de dados brutos:
```python
birds.plot(x='Category',
@@ -145,13 +146,13 @@ birds.plot(x='Category',
```

-This bar chart, however, is unreadable because there is too much non-grouped data. You need to select only the data that you want to plot, so let's look at the length of birds based on their category.
+No entanto, esse gráfico de barras é ilegível porque existem muitos dados não agrupados. Você precisa selecionar somente os dados que quer plotar, então vamos olhar o comprimento de pássaros com base na sua categoria.
-Filter your data to include only the bird's category.
+Filtre os dados para incluir somente a categoria do pássaro.
-✅ Notice that that you use Pandas to manage the data, and then let Matplotlib do the charting.
+✅ Note que você usa o Pandas para lidar com os dados, e deixa a criação de gráficos para o Matplotlib.
-Since there are many categories, you can display this chart vertically and tweak its height to account for all the data:
+Já que existem muitas categorias, você pode mostrar esse gráfico verticalmente e ajustar sua altura para acomodar todos os dados:
```python
category_count = birds.value_counts(birds['Category'].values, sort=True)
@@ -160,13 +161,13 @@ category_count.plot.barh()
```

-This bar chart shows a good view of the number of birds in each category. In a blink of an eye, you see that the largest number of birds in this region are in the Ducks/Geese/Waterfowl category. Minnesota is the 'land of 10,000 lakes' so this isn't surprising!
+Esse gráfico de barras mostra uma boa visão do número de pássaros em cada categoria. Em um piscar de olhos, você vê que a maior quantidade de pássaros nessa região pertence à categoria de Ducks/Geese/Waterfowl (patos/gansos/cisnes). Minnesota é 'a terra de 10.000 lagos', então isso não é surpreendente!
-✅ Try some other counts on this dataset. Does anything surprise you?
+✅ Tente contar outras quantidades nesse dataset. Algo te surpreende?
-## Comparing data
+## Comparando dados
-You can try different comparisons of grouped data by creating new axes. Try a comparison of the MaxLength of a bird, based on its category:
+Você pode tentar diferentes comparações de dados agrupados criando novos eixos. Tente comparar o comprimento máximo de um pássaro, baseado na sua categoria:
```python
maxlength = birds['MaxLength']
@@ -176,9 +177,9 @@ plt.show()
```

-Nothing is surprising here: hummingbirds have the least MaxLength compared to Pelicans or Geese. It's good when data makes logical sense!
+Nada é surpreendente aqui: hummingbirds (beija-flores) tem o menor comprimento comparados com pelicans (pelicanos) ou geese (gansos). É muito bom quando os dados fazem sentido!
-You can create more interesting visualizations of bar charts by superimposing data. Let's superimpose Minimum and Maximum Length on a given bird category:
+Você pode criar visualizações mais interessantes de gráficos de barras ao sobrepor dados. Vamos sobrepor o comprimento mínimo e máximo de uma dada categoria de pássaros:
```python
minLength = birds['MinLength']
@@ -190,18 +191,21 @@ plt.barh(category, minLength)
plt.show()
```
-In this plot, you can see the range per bird category of the Minimum Length and Maximum length. You can safely say that, given this data, the bigger the bird, the larger its length range. Fascinating!
+
+Nesse gráfico, você pode ver o intervalo de comprimento mínimo e máximo por categoria de pássaro. Você pode seguramente dizer, a partir desses dados, que quanto maior o pássaro, maior seu intervalo de comprimento. Fascinante!

-## 🚀 Challenge
+## 🚀 Desafio
+
+Esse dataset de pássaros oferece uma riqueza de informações sobre os diferentes tipos de pássaros de um ecossistema particular. Tente achar na internet outros datasets com dados sobre pássaros. Pratique construir gráficos sobre esses pássaros e tente descobrir fatos que você ainda não havia percebido.
+
+## [Quiz pós-aula](https://red-water-0103e7a0f.azurestaticapps.net/quiz/17)
-This bird dataset offers a wealth of information about different types of birds within a particular ecosystem. Search around the internet and see if you can find other bird-oriented datasets. Practice building charts and graphs around these birds to discover facts you didn't realize.
-## [Post-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/17)
+## Revisão e autoestudo
-## Review & Self Study
+Essa primeira aula lhe deu informações sobre como usar o Matplotlib para visualizar quantidades. Procure por outras formas de trabalhar com dataset para visualização. [Plotly](https://github.com/plotly/plotly.py) é uma que não será abordada nas aulas, então dê uma olhada no que ela pode oferecer.
-This first lesson has given you some information about how to use Matplotlib to visualize quantities. Do some research around other ways to work with datasets for visualization. [Plotly](https://github.com/plotly/plotly.py) is one that we won't cover in these lessons, so take a look at what it can offer.
-## Assignment
+## Tarefa
[Lines, Scatters, and Bars](assignment.md)
From 5eb0795db32f92be987082cd1bfaaf063609790f Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Sun, 10 Oct 2021 21:25:51 -0500
Subject: [PATCH 082/319] Revert "feat: Module 1 section 4 - Add file content
to be translated"
This reverts commit def34abe88b682ab5faa37ebee9cbbc4897633cf.
---
.../translations/README.es.md | 263 ------------------
1 file changed, 263 deletions(-)
diff --git a/1-Introduction/04-stats-and-probability/translations/README.es.md b/1-Introduction/04-stats-and-probability/translations/README.es.md
index 3a4a4ae9..e69de29b 100644
--- a/1-Introduction/04-stats-and-probability/translations/README.es.md
+++ b/1-Introduction/04-stats-and-probability/translations/README.es.md
@@ -1,263 +0,0 @@
-# A Brief Introduction to Statistics and Probability
-
-| ](../../sketchnotes/04-Statistics-Probability.png)|
-|:---:|
-| Statistics and Probability - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
-
-Statistics and Probability Theory are two highly related areas of Mathematics that are highly relevant to Data Science. It is possible to operate with data without deep knowledge of mathematics, but it is still better to know at least some basic concepts. Here we will present a short introduction that will help you get started.
-
-[](https://youtu.be/Z5Zy85g4Yjw)
-
-
-## [Pre-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/6)
-
-## Probability and Random Variables
-
-**Probability** is a number between 0 and 1 that expresses how probable an **event** is. It is defined as a number of positive outcomes (that lead to the event), divided by total number of outcomes, given that all outcomes are equally probable. For example, when we roll a dice, the probability that we get an even number is 3/6 = 0.5.
-
-When we talk about events, we use **random variables**. For example, the random variable that represents a number obtained when rolling a dice would take values from 1 to 6. Set of numbers from 1 to 6 is called **sample space**. We can talk about the probability of a random variable taking a certain value, for example P(X=3)=1/6.
-
-The random variable in previous example is called **discrete**, because it has a countable sample space, i.e. there are separate values that can be enumerated. There are cases when sample space is a range of real numbers, or the whole set of real numbers. Such variables are called **continuous**. A good example is the time when the bus arrives.
-
-## Probability Distribution
-
-In the case of discrete random variables, it is easy to describe the probability of each event by a function P(X). For each value *s* from sample space *S* it will give a number from 0 to 1, such that the sum of all values of P(X=s) for all events would be 1.
-
-The most well-known discrete distribution is **uniform distribution**, in which there is a sample space of N elements, with equal probability of 1/N for each of them.
-
-It is more difficult to describe the probability distribution of a continuous variable, with values drawn from some interval [a,b], or the whole set of real numbers ℝ. Consider the case of bus arrival time. In fact, for each exact arrival time $t$, the probability of a bus arriving at exactly that time is 0!
-
-> Now you know that events with 0 probability happen, and very often! At least each time when the bus arrives!
-
-We can only talk about the probability of a variable falling in a given interval of values, eg. P(t1≤X<t2). In this case, probability distribution is described by a **probability density function** p(x), such that
-
-![P(t_1\le X1, x2, ..., xn. We can define **mean** (or **arithmetic average**) value of the sequence in the traditional way as (x1+x2+xn)/n. As we grow the size of the sample (i.e. take the limit with n→∞), we will obtain the mean (also called **expectation**) of the distribution. We will denote expectation by **E**(x).
-
-> It can be demonstrated that for any discrete distribution with values {x1, x2, ..., xN} and corresponding probabilities p1, p2, ..., pN, the expectation would equal to E(X)=x1p1+x2p2+...+xNpN.
-
-To identify how far the values are spread, we can compute the variance σ2 = ∑(xi - μ)2/n, where μ is the mean of the sequence. The value σ is called **standard deviation**, and σ2 is called a **variance**.
-
-## Mode, Median and Quartiles
-
-Sometimes, mean does not adequately represent the "typical" value for data. For example, when there are a few extreme values that are completely out of range, they can affect the mean. Another good indication is a **median**, a value such that half of data points are lower than it, and another half - higher.
-
-To help us understand the distribution of data, it is helpful to talk about **quartiles**:
-
-* First quartile, or Q1, is a value, such that 25% of the data fall below it
-* Third quartile, or Q3, is a value that 75% of the data fall below it
-
-Graphically we can represent relationship between median and quartiles in a diagram called the **box plot**:
-
-
-
-Here we also compute **inter-quartile range** IQR=Q3-Q1, and so-called **outliers** - values, that lie outside the boundaries [Q1-1.5*IQR,Q3+1.5*IQR].
-
-For finite distribution that contains a small number of possible values, a good "typical" value is the one that appears the most frequently, which is called **mode**. It is often applied to categorical data, such as colors. Consider a situation when we have two groups of people - some that strongly prefer red, and others who prefer blue. If we code colors by numbers, the mean value for a favorite color would be somewhere in the orange-green spectrum, which does not indicate the actual preference on neither group. However, the mode would be either one of the colors, or both colors, if the number of people voting for them is equal (in this case we call the sample **multimodal**).
-## Real-world Data
-
-When we analyze data from real life, they often are not random variables as such, in a sense that we do not perform experiments with unknown result. For example, consider a team of baseball players, and their body data, such as height, weight and age. Those numbers are not exactly random, but we can still apply the same mathematical concepts. For example, a sequence of people's weights can be considered to be a sequence of values drawn from some random variable. Below is the sequence of weights of actual baseball players from [Major League Baseball](http://mlb.mlb.com/index.jsp), taken from [this dataset](http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_MLB_HeightsWeights) (for your convenience, only first 20 values are shown):
-
-```
-[180.0, 215.0, 210.0, 210.0, 188.0, 176.0, 209.0, 200.0, 231.0, 180.0, 188.0, 180.0, 185.0, 160.0, 180.0, 185.0, 197.0, 189.0, 185.0, 219.0]
-```
-
-> **Note**: To see the example of working with this dataset, have a look at the [accompanying notebook](notebook.ipynb). There are also a number of challenges throughout this lesson, and you may complete them by adding some code to that notebook. If you are not sure how to operate on data, do not worry - we will come back to working with data using Python at a later time. If you do not know how to run code in Jupyter Notebook, have a look at [this article](https://soshnikov.com/education/how-to-execute-notebooks-from-github/).
-
-Here is the box plot showing mean, median and quartiles for our data:
-
-
-
-Since our data contains information about different player **roles**, we can also do the box plot by role - it will allow us to get the idea on how parameters values differ across roles. This time we will consider height:
-
-
-
-This diagram suggests that, on average, height of first basemen is higher that height of second basemen. Later in this lesson we will learn how we can test this hypothesis more formally, and how to demonstrate that our data is statistically significant to show that.
-
-> When working with real-world data, we assume that all data points are samples drawn from some probability distribution. This assumption allows us to apply machine learning techniques and build working predictive models.
-
-To see what the distribution of our data is, we can plot a graph called a **histogram**. X-axis would contain a number of different weight intervals (so-called **bins**), and the vertical axis would show the number of times our random variable sample was inside a given interval.
-
-
-
-From this histogram you can see that all values are centered around certain mean weight, and the further we go from that weight - the fewer weights of that value are encountered. I.e., it is very improbable that the weight of a baseball player would be very different from the mean weight. Variance of weights show the extent to which weights are likely to differ from the mean.
-
-> If we take weights of other people, not from the baseball league, the distribution is likely to be different. However, the shape of the distribution will be the same, but mean and variance would change. So, if we train our model on baseball players, it is likely to give wrong results when applied to students of a university, because the underlying distribution is different.
-## Normal Distribution
-
-The distribution of weights that we have seen above is very typical, and many measurements from real world follow the same type of distribution, but with different mean and variance. This distribution is called **normal distribution**, and it plays a very important role in statistics.
-
-Using normal distribution is a correct way to generate random weights of potential baseball players. Once we know mean weight `mean` and standard deviation `std`, we can generate 1000 weight samples in the following way:
-```python
-samples = np.random.normal(mean,std,1000)
-```
-
-If we plot the histogram of the generated samples we will see the picture very similar to the one shown above. And if we increase the number of samples and the number of bins, we can generate a picture of a normal distribution that is more close to ideal:
-
-
-
-*Normal Distribution with mean=0 and std.dev=1*
-
-## Confidence Intervals
-
-When we talk about weights of baseball players, we assume that there is certain **random variable W** that corresponds to ideal probability distribution of weights of all baseball players (so-called **population**). Our sequence of weights corresponds to a subset of all baseball players that we call **sample**. An interesting question is, can we know the parameters of distribution of W, i.e. mean and variance of the population?
-
-The easiest answer would be to calculate mean and variance of our sample. However, it could happen that our random sample does not accurately represent complete population. Thus it makes sense to talk about **confidence interval**.
-
-> **Confidence interval** is the estimation of true mean of the population given our sample, which is accurate is a certain probability (or **level of confidence**).
-
-Suppose we have a sample X1, ..., Xn from our distribution. Each time we draw a sample from our distribution, we would end up with different mean value μ. Thus μ can be considered to be a random variable. A **confidence interval** with confidence p is a pair of values (Lp,Rp), such that **P**(Lp≤μ≤Rp) = p, i.e. a probability of measured mean value falling within the interval equals to p.
-
-It does beyond our short intro to discuss in detail how those confidence intervals are calculated. Some more details can be found [on Wikipedia](https://en.wikipedia.org/wiki/Confidence_interval). In short, we define the distribution of computed sample mean relative to the true mean of the population, which is called **student distribution**.
-
-> **Interesting fact**: Student distribution is named after mathematician William Sealy Gosset, who published his paper under the pseudonym "Student". He worked in the Guinness brewery, and, according to one of the versions, his employer did not want general public to know that they were using statistical tests to determine the quality of raw materials.
-
-If we want to estimate the mean μ of our population with confidence p, we need to take *(1-p)/2-th percentile* of a Student distribution A, which can either be taken from tables, or computer using some built-in functions of statistical software (eg. Python, R, etc.). Then the interval for μ would be given by X±A*D/√n, where X is the obtained mean of the sample, D is the standard deviation.
-
-> **Note**: We also omit the discussion of an important concept of [degrees of freedom](https://en.wikipedia.org/wiki/Degrees_of_freedom_(statistics)), which is important in relation to Student distribution. You can refer to more complete books on statistics to understand this concept deeper.
-
-An example of calculating confidence interval for weights and heights is given in the [accompanying notebooks](notebook.ipynb).
-
-| p | Weight mean |
-|-----|-----------|
-| 0.85 | 201.73±0.94 |
-| 0.90 | 201.73±1.08 |
-| 0.95 | 201.73±1.28 |
-
-Notice that the higher is the confidence probability, the wider is the confidence interval.
-
-## Hypothesis Testing
-
-In our baseball players dataset, there are different player roles, that can be summarized below (look at the [accompanying notebook](notebook.ipynb) to see how this table can be calculated):
-
-| Role | Height | Weight | Count |
-|------|--------|--------|-------|
-| Catcher | 72.723684 | 204.328947 | 76 |
-| Designated_Hitter | 74.222222 | 220.888889 | 18 |
-| First_Baseman | 74.000000 | 213.109091 | 55 |
-| Outfielder | 73.010309 | 199.113402 | 194 |
-| Relief_Pitcher | 74.374603 | 203.517460 | 315 |
-| Second_Baseman | 71.362069 | 184.344828 | 58 |
-| Shortstop | 71.903846 | 182.923077 | 52 |
-| Starting_Pitcher | 74.719457 | 205.163636 | 221 |
-| Third_Baseman | 73.044444 | 200.955556 | 45 |
-
-We can notice that the mean heights of first basemen is higher than that of second basemen. Thus, we may be tempted to conclude that **first basemen are higher than second basemen**.
-
-> This statement is called **a hypothesis**, because we do not know whether the fact is actually true or not.
-
-However, it is not always obvious whether we can make this conclusion. From the discussion above we know that each mean has an associated confidence interval, and thus this difference can just be a statistical error. We need some more formal way to test our hypothesis.
-
-Let's compute confidence intervals separately for heights of first and second basemen:
-
-| Confidence | First Basemen | Second Basemen |
-|------------|---------------|----------------|
-| 0.85 | 73.62..74.38 | 71.04..71.69 |
-| 0.90 | 73.56..74.44 | 70.99..71.73 |
-| 0.95 | 73.47..74.53 | 70.92..71.81 |
-
-We can see that under no confidence the intervals overlap. That proves our hypothesis that first basemen are higher than second basemen.
-
-More formally, the problem we are solving is to see if **two probability distributions are the same**, or at least have the same parameters. Depending on the distribution, we need to use different tests for that. If we know that our distributions are normal, we can apply **[Student t-test](https://en.wikipedia.org/wiki/Student%27s_t-test)**.
-
-In Student t-test, we compute so-called **t-value**, which indicates the difference between means, taking into account the variance. It is demonstrated that t-value follows **student distribution**, which allows us to get the threshold value for a given confidence level **p** (this can be computed, or looked up in the numerical tables). We then compare t-value to this threshold to approve or reject the hypothesis.
-
-In Python, we can use the **SciPy** package, which includes `ttest_ind` function (in addition to many other useful statistical functions!). It computes the t-value for us, and also does the reverse lookup of confidence p-value, so that we can just look at the confidence to draw the conclusion.
-
-For example, our comparison between heights of first and second basemen give us the following results:
-```python
-from scipy.stats import ttest_ind
-
-tval, pval = ttest_ind(df.loc[df['Role']=='First_Baseman',['Height']], df.loc[df['Role']=='Designated_Hitter',['Height']],equal_var=False)
-print(f"T-value = {tval[0]:.2f}\nP-value: {pval[0]}")
-```
-```
-T-value = 7.65
-P-value: 9.137321189738925e-12
-```
-In our case, p-value is very low, meaning that there is strong evidence supporting that first basemen are taller.
-
-There are also different other types of hypothesis that we might want to test, for example:
-* To prove that a given sample follows some distribution. In our case we have assumed that heights are normally distributed, but that needs formal statistical verification.
-* To prove that a mean value of a sample corresponds to some predefined value
-* To compare means of a number of samples (eg. what is the difference in happiness levels among different age groups)
-
-## Law of Large Numbers and Central Limit Theorem
-
-One of the reasons why normal distribution is so important is so-called **central limit theorem**. Suppose we have a large sample of independent N values X1, ..., XN, sampled from any distribution with mean μ and variance σ2. Then, for sufficiently large N (in other words, when N→∞), the mean ΣiXi would be normally distributed, with mean μ and variance σ2/N.
-
-> Another way to interpret the central limit theorem is to say that regardless of distribution, when you compute the mean of a sum of any random variable values you end up with normal distribution.
-
-From the central limit theorem it also follows that, when N→∞, the probability of the sample mean to be equal to μ becomes 1. This is known as **the law of large numbers**.
-
-## Covariance and Correlation
-
-One of the things Data Science does is finding relations between data. We say that two sequences **correlate** when they exhibit the similar behavior at the same time, i.e. they either rise/fall simultaneously, or one sequence rises when another one falls and vice versa. In other words, there seems to be some relation between two sequences.
-
-> Correlation does not necessarily indicate causal relationship between two sequences; sometimes both variables can depend on some external cause, or it can be purely by chance the two sequences correlate. However, strong mathematical correlation is a good indication that two variables are somehow connected.
-
- Mathematically, the main concept that shows the relation between two random variables is **covariance**, that is computed like this: Cov(X,Y) = **E**\[(X-**E**(X))(Y-**E**(Y))\]. We compute the deviation of both variables from their mean values, and then product of those deviations. If both variables deviate together, the product would always be a positive value, that would add up to positive covariance. If both variables deviate out-of-sync (i.e. one falls below average when another one rises above average), we will always get negative numbers, that will add up to negative covariance. If the deviations are not dependent, they will add up to roughly zero.
-
-The absolute value of covariance does not tell us much on how large the correlation is, because it depends on the magnitude of actual values. To normalize it, we can divide covariance by standard deviation of both variables, to get **correlation**. The good thing is that correlation is always in the range of [-1,1], where 1 indicates strong positive correlation between values, -1 - strong negative correlation, and 0 - no correlation at all (variables are independent).
-
-**Example**: We can compute correlation between weights and heights of baseball players from the dataset mentioned above:
-```python
-print(np.corrcoef(weights,heights))
-```
-As a result, we get **correlation matrix** like this one:
-```
-array([[1. , 0.52959196],
- [0.52959196, 1. ]])
-```
-
-> Correlation matrix C can be computed for any number of input sequences S1, ..., Sn. The value of Cij is the correlation between Si and Sj, and diagonal elements are always 1 (which is also self-correlation of Si).
-
-In our case, the value 0.53 indicates that there is some correlation between weight and height of a person. We can also make the scatter plot of one value against the other to see the relationship visually:
-
-
-
-> More examples of correlation and covariance can be found in [accompanying notebook](notebook.ipynb).
-
-## Conclusion
-
-In this section, we have learnt:
-
-* basic statistical properties of data, such as mean, variance, mode and quartiles
-* different distributions of random variables, including normal distribution
-* how to find correlation between different properties
-* how to use sound apparatus of math and statistics in order to prove some hypotheses,
-* how to compute confidence intervals for random variable given data sample
-
-While this is definitely not exhaustive list of topics that exist within probability and statistics, it should be enough to give you a good start into this course.
-
-## 🚀 Challenge
-
-Use the sample code in the notebook to test other hypothesis that:
-1. First basemen and older that second basemen
-2. First basemen and taller than third basemen
-3. Shortstops are taller than second basemen
-
-## [Post-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/7)
-
-## Review & Self Study
-
-Probability and statistics is such a broad topic that it deserves its own course. If you are interested to go deeper into theory, you may want to continue reading some of the following books:
-
-1. [Carlos Fernanderz-Granda](https://cims.nyu.edu/~cfgranda/) from New York University has great lecture notes [Probability and Statistics for Data Science](https://cims.nyu.edu/~cfgranda/pages/stuff/probability_stats_for_DS.pdf) (available online)
-1. [Peter and Andrew Bruce. Practical Statistics for Data Scientists.](https://www.oreilly.com/library/view/practical-statistics-for/9781491952955/) [[sample code in R](https://github.com/andrewgbruce/statistics-for-data-scientists)].
-1. [James D. Miller. Statistics for Data Science](https://www.packtpub.com/product/statistics-for-data-science/9781788290678) [[sample code in R](https://github.com/PacktPublishing/Statistics-for-Data-Science)]
-
-## Assignment
-
-[Small Diabetes Study](assignment.md)
-
-## Credits
-
-This lesson has been authored with ♥️ by [Dmitry Soshnikov](http://soshnikov.com)
From 1808d332d98e6090323738eba65956a2f48dbe7d Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Sun, 10 Oct 2021 21:31:50 -0500
Subject: [PATCH 083/319] fix: (translation): Revert translation on module 1
* revert translations for sections 1 & 2
---
.../translations/README.es.md | 175 ------------
.../02-ethics/translations/README.es.md | 263 ------------------
2 files changed, 438 deletions(-)
diff --git a/1-Introduction/01-defining-data-science/translations/README.es.md b/1-Introduction/01-defining-data-science/translations/README.es.md
index 20011f7c..e69de29b 100644
--- a/1-Introduction/01-defining-data-science/translations/README.es.md
+++ b/1-Introduction/01-defining-data-science/translations/README.es.md
@@ -1,175 +0,0 @@
-# Definiendo la Ciencia de Datos
-
-| ](../../sketchnotes/01-Definitions.png)|
-|:---:|
-|Definiendo la Ciencia de Datos - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
-
----
-
-[](https://youtu.be/pqqsm5reGvs)
-
-## [Examen previo a la lección](https://red-water-0103e7a0f.azurestaticapps.net/quiz/0)
-
-## ¿Qué son los Datos?
-En nuestra vida diaria, estamos constantemente rodeados por datos. El texto que estás leyendo ahora son datos,
-la lista de números telefónicos de tus amigos en tu móvil son datos, también como la hora actual que se muestra en tu reloj.
-Como seres humanos, operamos naturalmente con datos, contando el dinero que tenemos o escribiendo cartas a nuestros amigos.
-
-Sin embargo, los datos se vuelven más críticos con la creación de las computadoras. El rol principal de las computadoras
-es realizar cálculos, pero éstas necesitan datos para operar. Por lo cual, necesitamos entender cómo las computadoras
-almacenan y procesan los datos.
-
-Con el surgimiento de internet, el rol de las computadoras como dispositivos para la manipulación de datos incrementó.
-Si lo piensas, ahora usamos computadoras mucho más para la comunicación y el procesamiento de datos, en lugar de para hacer cálculos. Cuando escribimos un correo electrónico a un amigo o buscamos alguna información en internet - estamos
-creando, almacenando, transmitiendo y manipulando datos.
-
-> ¿Recuerdas la última vez que usaste una computadora para realmente calcular algo?
-
-## ¿Qué es Ciencia de Datos?
-
-En [Wikipedia](https://en.wikipedia.org/wiki/Data_science), se define la **Ciencia de Datos** como *un campo de las ciencias que usa métodos científicos para extraer conocimiento y perspectivas de datos estructurados y no estructurados, y
-aplicar el conocimiento y conocimiento práctico de los datos a través de un amplio rango de dominios de aplicación*.
-
-Ésta definición destaca los siguientes aspectos importantes para la ciencia de datos:
-
-* El objetivo principal para la ciencia de datos es **extraer conocimiento** de los datos, en otras palabras - **entender** los datos, encontrar relaciones ocultas y construir un **modelo**.
-* La ciencia de datos usa **métodos científicos**, como la probabilidad y estadística. De hecho, cuando el término **ciencia de datos** fue usado por primera vez, algunas personas argumentaron que la ciencia de datos era solo un nuevo nombre elegante para estadística. En estos días se ha vuelto evidente que es un campo mucho más amplio.
-* El conocimiento obtenido puede ser aplicado para producir **conocimiento práctico**.
-* Seremos capace de operar tanto datos **estructurados** y **no estructurados**. Más adelante en el curso discutiremos los diferentes tupos de datos.
-* El **dominio de la aplicación** es un concepto importante, y un científico de datos necesita al menos cierta experiencia en el dominio del problema.
-
-> Otro aspecto importante de la Ciencia de Datos es que esta estudia como los datos son obtenidos, almacenados y operados usando computadoras. Mientras la estadística nos da los fundamentos matemáticos, la ciencia de datos aplica los conceptos matemáticos para realmente extraer conocimiento de los datos.
-
-Una de las formas (atribuidas a [Jim Gray](https://en.wikipedia.org/wiki/Jim_Gray_(computer_scientist))) de ver a la ciencia de datos es considerarla como un paradigma separado de la ciencia:
-* **Empírica**, en la que confíamos mayormente en observaciones y resultados de experimientos
-* **Teórica**, donde surgen nuevos conceptos desde el conocimiento científico existente
-* **Computacional**, donde descubrimos nuevos principios basados en algunos experimentos computacionales
-* **Dirigidos por datos**, basados en el descubrimiento de relaciones y patrones en los datos
-
-## Otros campos relacionados
-
-Ya que los datos son un concepto predominante, la ciencia de datos en sí misma también es un amplio campo, abarcando muchas otras disciplinas relacionadas.
-
-
-
Bases de datos
-
-La cosa más obvia a considerar es **cómo almacenar** los datos, por ejemplo como estructurarlos de tal formar que se procesen más rápido. Existen distintos tipos de bases de datos que almacenan datos estructurados y no estructurados, los
-cuales [consideraremos en este curso] (../../2-Working-With-Data/README.md).
-
-
Big Data
-
-Usualmente necesitamos almacenar y procesar enormes cantidades de datos con estructuras relativamente simples. Existen
-formas especiales y herramientas para almacenar los datos en una forma distribuida on un clúster de computadoras, y procesarlas eficientemente.
-
-
Aprendizaje automático
-
-Una de las formas de entender los datos es **construir un modelo** que será capaz de predecir el resultado deseado. Ser capaz de aprender esos modelos de los datos es el área de estudio del **aprendizaje automático**. Querrás dar un vistazo a nuestro currículum de [Aprendizaje automático para principiantes](https://github.com/microsoft/ML-For-Beginners/) para profundizar en ese campo.
-
-
Inteligencia aritifcial
-
-Así como el aprendizaje automático, la inteligencia artificial también depende de los datos, e involucra la construcción de modelos altamente complejos que expondrán un comportamiento similar a un ser humano. Además, los métodos de AI usualmente nos permiten convertir datos no estructurados (por ejemplo, lenguaje natural) en datos estructurados extrayendo conocimiento útil.
-
-
Visualización
-
-Cantidades descomunales de datos son incomprensibles para un ser humano, pero una vez que creamos visualizaciones útiles - podemos iniciar haciendo más sentido de los datos, y extrayendo algunas conclusiones. Por lo tanto, es importante conocer diversas formas de visualizar la información - lo cual cubriremos en la [Sección 3](../../3-Data-Visualization/README.md) de nuestro curso. Campos relacionados incluyen **infografías**, e **interacción humano-computadora** en general.
-
-
-
-## Tipos de datos
-
-Como ya se ha mencionado - los datos están en todas partes, ¡sólo necesitamos capturarlos en la forma correcta! Es útil distinguir entre datos **estructurados** y **no estructurados**. Los primeros típicamente son representados en una forma bien estructurada, usualmente como una tabla o conunto de tablas, mientras que los últimos es sólo una colección de archivos. Algunas veces podemos hablar de datos **semi-estructurados**, que tienen cierta estructura la cual podría variar mucho.
-
-| Estructurado | Semi-estructurado | No estructurado |
-|------------- |-------------------|-----------------|
-| Lista de personas con sus números telefónicos | Páginas de wikipedia con enlaces | Texto de la enciclopedia Británica |
-| Temperatura en todas las habitaciones de un edificio a cada minuto por los últimos 20 años | Colección de documentos científicos en formato JSON con autores, fecha de publicación, y resumen | Recurso compartido de archivos con documentos corporativos |
-| Datos por edad y género de todas las personas que entrar al edificio | Páginas de internet | Vídeo sin procesar de cámara de vigilancia |
-
-## Dónde obtener datos
-
-Hay múltiples fuentes de datos, y ¡sería imposible listarlas todas! Sin embargo, mencionemos algunos de los lugares típicos en dónde obtener datos:
-
-* **Estructurados**
- - **Internet de las cosas**, incluyendo datos de distintos sensore, como sensores de temperatura o presión, proveen muchos datos útiles. Por ejemplo, si una oficina es equipada con sensores IoT, podemos controlar automáticamente la calefacción e iluminación para minimizar costos.
- - **Encuestas** que realizamos a los usuarios después de pagar un producto o después de visitar un sitio web.
- - **Análisis de comportamiento** podemos, por ejemplo, ayudarnos a entender que tanto profundiza un usuario en un sitio, y cuál es la razón típica por la cual lo deja.
-* **No estructurados**
- - Los **Textos** pueden ser una fuente rica en conocimiento práctico, empezando por el **sentimiento principal** generalizado, hasta la extracción de palabras clave e incluso algún significado semántico.
- - **Imágenes** o **Video**. Un video de una cámara de vigilancia puede ser usado para estimar el tráfico en carretera, e informar a las personas acerca de posibles embotellamientos.
- - **Bitácoras** de servidores web pueden ser usadas para entender qué páginas de nuestro sitio son las más visitadas y por cuánto tiempo.
-* **Semi-estructurados**
- - Grafos de **redes sociales** pueden ser una gran fuente de datos acerca de la la personalidad del usuario y efectividad potencial de difusión de la información.
- - Cuando tenemos un conjunto de fotografías de una fiesta, podemos intentar extraer datos de la **dinámica de grupos** construyendo un grafo de personas tomándose fotos unas a otras.
-
-Conociendo posibles fuentes de datos diversas, puedes intentar pensar en distintos escenarios donde se pueden aplicar técnicas de ciencia de datos para conocer mejor la situación, y mejroar los procesos de negocio.
-
-## Qué puedes hacer con los datos
-
-En la ciencia de datos, nos enfocamos en los siguientes pasos del viaje de los datos:
-
-
-
1) Adquisición de datos
-
-El primer paso es reunir los datos. Mientras que en muchos casos esto puede ser un proceso simple, como datos obtenidos des una base de datos de una aplicación web. algunas veces necesitamos usar técnicas especiales. Por ejemplo, los datos obtenidos desde sensorres IoT pueden ser inmensos, y es una buena práctica el uso de endpoints búfer como IoT Hub para para reunir todos los datos antes de procesarlos.
-
-
2) Almacenamiento de datos
-
-Almacenar los datos puede ser desafiante, especialmente si hablamos de big data. Al decidir cómo almacer datos, hace sentido anticiparse a la forma en la cual serán consultados. Existen varias formas de almacenar los datos:
-
-
Las bases de datos relacionales almacenan una colección de tabla, y usan un lenguaje especial llamado SQL para consultalos. Típicamente, las tablas estarían conectadas unas a otras mediante un esquema. En muchas ocasiones necesitamos convertir los datos desde la fuente original para que se ajusten al esquema.
-
Bases de datos NoSQL, como CosmosDB, no exigen un esquema de datos, y permiten almacenar datos más complejos, por ejemplo, documentos JSON jerárquicos o grafos. Sin embargo, Las bases de datos NoSQL no tienen capacidades de consulta SQL sofisticadas, y no requieren integridad referencial entre los datos.
-
El almacenamiento en lago de datos se usa para grandes colecciones de datos sin procesamiento. Los lagos de datos suelen ser usados con big data, donde todos los datos no pueden ser reunidos en un único equipo, y tienen que ser almacenados y procesados por un clúster. Parquet es un formato de datos que se utiliza comúnmente en conjunto con big data.
-
-
-
3) Procesamiento de datos
-
-Esta es la parte más emocionante del viaje de los datos, el cual involucra el procesamiento de los datos desde su forma original hasta la forma en que puede ser usada por visualizaciones/modelo de entrenamiento. Cuando tratamos con datos no estructurados como texto o imágenes, debemos usar algunas técnias de IA para extraer las **características** de los datos, y así convertirlos a su forma estructurada.
-
-
4) Visualización / entendimiento humano
-
-Usualmente para entender los datos necesitamos visualizarlos. Teniendo diversas ténicas de visualización en nuestro arsenal podemos encontrar la visualización adecuada para comprenderla. Comúnmente, un científico de datos necesita "jugar con los datos", visualizádolos varias veces y buscando alguna relación. Además, debemos usar técnicas de estadística para probar algunas hipótesis o probar la correlación entre distintas porciones de datos.
-
-
5) Entrenando modelos predictivos
-
-Ya que el principal objetivo de la ciencia de datos es ser capaz de tomar decisiones basándonos en los datos, debemos usar técnicas de aprendizaje automático para construir modelos predictivos que serán capces de resolver nuestros problemas.
-
-
-
-Por supuesto, dependiendo de los datos reales algunos pasos serán omitidos (por ejemplo, cuando ya tenemos los datos en la base de datos, o cuando no necesitamos modelo de entrenamiento), o algunos pasos deben ser repetidos varias veces (como el procesamiento de datos).
-
-## Digitalización y transformación digital
-
-En la última década, muchos negocios comenzaron a entender la importancia de los datos al tomar decisiones de negocio. Para aplicar los principios de ciencia de datos para dirigir un negocio primero necesitas reunir algunos datos, por ejemplo, de alguna forma digitalizar los procesos de negocio. Esto es conocido como **digitalización**, y seguido usar técnicas de ciencia de datos para guiar decisiones esto usualmente conlleva a un incremento significativo de la productividad (o incluso negocios pivote), llamado **transformación digital**.
-
-Consideremos el siguiente ejemplo. Supongaos, tenemos un curso de ciencia de datos (como éste), el cual ofrecemos en línea a los estudiante, y queremos usar ciencia de datos para mejorarl. ¿Cómo podemos hacerlo?
-
-Podemos comenzar pensando "¿qué puede ser digitalizado?". La forma más simple sería medir el tiempo que le toma a cada estuddiante completar cada módulo, y el conocimiento obtenido (por ejemplo, realizando exámenes de opción múltiple al final de cada módulo). Promediando el tiempo en concluir de todos los estudiantes, y trabajar en simplificarlos.
-
-> Argumentarás que este enfoque no es idóneo, porque los módulos pueden tener distinta duración. Problablemente es más justo dividir el tiempo por la longitud del módulo (en número de caracteres), y comparar esos valores en su lugar.
-
-Cuando comenzamos analizando los resultados de los exámenes de opción múltiple, intentamos encontrar conceptos específicos que los estudiantes entendieron vagamente,y mejorar el contenido. Para hacerlo, necesitamos diseñar exámenes de tal forma que cada pregunta se relacione a un concepto concreto o porción de conocimiento.
-
-Si queremos hacerlo aún más complejo, podemos trazar el tiempo invertido en cada módulo contra la categoría de edad de los estudiantes. Encontraremos que para algunas categorías de edad les toma ciertamente más tiempo el completar el módulo, o algunos estudiantes abandonan el curso en cierto punto. Esot nos puede ayudar a sugerir recomendaciones de módulos por edad, y así minimizar el descontengo de la gente por falsas expectativas.
-
-## 🚀 Desafío
-
-En este desafío, intentaremos encontrar los conceptos relevante para el campo de la Ciencia de Datos consultando algunos textos. Tomarermos un artículo de Wikipedia de Ciecnia de Datos, descargaremos y procesaremos el texto, y luego construiremos una nube de palabras como esta:
-
-
-
-Visita [`notebook.ipynb`](notebook.ipynb) para leer el código.También pueder ejecutarlo y ver como realiza todas las transformaciones de los datos en tiempo real.
-
-> Si no sabes como ejecutar el código en Jupyter Notebook, da un vistazo a [este artículo](https://soshnikov.com/education/how-to-execute-notebooks-from-github/).
-
-
-
-## [Cuestionario porterior a la lección](https://red-water-0103e7a0f.azurestaticapps.net/quiz/1)
-
-## Ejercicios
-
-* **Tarea 1**: Modifica el código anterior para encontrar conceptos relacionados para los campos de **Big Data** y **Machine Learning**
-* **Tarea 2**: [Piensa en los escenarios para la Ciencia de Datos](assignment.md)
-
-## Créditos
-
-Esta lección ha sido escrita con ♥️ por [Dmitry Soshnikov](http://soshnikov.com)
diff --git a/1-Introduction/02-ethics/translations/README.es.md b/1-Introduction/02-ethics/translations/README.es.md
index 8d883b1b..e69de29b 100644
--- a/1-Introduction/02-ethics/translations/README.es.md
+++ b/1-Introduction/02-ethics/translations/README.es.md
@@ -1,263 +0,0 @@
-# Introducción a la ética de datos
-
-| ](../../sketchnotes/02-Ethics.png)|
-|:---:|
-| Ética para ciencia de datos - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
-
----
-
-Todos somos ciudadanos de datos viviendo en un gobernado por datos.
-
-Las tendencias de mercado nos dicen que para 2022, 1 de cada 3 grandes organizaciones comprará y venderá sus datos en línea a través de [mercados e intercambios](https://www.gartner.com/smarterwithgartner/gartner-top-10-trends-in-data-and-analytics-for-2020/). Como **desarrolladores de Apps**, lo encontraremos más fácil y barato que integrar conocimientos dirigidos por datos y automatización dirigida por algoritmos en las experiencias de usuario del día a día. Pero como la AI se vuelve cada vez más presente, necesitarempos entender lso daños potenciales causados por el [armamentismo](https://www.youtube.com/watch?v=TQHs8SA1qpk) de dichos algoritmos a escala.
-
-Las tendencias también indican que crearemos y consumiremos más de [180 zettabytes](https://www.statista.com/statistics/871513/worldwide-data-created/) de datos para el 2025. Como **científicos de datos**, esto nos da niveles de acceso sin precedentes a datos personales. Esto significa que podemos construir perfiles conductuales de usuarios e influenciar en la toma de decisiones en formas que crea una [ilusión de libre elección](https://www.datasciencecentral.com/profiles/blogs/the-illusion-of-choice) mientras empuja a los usuarios hacia los resultados que preferimos. También plantea preguntas más amplias respecto a la privacidad de datos y protección de los usuarios.
-
-La ética de datos son _barreras de seguridad necesarias_ para la ciencia de datos e ingeniería, ayudándonos a minimizar daños potenciales y consecuencias no deseadas de nuestras acciones dirigidas por datos. El [Hype Cycle de Gartner para AI](https://www.gartner.com/smarterwithgartner/2-megatrends-dominate-the-gartner-hype-cycle-for-artificial-intelligence-2020/) identifica tendencias relevantes en ética digital, AI responsable, y gobernanza de AI como factores clave para mega-tendencias mayores alrededor de la _democratización_ e _industrialización_ de la AI.
-
-
-
-En esta lección, exploraremos la fascinante área de la ética de datos - desde los conceptos clave y desafíos, hasta los casos de estudio y conceptos de AI aplicados como gobernanza - que ayuda a establecer una cultura de ética en equipos y organizaciones que trabajan con datos y AI.
-
-
-
-
-## [Examen previo a la lección](https://red-water-0103e7a0f.azurestaticapps.net/quiz/2) 🎯
-
-## Definiciones básicas
-
-Empecemos entendiendo la terminología básica.
-
-La palabra "ética" proviene de la [palabra griega "ethikos"](https://en.wikipedia.org/wiki/Ethics) (y su raíz "ethos") lo cual significa _carácter o naturaleza moral_.
-
-La **ética** se trata de valores compartidos y principios morales que gobiernan nuestro comportamiento en sociedad. La ética se basa no en leyes sino en normas más ampliamente aceptadas de lo que es "correcto vs lo incorrecto". Sin embargo, las consideraciones éticas pueden influenciar iniciativas de gobernanza corporativa y regulaciones de gobernanza que crean más incentivos para el cumplimiento.
-
-La **ética de datos** es una [nueva rama de la ética](https://royalsocietypublishing.org/doi/full/10.1098/rsta.2016.0360#sec-1) que "estudia y evalua problemas morales relacionados a _datos, algoritmos y prácticas correspondientes_"- Aquí, los **"datos"** se centran en acciones relacionadas a la generación, grabación, curación, procesamiento de difusión, intercambio y uso de **"algoritmos"** centrados en AI, agentes, aprendizaje automático, y robots, así como **"prácticas"** enfocadas en temas como inovación responsable, programación, hackeo y códigos de ética.
-
-**Ética aplicada** es la [aplicación práctica de consideraciones morales](https://en.wikipedia.org/wiki/Applied_ethics). Es el proceso de investigar activamente cuestiones éticas en el contexto de _acciones del mundo real, productos y procesos_, y tomar medidas correctivas para hacer que estos se alinean con nuestros valores éticos definidos.
-
-**Cultura ética** trata de [_operacionalizar_ la ética aplicada](https://hbr.org/2019/05/how-to-design-an-ethical-organization) para confirmar que nuestros principios éticos y prácticas son adoptados de forma consistente y escalable a través de toda la organización. Una cultura ética exitosa define principios éticos a nivel organización, provee incentivos significativos para el cumplimiento y refuerza las normas éticas alentando y amplificando los comportamientos deseados en cada nivel de la organización.
-
-
-## Conceptos éticos
-
-En esta sección, duscutiremos conceptos como **valores compartidos** (principios) y **retos éticos** (problemas) para la ética de datos - y explora **casos de estudio** que te ayudan a entender estos conceptos en el contexto del mundo real.
-
-### 1. Principios éticos
-
-Cada estrategia de ética de datos comienza por la definición de _principios éticos_ - los "valores compartidos" que describen los comportamientos aceptables, y guían acciones de conformidad, en nuestros proyectos de datos y AI. Puedes definir estos a nivel individual o de equipo. Sin embargo, la mayoría de las grandes organizaciones describen estos en una misión _ética de AI_ o marco de trabajo que es definido a niveles corporativos y refuerza consistentement a través de todos los equipos.
-
-**Ejemplo:** La misión [responsable de AI](https://www.microsoft.com/en-us/ai/responsible-ai) de Microsoft se lee: _"Estamos comprometidos al avance de principios éticos dirigidos por AI que anteponen primero a la gente"_ - identificando 6 principios éticos en el marco de trabajo descrito a continuación:
-
-
-
-Exploremos brevemente estos principios. La _transparencia_ y _responsabilidad_ son los valores fundamentales sobre los que se cimientan otros principios - iniciemos aquí:
-
-* [**Responsabilidad**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) hace a los practicantes _responsables_ por sus datos y operaciones de AI, en cumplimiento con estos principios éticos.
-* [**Transparencia**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) asegura que los datos y acciones de AI sean _entendibles_ (interpretables) para los usuarios, explicando el qué y el porqué detrás de las decisiones.
-* [**Justicia**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1%3aprimaryr6) - se centra en asegurar que la AI trata a _todas las personas_ justamente, dirigiendo cualquier sesgo sistémico o social-ético implícito in datos y sistemas.
-* [**Fiabilidad y seguridad**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - asegura que la AI se comporta _consistentemente_ con los valores definidos, minimizando daños potenciales o consecuencias no intencionadas.
-* [**Privacidad & seguridad**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - se trata del entendimiento del linaje de datos, y provee _privacidad de datos y protecciones relacionadas_ a los usuarios.
-
-* [**Inclusión**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - trata del diseño de soluciones AI con la intención, adaptándolas para reunir un _amplio rango de necesidades humanas_ y capacidades.
-
-> 🚨 Piensa cual podría ser tu misión de ética de datos. Explorar marcos de trabajo éticos de AI de otras organizaciones - aquí tienes ejemplos de [IBM](https://www.ibm.com/cloud/learn/ai-ethics), [Google](https://ai.google/principles) , y[Facebook](https://ai.facebook.com/blog/facebooks-five-pillars-of-responsible-ai/). ¿Qué valores compartidos tienen en común? ¿Cómo se relacionan estos principios al producto de AI o industria en la cual operan?
-
-### 2. Desafíos éticos
-
-Una vez que tenemos pricipios éticos definidos, el siguiente paos es evaluar nuestros datos y acciones AI para ver si estos se alinean con los valores compartidos. Piensa en tus acciones en 2 categorías: _recolección de datos_ y _diseño de algoritmos_.
-
-Con la recolección de datos, las acciones probablemente involucren **datos personales** o información de identificación personal (PII) para individuos vivos identificables. Esto incluye [diversos artículos de datos no personales](https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en) que _colectivamente_ identifican a un individuo. Los desafíos éticos pueden relacionarse con _privacidad de datos_, _propiedad de los datos_, y temas relacionados como _consentimiento informado_ y _derechos de propiedad intelectual_ para los usuarios.
-
-Con el diseño de algoritmo, las acciones involucran la recolección y curación de **conjuntos de datos**, luego usarlos para entrenar y desplegar **modelos de datos** que predicen resultados o automatizan decisiones en contexto del mundo real. Los desafíos éticos pueden surgir de _conjuntos de datos sesgados_, problemas con la _calidad de los datos_, _injusticia_, y _malinterpretación_ in los algoritmo - incluyendo algunos problemas que son sistémicos por naturaleza.
-
-En ambos casos, los desafíos éticos destacan áreas donde nuestas acciones pueden encontrar conflictos con nuestros valores compartidos. Para detectar, mitigar, minimizar, o eliminar estas preocupaciones - necesitamos realizar preguntas morales de "sí/no" relacionadas a nuestas acciones, luego tomar acciones correctivas según sea necesario. Demos un vistazo a algunos desafíos éticos y las preguntas morales que plantean:
-
-
-#### 2.1 Propiedad de los datos
-
-La recolección de datos suele involucrar datos que pueden identificar los sujetos de datos. La [propiedad de los datos](https://permission.io/blog/data-ownership) trata del _control_ y [_derechos de usuario_](https://permission.io/blog/data-ownership) relacionados a la creación, procesamiento y dispersión de los datos.
-
-Las preguntas morales que debemos hacer son:
- * ¿Quién posea los datos? (usuario u organización)
- * ¿Qué derechos tienen los sujetos de datos? (ejemplo: acceso, eliminación, portabilidad)
- * ¿Qué derechos tienen las organizaciones? (ejemplo: rectificar revisiones de usuarios maliciosos)
-
-#### 2.2 Consentimiento informado
-
-[Consentimiento informado](https://legaldictionary.net/informed-consent/) define el acto de los usuarios al aceptar una acción (como la recolección de datos) con un _completo entendimiento_ de hechos relevantes incluyendo el propósito, riesgos potenciales y alternativas.
-
-Las preguntas a explorar son:
- * ¿El usuario (sujeto de datos) otorgó el permiso para el uso y cpatura de datos?
- * ¿El usuario entendió el propósito para el cual los datos fueron capturados?
- * ¿EL usuario entendió los riesgos potenciales de su participación?
-
-#### 2.3 Propiedad intelectual
-
-La [propiedad intelectual](https://en.wikipedia.org/wiki/Intellectual_property) se refiere a creaciones intangibles resultado de la iniciativa humana, que puede _tener valor económico_ para individuos o negocios.
-
-Las preguntas a explorar son:
- * ¿Los datos recolectados tiene valor económico para un usuario o negocio?
- * ¿El **usuario** tiene propiedad intelectual en este ámbito?
- * ¿La **organización** tiene propiedad intelectual en este ámbito?
- * Si estos derechos existen, ¿cómo los protegemos?
-
-#### 2.4 Privacidad de datos
-
-La [privacidad de datos](https://www.northeastern.edu/graduate/blog/what-is-data-privacy/) o privacidad de la información se refiere a la preservación de la privacidad del usuario y la protección de la identidad del usuario respecto a información de identificación personal.
-
-Las preguntas a explorar son:
- * ¿Están los datos (personales) de los usuarios seguros contra hackeos y filtraciones?
- * ¿Están los datos de usuario accesibles sólo para usuarios y contextos autorizados?
- * ¿Se preserva el anonimato de los usuarios cuando los datos son compartidos o esparcidos?
- * ¿Puede un usuario ser desidentificado de conuntos de datos anonimizados?
-
-
-#### 2.5 Derecho al olvido
-
-El [derecho al olvido](https://en.wikipedia.org/wiki/Right_to_be_forgotten) o [derecho a la eliminación](https://www.gdpreu.org/right-to-be-forgotten/) provee protección adicional a datos personales de los usuarios. Especialmente, brinda a los usuarios el derecho a solicitar la eliminación o remoción de datos personales de búsquedas de internet y otras ubicaciones, _bajo circunstancias específicas_ - permitiéndoles un nuevo comienzo en línea sin las acciones pasadas siendo retenidas contra él.
-
-Las preguntas a explorar son:
- * ¿El sistema permite a los sujetos de datos solicitar eliminación?
- * ¿La remoción del consentimiento del usuario debería disparar la eliminación automatizada?
- * ¿Se recolectaron los datos sin consentimiento o por medios no legítimos?
- * ¿Estamos de acuerdo con las regulaciones de gobierno para la privacidad de los datos?
-
-
-#### 2.6 Sesgo del conjunto de datos
-
-Un conjunto de datos o [sesgo de recopilación](http://researcharticles.com/index.php/bias-in-data-collection-in-research/) pretende seleccionar un subconjunto _no representativo_ de datos para el desarrollo de un algorítmo, creando una potencial injusticia en los resultados para distintos grupos. Los tipos de sesgos incluyen selección o muestreo de sesgo, sesgo voluntario y sesgo de instrumento.
-
-Las preguntas a explorar son:
- * ¿Reclutamos un conjunto representativo de sujetos de datos?
- * ¿Probamos nuestros conjuntos de datos recoletados o curados para distintos sesgos?
- * ¿Podemos mitigar o eliminar los sesgos descubiertos?
-
-#### 2.7 Calidad de los datos
-
-[La calidad de los datos](https://lakefs.io/data-quality-testing/) se enfoca en la validez de los conjuntos de datos curados que se usan para desarrollar nuestros algoritmos, comprobando si las características y registros cumplen los requerimientos para el nivel de precisión necesario para nuestros propósitos de AI.
-
-Las preguntas a explorar son:
- * ¿Capturamos _características_ válidas para nuestro caso de uso?
- * ¿Los datos fueron capturados _consistentemente_ a través de las distintas fuentes de datos?
- * ¿Están _completos_ los conjuntos de datos para las distintas condiciones o escenarios?
- * ¿La información es capturada de forma _precisa_ reflejando la realidad?
-
-#### 2.8 Justicia del algoritmo
-
-[La justicia del algoritmo](https://towardsdatascience.com/what-is-algorithm-fairness-3182e161cf9f) verifica que el diseño del algoritmo discrimina sistemáticamente contra subgrupos específicos de sujetos de datos que conlleven a [daños potenciales](https://docs.microsoft.com/en-us/azure/machine-learning/concept-fairness-ml) en _asignación_, (donde los recursos son negados o retenidos para ese grupo) y _calidad del servicio_ (donde la AI no es tan precisa para algunos subgrupos como lo es para otros).
-
-Las preguntas a explorar son:
- * ¿Evaluamos la precisión del modelo para distintos subgrupos y condiciones?
- * ¿Escrutinamos el sistema buscando daños potenciales (por ejemplo, estereotipos?
- * ¿Podemos revisar los datos o retener re-entrenar modelos para mitigar daños potenciales?
-
-Explora recursos como [Listas de comprogación de justicia de AI](https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4t6dA) para aprender más.
-
-#### 2.9 Malinterpretación
-
-[La malinterpretación de datos](https://www.sciencedirect.com/topics/computer-science/misrepresentation) trata de preguntarse si estamos comunicando ideas honestamente de los datos reportados en una forma engañosa para soportar la narrativa deseada.
-
-Las preguntas a explorar son:
- * ¿Estamos reportando datos incompletos o inexactos?
- * ¿Estamos visualizando los datos de tal forma conllevan a conclusiones engañosas?
- * ¿Estamos usando técnicas estadísticas selectivas para manipular los resultados?
- * ¿Existen explicaciones alternativas par pueden ofrecer una conclusión distinta?
-
-#### 2.10 Libertad de elección
-La [ilusión de libertad de elección](https://www.datasciencecentral.com/profiles/blogs/the-illusion-of-choice) ocurre cuando un sistema "elige arquitecturas" usando algoritmos de toma de decisiones para empujar a la gente hacia la elección de un resultado preferido mientras aparenta darles opciones y control. Estos [patrones obscuros](https://www.darkpatterns.org/) pueden causar daño social y económico a los usuarios. Ya que las decisiones del usuario impactan en los perfiles de comportamiento, estas acciones dirigen potencialmente a futuras elecciones que pueden amplificar y extender el impacto de estos daños.
-
-Las preguntas a explorar son:
- * ¿El usuario entendió las implicaciones de realizar dicha elección?
- * ¿El usuario estaba conciente de las opciones (alternativas) y los pros y contrar de cada una?
- * ¿El usuario puede revertir una elección influenciada o automatizada posteriormente?
-
-### 3. Casos de estudio
-
-Para poner estos desafíos éticos en contexto del mundo real, ayuda ver casos de estudio que destacan el daño potencial y las consecuencias a individuos y sociedad, cuando dichas violaciones éticas son pasadas por alto.
-
-Aquí hay algunos ejemplos:
-
-| Desafío de ética | Caso de estudio |
-|--- |--- |
-| **Consentimiento informado** | 1972 - [Estudio de sífilis Tuskegee](https://en.wikipedia.org/wiki/Tuskegee_Syphilis_Study) - A los hombres afroamericanos que participaron en el estudio le fue prometido tratamiento médico gratuito _pero fueron engañados_ por los investigadores quienes fallaron al informar a los sujetos en sus diagnósticos o en la disponibilidad del tratamiento. Muchos sujetos murieron y los compañeros o hijos fueron afectados; el estudio duró 40 años. |
-| **Privacidad de los datos** | 2007 - El [premio de datos de Netflix](https://www.wired.com/2007/12/why-anonymous-data-sometimes-isnt/) otorgó a investigadores con _10M de clasificaciones anóminas de 50K clientes_ para ayudar a mejorar los algoritmos de recomendación. Sin embargo, los investigadores fueron capaces de correlacionar datos anónimos con datos personalmente identificables en _conjuntos de datos externos_ (por ejemplo, comentarios en IMDb) - efectivamente "des-anonimizando" a algunos subscriptores de Netflix.|
-| **Sesgo de colección** | 2013 - La ciudad de Boston [desarrolló Street Bump](https://www.boston.gov/transportation/street-bump), una app que permite a los ciudadanos reportar baches, dando a la ciudad mejores datos de la carretera para encontrar y reparar desperfectos. Sin embargo, [la gente en los grupos con menores ingresos tuvieron menos acceso a autos y teléfonos](https://hbr.org/2013/04/the-hidden-biases-in-big-data), haciendo sus problemas de carretera invisibles para la app. Los desarrolladores trabajaron en conjunto con académicos para cambiar _el acceso equitativo y brecha digital_ y así fuese más justo. |
-| **Justicia de algoritmos** | 2018 - El [estudio de tonos de género](http://gendershades.org/overview.html) del MIT evaluó la precisión de productos de clasificación de género , exponiendo brechas en la precisión para mujeres y personas de color. Una [tarjeta 2019 de Apple](https://www.wired.com/story/the-apple-card-didnt-see-genderand-thats-the-problem/) parecía ofrecer menos crédit a mujeres que a hombres. Ambos ilustraron problemas en sesgos de algoritmos llevando a daños socio-económicos.
-| **Malinterpretación de datos** | 2020 - El [departamento de salud pública de Georgia liberó gráficos de COVID-19](https://www.vox.com/covid-19-coronavirus-us-response-trump/2020/5/18/21262265/georgia-covid-19-cases-declining-reopening) que parecían malinformar a los ciudadanos acerca de las tendencias en los casos confirmados sin orden cronológico en el eje x. Esto ilustra la malinterpretación a través de visualizaciones engañosas. |
-| **Ilusión de libertad de elección** | 2020 - La aplicación de aprendizaje [ABCmouse pagó $10M para asentar una queja FTC](https://www.washingtonpost.com/business/2020/09/04/abcmouse-10-million-ftc-settlement/) donde los padres fueron engañados para pagar subscripciones que no podían cancelar. Esto ilustra los patrones obscuros en arquitecturas de elección, donde los usuarios fueron empujados hacia elecciones potencialmente dañinas. |
-| **Privacidad de los datos y derechos de usuario** | 2021 - La [infracción de datos](https://www.npr.org/2021/04/09/986005820/after-data-breach-exposes-530-million-facebook-says-it-will-not-notify-users) de Facebook expuso datos de 530M de usuarios, resultando en un acuerdo de $5B para la FTC. Sin embargo, esto rechazó notificar a los usuarios de la brecha violando los derechos de usuarios alrededor de la transparencia y acceso de datos. |
-
-¿Quieres explorar más casos de estudio? Revisa los siguientes recursos:
-* [Ética desenvuelta](https://ethicsunwrapped.utexas.edu/case-studies) - dilemas éticos en diversas industrias.
-* [Curso de ética en ciencia de datos](https://www.coursera.org/learn/data-science-ethics#syllabus) - referencía los casos de estudio explorados.
-* [Donde las cosas han ido mal](https://deon.drivendata.org/examples/) - lista de comprobación de deon con ejemplos
-
-> 🚨 Piensa en los casos de estudio que has visto - ¿has experimentado o sido afectado por un desafío ético similar en tu vida? ¿Puedes pensar en al menos otro caso de estudio que ilustre uno de los desafíos éticos que discutimos en esta sección?
-
-## Ética aplicada
-
-Hemos hablado de conceptos éticos, desafíos y casos de estudio en contextos del mundo real. Pero ¿cómo podemos _aplicar_ los principios éticos y prácticas en nuestros proyectos? y ¿cómo _aplicamos_ estas prácticas para una mejor gobernanza? Exploremos algunas soluciones del mundo real:
-
-### 1. Códigos profesionales
-
-Los códigos profesionales ofrecen una opción para que las organizaciones "incentiven" a los miembros a apoyar sus principios éticos y su misión. Los códigos son _guías morales_ para el comportamiento profesional, que ayudan a los empleados o miembros a tomar decisiones que se alinea con sus principios de organización. Estas son tan buenas como el cumplimiento voluntario de los miembros; sin embargo, muchas organizaciones ofrecen incentivos adicionales y penalizaciones para motivar el cumplimiento de los miembros.
-
-Los ejemplos incluyen:
-
- * Código de ética de [Oxford Munich](http://www.code-of-ethics.org/code-of-conduct/)
- * Código de conducta de la [Asociación de ciencia de datos](http://datascienceassn.org/code-of-conduct.html) (creado en 2013)
- * [Código de ética y conducta profesional de ACM](https://www.acm.org/code-of-ethics) (desde 1993)
-
-> 🚨 ¿Perteneces a una organización profesional de ingeniería o ciencia de datos? Explora su sitio para ver si definen un código de ética profesional. ¿Qué te dice acerca de sus principios éticos? ¿Cómo "incentivan" a los miembros para que sigan el código?
-
-### 2. Listas de comprobación de ética
-
-Mientras los códigos profesionales defiene los _comportamientos éticos_ requerido por sus practicantes, estos tienen [limitaciones conocidas](https://resources.oreilly.com/examples/0636920203964/blob/master/of_oaths_and_checklists.md) en su aplicación, particularmente en proyectos a gran escala. En su lugar, muchos expertos en ciencia de datos [abogan por listas de comprobación](https://resources.oreilly.com/examples/0636920203964/blob/master/of_oaths_and_checklists.md), que pueden **conectar principios a prácticas** en formas más determinísticas y accionables.
-
-Las listas de comprobación convierten preguntas en tareas de "sí/no" que pueden ser operadas, permitiendo darles seguimiento como parte de flujos de trabajo de liberación de productos estándar.
-
-Los ejemplos incluyen:
- * [Deon](https://deon.drivendata.org/) - una lista de comprobación de ética de datos de propósito general creada a partir de [recomendaciones de la industria](https://deon.drivendata.org/#checklist-citations) con una herramienta de línea de comandos para su fácil integración.
- * [Lista de comprobación de auditoría de privacidad](https://cyber.harvard.edu/ecommerce/privacyaudit.html) - provee orientación general para prácticas de manejo de la información desde perspectivas legales y sociales.
- * [Lista de comprobación de justicia de AI](https://www.microsoft.com/en-us/research/project/ai-fairness-checklist/) - creada por practicantes de AI para soportar la adopción e integración de controles justos en los ciclos de desarrollo de AI.
- * [22 preguntas para ética en datos y AI](https://medium.com/the-organization/22-questions-for-ethics-in-data-and-ai-efb68fd19429) - marcos de trabajo más abiertos, estructurados para la exploración inicial de problemas éticos en contextos de diseño, implementación y organización.
-
-### 3. Regulaciones éticas
-
-La ética trata de definir valores compartidos y hacer lo correcto _voluntariamente_. El **cumplimiento** trata de _seguir la ley_ donde se define. La **gobernanza** cubre ampliamente todas las formas en las cuales las organizaciones operan para hacer cumplir los principios éticos y seguir las leyes establecidas.
-
-Hoy en día, la gobernanza toma dos formas dentro de la organización. Primero, define los principios **éticos de AI** y establece prácticas para promover la adopción en todos los proyectos relacionados a AI en la organización. Segundo, trata de cumplir con todoso los mandatos de gobierno en **regulaciones de protección de datos** para las regiones en las cuales opera.
-
-Ejemplos de protección de datos y regulaciones de privacidad:
-
- * `1974`, [Ley de privacidad de EE.UU.](https://www.justice.gov/opcl/privacy-act-1974) - regula al _gobierno federal_ la recolección, uso y divulgación de información personal.
- * `1996`, [Ley de responsabilidad y portabilidad de seguro de salud de EE.UU. (HIPAA)](https://www.cdc.gov/phlp/publications/topic/hipaa.html) - protege los datos de salud personales.
- * `1998`, [Ley de protección de la privacidad en línea para niños de EE.UU. (COPPA)](https://www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/childrens-online-privacy-protection-rule) - protege la privacidad de los datos para menores de 13 años.
- * `2018`, [Regulación de protección general de los datos (GDPR)](https://gdpr-info.eu/) - provee derechos de usuario, protección de datos y privacidad.
- * `2018`, [Ley de privacidad para los consumidores de California (CCPA)](https://www.oag.ca.gov/privacy/ccpa) da a los consumidores más _derechos_ sobre sus datos (personales).
- * `2021`, [Ley China de protección de la información personal](https://www.reuters.com/world/china/china-passes-new-personal-data-privacy-law-take-effect-nov-1-2021-08-20/) recién establecida, crea una de las regulaciones más grandes a nivel mundial respecto a privacidad de los datos.
-
-> 🚨 La Unión Europea definió la GDPR (regulación general de protección de datos) quedando como una de las regulaciones a la privacidad de los datos más influyentes de hoy en día. ¿Sabías que también define [8 derechos de usuario](https://www.freeprivacypolicy.com/blog/8-user-rights-gdpr) para la protección de la privacidad digital de los ciudadanos y datos personales? Aprende más acerca de qué son y porqué importan.
-
-
-### 4. Cultura ética
-
-Nota que existe una brecha intangible entre _cumplimiento_ (hacer suficiente para cumplir "lo designado por ley") y atender [problemas sistémicos](https://www.coursera.org/learn/data-science-ethics/home/week/4) (como la osificación, asimetría de la información e injusticia distribucional) que acelera el armamento de la AI.
-
-Lo último requier [enfoques colaborativos para definir culturas de ética](https://towardsdatascience.com/why-ai-ethics-requires-a-culture-driven-approach-26f451afa29f) que construyan conexiones emocionales y valores compartidos consistentes _a través de las organizaciones_ en la industria. Esto hace un llamado a [culturas de ética de datos más formalizadas](https://www.codeforamerica.org/news/formalizing-an-ethical-data-culture/) en las organizaciones - permitiendo a _cualquiera_ tirar del [cordón de Andon](https://en.wikipedia.org/wiki/Andon_(manufacturing)) (para plantear cuestiones éticas desde el principio en el proceso) y hacer de las _evaluaciones éticas_ (por ejemplo, en la contratación) un criterio principal en la formación de equipos en proyectos de AI.
-
----
-## [Examen posterior a la lección](https://red-water-0103e7a0f.azurestaticapps.net/quiz/3) 🎯
-## Revisión y auto-estudio
-
-Los siguientes cursos y libros te facilitarán el entendimiento de conceptos éticos principales y desafíos, mientras que los casos de estudio y herramientas te ayudarán con las prácticas éticas aplicadas en contextos del mundo real. Aquí tienes algunos recursos con los que comenzar.
-
-* [Aprendizaje automático para principiantes](https://github.com/microsoft/ML-For-Beginners/blob/main/1-Introduction/3-fairness/README.md) - lecciones de justicia, de Microsoft.
-* [Principios de AI responsable](https://docs.microsoft.com/en-us/learn/modules/responsible-ai-principles/) - ruta de aprendizaje gratuito de Microsoft Learn.
-* [Ética y Ciencia de Datos](https://resources.oreilly.com/examples/0636920203964) - Libro electrónico de O'Reilly (M. Loukides, H. Mason et. al)
-* [Ética de Ciencia de Datos](https://www.coursera.org/learn/data-science-ethics#syllabus) - curso en línea de la Universidad de Michigan.
-* [Ética desenvuelta](https://ethicsunwrapped.utexas.edu/case-studies) - casos de estudio de la Universidad de Texas.
-
-# Asignación
-
-[Escribe un caso de estudio de ética de datos](assignment.md)
From b68683d233501d2f39ec7748f49a08d6846c8a14 Mon Sep 17 00:00:00 2001
From: Heril Changwal <76246330+Heril18@users.noreply.github.com>
Date: Mon, 11 Oct 2021 11:07:11 +0530
Subject: [PATCH 084/319] Update README.hi.md
---
.../16-communication/translations/README.hi.md | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md b/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md
index 234072d7..fcfab7f7 100644
--- a/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md
+++ b/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md
@@ -1,7 +1,8 @@
# डेटा विज्ञान के जीवनचक्र: संचार
-| ](../../sketchnotes/16-Communicating.png)|
+
+|](../../sketchnotes/16-Communicating.png)|
|:---:|
-| डेटा विज्ञान के जीवनचक्र: संचार - [@nitya](https://twitter.com/nitya)_द्वारा स्केचनोट _ |
+| डेटा विज्ञान के जीवनचक्र: संचार - _[@nitya](https://twitter.com/nitya) द्वारा स्केचनोट_|
## [प्री-लेक्चर क्विज ](https://red-water-0103e7a0f.azurestaticapps.net/quiz/30)
ऊपर दिए गए प्री-लेक्चर क्विज़ के साथ क्या करना है, इसके बारे में अपने ज्ञान का परीक्षण करें!
@@ -37,7 +38,7 @@
### 1. अपने दर्शकों, अपने चैनल और अपनी संचार पद्धति को समझें
जिस तरह से आप परिवार के सदस्यों के साथ संवाद करते हैं, वह आपके दोस्तों के साथ संवाद करने के तरीके से अलग होने की संभावना है। आप शायद अलग-अलग शब्दों और वाक्यांशों का उपयोग करते हैं जिन्हें आप जिन लोगों से बात कर रहे हैं, उनके समझने की अधिक संभावना है। डेटा संचार करते समय आपको वही दृष्टिकोण अपनाना चाहिए। इस बारे में सोचें कि आप किससे संवाद कर रहे हैं। उनके लक्ष्यों और उस संदर्भ के बारे में सोचें जो उनके पास उस स्थिति के आसपास है जो आप उन्हें समझा रहे हैं।
-आप संभावित रूप से अपने अधिकांश दर्शकों को एक श्रेणी में समूहित कर सकते हैं। एक _Harvard Business Review_ लेख में, "[डेटा के साथ कहानी कैसे सुनाएं] (http://blogs.hbr.org/2013/04/how-to-tell-a-story-with-data/)," डेल कार्यकारी रणनीतिकार जिम स्टिकलेदर दर्शकों की पांच श्रेणियों की पहचान करता है।
+आप संभावित रूप से अपने अधिकांश दर्शकों को एक श्रेणी में समूहित कर सकते हैं। एक _Harvard Business Review_ लेख में, “[डेटा के साथ कहानी कैसे बताएं](http://blogs.hbr.org/2013/04/how-to-tell-a-story-with-data/),” डेल कार्यकारी रणनीतिकार जिम स्टिकलेदर दर्शकों की पांच श्रेणियों की पहचान करता है।
- **नौसिखिया**: विषय के लिए पहला प्रदर्शन, लेकिन नहीं चाहता
अति सरलीकरण
@@ -66,9 +67,9 @@
आप अंत को ध्यान में रखकर कैसे शुरू करते हैं? अपने डेटा को संप्रेषित करने से पहले, अपने मुख्य निष्कर्ष लिख लें। फिर, जिस तरह से आप कहानी तैयार कर रहे हैं, जिस तरह से आप अपने डेटा के साथ बताना चाहते हैं, अपने आप से पूछें, "यह मेरे द्वारा बताई जा रही कहानी में कैसे एकीकृत होता है?"
-सावधान रहें - अंत को ध्यान में रखते हुए शुरुआत करना आदर्श है, आप केवल उस डेटा को संप्रेषित नहीं करना चाहते जो आपके इच्छित takeaways का समर्थन करता है। ऐसा करने को चेरी-पिकिंग कहा जाता है, जो तब होता है जब एक संचारक केवल उस डेटा का संचार करता है जो उस बिंदु का समर्थन करता है जिसे वे बनाने के लिए बांध रहे हैं और अन्य सभी डेटा को अनदेखा करते हैं।
+सावधान रहें - अंत को ध्यान में रखते हुए शुरुआत करना आदर्श है, आप केवल उस डेटा को संप्रेषित नहीं करना चाहते जो आपके इच्छित टेकअवे का समर्थन करता है। ऐसा करने को चेरी-पिकिंग कहा जाता है, जो तब होता है जब एक संचारक केवल उस डेटा का संचार करता है जो उस बिंदु का समर्थन करता है जिसे वे बनाने के लिए बांध रहे हैं और अन्य सभी डेटा को अनदेखा करते हैं।
-यदि आपके द्वारा एकत्र किया गया सभी डेटा स्पष्ट रूप से आपके इच्छित टेकअवे का समर्थन करता है, तो बढ़िया। लेकिन अगर आपके द्वारा एकत्र किया गया डेटा है जो आपके टेकअवे का समर्थन नहीं करता है, या यहां तक कि आपके प्रमुख टेकअवे के खिलाफ तर्क का समर्थन करता है, तो आपको उस डेटा को भी संप्रेषित करना चाहिए। अगर ऐसा होता है, तो अपने दर्शकों के साथ खुलकर बात करें और उन्हें बताएं कि आप अपनी कहानी के साथ बने रहने का विकल्प क्यों चुन रहे हैं, भले ही सभी डेटा इसका समर्थन न करें।
+यदि आपके द्वारा एकत्र किया गया सभी डेटा स्पष्ट रूप से आपके इच्छित टेकअवे का समर्थन करता है, तो बढ़िया। लेकिन अगर आपके द्वारा एकत्र किया गया डेटा है जो आपके टेकअवे का समर्थन नहीं करता है, या यहां तक कि आपके प्रमुख टेकअवे के खिलाफ तर्क का समर्थन करता है, तो आपको उस डेटा को भी संप्रेषित करना चाहिए। अगर ऐसा होता है, तो अपने दर्शकों के साथ खुलकर बात करें और उन्हें बताएं कि आप अपनी कहानी के साथ बने रहने का विकल्प क्यों चुन रहे हैं, भले ही सभी डेटा इसका समर्थन न करें।
### 3. इसे एक वास्तविक कहानी की तरह देखें
@@ -208,4 +209,4 @@
## कार्यभार
-[एक कहानी बताओ] (assignment.md)
\ No newline at end of file
+[एक कहानी बताओ](assignment.md)
From 2011504c98059a6d5540ad45b39a9aa51374c7c6 Mon Sep 17 00:00:00 2001
From: Heril Changwal <76246330+Heril18@users.noreply.github.com>
Date: Mon, 11 Oct 2021 11:16:09 +0530
Subject: [PATCH 085/319] Update README.hi.md
---
.../16-communication/translations/README.hi.md | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md b/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md
index fcfab7f7..85ae92d4 100644
--- a/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md
+++ b/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md
@@ -1,6 +1,6 @@
# डेटा विज्ञान के जीवनचक्र: संचार
-|](../../sketchnotes/16-Communicating.png)|
+|](https://github.com/Heril18/Data-Science-For-Beginners/raw/main/sketchnotes/16-Communicating.png)|
|:---:|
| डेटा विज्ञान के जीवनचक्र: संचार - _[@nitya](https://twitter.com/nitya) द्वारा स्केचनोट_|
@@ -147,7 +147,7 @@
**क्लाइमेक्स** आधार तैयार करने के बाद, इमर्सन 5 या इतने मिनट के लिए चरमोत्कर्ष पर जा सकता था।
-इमर्सन प्रस्तावित समाधानों को पेश कर सकता है, यह बता सकता है कि वे समाधान कैसे उल्लिखित मुद्दों को संबोधित करेंगे, उन समाधानों को मौजूदा वर्कफ़्लो में कैसे लागू किया जा सकता है, समाधानों की लागत कितनी है, समाधानों का आरओआई क्या होगा, और शायद कुछ स्क्रीनशॉट भी दिखा सकते हैं या लागू होने पर समाधान कैसे दिखेंगे, इसके वायरफ्रेम। एमर्सन उन उपयोगकर्ताओं के प्रशंसापत्र भी साझा कर सकते हैं, जिन्होंने अपनी शिकायत को संबोधित करने में 48 घंटे से अधिक समय लिया, और यहां तक कि कंपनी के भीतर एक मौजूदा ग्राहक सेवा प्रतिनिधि से एक प्रशंसापत्र भी, जिसने वर्तमान टिकट प्रणाली पर टिप्पणी की है।
+इमर्सन प्रस्तावित समाधानों को पेश कर सकता है, यह बता सकता है कि वे समाधान कैसे उल्लिखित मुद्दों को संबोधित करेंगे, उन समाधानों को मौजूदा वर्कफ़्लो में कैसे लागू किया जा सकता है, समाधानों की लागत कितनी है, समाधानों का आरओआई क्या होगा, और शायद कुछ स्क्रीनशॉट भी दिखा सकते हैं या लागू होने पर समाधान कैसे दिखेंगे, इसके वायरफ्रेम। एमर्सन उन उपयोगकर्ताओं के प्रशंसापत्र भी साझा कर सकते हैं, जिन्होंने अपनी शिकायत को संबोधित करने में 48 घंटे से अधिक समय लिया, और यहां तक कि कंपनी के भीतर एक मौजूदा ग्राहक सेवा प्रतिनिधि से एक प्रशंसापत्र भी, जिसने वर्तमान टिकट प्रणाली पर टिप्पणी की है।
**क्लोजर** अब इमर्सन कंपनी के सामने आने वाली समस्याओं को दूर करने में 5 मिनट बिता सकता है, प्रस्तावित समाधानों पर फिर से विचार कर सकता है और समीक्षा कर सकता है कि वे समाधान सही क्यों हैं।
@@ -175,7 +175,7 @@
[डेटा के साथ कहानी कैसे सुनाएं (hbr.org)](https://hbr.org/2013/04/how-to-tell-a-story-with-data)
-[टू-वे कम्युनिकेशन: अधिक व्यस्त कार्यस्थल के लिए 4 टिप्स (yourविचारपार्टनर.कॉम)](https://www.your Thoughtpartner.com/blog/bid/59576/4-steps-to-increase-employee-engagement-through- दो तरफ से संचार)
+[टू-वे कम्युनिकेशन: अधिक व्यस्त कार्यस्थल के लिए 4 टिप्स (yourthoughtpartner.com)](https://www.yourthoughtpartner.com/blog/bid/59576/4-steps-to-increase-employee-engagement-through-two-way-communication)
[महान डेटा स्टोरीटेलिंग के लिए 6 संक्षिप्त चरण - बार्नराइज़र, एलएलसी (barnraisersllc.com)](https://barnraisersllc.com/2021/05/02/6-succinct-steps-to-great-data-storytelling/)
From f9218ecd3029e4ec2d4e02315ec899a23cc07f8b Mon Sep 17 00:00:00 2001
From: Dhruv Krishna Vaid
Date: Mon, 11 Oct 2021 13:08:02 +0530
Subject: [PATCH 086/319] Added Hindi translation
---
.../translations/README.hi.md | 20 +++++++++++++++++++
1 file changed, 20 insertions(+)
create mode 100644 5-Data-Science-In-Cloud/translations/README.hi.md
diff --git a/5-Data-Science-In-Cloud/translations/README.hi.md b/5-Data-Science-In-Cloud/translations/README.hi.md
new file mode 100644
index 00000000..552ea6ee
--- /dev/null
+++ b/5-Data-Science-In-Cloud/translations/README.hi.md
@@ -0,0 +1,20 @@
+# क्लाउड में डेटा साइंस
+
+
+
+> [Unsplash](https://unsplash.com/s/photos/cloud?orientation=landscape) से [जेलेके वनूटेघम](https://unsplash.com/@ilumire) द्वारा फोटो।
+
+जब बड़े डेटा के साथ डेटा साइंस करने की बात आती है, तो क्लाउड गेम चेंजर हो सकता है। अगले तीन पाठों में हम यह देखने जा रहे हैं कि क्लाउड क्या है और यह इतना मददगार क्यों हो सकता है। हम हृद्पात (दिल की धड़कन रुकना) के डेटासेट का भी पता लगाने जा रहे हैं और किसी के हृद्पात की संभावना का आकलन करने में मदद करने के लिए एक मॉडल का निर्माण करने जा रहे हैं। हम दो अलग-अलग तरीकों से एक मॉडल को प्रशिक्षित करने, डिप्लॉय करने और उपभोग करने के लिए क्लाउड की शक्ति का उपयोग करेंगे। एक तरीका कम कोड/नो कोड फैशन में केवल यूजर इंटरफेस का उपयोग करके, दूसरा तरीका एज़ूर मशीन लर्निंग सॉफ्टवेयर डेवलपर किट (एज़ूर एमएल एस.डी.के) का उपयोग करके।
+
+
+
+### विषय
+
+1. [डेटा साइंस के लिए क्लाउड का उपयोग क्यों करें?](../17-Introduction/README.md)
+2. [क्लाउड में डेटा साइंस: "लो कोड/नो कोड" तरीका](../18-Low-Code/README.md)
+3. [क्लाउड में डेटा साइंस: "एज़ूर एमएल एस.डी.के" तरीका](../19-Azure/README.md)
+
+### आभार सूची
+ये पाठ [मौड लेवी](https://twitter.com/maudstweets) और [टिफ़नी सॉटर्रे](https://twitter.com/TiffanySouterre) द्वारा ☁️ और 💕 के साथ लिखे गए थे।
+
+हार्ट फेल्योर प्रेडिक्शन प्रोजेक्ट के लिए डेटा [कागल](https://www.kaggle.com/andrewmvd/heart-failure-clinical-data) पर [लारक्सेल](https://www.kaggle.com/andrewmvd) से प्राप्त किया गया है। इसे [एट्रिब्यूशन 4.0 इंटरनेशनल (CC BY 4.0)](https://creativecommons.org/licenses/by/4.0/) के तहत लाइसेंस दिया गया है।
\ No newline at end of file
From ce293650bddd05f67f9b44b2b1aa047e234f8764 Mon Sep 17 00:00:00 2001
From: Izael
Date: Mon, 11 Oct 2021 10:25:55 -0300
Subject: [PATCH 087/319] Translated Introduction -> Ethics to PT-BR
---
.../02-ethics/translations/RAEDME.pt-br.md | 262 ++++++++++++++++++
1 file changed, 262 insertions(+)
create mode 100644 1-Introduction/02-ethics/translations/RAEDME.pt-br.md
diff --git a/1-Introduction/02-ethics/translations/RAEDME.pt-br.md b/1-Introduction/02-ethics/translations/RAEDME.pt-br.md
new file mode 100644
index 00000000..ec921d8f
--- /dev/null
+++ b/1-Introduction/02-ethics/translations/RAEDME.pt-br.md
@@ -0,0 +1,262 @@
+# Introdução a Ética de Dados
+
+| ](../../../sketchnotes/02-Ethics.png)|
+|:---:|
+| Ética em Ciência de Dados - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
+
+---
+
+Nós somos todos cidadãos dos dados vivendo em um mundo de dados.
+
+Tendências do mercado nos mostram que até 2022, 1 em 3 grandes organizações irá comprar e vender seus dados através de [Marketplaces e Exchanges](https://www.gartner.com/smarterwithgartner/gartner-top-10-trends-in-data-and-analytics-for-2020/) online. Como **Desenvolvedores de Aplicativos**, nós vamos achar mais fácil e mais barato integrar insights baseados em dados e automações baseadas em algoritmos nas experiências diárias dos usuário. Mas conforme IA se torna mais difundida, nós também vamos precisar entender os danos potenciais causado pelo uso desses algoritmos [como uma arma](https://www.youtube.com/watch?v=TQHs8SA1qpk).
+
+Tendências também indicam que nós vamos criar e consumir mais de [180 zettabytes](https://www.statista.com/statistics/871513/worldwide-data-created/) de dados em 2025. Como **Cientistas de Dados**, isso nos dará níveis de acesso sem precedentes à dados pessoais. Isso significa que poderemos construir perfis comportamentais dos usuário e influenciar tomadas de decisão de uma forma que crie a [ilusão da livre escolha](https://www.datasciencecentral.com/profiles/blogs/the-illusion-of-choice) enquanto potencialmente direcionando os usuários na direção do resultado que nós preferimos. Isso também levanta questões mais amplas sobre privacidade dos dados e proteção dos usuários.
+
+Ética dos dados é agora um _guarda-corpos necessário_ para ciẽncia de dados e engenharia, nos ajudando a minimizar potenciais danos e consequências não intencionas das nossas ações realizadas com base em dados. O [Gartner Hype Cycle for AI](https://www.gartner.com/smarterwithgartner/2-megatrends-dominate-the-gartner-hype-cycle-for-artificial-intelligence-2020/) identifica tendências relevantes ná ética digital, IAs responsáveis, e governanças de IA como principais impulsionadores para grandes mega tendências sobre _democratização_ e _industrialização_ da IA.
+
+
+
+Nessa aula, nós vamos explorar a área fascinante de ética dos dados - desde conceitos essenciais e desafios, para estudos de caso e conceitos de IA aplicados como governança - isso ajuda a estabelecer a cultura da ética nos times e organizações que trabalham com dados e IA.
+
+
+
+
+## [Quiz pré aula](https://red-water-0103e7a0f.azurestaticapps.net/quiz/2) 🎯
+
+## Definição Básica
+
+Vamos começar entendendo o básico da terminologia.
+
+A palavra "ética" vem da [palavra Grega "ethikos"](https://en.wikipedia.org/wiki/Ethics) (e sua raíz "ethos") que significa _caráter ou natureza moral_.
+
+**Ética** é sobre os valores e princípios morais compartilhados que governam o nosso comportamento em sociedade. Ética é baseada não nas leis mas nas normas amplamente aceitas sobre o que é "certo vs. errado". No entanto, considerações éticas podem influenciar iniciativas de governança corporativa e regulamentações governamentais que criam mais incentivos para conformidade (compliance).
+
+**Ética de Dados** é uma [nova ramificação da ética](https://royalsocietypublishing.org/doi/full/10.1098/rsta.2016.0360#sec-1) que "estuda e avalia problemas morais relacionados a _dados, algoritmos e práticas correspondentes_". Aqui, **"dados"** focam nas ações relacionadas a geração, gravação, curadoria, disseminação de processamento, compartilhamento, e uso, **"algoritmos"** focam em IA, agentes, aprendizado de máquina, e robôs, e **"práticas"** focam em tópicos como inovação responsável, programação, hacking e códigos de ética.
+
+**Ética Aplicada** é a [aplicação prática de considerações morais](https://en.wikipedia.org/wiki/Applied_ethics). É o processo de investigar ativamente problemáticas éticas no contexto de _ações do mundo real, produtos e processos_, e tomar medidas corretivas para fazer com que esses permanecam alianhados com o nossos valores éticos definidos.
+
+**Cultura Ética** é sobre [operacionalizar a ética aplicada](https://hbr.org/2019/05/how-to-design-an-ethical-organization) para garantir que nossos princípios e práticas éticas sejam adotados de maneira consistente e escalável em toda a organização. Culturas éticas de sucesso definem princípios éticos em toda a organização, fornecem incentivos significativos para consistência, e reinforça as normas éticas encorajando e amplificando comportmentos desejados em todos os níveis da organização.
+
+
+## Conceitos Éticos
+
+Nessa seção, nós vamos discutir conceitos como **valores compartilhados** (princípios) e **desafios éticos** (problemas) para a ética de dados - e explorar **estudos de caso** que ajudam você a entender esses conceitos em contextos do mundo real.
+
+### 1. Princípios Éticos
+
+Toda estratégia de ética de dados começa definindo _pricípios éticos_ - os "valores compartilhados" que descrevem comportamentos aceitáveis, e guia ações complacentes, nos nossos dados e nos projetos de IA. Você pode definir eles individualmente ou com um time. No entando, a maioria das grandes organizações descreve eles em uma declaração de missão ou de estrutura de _IA ética_ que é definida em níveis corporativos e aplicadas consistentemente em todos os times.
+
+**Exemplo:** a declaração de missão da [IA responsável](https://www.microsoft.com/pt-br/ai/responsible-ai?activetab=pivot1:primaryr6) da Microsoft afirma: _"Estamos comprometidos com o avanço da AI impulsionados por princípios éticos que colocam as pessoas em primeiro lugar."_ - identificando 6 princípios éticos na estrutura abaixo:
+
+
+
+Vamos explorar brevemente esses princípios. _Transparência_ e _responsabilidade_ são valores fundamentais nos quais outros princípios construíram sobre - então vamos começar aí:
+
+* [**Responsabilidade**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) torna os profissionais _responsáveis_ pelos seus dados e operações da IA, e conformidade (compliance) com esses princípios éticos.
+* [**Transparência**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) garante que os dados e as ações da IA são _compreesíveis_ (interpretáveis) para os usuários, explicando o que e o porquê por trás de cada decisão.
+* [**Justiça**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1%3aprimaryr6) - foca em garantir que a IA _trate_ todas as pessoas de forma justa, abordando quaisquer preconceitos sociotécnicos implícitos ou sistêmicos nos dados e sistemas.
+* [**Confiabilidade e Segurança**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - garante que a IA comporte de maneira _consistente_ com os valores definidos, minimizando potenciais danos ou consequências não pretendidas.
+* [**Segurança e Privacidade**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - é sobre compreender as linhagem dos dados, e fornecer _privacidade de dados e proteções relacionadas_ aos usuários.
+* [**Inclusão**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - é sobre projetar soluções de IA com intenção, adaptando elas para atender uma _vasta game de necessidades humanas_ & capacidades.
+
+> 🚨 Pense sobre qual poderia ser a frase de missão da sua ética de dados. Explore estruturas éticas de IA de outras organizações - aqui estão alguns exemplos da [IBM](https://www.ibm.com/cloud/learn/ai-ethics), [Google](https://ai.google/principles), e [Facebook](https://ai.facebook.com/blog/facebooks-five-pillars-of-responsible-ai/). Quais valores compartilhados vocês tem em comum? Como esses princípios se relacionam ao produto de IA ou à indústria na qual eles operam?
+
+### 2. Desafios de Ética
+
+Uma vez que nossos princípios éticos estão definidos, o próximo passo é avaliar nossos dados e ações da IA para ver se eles estão alinhados com aqueles valores compartilhados. Pense sobre suas ações em duas categorias: _coleção de dados_ e _design de algoritmo_.
+
+Com coleções dados, ações irão, provavelmente, envolver **dados pessoais** ou informação pessoalmente identificável (do Inglês, personally identifiable information, ou PII) para indivíduos vivos identificáveis. Isso inclui [itens diversos de dados não pessoais](https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en) que _coletivamente_ identificam um indivíduo. Desafios éticos podem estar relacionados à _privacidade dos dados_, _qualidade dos dados_, e tópicos relacionados como _consentimento informado_ e _direitos de propriedades intelectuais_ para os usuários.
+
+Com o design de algoritmo, as ações envolverão coleta e curadoria dos **datasets**, e então o uso deles para treinar e implantar **modelos de dados** que predizem resultados ou automatizam decisões em contextos do mundo real. Desafios éticos podem surgir de _vieses do dataset_ (biases), problemas com a _qualidade de dados_, _injustiça_, e _má representação_ nos algoritmos - incluindo alguns problemas que são sistêmicos na natureza.
+
+Em ambos os casos, desafios de ética destacam áreas onde nossas ações podem conflitar com nossos valores compartilhados. Para detectar, mitigar, minimizar, ou eliminar, essas preocupações - nós precisamos perguntar questões morais de "sim ou não" relacionadas as nossas ações, e então tomar uma ação corretiva conforme necessário. Vamos olhar alguns desafios éticos e as questões morais que eles levantam:
+
+
+#### 2.1 Propriedade de Dados
+
+A coleta de dados geralmente envolve dados pessoais que podem identificar os titulares dos dados. [Propriedade de dados](https://permission.io/blog/data-ownership) é sobre o _controle_ e [_direitos dos usuários_](https://permission.io/blog/data-ownership) relacionados à criação, processamento, e disseminação dos dados.
+
+As questões morais que precisamos nos perguntar são:
+ * Quem detêm/possui os dados? (usuário ou organização)
+ * Quais direitos os titulares dos dados tem? (ex: acesso, apagar, portabilidade)
+ * Quais direitos as organizações tem? (ex: retificar reviews maliciosas de usuários)
+
+#### 2.2 Consentimento Informado
+
+[Consentimento Informado](https://legaldictionary.net/informed-consent/) define o ato dos usuários aceitar uma ação (como a coleta de dados) com um _compreendimento total_ de fatos relevantes incluindo propósito, potenciais riscos, e alternativas.
+
+Questões a se explorar aqui são:
+ * O usuário (titular dos dados) deu permissão para a captação e uso dos dados?
+ * O usuário entendeu o propósito para o qual aqueles dados foram coletados?
+ * O usuário entendeu os potenciais riscos de sua participação?
+
+#### 2.3 Propriedade Intelectual
+
+[Propriedade intelectual](https://en.wikipedia.org/wiki/Intellectual_property) se refere a criações intangíveis que foram resultados das iniciativas humanas, que podem _ter valor econômico_ para indivíduos ou negócios.
+
+Questões a se explorar aqui são:
+ * Os dados coletados tem valor econômicos para um usuário ou negócio?
+ * O **usuário** tem propriedade intelectual aqui?
+ * As **organizações** tem propriedade intelectual aqui?
+ * Se esses direitos existem, como estamos protejendo eles?
+
+#### 2.4 Privacidade de Dados
+
+[Privacidade de dados](https://www.northeastern.edu/graduate/blog/what-is-data-privacy/) ou privacidade da informação se refere a preservação da privacidade do usuário e proteção da identidade do usuário com relação as informações de indentificação pessoal.
+
+Questões a se explorar aqui são:
+ * Os dados (pessoais) dos usuários estão protegidos contra hacks e vazamentos?
+ * Os dados do usuário são acessíveis somente a usuários e contextos autorizados?
+ * A anonimidade do usuário são preservados quando os dados são compartilhados ou disseminados?
+ * Um usuário podem ser desindentificado de datasets anônimos?
+
+
+#### 2.5 Direito a Ser Esquecido
+
+o [Direito a Ser Esquecido](https://en.wikipedia.org/wiki/Right_to_be_forgotten) ou [Direito de Apagar](https://www.gdpreu.org/right-to-be-forgotten/) fornecem proteções de dados adicionais para os usuários. Especificamente, dá aos usuários o direito de pedir deleção ou remoção dos dados pessoais das buscas da Internet e outros locais, _sobre circunstâncias específicas_ - permitindo a eles um novo começo online sem que as ações passadas sejam colocadas contra eles.
+
+Questões a se explorar aqui são:
+ * O sistema permite que os titulares dos dados peçam o apagamento dos mesmos?
+ * A retirada do consentimento do usuário deve acionar um apagamento automático?
+ * Dados foram colocados sem o consentimento ou por meios ilegais?
+ * Estamos de acordo com regulações governamentais para a privacidade de dados?
+
+
+#### 2.6 Viéses dos Datasets
+
+Dataset ou [Viéses da Coleção](http://researcharticles.com/index.php/bias-in-data-collection-in-research/) é sobre selecionar um subset de dados _não representativos_ para o desenvolvimento de um algoritmo, criando potenciais injustiças nos resultados para grupos diversos. Os tipos de viéses incluem seleção ou viés da amostra, viés voluntário, e viés do instrumento.
+
+Questões a se explorar aqui são:
+ * Recrutamos um conjunto representativo de titulares de dados?
+ * Nós testamos nossos datasets colecionados ou com curadoria para diversos viéses?
+ * Nós podemos mitigar ou remover quaisquer viéses descobertos?
+
+#### 2.7 Qualidade de Dados
+
+[Qualidade de Dados](https://lakefs.io/data-quality-testing/) procura pela validade do dataset com curadoria usado para desenvolver nossos algoritmos, checando para ver se recursos e registros atendem os requisitos para o nível de acurácia e consistência necessários para o propósito da nossa IA.
+
+Questões a se explorar aqui são:
+ * Nós coletamos _features_ válidos para nosso caso de uso?
+ * Os dados foram coletados _consistentemente_ em diversas fontes de dados?
+ * O dataset é _completo_ para diversas condições e cenários?
+ * As informações capturadas refletem _com precisão_ a realidade?
+
+#### 2.8 Justiça do Algoritmo
+
+[Justiça do Algoritmo](https://towardsdatascience.com/what-is-algorithm-fairness-3182e161cf9f) checa para ver se o design do algoritmo discrimina sistematicamente subgrupos específicos dos titulares dos dados levando a [potenciais danos](https://docs.microsoft.com/en-us/azure/machine-learning/concept-fairness-ml) em _alocação_ (onde recursos são negados ou detidos daquele grupo) e _qualidade de serviço_ (onde IA não é tão acurada para alguns subgrupos quanto é para outros).
+
+Questões a se explorar aqui são:
+ * Nós avaliamos a acurácia do modelo para diversos subgrupos e condições?
+ * Nós examinamos o sistema em busca de danos potenciais (ex. estereótipos)?
+ * Nós podemos revisar os dados ou retreinar os modelos para mitigar danos identificados?
+
+Explore recursos como [Checklist de Justiça de IA](https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4t6dA) para saber mais.
+
+#### 2.9 Má Representação
+
+[Má Representação dos Dados](https://www.sciencedirect.com/topics/computer-science/misrepresentation) é sobre perguntar se nós estamos comunicando insights de dados honestamente relatados de uma maneira enganosa para suportar uma narrativa desejada.
+
+Questões a se explorar aqui são:
+ * Estamos relatando dados completos ou inacurados?
+ * Estamos visualizando dados de uma maneira que conduz a uma conclusão errada?
+ * Estamos usando técnicas estatísticas seletivas para manipular os resultados?
+ * Existem explicações alternativas que podem oferecer uma conclusão diferente?
+
+#### 2.10 Livre Escolha
+A [Ilusão da Livre Escolha](https://www.datasciencecentral.com/profiles/blogs/the-illusion-of-choice) ocorre quando as "arquiteturas de escolha" do sistema utiliza algoritmos de tomada de decisão para incentivar as pessoas a obterem um resultado preferido enquanto parece lhe dar opções e controle. Esses [dark patterns](https://www.darkpatterns.org/) podem causar danos sociais e econômicos aos usuários. Já que as decisões do usuário impactam perfis de comportamento, essas ações potencialmente conduzem as escolhas futuras que podem aplificar ou extender o impacto desses danos.
+
+Questões a se explorar aqui são:
+ * O usuário entende as implicações de fazer aquela escolha?
+ * O usuário estava ciente das escolhas (alternativas) e dos prós e contras de cada uma?
+ * O usuário pode reverter um escolha automatizada ou influenciada depois?
+
+### 3. Estudo de Casos
+
+Para colocar esses desafios éticos em contextos do mundo real, ajuda olhar para estudo de casos que destacam potenciais danos e consequências para indivíduos e sociedade, quando essas violações éticas são negligenciadas.
+
+Aqui estão alguns exemplos:
+
+| Desafios de Éticas | Estudo de Caso |
+|--- |--- |
+| **Consentimento Informado** | 1972 - [Tuskegee Syphillis Study](https://en.wikipedia.org/wiki/Tuskegee_Syphilis_Study) - Homens afro-americanos que participaram no estudo foram prometidos cuidados médicos livres de custo _mas foram enganados_ pelos pesquisadores que não informaram os participantes de seus diagnósticos ou sobre a avaliabilidade de tratamentos. Muitos participantes morreram e parceiros e ciranças foram afetados; oe studo durou por 40 anos. |
+| **Privacidade de Dados** | 2007 - O [Netflix data prize](https://www.wired.com/2007/12/why-anonymous-data-sometimes-isnt/) forneceu a pesquisadores _10M de avaliações anônimas de filmes de 50K clientes_ para ajudar a melhorar os algoritmos de recomendação. No entanto, os pesquisadores conseguiram correlacionar os dados anônimos com dados de identificação pessoal em _datasets externos_ (ex. comentários no IMDb) - "desanonimizando" efetivamente alguns assinates da Netflix.|
+| **Viéses dos Datasets** | 2013 - A Cidade de Boston [desenvolveu Street Bump](https://www.boston.gov/transportation/street-bump), um aplicativo que deixa os usuários relatarem burcos nas ruas, dando à cidade melhores dados rodoviários para encontrar e consertar problemas. No entanto, [pessoas que faziam parte de grupos de baixa renda tinham menos acesso a carros e celulares](https://hbr.org/2013/04/the-hidden-biases-in-big-data), fazendo com que os seus problema rodoviários fossem invisíveis nesse aplicativo. Desenvolvedores trabalharm com acadêmicos para questões de _acesso equitativo e divisões digitais_ para justiça. |
+| **Justiça do Algoritmo** | 2018 - [O Gender Shades Study do MIT](http://gendershades.org/overview.html) avaliou a acurácia de produtos de IA de classificação de gêneros, expondo lacunas na acurácia para mulheres e pessoas não brancas. Um [Apple Card de 2019](https://www.wired.com/story/the-apple-card-didnt-see-genderand-thats-the-problem/) parece oferecer menos créditos para mulheres do que oferece para homens. Ambos ilustraram questões de viés algorítmico levando a danos socioeconômicos.|
+| **Má Representação de Dados** | 2020 - O [Departamento de Sáude Pública da Georgia (Georgia Department of Public Health) liberou gráficos da COVID-19](https://www.vox.com/covid-19-coronavirus-us-response-trump/2020/5/18/21262265/georgia-covid-19-cases-declining-reopening) que aparentam a levar os cidadãos a conclusões errôneas sobre as tendências em casos confirmados em uma ordem não cronológica no eixo x. Isso ilustra a má representação atráves de truques de visualização. |
+| **Ilusão da Livre Escolha** | 2020 - Aplicativo de aprendizado [ABCmouse pagou $10M para resolver uma reclamação do FTC](https://www.washingtonpost.com/business/2020/09/04/abcmouse-10-million-ftc-settlement/) onde os pais foram enganados a pagar assinaturas que eles não podiam cancelar. Isso ilustra "dark patterns" em arquiteturas de escolha, onde usuários foram direcionados a escolhas potencialmente prejudiciais. |
+| **Privacidade de Dados & Direitos do Usuário** | 2021 - [Violação de Dados do facebook](https://www.npr.org/2021/04/09/986005820/after-data-breach-exposes-530-million-facebook-says-it-will-not-notify-users) expôs dados de mais de 530M de usuários, resultando em um acordo de $5B com o FTC (Federal Trade Commission). No entanto, o Facebook se recusou a notificar os usuários sobre a violação dos dados violando os direitos dos usuários de transparência e acesso de dados. |
+
+Gostaria de explorar mais estudos de caso? Confira:
+* [Ethics Unwrapped](https://ethicsunwrapped.utexas.edu/case-studies) - dilemas éticos em indústrias diversas.
+* [Data Science Ethics course](https://www.coursera.org/learn/data-science-ethics#syllabus) - estudos de caso marcantes explorados.
+* [Where things have gone wrong](https://deon.drivendata.org/examples/) - checklists da deon com exemplos
+
+> 🚨 Pense sobre estudos de caso que você ja viu - você ja experienciou, ou foi afetado por, um desafio ético similar em sua vida? Voce consegue pensar em pelo menos um estudo de caso que ilustre um ou mais desafios éticos que discutimos nessa seção?
+
+## Ética aplicada
+
+Nós falamos sobre conceitos de éticas, desafios, e casos de estudo em contextos do mundo real. Mas como nós começamos a _aplicar_ esses princípios éticos em nossos projetos? E como nós _operacionalizamos_ essas práticas para melhor governância? Vamos explorar algumas soluções do mundo real:
+
+### 1. Códigos Profissionais
+
+Códigos Profisionais oferecem uma opção para organizações para "incentivar" membros a apoiar os princípios éticos e frase de missão. Códigos são _diretrizes morais_ para comportamento profissional, ajudando funcionários ou membros a tomar decisões que alinhem com os princípios da sua organização. Eles são tão bons quanto a conformidade voluntária dos membros; no entanto, muitas organizações oferecem recompensas e penalidades adicionais para motivar a conformidade dos membros.
+
+Exemplos incluem:
+
+ * [Oxford Munich](http://www.code-of-ethics.org/code-of-conduct/) Código de Ética
+ * [Data Science Association](http://datascienceassn.org/code-of-conduct.html) Código de Conduta (criado em 2013)
+ * [ACM Code of Ethics and Professional Conduct](https://www.acm.org/code-of-ethics) (desde 1993)
+
+> 🚨 Você faz parte de uma organização profissional de engenharia ou de ciências de dados? Explore o site deles para ver se eles definem um código de ética profissional. O que diz sobre os princípios éticos deles? Como eles estão "incentivando" os membros a seguir o código?
+
+### 2. Checklists de Éticas
+
+Enquanto códigos profissionais definem _comportamentos ético_ requiridos de seus praticantes, eles [tem limitações conhecidas](https://resources.oreilly.com/examples/0636920203964/blob/master/of_oaths_and_checklists.md) na execução, particularmente em projetos de larga escala. Ao invés disso, muitos experts em Ciência de Dados [defendem as checklists](https://resources.oreilly.com/examples/0636920203964/blob/master/of_oaths_and_checklists.md), que podem **conectar princípios a práticas** de maneiras para determinísticas e acionáveis.
+
+Checklists convertem as questões em tarefas de "sim/não" que podem ser operacionalizadas, permitindo eles serem rastreados como parte dos fluxos de trabalho de liberação de produtos padrão.
+
+Exemplos incluem:
+ * [Deon](https://deon.drivendata.org/) - uma checklist de propósito gerak criado a partir de [recomendações da insústria](https://deon.drivendata.org/#checklist-citations) com uma ferramenta de linha de comando para fácil integração.
+ * [Privacy Audit Checklist](https://cyber.harvard.edu/ecommerce/privacyaudit.html) - fornece orientação geral para práticas de manipulação de informação a partir de perspectivas de exposição legal e social.
+ * [AI Fairness Checklist](https://www.microsoft.com/en-us/research/project/ai-fairness-checklist/) - criado por praticantes de IA para apoiar a adoção e integração de verificações de justiça dentro dos ciclos de desenvolvimento de IA.
+ * [22 questions for ethics in data and AI](https://medium.com/the-organization/22-questions-for-ethics-in-data-and-ai-efb68fd19429) - estrutura mais aberto-fechado, estrturado para exploração inicial de problemas éticos em contextos de design, implementação, e organizacional.
+
+### 3. Regulações Éticas
+
+Ética é sobre definir valores compartilhados e fazer a coisa certa _voluntariamente_. **Compliance (Conformidade)** é sobre _seguir a lei_ se e onde definida. **Governância** abrange amplamente todos as formas de como as organizações operam para garantir princípios éticos e cumprir as leis estabelecidas.
+
+Hoje, governância assume duas formas dentro das organizações. Primeira, é sobre definir princípios de **IA ética** e estabelecer práticas para operacionalizar a adoção em todos os projetos de IA na organização. Segundo, trata-se de cumprir com todos os **regulamentos de proteção de dados** para as regiões em que operam.
+
+Exemplos de proteção de dados e regulamentos de privacidade:
+
+ * `1974`, [US Privacy Act](https://www.justice.gov/opcl/privacy-act-1974) - regula a coleta, o uso, e divulgação de informações pessoais por parte do _governo federal_.
+ * `1996`, [US Health Insurance Portability & Accountability Act (HIPAA)](https://www.cdc.gov/phlp/publications/topic/hipaa.html) - protege dados de sáude pessoais.
+ * `1998`, [US Children's Online Privacy Protection Act (COPPA)](https://www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/childrens-online-privacy-protection-rule) - protege a privacidade de dados de crianças menores de 13 anos de idade.
+ * `2018`, [General Data Protection Regulation (GDPR)](https://gdpr-info.eu/) - fornece direitos aos usuário, proteção de dados, e privacidade.
+ * `2018`, [California Consumer Privacy Act (CCPA)](https://www.oag.ca.gov/privacy/ccpa) dá aos consumidores mais _direitos_ sobre seus dados (pessoais).
+ * `2021`, [A Lei de Proteção de Informação Pessoal](https://www.reuters.com/world/china/china-passes-new-personal-data-privacy-law-take-effect-nov-1-2021-08-20/) da China acabou de ser passado, criando uma das regulações de privacidade de dados online mais forte do mundo.
+
+> 🚨 A GDPR (General Data Protection Regulation) da União Europia continua sendo umas das regulações de privacidade de dados mais influentes hoje em dia. Você sabia que a mesma também define [8 direitos dos usuário](https://www.freeprivacypolicy.com/blog/8-user-rights-gdpr) para proteger a privacidade dos cidadãos e dados pessoais? Saiba mais sobre o que são e porque eles importam.
+
+
+### 4. Cultura Ética
+
+Note que existe uma lacuna intangível entre _compliance_ (fazer o suficiente para cumprir a "a carta da lei") e abordar [problemas sistêmicos](https://www.coursera.org/learn/data-science-ethics/home/week/4) (como ossificação, assimetria informacional, e injustiça distribucional) que podem acelerar o uso da IA como uma arma.
+
+Este último requere [abordagens colaborativas para definir culturas éticas](https://towardsdatascience.com/why-ai-ethics-requires-a-culture-driven-approach-26f451afa29f) que constrói conexões emocionais e valores compartilhados consistentes _em todas as organizações_ na indústria. Isso requere mais [culturas de ética de dados formalizadas](https://www.codeforamerica.org/news/formalizing-an-ethical-data-culture/) nas organizações - permitindo _qualquer um_ a [puxar o cordão Andom](https://en.wikipedia.org/wiki/Andon_(manufacturing)) (para aumentar as preocupações éticas mais cedo no processo) e fazendo _avaliações éticas_ (ex. na contratação) um critério fundamental na formação de times em projetos de IA.
+
+---
+## [Quiz pós aula](https://red-water-0103e7a0f.azurestaticapps.net/quiz/3) 🎯
+## Revisão e Autoestudo
+
+Cursos e livros ajudam a entender os conceitos essencias da ética, enquanto estudos de caso e ferramentas ajudam com práticas da ética aplicado em contextos do mundo real. Aqui estão alguns recursos para começar.
+
+* [Machine Learning For Beginners](https://github.com/microsoft/ML-For-Beginners/blob/main/1-Introduction/3-fairness/README.md) - aula sobre Justiça, da Microsoft.
+* [Principles of Responsible AI](https://docs.microsoft.com/en-us/learn/modules/responsible-ai-principles/) - programa de aprendizado gratuito da Microsoft Learn.
+* [Ethics and Data Science](https://resources.oreilly.com/examples/0636920203964) - O'Reilly EBook (M. Loukides, H. Mason et. al)
+* [Data Science Ethics](https://www.coursera.org/learn/data-science-ethics#syllabus) - curso online da Universidade de Michigan.
+* [Ethics Unwrapped](https://ethicsunwrapped.utexas.edu/case-studies) - estudos de caso da Universidade do Texas.
+
+# Tarefa
+
+[Escreva um Caso de Uso de Ética de Dados](assignment.md)
From 27898fa02ffa2479953ada0d367509a1ac0be629 Mon Sep 17 00:00:00 2001
From: Izael
Date: Mon, 11 Oct 2021 12:00:53 -0300
Subject: [PATCH 088/319] Translated Introduction -> Defining Data to PT-BR
---
.../translations/README.pt-br.md | 67 +++++++++++++++++++
1 file changed, 67 insertions(+)
create mode 100644 1-Introduction/03-defining-data/translations/README.pt-br.md
diff --git a/1-Introduction/03-defining-data/translations/README.pt-br.md b/1-Introduction/03-defining-data/translations/README.pt-br.md
new file mode 100644
index 00000000..f76deed3
--- /dev/null
+++ b/1-Introduction/03-defining-data/translations/README.pt-br.md
@@ -0,0 +1,67 @@
+# Definindo Dados
+
+| ](../../../sketchnotes/03-DefiningData.png)|
+|:---:|
+|Definindo Dados - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
+
+Dados são fatos, informações, observações e medidas que são usadas para fazer descobertas e apoiar decisões informadas. Um ponto de dado é uma unidade única dentro de um dataset, que é uma coleção de pontos de dados. Datasets podem vir em diferentes formatos e estruturas, e normalmente será baseado em sua fonte, ou de onde os dados vieram. Por exemplo, os ganhos mensais de uma empresa podem estar em uma planilha mas a frequência cardíaca (por hora) de um smartwatch pode estar em formato [JSON](https://stackoverflow.com/a/383699). É comum para cientistas de dados terem que trabalhar com diferentes tupos de dados em um dataset.
+
+Essa aula irá focar em identificar e classificar dados baseados em sua características e fontes.
+
+## [Quiz Pré Aula](https://red-water-0103e7a0f.azurestaticapps.net/quiz/4)
+## Como Dados são Descritos
+**Dados Brutos (Raw data)** são dados que vieram em seu estado inicial de sua fonte e não foram analisados ou organizados. Para entender o que está acontecendo com um conjunto de dados, é necessário organizar os dados em uma formato que possa ser entendido pelos humanos e também pela tecnologia que pode ser usada para analisar os mesmos. A estrutura do dataset descreve como estão organizados e pode ser classificada em estruturada, não estruturada e semi estruturada. Esses tipos de estruturas irão variar, dependendo da fonte mas irão ultimamente se encaixar nessas categorias.
+
+### Dados Qualitativos
+Dados qualitativos, também conhecidos como dados categóricos são dados que não podem ser medidos objetivamente como observações de dados quantitativos. São geralmente vários formatos de dados subjetivos que coletam a qualidade de algo, como um produto ou processo. Algumas vezes, dados qualitativos são numéricos e tipicamente não seriam usados matematicamente, como números de telefones e marcas de tempo. Alguns exemplos de dados qualitativos são: comentários de vídeos, a marca e modelo de um carro e a cor favorita do seu melhor amigo. Dados qualitativos podem ser usados para entender quais produtos os consumidores mais gostam ou identificar palavras-chaves populares em cúrriculos para aplicação em uma vaga de trabalho.
+
+### Dados Estruturados
+Dados estruturados são dados que estão organizados em linhas e colunas, onde cada linha tem a mesma quantidade de colunas. Colunas representam um valor de um tipo particular e são identificadas com um nome descrevendo o que aquele valor representa, enquanto cada linha contém o valor. Colunas geralmente vão possuir um conjunto específico de regras e restrições nesses valores, para garantir que os valores representam precisamente a coluna. Por exemplo, imagine uma planilha de clientes onde cada linha deve ter um número de telefone e o mesmo nunca pode conter caractéres alfabéticos. Podem existir regras aplicadas na coluna do número de telefone para garantir que nunca esteja vazio e contenha apenas números.
+
+Um benefício de dados estruturados é que podem ser organizados de uma forma que pode ser relacionada a um outro dado estruturado. No entanto, devido ao fato dos dados serem feitos para serem organizados de uma forma específica, fazer mudanças na estrutura em geral pode requerer muito esforço. Por exemplo, adicionar uma coluna de email na planilha de clientes que não pode ser vazia, significa que você terá que decidir como você irá adicionar os valores nas linhas já existentes no dataset.
+
+Exemplos de dados estruturados: planilhas/spreadsheets, bancos de dados relacionais, números de telefone, extratos bancários
+
+### Dados Não Estruturados
+Dados não estruturados tipicamente não podem ser categorizado em linhas e colunas e não possuem um formato ou um conjunto de regras a ser seguido. Devido ao fato de dados não estruturados possuirem menos restrições na sua estrutura é mais fácil adicionar novas informações quando comparados com um dataset estruturado. Se um sensor que coleta dados de pressão bariométrica a cada 2 minutos recebeu uma atualização que agora permite que o mesmo meça e grave a temperatura, não é preciso alterar os dados já existentes se eles são não estruturados. No entanto, isso pode fazer com que a análise ou investigação desses dados leve mais tempo. Por exemplo, um cientista que quer descobrir a temperatura média do mês passado a partir dos dados do sensor, mas descobre que o sensor gravou um "e" em alguns dados gravados indicando que estava quebrado ao invés de um número típico, o que significa que os dados estão incompletos.
+
+Exemplos de dados não estruturados: arquivos de texto, mensagens de texto, arquivo de vídeo
+
+### Dados Semi Estruturados
+Dados semi estruturados possui recursos que o fazem ser uma combinação de dados estruturados e não estruturados. Tipicamente não está em conformidade com linhas e colunas mas estão organizados de uma forma que são considerados estruturados e podem seguir um formato fizo ou um conjunto de regras. A estrutura pode variar entre as fontes, desde uma hierarquia bem definida até algo mais flexível que permite uma fácil integração de novas informação. Metadados são indicadores que ajudam a decidir como os dados são organizados e armazenados e terão vários nomes, baseado no tipo de dado. Alguns nomes comuns para metadados são tags, elementos, entidades e atributos. Por exemplo, uma mensaem de email típica terá um assunto, corpo e um conjunto de recipientes e podem ser organizados por quem ou quando foi mandado.
+
+Exemplos de dados não estruturados: HTML, arquivos CSV, JavaScript Object Notation (JSON)
+
+## Fontes de Dados
+
+Uma fonte de dados é o local inicial onde os dados foram gerados, ou onde "vivem" e irá variar com base em como e quando foram coletados. Dados gerados por seus usuários são conhecidos como dados primários enquanto dados secundários vem de uma fonte que coletou os dados para uso geral. Por exemplo, um grupo de cientistas fazendo observações em uma floresta tropical seriam considerados dados primários e se eles decidirem compartilhar com outros cientistas seriam considerados dados secundários para aqueles que usarem.
+
+Banco de dados são fontes comuns e dependem de um sistema de gerenciamente de banco de dados para hospedar e manter os dados onde usuários usam comandos chamados de "queries" para explorar os dados. Arquivos como fonte de dados podem ser aúdio, imagens, e arquivos de vídeo assim como planilhas como o Excel. Fontes da internet são lugares comuns para hospedar dados, onde banco de dados e arquivos podem ser encontrados. Application programming interfaces, ou APIs, permitem programadores a criarem formas de compartilhar dados com usuários externos através da interet, enquanto processos de "web scraping" extrai dados de uma página da web. As [tarefas em Trabalhando com Dados](/2-Working-With) focam em como usar várias fontes de dados.
+
+## Conclusão
+
+Nessa aula nós aprendemos:
+
+- O que são dados
+- Como dados são descritos
+- Como dados são classificados e categorizados
+- Onde os dados podem ser encontrados
+
+## 🚀 Desafio
+
+O Kaggle é uma excelente fonte para datasets abertos. Use a [ferramenta de busca de dataset](https://www.kaggle.com/datasets) para encontrar alguns datasets interessantes e classificar de três a cinco datasets com esses critérios:
+
+- Os dados são quantitativos ou qualitativos?
+- Os dados são estruturados, não estruturados, ou semi estruturados?
+
+## [Quiz Pós Aula](https://red-water-0103e7a0f.azurestaticapps.net/quiz/5)
+
+
+
+## Revisão e Auto Estudo
+
+- Essa unidade do Microsoft Lean, entitulada [Classifique seus Dados (Classify your Data)](https://docs.microsoft.com/en-us/learn/modules/choose-storage-approach-in-azure/2-classify-data) tem uma análise detalhada de dados estruturados, semi estruturados, e não estruturados.
+
+## Tarefa
+
+[Classificando Datasets](assignment.md)
From f49f9312e170d95e8da23baa652399e711a22f90 Mon Sep 17 00:00:00 2001
From: Jen Looper
Date: Mon, 11 Oct 2021 15:33:27 -0400
Subject: [PATCH 089/319] editing tracking code for Dmitry to 'academic'
---
.../01-defining-data-science/README.md | 18 ++--
2-Working-With-Data/07-python/README.md | 90 +++++++++----------
.../07-python/notebook-papers.ipynb | 2 +-
3 files changed, 55 insertions(+), 55 deletions(-)
diff --git a/1-Introduction/01-defining-data-science/README.md b/1-Introduction/01-defining-data-science/README.md
index bedbb1e7..ca25810d 100644
--- a/1-Introduction/01-defining-data-science/README.md
+++ b/1-Introduction/01-defining-data-science/README.md
@@ -1,8 +1,8 @@
# Defining Data Science
-| ](../../sketchnotes/01-Definitions.png)|
-|:---:|
-|Defining Data Science - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
+|  ](../../sketchnotes/01-Definitions.png) |
+| :----------------------------------------------------------------------------------------------------: |
+| Defining Data Science - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
---
@@ -69,11 +69,11 @@ Vast amounts of data are incomprehensible for a human being, but once we create
As we have already mentioned - data is everywhere, we just need to capture it in the right way! It is useful to distinguish between **structured** and **unstructured** data. The former are typically represented in some well-structured form, often as a table or number of tables, while latter is just a collection of files. Sometimes we can also talk about **semistructured** data, that have some sort of a structure that may vary greatly.
-| Structured | Semi-structured | Unstructured |
-|----------- |-----------------|--------------|
-| List of people with their phone numbers | Wikipedia pages with links | Text of Encyclopaedia Britannica |
-| Temperature in all rooms of a building at every minute for the last 20 years | Collection of scientific papers in JSON format with authors, data of publication, and abstract | File share with corporate documents |
-| Data for age and gender of all people entering the building | Internet pages | Raw video feed from surveillance camera |
+| Structured | Semi-structured | Unstructured |
+| ---------------------------------------------------------------------------- | ---------------------------------------------------------------------------------------------- | --------------------------------------- |
+| List of people with their phone numbers | Wikipedia pages with links | Text of Encyclopaedia Britannica |
+| Temperature in all rooms of a building at every minute for the last 20 years | Collection of scientific papers in JSON format with authors, data of publication, and abstract | File share with corporate documents |
+| Data for age and gender of all people entering the building | Internet pages | Raw video feed from surveillance camera |
## Where to get Data
@@ -107,7 +107,7 @@ First step is to collect the data. While in many cases it can be a straightforwa
Storing the data can be challenging, especially if we are talking about big data. When deciding how to store data, it makes sense to anticipate the way you would want later on to query them. There are several ways data can be stored:
Relational database stores a collection of tables, and uses a special language called SQL to query them. Typically, tables would be connected to each other using some schema. In many cases we need to convert the data from original form to fit the schema.
-
NoSQL database, such as CosmosDB, does not enforce schema on data, and allows storing more complex data, for example, hierarchical JSON documents or graphs. However, NoSQL database does not have rich querying capabilities of SQL, and cannot enforce referential integrity between data.
+
NoSQL database, such as CosmosDB, does not enforce schema on data, and allows storing more complex data, for example, hierarchical JSON documents or graphs. However, NoSQL database does not have rich querying capabilities of SQL, and cannot enforce referential integrity between data.
Data Lake storage is used for large collections of data in raw form. Data lakes are often used with big data, where all data cannot fit into one machine, and has to be stored and processed by a cluster. Parquet is the data format that is often used in conjunction with big data.
diff --git a/2-Working-With-Data/07-python/README.md b/2-Working-With-Data/07-python/README.md
index 53e4bc84..ab1d209b 100644
--- a/2-Working-With-Data/07-python/README.md
+++ b/2-Working-With-Data/07-python/README.md
@@ -1,8 +1,8 @@
# Working with Data: Python and the Pandas Library
-| ](../../sketchnotes/07-WorkWithPython.png)|
-|:---:|
-|Working With Python - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
+|  ](../../sketchnotes/07-WorkWithPython.png) |
+| :-------------------------------------------------------------------------------------------------------: |
+| Working With Python - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
[](https://youtu.be/dZjWOGbsN4Y)
@@ -16,7 +16,7 @@ Data processing can be programmed in any programming language, but there are cer
In this lesson, we will focus on using Python for simple data processing. We will assume basic familiarity with the language. If you want a deeper tour of Python, you can refer to one of the following resources:
* [Learn Python in a Fun Way with Turtle Graphics and Fractals](https://github.com/shwars/pycourse) - GitHub-based quick intro course into Python Programming
-* [Take your First Steps with Python](https://docs.microsoft.com/en-us/learn/paths/python-first-steps/?WT.mc_id=acad-31812-dmitryso) Learning Path on [Microsoft Learn](http://learn.microsoft.com/?WT.mc_id=acad-31812-dmitryso)
+* [Take your First Steps with Python](https://docs.microsoft.com/en-us/learn/paths/python-first-steps/?WT.mc_id=academic-31812-dmitryso) Learning Path on [Microsoft Learn](http://learn.microsoft.com/?WT.mc_id=academic-31812-dmitryso)
Data can come in many forms. In this lesson, we will consider three forms of data - **tabular data**, **text** and **images**.
@@ -97,10 +97,10 @@ b = pd.Series(["I","like","to","play","games","and","will","not","change"],index
df = pd.DataFrame([a,b])
```
This will create a horizontal table like this:
-| | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
-|---|---|---|---|---|---|---|---|---|---|
-| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
-| 1 | I | like | to | use | Python | and | Pandas | very | much |
+| | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 |
+| --- | --- | ---- | --- | --- | ------ | --- | ------ | ---- | ---- |
+| 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
+| 1 | I | like | to | use | Python | and | Pandas | very | much |
We can also use Series as columns, and specify column names using dictionary:
```python
@@ -108,17 +108,17 @@ df = pd.DataFrame({ 'A' : a, 'B' : b })
```
This will give us a table like this:
-| | A | B |
-|---|---|---|
-| 0 | 1 | I |
-| 1 | 2 | like |
-| 2 | 3 | to |
-| 3 | 4 | use |
-| 4 | 5 | Python |
-| 5 | 6 | and |
-| 6 | 7 | Pandas |
-| 7 | 8 | very |
-| 8 | 9 | much |
+| | A | B |
+| --- | --- | ------ |
+| 0 | 1 | I |
+| 1 | 2 | like |
+| 2 | 3 | to |
+| 3 | 4 | use |
+| 4 | 5 | Python |
+| 5 | 6 | and |
+| 6 | 7 | Pandas |
+| 7 | 8 | very |
+| 8 | 9 | much |
**Note** that we can also get this table layout by transposing the previous table, eg. by writing
```python
@@ -154,17 +154,17 @@ df['LenB'] = df['B'].apply(len)
After operations above, we will end up with the following DataFrame:
-| | A | B | DivA | LenB |
-|---|---|---|---|---|
-| 0 | 1 | I | -4.0 | 1 |
-| 1 | 2 | like | -3.0 | 4 |
-| 2 | 3 | to | -2.0 | 2 |
-| 3 | 4 | use | -1.0 | 3 |
-| 4 | 5 | Python | 0.0 | 6 |
-| 5 | 6 | and | 1.0 | 3 |
-| 6 | 7 | Pandas | 2.0 | 6 |
-| 7 | 8 | very | 3.0 | 4 |
-| 8 | 9 | much | 4.0 | 4 |
+| | A | B | DivA | LenB |
+| --- | --- | ------ | ---- | ---- |
+| 0 | 1 | I | -4.0 | 1 |
+| 1 | 2 | like | -3.0 | 4 |
+| 2 | 3 | to | -2.0 | 2 |
+| 3 | 4 | use | -1.0 | 3 |
+| 4 | 5 | Python | 0.0 | 6 |
+| 5 | 6 | and | 1.0 | 3 |
+| 6 | 7 | Pandas | 2.0 | 6 |
+| 7 | 8 | very | 3.0 | 4 |
+| 8 | 9 | much | 4.0 | 4 |
**Selecting rows based on numbers** can be done using `iloc` construct. For example, to select first 5 rows from the DataFrame:
```python
@@ -183,13 +183,13 @@ df.groupby(by='LenB') \
```
This gives us the following table:
-| LenB | Count | Mean |
-|------|-------|------|
-| 1 | 1 | 1.000000 |
-| 2 | 1 | 3.000000 |
-| 3 | 2 | 5.000000 |
-| 4 | 3 | 6.333333 |
-| 6 | 2 | 6.000000 |
+| LenB | Count | Mean |
+| ---- | ----- | -------- |
+| 1 | 1 | 1.000000 |
+| 2 | 1 | 3.000000 |
+| 3 | 2 | 5.000000 |
+| 4 | 3 | 6.333333 |
+| 6 | 2 | 6.000000 |
### Getting Data
@@ -230,7 +230,7 @@ While data very often comes in tabular form, in some cases we need to deal with
In this challenge, we will continue with the topic of COVID pandemic, and focus on processing scientific papers on the subject. There is [CORD-19 Dataset](https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge) with more than 7000 (at the time of writing) papers on COVID, available with metadata and abstracts (and for about half of them there is also full text provided).
-A full example of analyzing this dataset using [Text Analytics for Health](https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-for-health/?WT.mc_id=acad-31812-dmitryso) cognitive service is described [in this blog post](https://soshnikov.com/science/analyzing-medical-papers-with-azure-and-text-analytics-for-health/). We will discuss simplified version of this analysis.
+A full example of analyzing this dataset using [Text Analytics for Health](https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-for-health/?WT.mc_id=academic-31812-dmitryso) cognitive service is described [in this blog post](https://soshnikov.com/science/analyzing-medical-papers-with-azure-and-text-analytics-for-health/). We will discuss simplified version of this analysis.
> **NOTE**: We do not provide a copy of the dataset as part of this repository. You may first need to download the [`metadata.csv`](https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge?select=metadata.csv) file from [this dataset on Kaggle](https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge). Registration with Kaggle may be required. You may also download the dataset without registration [from here](https://ai2-semanticscholar-cord-19.s3-us-west-2.amazonaws.com/historical_releases.html), but it will include all full texts in addition to metadata file.
@@ -242,15 +242,15 @@ Open [`notebook-papers.ipynb`](notebook-papers.ipynb) and read it from top to bo
Recently, very powerful AI models have been developed that allow us to understand images. There are many tasks that can be solved using pre-trained neural networks, or cloud services. Some examples include:
-* **Image Classification**, which can help you categorize the image into one of the pre-defined classes. You can easily train your own image classifiers using services such as [Custom Vision](https://azure.microsoft.com/services/cognitive-services/custom-vision-service/?WT.mc_id=acad-31812-dmitryso)
-* **Object Detection** to detect different objects in the image. Services such as [computer vision](https://azure.microsoft.com/services/cognitive-services/computer-vision/?WT.mc_id=acad-31812-dmitryso) can detect a number of common objects, and you can train [Custom Vision](https://azure.microsoft.com/services/cognitive-services/custom-vision-service/?WT.mc_id=acad-31812-dmitryso) model to detect some specific objects of interest.
-* **Face Detection**, including Age, Gender and Emotion detection. This can be done via [Face API](https://azure.microsoft.com/services/cognitive-services/face/?WT.mc_id=acad-31812-dmitryso).
+* **Image Classification**, which can help you categorize the image into one of the pre-defined classes. You can easily train your own image classifiers using services such as [Custom Vision](https://azure.microsoft.com/services/cognitive-services/custom-vision-service/?WT.mc_id=academic-31812-dmitryso)
+* **Object Detection** to detect different objects in the image. Services such as [computer vision](https://azure.microsoft.com/services/cognitive-services/computer-vision/?WT.mc_id=academic-31812-dmitryso) can detect a number of common objects, and you can train [Custom Vision](https://azure.microsoft.com/services/cognitive-services/custom-vision-service/?WT.mc_id=academic-31812-dmitryso) model to detect some specific objects of interest.
+* **Face Detection**, including Age, Gender and Emotion detection. This can be done via [Face API](https://azure.microsoft.com/services/cognitive-services/face/?WT.mc_id=academic-31812-dmitryso).
-All those cloud services can be called using [Python SDKs](https://docs.microsoft.com/samples/azure-samples/cognitive-services-python-sdk-samples/cognitive-services-python-sdk-samples/?WT.mc_id=acad-31812-dmitryso), and thus can be easily incorporated into your data exploration workflow.
+All those cloud services can be called using [Python SDKs](https://docs.microsoft.com/samples/azure-samples/cognitive-services-python-sdk-samples/cognitive-services-python-sdk-samples/?WT.mc_id=academic-31812-dmitryso), and thus can be easily incorporated into your data exploration workflow.
Here are some examples of exploring data from Image data sources:
-* In the blog post [How to Learn Data Science without Coding](https://soshnikov.com/azure/how-to-learn-data-science-without-coding/) we explore Instagram photos, trying to understand what makes people give more likes to a photo. We first extract as much information from pictures as possible using [computer vision](https://azure.microsoft.com/services/cognitive-services/computer-vision/?WT.mc_id=acad-31812-dmitryso), and then use [Azure Machine Learning AutoML](https://docs.microsoft.com/azure/machine-learning/concept-automated-ml/?WT.mc_id=acad-31812-dmitryso) to build interpretable model.
-* In [Facial Studies Workshop](https://github.com/CloudAdvocacy/FaceStudies) we use [Face API](https://azure.microsoft.com/services/cognitive-services/face/?WT.mc_id=acad-31812-dmitryso) to extract emotions on people on photographs from events, in order to try to understand what makes people happy.
+* In the blog post [How to Learn Data Science without Coding](https://soshnikov.com/azure/how-to-learn-data-science-without-coding/) we explore Instagram photos, trying to understand what makes people give more likes to a photo. We first extract as much information from pictures as possible using [computer vision](https://azure.microsoft.com/services/cognitive-services/computer-vision/?WT.mc_id=academic-31812-dmitryso), and then use [Azure Machine Learning AutoML](https://docs.microsoft.com/azure/machine-learning/concept-automated-ml/?WT.mc_id=academic-31812-dmitryso) to build interpretable model.
+* In [Facial Studies Workshop](https://github.com/CloudAdvocacy/FaceStudies) we use [Face API](https://azure.microsoft.com/services/cognitive-services/face/?WT.mc_id=academic-31812-dmitryso) to extract emotions on people on photographs from events, in order to try to understand what makes people happy.
## Conclusion
@@ -271,7 +271,7 @@ Whether you already have structured or unstructured data, using Python you can p
**Learning Python**
* [Learn Python in a Fun Way with Turtle Graphics and Fractals](https://github.com/shwars/pycourse)
-* [Take your First Steps with Python](https://docs.microsoft.com/learn/paths/python-first-steps/?WT.mc_id=acad-31812-dmitryso) Learning Path on [Microsoft Learn](http://learn.microsoft.com/?WT.mc_id=acad-31812-dmitryso)
+* [Take your First Steps with Python](https://docs.microsoft.com/learn/paths/python-first-steps/?WT.mc_id=academic-31812-dmitryso) Learning Path on [Microsoft Learn](http://learn.microsoft.com/?WT.mc_id=academic-31812-dmitryso)
## Assignment
diff --git a/2-Working-With-Data/07-python/notebook-papers.ipynb b/2-Working-With-Data/07-python/notebook-papers.ipynb
index 6bcd7053..c08b60eb 100644
--- a/2-Working-With-Data/07-python/notebook-papers.ipynb
+++ b/2-Working-With-Data/07-python/notebook-papers.ipynb
@@ -7,7 +7,7 @@
"\r\n",
"In this challenge, we will continue with the topic of COVID pandemic, and focus on processing scientific papers on the subject. There is [CORD-19 Dataset](https://www.kaggle.com/allen-institute-for-ai/CORD-19-research-challenge) with more than 7000 (at the time of writing) papers on COVID, available with metadata and abstracts (and for about half of them there is also full text provided).\r\n",
"\r\n",
- "A full example of analyzing this dataset using [Text Analytics for Health](https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-for-health/?WT.mc_id=acad-31812-dmitryso) cognitive service is described [in this blog post](https://soshnikov.com/science/analyzing-medical-papers-with-azure-and-text-analytics-for-health/). We will discuss simplified version of this analysis."
+ "A full example of analyzing this dataset using [Text Analytics for Health](https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-for-health/?WT.mc_id=academic-31812-dmitryso) cognitive service is described [in this blog post](https://soshnikov.com/science/analyzing-medical-papers-with-azure-and-text-analytics-for-health/). We will discuss simplified version of this analysis."
],
"metadata": {}
},
From c61b3810a40ed66ce26b373a23f49ffe2d6dbaa1 Mon Sep 17 00:00:00 2001
From: Keshav Sharma
Date: Mon, 11 Oct 2021 13:59:27 -0700
Subject: [PATCH 090/319] Delete pandas.ipynb
---
2-Working-With-Data/R/pandas.ipynb | 978 -----------------------------
1 file changed, 978 deletions(-)
delete mode 100644 2-Working-With-Data/R/pandas.ipynb
diff --git a/2-Working-With-Data/R/pandas.ipynb b/2-Working-With-Data/R/pandas.ipynb
deleted file mode 100644
index cb928833..00000000
--- a/2-Working-With-Data/R/pandas.ipynb
+++ /dev/null
@@ -1,978 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "304296e3",
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "\n",
- "Attaching package: 'dplyr'\n",
- "\n",
- "\n",
- "The following objects are masked from 'package:stats':\n",
- "\n",
- " filter, lag\n",
- "\n",
- "\n",
- "The following objects are masked from 'package:base':\n",
- "\n",
- " intersect, setdiff, setequal, union\n",
- "\n",
- "\n",
- "-- \u001b[1mAttaching packages\u001b[22m ------------------------------------------------------------------------------- tidyverse 1.3.1 --\n",
- "\n",
- "\u001b[32mv\u001b[39m \u001b[34mggplot2\u001b[39m 3.3.5 \u001b[32mv\u001b[39m \u001b[34mpurrr \u001b[39m 0.3.4\n",
- "\u001b[32mv\u001b[39m \u001b[34mtibble \u001b[39m 3.1.5 \u001b[32mv\u001b[39m \u001b[34mstringr\u001b[39m 1.4.0\n",
- "\u001b[32mv\u001b[39m \u001b[34mtidyr \u001b[39m 1.1.4 \u001b[32mv\u001b[39m \u001b[34mforcats\u001b[39m 0.5.1\n",
- "\u001b[32mv\u001b[39m \u001b[34mreadr \u001b[39m 2.0.2 \n",
- "\n",
- "-- \u001b[1mConflicts\u001b[22m ---------------------------------------------------------------------------------- tidyverse_conflicts() --\n",
- "\u001b[31mx\u001b[39m \u001b[34mdplyr\u001b[39m::\u001b[32mfilter()\u001b[39m masks \u001b[34mstats\u001b[39m::filter()\n",
- "\u001b[31mx\u001b[39m \u001b[34mdplyr\u001b[39m::\u001b[32mlag()\u001b[39m masks \u001b[34mstats\u001b[39m::lag()\n",
- "\n"
- ]
- }
- ],
- "source": [
- "library(dplyr)\n",
- "library(tidyverse)"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "d786e051",
- "metadata": {},
- "source": [
- "## Series"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "f659f553",
- "metadata": {},
- "outputs": [],
- "source": [
- "a<- 1:9"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "9acc193d",
- "metadata": {},
- "outputs": [],
- "source": [
- "b = c(\"I\",\"like\",\"to\",\"use\",\"Python\",\"and\",\"Pandas\",\"very\",\"much\")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "id": "f577ec14",
- "metadata": {},
- "outputs": [],
- "source": [
- "a1 = length(a)\n",
- "b1 = length(b)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "id": "31e069a0",
- "metadata": {},
- "outputs": [],
- "source": [
- "a = data.frame(a,row.names = c(1:a1))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "id": "29ce166e",
- "metadata": {},
- "outputs": [],
- "source": [
- "b = data.frame(b,row.names = c(1:b1))"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "945feffd",
- "metadata": {},
- "source": [
- "## DataFrame"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "id": "88a435ec",
- "metadata": {},
- "outputs": [],
- "source": [
- "a = data.frame(a,row.names = c(1:a1))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "id": "c4e2a6c1",
- "metadata": {},
- "outputs": [],
- "source": [
- "b = data.frame(b,row.names = c(1:b1))"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 13,
- "id": "2bb5177c",
- "metadata": {},
- "outputs": [],
- "source": [
- "df<- data.frame(a,b)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 14,
- "id": "8f45d3a5",
- "metadata": {},
- "outputs": [],
- "source": [
- "df = \n",
- " rename(df,\n",
- " A = a,\n",
- " B = b,\n",
- " )"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 15,
- "id": "0efbf2d4",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "
\n"
+ ],
+ "text/latex": [
+ "A data.frame: 6 × 4\n",
+ "\\begin{tabular}{r|llll}\n",
+ " & A & B & DivA & LenB\\\\\n",
+ " & & & & \\\\\n",
+ "\\hline\n",
+ "\t1 & 1 & I & -4 & 1\\\\\n",
+ "\t2 & 2 & like & -3 & 4\\\\\n",
+ "\t3 & 3 & to & -2 & 2\\\\\n",
+ "\t4 & 4 & use & -1 & 3\\\\\n",
+ "\t5 & 5 & Python & 0 & 6\\\\\n",
+ "\t6 & 6 & and & 1 & 3\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/markdown": [
+ "\n",
+ "A data.frame: 6 × 4\n",
+ "\n",
+ "| | A <int> | B <chr> | DivA <dbl> | LenB <int> |\n",
+ "|---|---|---|---|---|\n",
+ "| 1 | 1 | I | -4 | 1 |\n",
+ "| 2 | 2 | like | -3 | 4 |\n",
+ "| 3 | 3 | to | -2 | 2 |\n",
+ "| 4 | 4 | use | -1 | 3 |\n",
+ "| 5 | 5 | Python | 0 | 6 |\n",
+ "| 6 | 6 | and | 1 | 3 |\n",
+ "\n"
+ ],
+ "text/plain": [
+ " A B DivA LenB\n",
+ "1 1 I -4 1 \n",
+ "2 2 like -3 4 \n",
+ "3 3 to -2 2 \n",
+ "4 4 use -1 3 \n",
+ "5 5 Python 0 6 \n",
+ "6 6 and 1 3 "
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "head(df)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "id": "515c95b2",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAANICAMAAADKOT/pAAAAMFBMVEUAAABNTU1oaGh8fHyM\njIyampqnp6eysrK9vb3Hx8fQ0NDZ2dnh4eHp6enw8PD////QFLu4AAAACXBIWXMAABJ0AAAS\ndAHeZh94AAAVuklEQVR4nO3djVYbuxWAUZn/ULDf/22LDYm5CQbbc0Y6kvZeC4c2C0Yj6cNg\n0absgMVK6wHACIQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQE\nAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQE\nAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQE\nAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQE\nAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQE\nAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEAYQEASqEVKAzV+zy+HAaXAIiCQkCCAkCCAkCCAkC\nCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAkCCAnO9c3/olxIcJ5DRadS\nEhKcp3x6PPGXV3y+VQmJdMpff379t1d8whUJiXTKp7ev//aKT7gyIZFO+diXQoIlyjcdCQnO\n5FU7WGxfkHMkWOanLSkk+NmPO1JI8KOfN6SQ4Cdn7EchwQ/O2Y5Cgu+dtRuFBN86bzMKCb5x\n7r9pKSQ47eydKCQ46fyNKCQ45YJ9KCQ44ZJtKCT42kW7UEjwpcs2oZDgKxfuQSHBv849Pjp+\nwBXXuPxDEl4CTrt8AwoJ/lanCiExtmu2n5Dgv67afUKC/7hu8wkJPrty7wkJPrl26wkJ/rj4\n+Oj4kVU+JOEl4B8L9p2Q4MOSbSckeLdo1wkJDpZtOiHB3sI9JyTYLd9yQoIFL3v/+QxVPiTh\nJeCPgP0mJKYXsd2ExOxCdpuQmFzMZhMScwvaa0JialFbTUjMLGynCYl5LT8+On6qKh+S8BIQ\nuc2ExKxCd5mQmFTsJhMScwreY0JiStFbTEjMKHyHCYkJxW8wITGdwOOj4+es8iEJL8G0Vtld\nQmIy62wuITGXlfaWkJjKWltLSMxktZ0lJCay3sYSEvNYcV8JiVmscXx0/ORVPiThJZjNuptK\nSMxh5T0lJKaw9pYSEjNYfUcJiQmsv6GExPhybtmco4JTamynmiG93pfN4273dFM2DytdAj4p\n+6OjVY+Pjpeq8iEH283bfZWnx/1juV3lEnB0KKhORlVDeihvz0MPm3K/3W0P78dfAo4Ou2jA\nkDYfN7Y9/LFZ4xLwR/l4qLOZKoZUPn+F+PcLRfnsykvAH+XTW52rrf8hB5tPIW09I7GycnxS\nqnO1Ch9y8PtnpIftx/vxl4CjUrEjr9oxrFFftXOOREXvR0jVdpLfbGBItbeQkBhR9R0kJAZU\nfwMJifE02D9CYjgtto+QGE2T3SMkBtNm8wiJsTTaO0JiJM1+31lIDKTdxhES42i4b4TEMFpu\nGyExiqa7RkgMou2mERJjaLxnhMQQWm8ZITGC5jtGSPQvwf/tlJDoXobtIiR6l2K3CInO5dgs\nQqJvSfaKkOhalq0iJHqWZqcIiY7l2ShCol+J9omQ6FWCY9gjIdGpXJtESPQp2R4REl3KtkWE\nRI/S7RAh0aF8G0RI9Cfh/hAS3cm4PYREb1LuDiHRl1THsEdCoitZt4aQ6EnanSEkOpJ3YwiJ\nfiTeF0KiG5m3hZDoRepdISQ6kXtTCIk+JN8TQqIHSY9hj4REB/JvCCGRXwf7QUik18N2EBLZ\ndbEbhERyfWwGIZFbJ3tBSKTWy1YQEpl1sxOERF7pj2GPhERaPW0DIZFVV7tASCTV1yYQEjl1\ntgeEREq9bQEhkVF3O0BIJNTfBhAS+XS4/kIim46OYY+ERB5l31Cfiy8ksjg8E3X5dLQTEnkc\nVl1IkTqdTJYoHw99Lr6QSKJ8euuPkEiiHJ+UOiQksigddyQk0vCqXbhOJ5MFDkdIvWYkJJLo\nfcmFRAbdr7iQSKD/BRcS7Q2w3kKiuRGWW0i0NsRqC4nGxlhsIdHWIGstJFrq9wT2L0KioXEW\nWki0M9A6C4lmRlpmIdHKUKssJBoZa5GFRBuDrbGQaGK0JRYSLQy3wkKivmGOYY+ERHUjLq+Q\nqG3I1RUSlY25uEKirkHXVkhUNerSComahl1ZIVHRuAsrJOoZeF2FRC0DHsMeCYlKxl5UIVHH\n4GsqJKoYfUmFRA3Dr6iQqGD8BRUS65tgPYXE6mZYTiGxtilWU0isa+hj2CMhsapZllJIrGma\nlRQSK5pnIYXEeiZaRyGxmpmWUUisZapVFBIrmWsRhcQ6JltDIbGGSY5hj4TECuZbQCERb8L1\nExLhZlw+IRFtytUTEsHmXDwhEWvStRMSoWZdOiERadqVExJxpjuGPRISYWZeNiERZepVExJB\n5l40IRFj8jUTEiFmXzIhEWH6FRMSASyYkFjOegmJxSY+hj2qGdL2YfP2+HhTyu2vlS5BVWXf\nkMXaqxjS6+Zt2rdvD3u3q1yCmg7PRJ6O3lUM6b7cbd8e7l/fmrovD2tcgpoOqySkdxVDKmX7\n8fD2XV7ZrHEJKiofDxZrr2pIbw+b8uk//PXXn1x5CSoqn96o+q3dy273uH/YPyN9+0OStelA\nOT4pUTOkl7J5eNndbd5Ker4pz2tcgpqKjo5qvvz9vDl+7/a4ziWoyKt2n9Q9kP11f7Ov6O7x\ndbVLUMnhCElGv/nNBq5iif5LSFzDCv1FSFzBAv1NSFzO+vxDSFzM8vxLSFzK6nxBSFzI4nxF\nSFzEydHXhMQlrMwJQuICFuYUIXE+63KSkDibZTlNSJzLqnxDSJzJonxHSJzHmnxLSJzD8dEP\nhMQZLMhPhMTPrMePhMSPLMfPhMRPrMYZhMQPLMY5hMT3rMVZhMS3LMV5hMQ3HB+dS0icZh3O\nJiROsgznExKnWIULCIkTLMIlhMTXrMFFhMSXLMFlhMQXvOx9KSHxL/N/MSHxD9N/OSHxN7N/\nBSHxF5N/DSHxX+b+KkLiP0z9dYTEZ2b+SkLiyPHR1YTEH6b9ekLiN7O+gJD4YNKXEBLvzPki\nQuLAlC8jJPbM+EJCYmfClxMSjo8CCAmzHUBI0zPZEYQ0O3MdQkiTM9UxhDQ3Mx1ESFMz0VGE\nNDPzHEZI83J8FEhI0zLJkYQ0K3McSkiTMsWxhDQnMxxMSFMywdGCQnp52Cweyg+XII75DRcR\n0uvjTSlC6ofpjbc4pO2vt4rK7XPQeL66BFHK/ujI8dEaFob067bsvYaN599LEOVQkIzWsSSk\n5/u3hjYPL/FrY7HXcJhVIa1jQUibfUX/262xNhZ7BeXjweSuYUFIpTz8fidsOH9dgkDl0xvR\nPCNNoxyflAgX8DPS/4TUh6Kj9XjVbh5etVtR0DnSnXOk7N6PkMzsSvxmwyRM6br8rt0czOjK\n/Pb3FEzo2oQ0A/O5OiFNwHSuT0jjM5sVCGl4JrMGIY3OXFYhpLE5ga1ESEMzkbUIaWTmsRoh\nDcw01iOkcZnFioQ0LJNYk5BGZQ6rEtKgTGFdQhqTGaxMSCNyDFudkAZk+uoT0njMXgNCGo7J\na0FIozF3TQhpMKauDSGNxcw1IqShmLhWhDQS89aMkMbhGLYhIQ3DpLUkpFGYs6aENAhT1paQ\nxmDGGhPSEExYa0IagflqTkgDMF3tCal/ZisBIfXOMWwKQuqcqcpBSH0zU0kIqWsmKgsh9cw8\npSGkjpmmPITUL7OUiJC6ZZIyEVKvzFEqQuqTY9hkhNQlE5SNkHpkftIRUodMTz5C6o/ZSUhI\n3TE5GQmpN+YmJSF1xtTkJKS+mJmkhNQTx7BpCakjpiUvIfXDrCQmpG6YlMyE1AtzkpqQOmFK\nchNSH8xIckLqggnJTkg9MB/pCSk/x7AdaBLSjzvDznlX9jNlMnogpLwO0+TpqA8VQyr/tcYl\nxnKYBSH1oWJI/9sI6RLl48Fk9KDmt3bbu3L7evgMX32KsyubRfn0RnZ1f0b6VcqvnZ+RzlOO\nT0qkV/nFhtfbcrcV0nmKjvpR/VW7x7J5FtJZvGrXkfovf7/c/PwzkM3zXpCfFrvR4hzpXkg/\nMwV98StCOZmBzggppeknoDtCymj2+++QkBKa/Pa7JKR85r77TgkpnalvvltCymbme++YkHJx\nAtspIaUy7Y13T0iZzHrfAxBSIpPe9hCElMecdz0IIaUx5U0PQ0hZzHjPAxFSEhPe8lCElMN8\ndzwYIWXgGLZ7QkpgstsdkpDam+tuByWk5qa62WEJqbWZ7nVgQmpsolsdmpDamudOByekpqa5\n0eEJqaVZ7nMCQmrHMexAhNTMFDc5DSG1MsM9TkRIjUxwi1MRUhvj3+FkhNTE8Dc4HSG1MPr9\nTUhIDQx+e1MSUn1j392khFSbY9ghCamygW9takKqa9w7m5yQqhr2xqYnpJpGvS+EVNOgt8VO\nSDWNeVccCKmaIW+KD0KqZcR74g8h1eEYdnBCqmK4G+IvQqphtPvhH0KqYLDb4QtCWt9Yd8OX\nhLS6oW6GE4S0tpHuhZOEtLKBboVvCGld49wJ3xLSmhzDTkNIKxrkNjiDkNYzxl1wFiGtZoib\n4ExCWssI98DZhLSSAW6BCwhpHf3fARcR0iq6vwEuJKQ19D5+LiakeI5hJySkSGXfUK+DZwkh\nxTk8E3k6mpOQ4hxGLaQ5CSlM+XjocvAsJKQw5dMbsxFSmHJ8UmI6QopTdDQvIcXxqt3EhBTl\ncIQko1kJKUiHQyaQkGL0N2JCCSlEdwMmmJAi9DZewgkpQGfDZQVCWq6v0bIKIS3W1WBZiZAW\ncnLEnpCW6WekrEpIi3QzUFYmpCV6GSerE9ICnQyTCoR0vT5GSRVCuloXg6QSIV2rhzFSjZCu\n4/iI/xDSVdIPkMqEdI3s46M6IV0h+fBoQEiXyz06mhDSxVIPjkaEdKnMY6MZIV0o8dBoSEgX\ncXzE14R0iazjojkhXSDpsEhASOfLOSpSENLZUg6KJIR0roxjIg0hnSnhkEhESGfxsjffE9I5\nso2HdIR0hmTDISEh/SzXaEhJSD9KNRiSEtJPMo2FtIT0g0RDITEhfS/PSEhNSN9xfMSZhPSN\nJMOgA0I6Lcco6IKQTkoxCDohpFMyjIFuCCnvEOiIkLKOgK4IKecA6IyQvrq8jriQkLJdnS4J\nKdfF6ZSQMl2bbgkpz6XpmJCyXJmuCSnHhemckDJcl+4J6dNVdcS1hNT2ogxCSC2vyTCE1O6S\nDERIra7IUITU5oIMRkgtrsdwhFT/cgyoZkjb+1Junz8+ybefpdbOLvthOD5iuYohbTdl7+79\nkyQI6TAGGRGhYkgP5emtpqfN7eGTZAjp54HAeSqGtHn/wNfNzWuKkMrHg5JYrmJIv9vZ3t5+\nFVL57MpLXDaeT2+wTMWQbsr293u3SZ6Rfj8pwUIVQ3oq9x/vvZbbBCF9xKwjAtR8+fvhTz3P\nP3z35lU7OlP1QPbl7vd7r/fNQ3o/QpIRIab9zQYFEWnWkHREqElD0hGx5gxJRwSbMiQdEW3G\nkHREuAlD0hHx5gtJR6xgtpCcwLKKyUKSEeuYKyQdsZKpQtIRa5kpJB2xmolC0hHrmSckHbGi\naULSEWuaJSQdsao5QnIMy8qmCElGrG2GkHTE6iYISUesb/yQdEQFw4ekI2oYPSQdUcXgIemI\nOsYOSUdUMnJIjmGpZuCQZEQ944akIyoaNiQdUdOoIemIqgYNSUfUNWZIOqKyIUPSEbWNGJKO\nqG68kBzD0sBwIcmIFkYLSUc0MVhIOqKNsULSEY0MFZKOaGWkkHREMwOFpCPaGSckHdHQKCE5\nhqWpQUKSEW2NEZKOaGyIkHREayOEpCOaGyAkHdFe/yHpiAS6D0lHZNB7SDoihb5DcgxLEl2H\nJCOy6DkkHZFGxyHpiDz6DUlHJNJtSDoik15D0hGpdBqSjsilz5B0RDI9huQYlnT6CqnsG5IR\n+fQU0uGZyNMRGXUV0uFBSCTUUUjl40FJ5NNZSOXUX0JTnYVU6fpwoY5C+vhvdURCXYXkVTuy\n6imkj3MkyKevkCApIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEAIUEA\nIUEAIUEAIUGApCFBZ67Y5fHhNJP9XoxvmdTjSz24C2W/F+NbJvX4Ug/uQtnvxfiWST2+1IO7\nUPZ7Mb5lUo8v9eAulP1ejG+Z1ONLPbgLZb8X41sm9fhSD+5C2e/F+JZJPb7Ug7tQ9nsxvmVS\njy/14C6U/V6Mb5nU40s9uAtlvxfjWyb1+FIP7kLZ78X4lkk9vtSDu1D2ezG+ZVKPL/XgoBdC\nggBCggBCggBCggBCggBCggBCggBCggBCggBCggBCggBCggBCggBCggBCggDDhPR0UzYP29aj\n+Nb/Mk/2y30p96+tR3HS9mGTe30zr+0lHg7/iMAm8UzvtpvEk/2ce/5eN+/jy1t64rW9xEu5\nf9sDT+W+9UC+cXfNvxZSy2bzstvelYfW4zjh/jCyh8Trm3htL3H3fh+Zt+qvq/7ZnUp+HTbq\ntmxaD+SEkn59847sGokn+rXcJh7dfXlpPYRvfXxXnDb0wULaltvWQzjptrwmDumm7B43h2+P\nc3r8+NbusfVATsq7tld4Ks+th3DKY/mV+fmylLvDD/Otx3HS0/7Vhs1T62GclndtL/e6uWs9\nhFNeyl3qbzzfNunLbnuf9yv+4+FVu7TDGyqk7SbvN3Y3+xeWU4e0/xnptdy0HsgJT/tv7d5C\nz/uUlHdtL3abdRfsf5bff8+ZOqTPf+RzU/Y/vm3Thj5QSK83t4lP6xb8u/NVZD8+yB76OCE9\nJ37BroOQHg9Pma9pJ/H95e+851zDhJR3C3ySNqPDT0fb/c8gv1oP5ISHsv89u4e0v3kxTEj3\nyb/iH2Qe3furYnm/Gt0mH98oIWX/1ukg9eieb8sm79f7t2ejTe7xZV5b6IaQIICQIICQIICQ\nIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQ\nIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQelPK613ZPB7e\nf7opN0+Nx8OBkHpTyqa82Zd0u3+n3LYeETsh9eetnO3uqdzsdr/K5mX3sim/Wg8JIfWnlP8d\nHne7u/L89t6zp6QMhNSbUn4/vr/3+w+asgi9EVJKFqE3QkrJIvTmGNLvn5HuGo+InZD6cwzJ\nq3aJCKk3x5CcIyUipN58Cmn3tPGbDUkICQIICQIICQIICQIICQIICQIICQIICQIICQIICQII\nCQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQIICQII\nCQIICQIICQL8H5sEkT1X9RA0AAAAAElFTkSuQmCC",
+ "text/plain": [
+ "plot without title"
+ ]
+ },
+ "metadata": {
+ "image/png": {
+ "height": 420,
+ "width": 420
+ }
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plot(df$A,type = 'o',xlab = \"no\",ylab = \"A\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "id": "41b872c9",
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAANICAMAAADKOT/pAAAAM1BMVEUAAABNTU1oaGh8fHyM\njIyampqnp6eysrK9vb2+vr7Hx8fQ0NDZ2dnh4eHp6enw8PD////ojgWfAAAACXBIWXMAABJ0\nAAASdAHeZh94AAAaE0lEQVR4nO3d63LeRrIsUFD3I1ki3/9pt6SxtrWPjYHBTnxINNb6QdOe\nYFVFMzM4shQzywswbDn7AJiBIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGA\nIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGA\nIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGA\nIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGA\nIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGA\nIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAIkGAInFly4joIclh8GDL/3s9RYI/\nKRIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIE\nKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIE\nKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIEKBIE\nKBIEKBIEKBIEKBIEKBIEKBIEKBIEXL9I394vTx9fXj69WZ4+JA+CHS5fpOen5btPH398XN4m\nL4J/7/JF+rB8/zn04Wl5//zy/PNzOMHli/T08wuX5fnnX55i98Aely/Ssvz18ddf/s9//JvX\nX8eslhG/z7l6kZ5+K9Lzf/+JpEj8TaoAly/Sr18jfXj+8/P8CiamSH/a8W/tFIm/UaRf/v3v\nIykSf6NIlSu4GkWqXMHVKFLlCq5GkSpXcDWKVLmCq1GkyhVcjSJVruBqFKlyBVejSJUruBpF\nqlzB1ShS5QquRpEqV3A1ilS5gqtRpMoVXI0iVa7gahSpcgVXo0iVK7gaRapcwdUoUuUKrkaR\nKldwNYpUuYKrUaTKFVyNIlWu4GoUqXIFV6NIlSu4GkWqXMHVKFLlCq5GkSpXcDWKVLmCq1Gk\nyhVcjSJVruBqFKlyBVejSJUruBpFqlzB1ShS5QquRpEqV3A1ilS5gqtRpMoVXI0iVa7gahSp\ncgVXo0iVK7gaRapcwdUoUuUKrkaRKldwNYpUuYKrUaTKFVyNIlWu4GoUqXIFV6NIlSu4GkWq\nXMHVKFLlCq5GkSpXcDWKVLmCq1GkyhVcjSJVruBqFKlyBVejSJUruBpFqlzB1ShS5QquRpEq\nV3A1ilS5gqtRpMoVXI0iVa7gahSpcgUPsgz5fZAiNa7gQUZye0gBFIkrUqTVl0kOO20FD6JI\nqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2GkreBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWS\nw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JIqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2Gkr\neBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWSw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JI\nqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2GkreBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWS\nw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JIqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2Gkr\neBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWSw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JI\nqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2GkreBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWS\nw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JIqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2Gkr\neBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWSw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JI\nqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2GkreBBFWn2Z5LDTVvAgirT6Mslhp63gQRRp9WWS\nw05bwYMo0urLJIedtoIHUaTVl0kOO20FD6JIqy+THHbaCh5EkVZfJjnstBU8iCKtvkxy2Gkr\neBBFWn2Z5LDTVvAgirT6Mq/9wucPT98/fnyzLG8/H7SCOoq0+jKv/LpvT8vy8vz9ww9vD1lB\nH0VafZlXft375d3z9w/vv33v1PvlwxEr6KNIqy/z2q9bnv/88P2/5S1PR6ygjyKtvsxrv+7H\nFz4tv/3N//cf/+b11xGyjPh9zkBuFemfvF++vrx8/PHhx0+k//qLJEU6X0NuFemffF2ePnx9\neff0vUlf3ixfjlhBTkNuFekffXn662f/x2NWENOQW0Va8fn9mx8tevfx22ErCGnIrSL1r2BD\nQ24VqX8FGxpyq0j9K9jQkFtF6l/BhobcKlL/CjY05FaR+lewoSG3itS/gg0NuVWk/hVsaMit\nIvWvYENDbhWpfwUbGnKrSP0r2NCQW0XqX8GGhtwqUv8KNjTkVpH6V7ChIbeK1L+CDQ25VaT+\nFWxoyK0i9a9gQ0NuFal/BRsacqtI/SvY0JBbRepfwYaG3CpS/wo2NORWkfpXsKEht4rUv4IN\nDblVpP4VbGjIrSL1r2BDQ24VqX8FGxpyq0j9K9jQkFtF6l/BhobcKlL/CjY05FaR+lewoSG3\nitS/gg0NuVWk/hVsaMitIvWvYENDbhWpfwUbGnKrSP0r2NCQW0XqX8GGhtwqUv8KNjTkVpH6\nV7ChIbeK1L+CDQ25VaT+FWxoyK0i9a9gQ0NuFal/BRsacqtI/SvY0JBbRepfwYaG3CpS/wo2\nNORWkfpXsKEht4rUv4INDblVpP4VbGjIrSL1r2BDQ24VqX8FGxpyq0j9K9jQkFtF6l/Bhobc\nKlL/CjY05FaR+lewoSG3itS/gg0NuVWk/hVsaMitIvWvYENDbhWpfwUbGnKrSP0r2NCQW0Xq\nX8GGhtwqUv8KNjTkVpH6V7ChIbeK1L+CDQ25VaT+FbNahvw+qCC3itS/YlZT5bbuIEW6jaly\nW3eQIt3GVLmtO0iRbmOq3NYdpEi3MVVu6w5SpNuYKrd1BynSbUyV27qDFOk2pspt3UGKdBtT\n5bbuIEW6jalyW3eQIt3GVLmtO0iRbmOq3NYdpEi3MVVu6w5SpNuYKrd1BynSbUyV27qDFOk2\npspt3UGKdBtT5bbuIEW6jalyW3eQIt3GVLmtO0iRbmOq3NYdpEi3MVVu6w5SpNuYKrd1BynS\nbUyV27qDFOk2pspt3UGKdBtT5bbuIEW6jalyW3eQIt3GVLmtO0iRbmOq3NYdpEi3MVVu6w5S\npNuYKrd1Bx1epK8fno5ewb8yVW7rDjq2SN8+vlkWReowVW7rDjqwSM+fv7doefsluUGRXm+q\n3NYddFiRPr/9+X/09i05/0WRBkyV27qDjinSl/ffO/T04esSz70ivdpUua076JAiPf1o0R8/\n/oEi9Zgqt3UHHVKkZfnw65Pk+N9XsNtUua07yE+k25gqt3UHHfprpD8UqchUua07yL+1u42p\nclt30OG/j/TO7yOVmCq3dQf5kw23MVVu6w7yZ+1uY6rc1h3kT3/fxlS5rTtIkW5jqtzWHaRI\ntzFVbusOUqTbmCq3dQcp0m1Mldu6gxTpNqbKbd1BinQbU+W27iBFuo2pclt3kCLdxlS5rTtI\nkW5jqtzWHaRItzFVbusOUqTbmCq3dQcp0m1Mldu6gxTpNqbKbd1BinQbU+W27iBFuo2pclt3\nkCLdxlS5rTtIkW5jqtzWHaRItzFVbusOUqTbmCq3dQeVFWnzf1FSkV5tqtzWHaRItzFVbusO\nqijS8n8dsYK5clt3UEWR/nhSpONNldu6gyqK9PL8bnn7838l/B9b9K9bNqdlxO9zBmJSl9u6\ngzqK9PLyeVk+v/g10j9piEldbusOainSy7e3y7tnRfoHDTGpy23dQTVFenn5uDx9UaS/a4hJ\nXW7rDioq0svXN9u/BlKkU2JSl9u6g5qK9PLyXpH+riEmdbmtO6irSBUr2jTEpC63dQcpUr2G\nmNTltu4gRarXEJO63NYdpEj1GmJSl9u6gxSpXkNM6nJbd5Ai1WuISV1u6w5SpHoNManLbd1B\nilSvISZ1ua07SJHqNcSkLrd1BylSvYaY1OW27iBFqtcQk7rc1h2kSPUaYlKX27qDFKleQ0zq\nclt3kCLVa4hJXW7rDlKkeg0xqctt3UGKVK8hJnW5rTtIkeo1xKQut3UHKVK9hpjU5bbuIEWq\n1xCTutzWHaRI9RpiUpfbuoMUqV5DTOpyW3eQItVriEldbusOUqR6DTGpy23dQYpUryEmdbmt\nO0iR6jXEpC63dQcpUr2GmNTltu4gRarXEJO63NYdpEj1GmJSl9u6gxSpXkNM6nJbd5Ai1WuI\nSV1u6w5SpHoNManLbd1BilSvISZ1ua07SJHqNcSkLrd1BylSvYaY1OW27iBFqtcQk7rc1h2k\nSPUaYlKX27qDFKleQ0zqclt3kCLVa4hJXW7rDlKkeg0xqctt3UGKVK8hJnW5rTtIkeo1xKQu\nt3UHKVK9hpjU5bbuIEWq1xCTutzWHaRI9RpiUpfbuoMUqV5DTOpyW3eQItVriEldbusOUqR6\nDTGpy23dQYpUryEmdbmtO0iR6jXEpC63dQcpUr2GmNTltu4gRarXEJO63NYdpEj1GmJSl9u6\ngxSpXkNM6nJbd5Ai1WuISV1u6w5SpHoNManLbd1BilSvISZ1ua07SJHqNcSkLrd1BylSvYaY\n1OW27iBFqtcQk7rc1h2kSPUaYlKX27qDFKleQ0zqclt3kCLVa4hJXW7rDlKkeg0xqctt3UGK\ndJRlyO+DCmJSl9u6gxTpKFPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc5\n7LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxn\nJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvm\njGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRB\ne+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPF\nxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUh\nU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0\nFSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc5\n7LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxn\nJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvm\njGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRB\ne+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPF\nxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUh\nU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0\nFSFTxcRBe+aMZyc57LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc5\n7LQVIVPFxEF75oxnJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZyc57LQVIVPFxEF75oxn\nJznstBUhU8XEQXvmjGcnOey0FSFTxcRBe+aMZ+e1X/j8flnefvlzyH+dokinzHHQnjnj2Xnl\n1z0/LT+8+88QReqLiYP2zBnPziu/7sPy6XubPj29/TlEkfpi4qA9c8az88qve/rPF357evNN\nkQ779jroUXPGs/Par/vzC5/fvv2nIi2/Gzjv3x4z4Pc5A9+Vupg4aM+cYa8d9mZ5/vXZ29N/\nIjV8V+pi4qA9c4a9dtin5f2fn31b3ipScJCDzpgz7NXDPvxve75s/Lc3RXJQ5UEdRXr5+u7X\nZ9/eK1JfTBy0Z86wKf5kQ8N3pS4mDtozZ5gihebUxcRBe+YMU6TQnLqYOGjPnGGKFJpTFxMH\n7ZkzTJFCc+pi4qA9c4YpUmhOXUwctGfOMEUKzamLiYP2zBmmSKE5dTFx0J45wxQpNKcuJg7a\nM2eYIoXm1MXEQXvmDFOk0Jy6mDhoz5xhihSaUxcTB+2ZM0yRQnPqYuKgPXOGKVJoTl1MHLRn\nzjBFCs2pi4mD9swZpkihOXUxcdCeOcMUKTSnLiYO2jNnmCKF5tTFxEF75gxTpNCcupg4aM+c\nYYoUmlMXEwftmTNMkUJz6mLioD1zhilSaE5dTBy0Z84wRQrNqYuJg/bMGaZIoTl1MXHQnjnD\nFCk0py4mDtozZ5gihebUxcRBe+YMU6TQnLqYOGjPnGGKFJpTFxMH7ZkzTJFCc+pi4qA9c4Yp\nUmhOXUwctGfOMEUKzamLiYP2zBmmSKE5dTFx0J45wxQpNKcuJg7aM2eYIoXm1MXEQXvmDFOk\n0Jy6mDhoz5xhihSaUxcTB+2ZM0yRQnPqYuKgPXOGKVJoTl1MHLRnzjBFCs2pi4mD9swZpkih\nOXUxcdCeOcMUKTSnLiYO2jNnmCKF5tTFxEF75gxTpNCcupg4aM+cYYoUmlMXEwftmTNMkUJz\n6mLioD1zhilSaE5dTBy0Z84wRQrNqYuJg/bMGaZIoTl1MXHQnjnDFCk0py4mDtozZ5gihebU\nxcRBe+YMU6TQnLqYOGjPnGGKFJpTFxMH7ZkzTJFCc+pi4qA9c4YpUmhOXUwctGfOMEUKzamL\niYP2zBmmSKE5dTFx0J45wxQpNKcuJg7aM2eYIoXm1MXEQXvmDFOk0Jy6mDhoz5xhihSaUxcT\nB+2ZM0yRQnPqYuKgPXOGKVJoTl1MHLRnzjBFCs2pi4mD9swZdmaRliG/Dyr4rtTFxEF75gw7\ntUgDj1D3XXHQ5Q5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiR\njnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQ\nrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU6\n6jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3\nOkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo\n13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93q\nIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe\n00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuD\nFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpN\nB93qIEU66jUddKuDFOmo13TQrQ5SpKNe00G3OkiRjnpNB93qoJIi/fHx3fLDuw9/vHLFVN8V\nB13uoIoiPb9Z/vL2dSum+q446HIHVRTpw/L0+evPz759eVo+vGrFVN8VB13uoIoiPS1f//fz\nr8vTq1ZM9V1x0OUOqijSsqz9zZ//5DfrM0akBjnotgetBfM1HvATCeY38GukL99+frb5aySY\n36t/vL397Ufkm+fkSXA9A7+P9OHn7yM9vfu48ftIML8H/MkGmJ8iQYAiQYAiQYAiQYAiQYAi\nQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAi\nQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAi\nQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAi\nQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAiQYAi\nQYAiQYAiQYAiQYAiXc2yfHu3PH38+fmnN8ubTyffw0+KdDXL8rR896NJb398srw9+yJeFOl6\nvjfn+eXT8ubl5fPy9PXl69Py+eyTUKTrWZY/fn58eXm3fPn+2Rc/khoo0tUsy6+P//ns1184\nlW/C1ShSJd+Eq1GkSr4JV/NXkX79GundyRfxokjX81eR/Fu7Iop0NX8Vye8jFVGkq/mtSC+f\nnvzJhhKKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGK\nBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAGKBAH/Ayzv44rlEgIU\nAAAAAElFTkSuQmCC",
+ "text/plain": [
+ "plot without title"
+ ]
+ },
+ "metadata": {
+ "image/png": {
+ "height": 420,
+ "width": 420
+ }
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "barplot(df$A, ylab = 'A',xlab = 'no')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "11001454",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "id": "670db495",
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "4.1.1"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
From 69b9b4d2cf36bfe2badf6a959c1fc2b9967b82f5 Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Mon, 11 Oct 2021 21:09:21 -0500
Subject: [PATCH 092/319] fix: Solve issue with broken reference on sketchdoc
image
---
1-Introduction/03-defining-data/translations/README.es.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/1-Introduction/03-defining-data/translations/README.es.md b/1-Introduction/03-defining-data/translations/README.es.md
index 16d11409..c30a37d7 100644
--- a/1-Introduction/03-defining-data/translations/README.es.md
+++ b/1-Introduction/03-defining-data/translations/README.es.md
@@ -1,6 +1,6 @@
# Definiendo datos
-| ](../../sketchnotes/03-DefiningData.png)|
+| ](../../../sketchnotes/03-DefiningData.png)|
|:---:|
|Definiendo datos - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
From ef2294e698452233a9c917e4e04806f6bf56afd3 Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Mon, 11 Oct 2021 21:10:59 -0500
Subject: [PATCH 093/319] feat:(translation) Improve title for module 1 section
3
---
1-Introduction/03-defining-data/translations/README.es.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/1-Introduction/03-defining-data/translations/README.es.md b/1-Introduction/03-defining-data/translations/README.es.md
index c30a37d7..4f709ece 100644
--- a/1-Introduction/03-defining-data/translations/README.es.md
+++ b/1-Introduction/03-defining-data/translations/README.es.md
@@ -1,4 +1,4 @@
-# Definiendo datos
+# Definiendo los datos
| ](../../../sketchnotes/03-DefiningData.png)|
|:---:|
From 1466c6fbfbfe12bf1e3189ca0970a05bb0b7f595 Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Mon, 11 Oct 2021 21:31:34 -0500
Subject: [PATCH 094/319] fix(translation): Fix broken link on assignment
reference
---
1-Introduction/03-defining-data/translations/README.es.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/1-Introduction/03-defining-data/translations/README.es.md b/1-Introduction/03-defining-data/translations/README.es.md
index 4f709ece..c59b5718 100644
--- a/1-Introduction/03-defining-data/translations/README.es.md
+++ b/1-Introduction/03-defining-data/translations/README.es.md
@@ -66,4 +66,4 @@ Kaggle es una fuente excelente de conjuntos de datos abiertos. Usa los [conjunto
## Assignación
-[Clasificación de conjuntos de datos](assignment.md)
+[Clasificación de los conjuntos de datos](../assignment.md)
From 0998692372bce42a7b12329bed6b01351e8ca139 Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Mon, 11 Oct 2021 21:36:46 -0500
Subject: [PATCH 095/319] fix(translation): Translate title on main readme
---
1-Introduction/translations/README.es.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/1-Introduction/translations/README.es.md b/1-Introduction/translations/README.es.md
index e905bf11..9e29302b 100644
--- a/1-Introduction/translations/README.es.md
+++ b/1-Introduction/translations/README.es.md
@@ -1,4 +1,4 @@
-# Introduction to Data Science
+# Introducción a la Ciencia de Datos

> Fotografía de Stephen Dawson en Unsplash
From 7ac9533ebbd21677c827317e6a5e77736f0e5156 Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Mon, 11 Oct 2021 21:38:35 -0500
Subject: [PATCH 096/319] fix:(translation): Add missing word (article) on
link's label
---
1-Introduction/03-defining-data/translations/README.es.md | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/1-Introduction/03-defining-data/translations/README.es.md b/1-Introduction/03-defining-data/translations/README.es.md
index c59b5718..656be467 100644
--- a/1-Introduction/03-defining-data/translations/README.es.md
+++ b/1-Introduction/03-defining-data/translations/README.es.md
@@ -2,14 +2,14 @@
| ](../../../sketchnotes/03-DefiningData.png)|
|:---:|
-|Definiendo datos - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
+|Definiendo los datos - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
Los datos son hechos, información, observaciones y mediciones que son usados para realizar descubrimientos y soportar decisiones informadas. Un punto de datos es una unidad simple de datos dentro de un conjunto de datos, lo cual es una colección de puntos de datos. Los conjuntos de datos pueden venir en distintos formatos y estructuras, y comúnmente se basan en su fuente, o de donde provienen los datos. Por ejemplo, las ganancias mensuales de una compañía pueden estar en una hoja de cálculo, pero los datos del ritmo cardiaco por hora de un reloj inteligente pueden estar en un formato [JSON](https://stackoverflow.com/a/383699). Es algo común para los científicos de datos el trabajar con distintos tipos de datos dentro de un conjunto de datos.
Esta lección se enfoca en la identificación y clasificación de datos por sus características y sus fuentes.
## [Examen previo a la lección](https://red-water-0103e7a0f.azurestaticapps.net/quiz/4)
-## Como se describen los datos
+## Cómo se describen los datos
Los **datos en crudo** son datos que provienen de su fuente en su estado inicial y estos no han sido analizados u organizados. Con el fin de que tenga sentido lo que sucede con un conjunto de datos, es necesario organizarlos en un formato que pueda ser entendido tanto por humanos como por la tecnología usada para analizarla a mayor detalle. La estructura de un conjunto de datos describe como está organizado y puede ser clasificado de forma estructurada, no estructurada y semi-estructurada. Estos tipos de estructuras podrían variar, dependiendo de la fuente pero finalmente caerá en una de estas categorías.
### Datos cuantitativos
Los datos cuantitativos son observaciones numéricas en un conjunto de datos que puede ser típicamente analizados, medidos y usados matemáticamente. Algunos ejemplos de datos cuantitativos son: la población de un país, la altura de una persona o las ganancias trimestrales de una compañía. Con algo de análisis adicional, los datos cuantitativos podrían ser usados para descubrir tendencias de temporada en el índice de calidad del aire (AQI) o estimar la probabilidad la hora pico de embotellamiento vial en un día laboral típico.
From 86e2690581a0786cdd6e748cafe31d83ccd4c29d Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Mon, 11 Oct 2021 21:43:44 -0500
Subject: [PATCH 097/319] fix: Solve issue with broken link on first image
* set right path for main image on module 1 main page
---
1-Introduction/translations/README.es.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/1-Introduction/translations/README.es.md b/1-Introduction/translations/README.es.md
index 9e29302b..75d3b4cc 100644
--- a/1-Introduction/translations/README.es.md
+++ b/1-Introduction/translations/README.es.md
@@ -1,6 +1,6 @@
# Introducción a la Ciencia de Datos
-
+
> Fotografía de Stephen Dawson en Unsplash
En estas lecciones descubrirás cómo se define la Ciencia de Datos y aprenderás acerca de
From 8cce81d30007bb27a71144ca34beafc25e1deed4 Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Mon, 11 Oct 2021 21:47:13 -0500
Subject: [PATCH 098/319] fix(translation): Fix typo on module 1 section 3
Change Lean with Learn
---
1-Introduction/03-defining-data/translations/README.es.md | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/1-Introduction/03-defining-data/translations/README.es.md b/1-Introduction/03-defining-data/translations/README.es.md
index 656be467..27ba6743 100644
--- a/1-Introduction/03-defining-data/translations/README.es.md
+++ b/1-Introduction/03-defining-data/translations/README.es.md
@@ -62,7 +62,7 @@ Kaggle es una fuente excelente de conjuntos de datos abiertos. Usa los [conjunto
## Revisión y auto-estudio
-- Esta unidad de Microsoft Lean, titulada [clasifica tus datos](https://docs.microsoft.com/en-us/learn/modules/choose-storage-approach-in-azure/2-classify-data) tiene un desglose detallado de datos estructurados, semi-estructurados y no estructurados.
+- Esta unidad de Microsoft Learn, titulada [clasifica tus datos](https://docs.microsoft.com/en-us/learn/modules/choose-storage-approach-in-azure/2-classify-data) tiene un desglose detallado de datos estructurados, semi-estructurados y no estructurados.
## Assignación
From c0f0147743d7371e31a78ccae69fb03c2889143d Mon Sep 17 00:00:00 2001
From: Subh Chaturvedi
Date: Tue, 12 Oct 2021 09:53:45 +0530
Subject: [PATCH 099/319] Added the translation for the ethics README
---
.../02-ethics/translations/README.hi.md | 260 ++++++++++++++++++
1 file changed, 260 insertions(+)
create mode 100644 1-Introduction/02-ethics/translations/README.hi.md
diff --git a/1-Introduction/02-ethics/translations/README.hi.md b/1-Introduction/02-ethics/translations/README.hi.md
new file mode 100644
index 00000000..a83a7e03
--- /dev/null
+++ b/1-Introduction/02-ethics/translations/README.hi.md
@@ -0,0 +1,260 @@
+# डेटा नैतिकता का परिचय
+
+| ](../../../sketchnotes/02-Ethics.png)|
+|:---:|
+| डेटा विज्ञान नैतिकता - _[@nitya](https://twitter.com/nitya) द्वारा स्केचनोट_ |
+
+---
+
+हम सब इस डाटा-फाइड दुनिया में रहने वाले डाटा-नागरिक है |
+
+बाजार के रुझान यह दर्शाते हैं कि २०२२ तक, तीन में से एक बड़ी संस्था अपना डाटा कि खरीद और बेचना ऑनलाइन [दुकानों](https://www.gartner.com/smarterwithgartner/gartner-top-10-trends-in-data-and-analytics-for-2020/) द्वारा करेंगी | **ऐप डेवलपर** के रूप में, हम डेटा-संचालित अंतर्दृष्टि और एल्गोरिथम-चालित स्वचालन को दैनिक उपयोगकर्ता अनुभवों में एकीकृत करना आसान और सस्ता पाएंगे। लेकिन जैसे-जैसे AI व्यापक होता जाएगा, हमें इस तरह के एल्गोरिदम के [हथियारीकरण](https://www.youtube.com/watch?v=TQHs8SA1qpk) से होने वाले संभावित नुकसान को भी समझना होगा ।
+
+रुझान यह भी संकेत देते हैं कि हम २०२५ तक [180 zettabytes](https://www.statista.com/statistics/871513/worldwide-data-created/) डेटा का निर्माण और उपभोग करेंगे । **डेटा वैज्ञानिक** के रूप में, यह हमें व्यक्तिगत डेटा तक पहुंचने के लिये अभूतपूर्व स्तर प्रदान करता है । इसका मतलब है कि हम उपयोगकर्ताओं के व्यवहार संबंधी प्रोफाइल बना सकते हैं और निर्णय लेने को इस तरह से प्रभावित कर सकते हैं जो संभावित रूप से एक [मुक्त इच्छा का भ्रम](https://www.datasciencecentral.com/profiles/blogs/the-illusion-of-choice) पैदा करता है जब्कि वह उपयोगकर्ताओं को हमारे द्वारा पसंद किए जाने वाले परिणामों की ओर आकर्षित करना । यह डेटा गोपनीयता और उपयोगकर्ता की सुरक्षा पर भी व्यापक प्रश्न उठाता है ।
+
+डेटा नैतिकता अब डेटा विज्ञान और इंजीनियरिंग का _आवश्यक रक्षक_ हैं, जिससे हमें अपने डेटा-संचालित कार्यों से संभावित नुकसान और अनपेक्षित परिणामों को नीचे रखने में मदद मिलती है । [AI के लिए गार्टनर हाइप साइकिल](https://www.gartner.com/smarterwithgartner/2-megatrends-dominate-the-gartner-hype-cycle-for-artificial-intelligence-2020/) डिजिटल नैतिकता में उचित रुझानों की पहचान करता है AI के _democratization_ और _industrialization_ के आसपास बड़े मेगाट्रेंड के लिए प्रमुख ड्राइवर के रूप में जिम्मेदार AI की ज़िम्मेदारी और AI शासन ।
+
+
+
+
+इस पाठ में, हम डेटा नैतिकता के आकर्षक क्षेत्र के बारे में सीखेंगे - मूल अवधारणाओं और चुनौतियों से लेकर केस-स्टडी और शासन जैसी एप्लाइड AI अवधारणाओं तक - जो डेटा और AI के साथ काम करने वाली समूह और संगठनों में नैतिकता संस्कृति स्थापित करने में मदद करते हैं ।
+
+## [पाठ से पहले की प्रश्नोत्तरी](https://red-water-0103e7a0f.azurestaticapps.net/quiz/2) 🎯
+
+## मूल परिभाषाएं
+
+आइए बुनियादी शब्दावली को समझना शुरू करें ।
+
+"नैतिकता" [ग्रीक शब्द "एथिकोस"](https://en.wikipedia.org/wiki/Ethics) (और इसकी जड़ "एथोस") से आया है जिसका अर्थ _चरित्र या नैतिक प्रकृति_ होता है ।
+
+**नैतिकता** उन साझा मूल्यों और नैतिक सिद्धांतों के बारे में है जो समाज में हमारे व्यवहार को नियंत्रित करते हैं । नैतिकता कानूनों पर नहीं बल्कि "सही बनाम गलत" के व्यापक रूप से स्वीकृत मानदंड पर आधारित है । लेकिन , नैतिक विचार कॉर्पोरेट प्रशासन की पहल और अनुपालन के लिए अधिक प्रोत्साहन पैदा करने वाले सरकारी नियमों को प्रभावित कर सकते हैं ।
+
+**डेटा नैतिकता** एक [नैतिकता की नई शाखा](https://royalsocietypublishing.org/doi/full/10.1098/rsta.2016.0360#sec-1) है जो "_डेटा, एल्गोरिदम और से संबंधित नैतिक समस्याओं का अध्ययन और मूल्यांकन करती है_" । यहां, **"डेटा"** - निर्माण, रिकॉर्डिंग, अवधि, प्रसंस्करण प्रसार, साझाकरण और उपयोग से संबंधित कार्यों पर केंद्रित है, **"एल्गोरिदम"** AI , एजेंटों, मशीन लर्निंग और रोबोटो पर केंद्रित है, और ** "अभ्यास"** जिम्मेदार नवाचार, प्रोग्रामिंग, हैकिंग और नैतिकता कोड जैसे विषयों पर केंद्रित है ।
+
+**एप्लाइड नैतिकता** [नैतिक विचारों का व्यावहारिक अनुप्रयोग](https://en.wikipedia.org/wiki/Applied_ethics) है । यह _वास्तविक दुनिया की कार्रवाइयों, उत्पादों और प्रक्रियाओं_ के संदर्भ में नैतिक मुद्दों की सक्रिय रूप से जांच करने और सुधारात्मक उपाय करने की प्रक्रिया है ताकि ये हमारे परिभाषित नैतिक मूल्यों के साथ संरेखित रहें ।
+
+**नैतिकता संस्कृति** यह सुनिश्चित करने के लिए [_operationalizing_ एप्लाइड नैतिकता](https://hbr.org/2019/05/how-to-design-an-ethical-organization) के बारे में है कि हमारे नैतिक सिद्धांतों और प्रथाओं को पूरे संगठन में एक सुसंगत और मापनीय तरीके से अपनाया जाए । सफल नैतिक संस्कृतियाँ संगठन-व्यापी नैतिक सिद्धांतों को परिभाषित करती हैं, अनुपालन के लिए सार्थक प्रोत्साहन प्रदान करती हैं, और संगठन के हर स्तर पर वांछित व्यवहारों को प्रोत्साहित और प्रवर्धित करके नैतिक मानदंडों को सुदृढ़ करती हैं ।
+
+
+## नैतिकता की अवधारणाएं
+
+इस खंड में, हम डेटा नैतिकता के लिए साझा मूल्यों (सिद्धांतों) और नैतिक चुनौतियों (समस्याओं) जैसी अवधारणाओं पर चर्चा करेंगे - और मामले के अध्ययन का पता लगाएंगे जो आपको वास्तविक दुनिया के संदर्भों में इन अवधारणाओं को समझने में मदद करते हैं ।
+
+### 1. नैतिक सिद्धांत
+
+प्रत्येक डेटा नैतिकता रणनीति _नैतिक सिद्धांतों_ को परिभाषित करके शुरू होती है - "साझा मूल्य" जो स्वीकार्य व्यवहारों का वर्णन करते हैं, और हमारे डेटा और AI परियोजनाओं में अनुपालन कार्यों का मार्गदर्शन करते हैं । लेकिन, अधिकांश बड़े संगठन इन्हें एक _नैतिक AI_ मिशन स्टेटमेंट या फ्रेमवर्क में रेखांकित करते हैं जो कॉर्पोरेट स्तर पर परिभाषित होता है और सभी टीमों में लगातार लागू होता है ।
+
+**उदाहरण:** माइक्रोसॉफ्ट की [Responsible AI](https://www.microsoft.com/en-us/ai/responsible-ai) मिशन स्टेटमेंट कहती है : _"हम नैतिक सिद्धांतों द्वारा संचालित AI की उन्नति के लिए प्रतिबद्ध हैं जो लोगों को सबसे पहले रखते हैं |"_ - नीचे दिए गए ढांचे में 6 नैतिक सिद्धांतों की वार्ना की गयी है :
+
+
+
+आइए संक्षेप में इन सिद्धांतों के बारे में सीखे | _पारदर्शिता_ और _जवाबदेही_ वह मूलभूत मूल्य हैं जिन पर अन्य सिद्धांतों का निर्माण किया गया है - तो चलिए वहां शुरु करते हैं :
+
+* [**जवाबदेही**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) उपयोगकर्ताओं को उनके डेटा और AI संचालन, और इन नैतिक सिद्धांतों के अनुपालन के लिए _जिम्मेदार_ बनाती है ।
+* [**पारदर्शिता**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) सुनिश्चित करती है कि डेटा और AI क्रियाएं उपयोगकर्ताओं के लिए _समझने योग्य_ (व्याख्या योग्य) हैं, यह बताते हुए कि निर्णयों के पीछे क्या और क्यों है ।
+* [**निष्पक्षता**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1%3aprimaryr6) - यह सुनिश्चित करने पर ध्यान केंद्रित करती है कि AI डेटा और सिस्टम में किसी भी प्रणालीगत या निहित सामाजिक-तकनीकी पूर्वाग्रहों को संबोधित करते हुए _सभी लोगों_ के साथ उचित व्यवहार करता है ।
+* [**विश्वसनीयता और अहनिकारकता**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - सुनिश्चित करती है कि AI- संभावित नुकसान या अनपेक्षित परिणामों को कम करते हुए परिभाषित मूल्यों के साथ _लगातार_ काम करता है ।
+* [**निजता एवं सुरक्षा**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - डेटा वंश को समझने, और उपयोगकर्ताओं को _डेटा गोपनीयता और संबंधित सुरक्षा_ प्रदान करने के बारे में है ।
+* [**समग्रता**](https://www.microsoft.com/en-us/ai/responsible-ai?activetab=pivot1:primaryr6) - AI समाधानों को इरादे से डिजाइन करना एवं उन्हें _मानवीय आवश्यकताओं की एक विस्तृत श्रृंखला_ और क्षमताओं को पूरा करने के लिए अनुकूलित करने के बारे में है ।
+
+> 🚨 अपने डेटा नैतिकता मिशन वक्तव्य के बारे में सोचें | अन्य संगठनों से नैतिक AI ढांचों का अन्वेषण करें - ये हैं कुछ उदाहरण [IBM](https://www.ibm.com/cloud/learn/ai-ethics), [Google](https://ai.google/principles) ,एवं [Facebook](https://ai.facebook.com/blog/facebooks-five-pillars-of-responsible-ai/) | इनके बीच क्या साझा मूल्य हैं? ये सिद्धांत उनके द्वारा संचालित AI उत्पाद या उद्योग से कैसे संबंधित हैं ?
+
+### 2. नैतिकता से जुडी चुनौतियां
+
+एक बार जब हमारे पास नैतिक सिद्धांत परिभाषित हो जाते हैं, तो अगला कदम यह देखने के लिए हमारे डेटा और एआई कार्यों का मूल्यांकन करना है कि क्या वे उन साझा मूल्यों के साथ संरेखित हैं । अपने कार्यों के बारे में दो श्रेणियों में सोचें: _डेटा संग्रह_ और _एल्गोरिदम डिज़ाइन_ |
+
+डेटा संग्रह के साथ, कार्रवाइयों में संभवतः पहचान योग्य जीवित व्यक्तियों के लिए **व्यक्तिगत डेटा** या व्यक्तिगत रूप से पहचान योग्य जानकारी शामिल होगी । इसमें [गैर-व्यक्तिगत डेटा के विविध आइटम](https://ec.europa.eu/info/law/law-topic/data-protection/reform/what-personal-data_en) शामिल हैं, जो _collectively_ किसी व्यक्ति की पहचान करते हैं । नैतिक चुनौतियां _डेटा गोपनीयता_, _डेटा स्वामित्व_, और उपयोगकर्ताओं के लिए _सूचित सहमति_ और _बौद्धिक संपदा अधिकार_ जैसे संबंधित विषयों से संबंधित हो सकती हैं ।
+
+एल्गोरिथम डिज़ाइन के साथ, क्रियाओं में **डेटासेट** एकत्र करना और क्यूरेट करना शामिल होगा, फिर उनका उपयोग **डेटा मॉडल** को प्रशिक्षित और तैनात करने के लिए किया जाएगा जो वास्तविक दुनिया के संदर्भों में परिणामों की भविष्यवाणी या स्वचालित निर्णय लेते हैं । एल्गोरिथम डिज़ाइन के साथ, क्रियाओं में **डेटासेट** एकत्र करना और क्यूरेट करना शामिल होगा, फिर उनका उपयोग **डेटा मॉडल** को प्रशिक्षित और तैनात करने के लिए किया जाएगा जो वास्तविक दुनिया के संदर्भों में परिणामों की भविष्यवाणी या स्वचालित निर्णय लेते हैं ।
+
+दोनों ही मामलों में, नैतिकता की चुनौतियाँ उन क्षेत्रों को उजागर करती हैं जहाँ हमारे कार्यों का हमारे साझा मूल्यों के साथ टकराव हो सकता है । इन चिंताओं का पता लगाने, सामना करने, कम करने या समाप्त करने के लिए - हमें अपने कार्यों से संबंधित नैतिक "हां या नहीं" प्रश्न पूछने की जरूरत है, फिर आवश्यकतानुसार सुधारात्मक कार्रवाई करें । आइए कुछ नैतिक चुनौतियों और उनके द्वारा उठाए गए नैतिक प्रश्नों पर एक नज़र डालें :
+
+
+#### 2.1 डेटा स्वामित्व
+
+डेटा संग्रह में अक्सर व्यक्तिगत डेटा शामिल होता है जो डेटा विषयों की पहचान कर सकता है । [डेटा स्वामित्व](https://permission.io/blog/data-ownership) _नियंत्रण_ के बारे में और उन [_उपयोगकर्ता अधिकारो_](https://permission.io/blog/data-ownership)के सम्भंदित है जो निर्माण , प्रसंस्करण और से संबंधित है ।
+
+हमें जो नैतिक प्रश्न पूछने चाहिए, वे हैं :
+ * डेटा का मालिक कौन है ? (उपयोगकर्ता या संगठन)
+ * डेटा विषयों के पास क्या अधिकार हैं ? (उदा: पहुंच, मिटाना, सुवाह्यता)
+ * संगठनों के पास क्या अधिकार हैं ? (उदा: दुर्भावनापूर्ण उपयोगकर्ता समीक्षाओं का सुधार)
+
+#### 2.2 सूचित सहमति
+
+[सूचित सहमति](https://legaldictionary.net/informed-consent/) उद्देश्य, संभावित जोखिमों और विकल्पों सहित प्रासंगिक तथ्यों की _पूर्ण समझ_ के साथ कार्रवाई (जैसे डेटा संग्रह) के लिए सहमत होने वाले उपयोगकर्ताओं के कार्य को परिभाषित करता है ।
+
+यहां देखने लायक प्रश्न हैं :
+ * क्या उपयोगकर्ता (डेटा विषय) ने डेटा कैप्चर और उपयोग के लिए अनुमति दी थी ?
+ * क्या उपयोगकर्ता को वह उद्देश्य समझ में आया जिसके लिए उस डेटा को कैप्चर किया गया था ?
+ * क्या उपयोगकर्ता ने उनकी भागीदारी से संभावित जोखिमों को समझा ?
+
+#### 2.3 बौद्धिक संपदा
+
+[बौद्धिक संपदा](https://en.wikipedia.org/wiki/Intellectual_property) मानव पहल से उत्पन्न अमूर्त कृतियों को संदर्भित करता है, जिनका व्यक्तियों या व्यवसायों के लिए _आर्थिक_ महत्व हो सकता है ।
+
+यहां देखने लायक प्रश्न हैं :
+ * क्या जमा किए गए डेटा का किसी उपयोगकर्ता या व्यवसाय के लिए आर्थिक महत्व है ?
+ * क्या **उपयोगकर्ता** के पास यहां बौद्धिक संपदा है ?
+ * क्या **संगठन** के पास यहां बौद्धिक संपदा है ?
+ * अगर ये अधिकार मौजूद हैं, तो हम उनकी रक्षा कैसे कर रहे हैं ?
+
+#### 2.4 डाटा गोपनीयता
+
+[डेटा गोपनीयता](https://www.northeaster.edu/graduate/blog/what-is-data-privacy/) या सूचना गोपनीयता व्यक्तिगत रूप से पहचान योग्य जानकारी के संबंध में उपयोगकर्ता की गोपनीयता के संरक्षण और उपयोगकर्ता की पहचान की सुरक्षा को संदर्भित करता है ।
+
+यहां देखने लायक प्रश्न हैं :
+ * क्या उपयोगकर्ताओं का (व्यक्तिगत) डेटा हैक और लीक से सुरक्षित है ?
+ * क्या उपयोगकर्ताओं का डेटा केवल अधिकृत उपयोगकर्ताओं और संदर्भों के लिए सुलभ है ?
+ * क्या डेटा साझा या प्रसारित होने पर उपयोगकर्ताओं की गोपनीयता बनी रहती है ?
+ * क्या किसी उपयोगकर्ता की पहचान अज्ञात डेटासेट से की जा सकती है ?
+
+
+#### 2.5 भूला दिया जाने का अधिकार
+
+[भूला दिया जाने का अधिकार](https://en.wikipedia.org/wiki/Right_to_be_forgotten) अतिरिक्त सुविधाएं प्रदान करता है उपयोगकर्ताओं के लिए व्यक्तिगत डेटा सुरक्षा। विशेष रूप से, यह उपयोगकर्ताओं को इंटरनेट खोजों और अन्य स्थानों से व्यक्तिगत डेटा को हटाने या हटाने का अनुरोध करने का अधिकार देता है, _विशिष्ट परिस्थितियों में_ - उन्हें उनके खिलाफ पिछली कार्रवाई किए बिना ऑनलाइन एक नई शुरुआत करने की अनुमति देता है ।
+
+यहां देखने लायक प्रश्न हैं :
+ * क्या सिस्टम डेटा विषयों को अपना डेटा मिटाने का अनुरोध करने की अनुमति देता है ?
+ * क्या उपयोगकर्ता की सहमति वापस लेने से स्वचालित डेटा मिटाना शुरू हो जाएगा ?
+ * क्या डेटा सहमति के बिना या गैरकानूनी तरीके से एकत्र किया गया था ?
+ * क्या हम डेटा गोपनीयता के लिए सरकारी नियमों का अनुपालन करते हैं ?
+
+
+#### 2.6 डेटासेट पूर्वाग्रह
+
+डेटासेट या [संग्रह पूर्वाग्रह](http://researcharticles.com/index.php/bias-in-data-collection-in-research/) एल्गोरिथम विकास के लिए डेटा के _गैर-प्रतिनिधि_ सबसेट का चयन करने के बारे में है, जिसमें संभावित अनुचितता पैदा होती है विभिन्न समूहों के लिए भेदभाव । पूर्वाग्रह के प्रकारों में चयन या नमूना पूर्वाग्रह, स्वयंसेवी पूर्वाग्रह और साधन पूर्वाग्रह शामिल हैं ।
+
+यहां देखने लायक प्रश्न हैं :
+ * क्या हमने डेटा विषयों के प्रतिनिधि सेट की भर्ती की ?
+ * क्या हमने विभिन्न पूर्वाग्रहों के लिए अपने एकत्रित या क्यूरेट किए गए डेटासेट का परीक्षण किय ा?
+ * क्या हम खोजे गए पूर्वाग्रहों को कम कर सकते हैं या हटा सकते हैं ?
+
+#### 2.7 डेटा की गुणवत्ता
+
+[डेटा गुणवत्ता](https://lakefs.io/data-quality-testing/) जो हमारे एल्गोरिदम को विकसित करने के लिए उपयोग किए गए क्यूरेट किए गए डेटासेट की वैधता को देखता है, यह देखने के लिए जाँच करता है कि सुविधाएँ और रिकॉर्ड सटीकता और स्थिरता के स्तर की आवश्यकताओं को पूरा करते हैं या नहीं हमारे AI उद्देश्य के लिए आवश्यक है ।
+
+यहां देखने लायक प्रश्न हैं :
+ * क्या हमने अपने उपयोग के मामले में मान्य _features_ को कैप्चर किया ?
+ * क्या डेटा विविध डेटा स्रोतों से _लगातार_ कैप्चर किया गया था ?
+ * क्या विविध स्थितियों या परिदृश्यों के लिए डेटासेट _पूर्ण_ है ?
+ * क्या वास्तविकता को प्रतिबिंबित करने में जानकारी _सटीक_ रूप से कैप्चर की गई है ?
+
+#### 2.8 एल्गोरिथम की निष्पक्षता
+
+[एल्गोरिदम निष्पक्षता](https://towardsdatascience.com/what-is-algorithm-fairness-3182e161cf9f) यह देखने के लिए जांच करता है कि क्या एल्गोरिथम डिज़ाइन व्यवस्थित रूप से डेटा विषयों के विशिष्ट उपसमूहों के साथ भेदभाव करता है जिससे [संभावित नुकसान](https://docs.microsoft.com/en-us/azure/machine-learning/concept-fairness-ml) होते हैं में _allocation_ (जहां संसाधनों को अस्वीकार कर दिया जाता है या उस समूह से रोक दिया जाता है) और _सेवा की गुणवत्ता_ (जहां AI कुछ उपसमूहों के लिए उतना सटीक नहीं है जितना कि यह दूसरों के लिए है) ।
+
+यहां देखने लायक प्रश्न हैं :
+ * क्या हमने विविध उपसमूहों और स्थितियों के लिए मॉडल सटीकता का मूल्यांकन किया ?
+ * क्या हमने संभावित नुकसान (जैसे, स्टीरियोटाइपिंग) के लिए सिस्टम की जांच की ?
+ * क्या हम पहचाने गए नुकसान को कम करने के लिए डेटा को संशोधित कर सकते हैं या मॉडल को फिर से प्रशिक्षित कर सकते हैं ?
+
+अधिक जानने के लिए [AI फेयरनेस चेकलिस्ट](https://query.prod.cms.rt.microsoft.com/cms/api/am/binary/RE4t6dA) जैसे संसाधनों का अन्वेषण करें ।
+
+#### 2.9 मिथ्या निरूपण
+
+[डेटा मिसरिप्रेजेंटेशन](https://www.sciencedirect.com/topics/computer-science/misrepresentation) यह पूछने के बारे में है कि क्या हम एक वांछित कथा का समर्थन करने के लिए भ्रामक तरीके से ईमानदारी से रिपोर्ट किए गए डेटा से अंतर्दृष्टि का संचार कर रहे हैं ।
+
+यहां देखने लायक प्रश्न हैं :
+ * क्या हम अपूर्ण या गलत डेटा की रिपोर्ट कर रहे हैं ?
+ * क्या हम डेटा को इस तरह से देख रहे हैं जिससे भ्रामक निष्कर्ष निकलते हैं ?
+ * क्या हम परिणामों में हेरफेर करने के लिए चुनिंदा सांख्यिकीय तकनीकों का उपयोग कर रहे हैं ?
+ * क्या ऐसे वैकल्पिक स्पष्टीकरण हैं जो एक अलग निष्कर्ष प्रस्तुत कर सकते हैं ?
+
+#### 2.10 मुक्त चयन
+[इल्यूज़न ऑफ़ फ्री चॉइस](https://www.datasciencecentral.com/profiles/blogs/the-illusion-of-choice) तब होता है जब सिस्टम "चॉइस आर्किटेक्चर" लोगों को पसंदीदा परिणाम लेने के लिए प्रेरित करने के लिए निर्णय लेने वाले एल्गोरिदम का उपयोग करता है। जबकि उन्हें विकल्प और नियंत्रण देना प्रतीत होता है। ये [डार्क पैटर्न](https://www.darkpatterns.org/) उपयोगकर्ताओं को सामाजिक और आर्थिक नुकसान पहुंचा सकते हैं। चूंकि उपयोगकर्ता निर्णय व्यवहार प्रोफाइल को प्रभावित करते हैं, इसलिए ये कार्रवाइयां संभावित रूप से भविष्य के विकल्पों को प्रेरित करती हैं जो इन नुकसानों के प्रभाव को बढ़ा या बढ़ा सकते हैं।
+
+यहां देखने लायक प्रश्न हैं :
+ * क्या उपयोगकर्ता ने उस विकल्प को बनाने के निहितार्थों को समझा ?
+ * क्या उपयोगकर्ता (वैकल्पिक) विकल्पों और प्रत्येक के पेशेवरों और विपक्षों से अवगत था ?
+ * क्या उपयोगकर्ता किसी स्वचालित या प्रभावित विकल्प को बाद में उलट सकता है ?
+
+### 3. केस स्टडी
+
+इन नैतिक चुनौतियों को वास्तविक दुनिया के संदर्भों में रखने के लिए, ऐसे मामलों के अध्ययन को देखने में मदद मिलती है जो व्यक्तियों और समाज को संभावित नुकसान और परिणामों को उजागर करते हैं, जब ऐसे नैतिकता उल्लंघनों की अनदेखी की जाती है ।
+
+कुछ उदाहरण निम्नलिखित हैं :
+
+| नैतिकता चुनौती | मामले का अध्ययन |
+|--- |--- |
+| **सूचित सहमति** | १९७२ - [टस्केगी सिफलिस अध्ययन](https://en.wikipedia.org/wiki/Tuskegee_Syphilis_Study) - अध्ययन में भाग लेने वाले अफ्रीकी अमेरिकी पुरुषों को उन शोधकर्ताओं द्वारा मुफ्त चिकित्सा देखभाल का वादा किया गया था जो उनके निदान या उपचार की उपलब्धता के बारे में विषयों को सूचित करने में विफल रहे। कई विषयों की मृत्यु हो गई और साथी या बच्चे प्रभावित हुए; अध्ययन 40 साल तक चला । |
+| **डाटा प्राइवेसी** | २००७ - [नेटफ्लिक्स डेटा प्राइज](https://www.wired.com/2007/12/why-anonymous-data-only-isnt/) ने शोधकर्ताओं को सिफारिश एल्गोरिदम को बेहतर बनाने में मदद करने के लिए 50K ग्राहकों_ से _10M अनाम मूवी रैंकिंग प्रदान की। हालांकि, शोधकर्ता अज्ञात डेटा को व्यक्तिगत रूप से पहचाने जाने योग्य डेटा के साथ _बाहरी डेटासेट_ (उदाहरण के लिए, IMDb टिप्पणियों) में सहसंबंधित करने में सक्षम थे - कुछ नेटफ्लिक्स ग्राहकों को प्रभावी रूप से "डी-अनामीकरण" ।|
+| **संग्रह पूर्वाग्रह** | २०१३ - द सिटी ऑफ़ बोस्टन [विकसित स्ट्रीट बम्प](https://www.boston.gov/transportation/street-bump), एक ऐप जो नागरिकों को गड्ढों की रिपोर्ट करने देता है, जिससे शहर को समस्याओं को खोजने और ठीक करने के लिए बेहतर रोडवे डेटा मिलता है । हालांकि, [निम्न आय वर्ग के लोगों के पास कारों और फोन तक कम पहुंच थी](https://hbr.org/2013/04/the-hidden-biases-in-big-data), जिससे इस ऐप में उनके सड़क संबंधी मुद्दे अदृश्य हो गए थे। . डेवलपर्स ने शिक्षाविदों के साथ निष्पक्षता के लिए _न्यायसंगत पहुंच और डिजिटल विभाजन_ मुद्दों पर काम किया । |
+| **एल्गोरिथम निष्पक्षता** | २०१८ - एमआईटी [जेंडर शेड्स स्टडी] (http://gendershades.org/overview.html) ने लिंग वर्गीकरण एआई उत्पादों की सटीकता का मूल्यांकन किया, महिलाओं और रंग के व्यक्तियों के लिए सटीकता में अंतराल को उजागर किया । एक [2019 ऐप्पल कार्ड](https://www.wired.com/story/the-apple-card-didnt-see-genderand-thats-the-problem/) पुरुषों की तुलना में महिलाओं को कम क्रेडिट प्रदान करता है। दोनों ने एल्गोरिथम पूर्वाग्रह में सचित्र मुद्दों को सामाजिक-आर्थिक नुकसान की ओर अग्रसर किया ।|
+| **डेटा गलत बयानी** | २०२० - [जॉर्जिया डिपार्टमेंट ऑफ पब्लिक हेल्थ ने जारी किया COVID-19 चार्ट](https://www.vox.com/covid-19-coronavirus-us-response-trump/2020/5/18/21262265/georgia-covid-19-cases-declining-reopening) जो एक्स-अक्ष पर गैर-कालानुक्रमिक क्रम के साथ पुष्टि किए गए मामलों में रुझानों के बारे में नागरिकों को गुमराह करने के लिए प्रकट हुए। यह विज़ुअलाइज़ेशन ट्रिक्स के माध्यम से गलत बयानी दिखाता है । |
+| **स्वतंत्र चुनाव का भ्रम** | २०२० - लर्निंग ऐप [एबीसीमाउस ने एफटीसी शिकायत को निपटाने के लिए 10 मिलियन डॉलर का भुगतान किया](https://www.washingtonpost.com/business/2020/09/04/abcmouse-10-million-ftc-settlement/) जहां माता-पिता भुगतान करने में फंस गए थे सदस्यता वे रद्द नहीं कर सके । यह पसंद वास्तुकला में काले पैटर्न को दिखाता है, जहां उपयोगकर्ता संभावित रूप से हानिकारक विकल्पों की ओर झुकाव कर रहे थे । |
+| **डेटा गोपनीयता और उपयोगकर्ता अधिकार** | २०२१ - फेसबुक [डेटा ब्रीच](https://www.npr.org/2021/04/09/986005820/after-data-breach-exposes-530-million-facebook-says-it-will-not-notify-users) 530M उपयोगकर्ताओं के डेटा को उजागर किया, जिसके परिणामस्वरूप FTC को $ 5B का समझौता हुआ । हालांकि इसने डेटा पारदर्शिता और पहुंच के आसपास उपयोगकर्ता अधिकारों का उल्लंघन करने वाले उल्लंघन के उपयोगकर्ताओं को सूचित करने से इनकार कर दिया । |
+
+अधिक केस स्टडी के बारे में चाहते हैं ? इन संसाधनों की जाँच करें :
+* [Ethics Unwrapped](https://ethicsunwrapped.utexas.edu/case-studies) - विविध उद्योगों में नैतिकता की दुविधा ।
+* [Data Science Ethics course](https://www.coursera.org/learn/data-science-ethics#syllabus) - ऐतिहासिक मामले का अध्ययन ।
+* [Where things have gone wrong](https://deon.drivendata.org/examples/) - उदाहरण के साथ डीओन चेकलिस्ट |
+
+> 🚨 आपके द्वारा देखी गई केस स्टडी के बारे में सोचें - क्या आपने अपने जीवन में इसी तरह की नैतिक चुनौती का अनुभव किया है, या इससे प्रभावित हुए हैं ? क्या आप कम से कम एक अन्य केस स्टडी के बारे में सोच सकते हैं जो इस खंड में चर्चा की गई नैतिक चुनौतियों में से एक को दर्शाती है ?
+
+## एप्लाइड नैतिकता
+
+हमने वास्तविक दुनिया के संदर्भों में नैतिक अवधारणाओं, चुनौतियों और केस स्टडी के बारे में बात की है। लेकिन हम अपनी परियोजनाओं में नैतिक सिद्धांतों और प्रथाओं को _लागू करना_ कैसे शुरू करते हैं ? और हम बेहतर शासन के लिए इन प्रथाओं को कैसे _संचालन_कृत करते हैं ? आइए कुछ वास्तविक दुनिया के समाधान देखें :
+
+### 1. व्यावसायिक कोड
+
+व्यावसायिक कोड संगठनों के लिए सदस्यों को उनके नैतिक सिद्धांतों और मिशन वक्तव्य का समर्थन करने के लिए "प्रोत्साहित" करने के लिए एक विकल्प प्रदान करते हैं । पेशेवर व्यवहार के लिए कोड _नैतिक दिशानिर्देश_ हैं, जो कर्मचारियों या सदस्यों को उनके संगठन के सिद्धांतों के अनुरूप निर्णय लेने में मदद करते हैं । वे केवल उतने ही अच्छे हैं जितने सदस्यों से स्वैच्छिक अनुपालन; हालांकि, कई संगठन सदस्यों से अनुपालन को प्रेरित करने के लिए अतिरिक्त पुरस्कार और दंड प्रदान करते हैं ।
+
+उदाहरणों में शामिल :
+
+ * [ऑक्सफोर्ड म्यूनिख](http://www.code-of-ethics.org/code-of-conduct/) आचार संहिता
+ * [डेटा साइंस एसोसिएशन](http://datascienceassn.org/code-of-conduct.html) आचार संहिता (2013 में बनाया गया)
+ * [एसीएम आचार संहिता और व्यावसायिक आचरण](https://www.acm.org/code-of-ethics) (1993 से)
+
+> 🚨 क्या आप एक पेशेवर इंजीनियरिंग या डेटा विज्ञान संगठन से संबंधित हैं ? यह देखने के लिए कि क्या वे पेशेवर आचार संहिता को परिभाषित करते हैं, उनकी साइट का अन्वेषण करें । यह उनके नैतिक सिद्धांतों के बारे में क्या कहता है ? वे सदस्यों को कोड का पालन करने के लिए "प्रोत्साहित" कैसे कर रहे हैं ?
+
+### 2. Ethics Checklists
+
+जबकि पेशेवर कोड चिकित्सकों से आवश्यक _नैतिक व्यवहार_ को परिभाषित करते हैं, वे प्रवर्तन में [विशेष रूप से बड़े पैमाने पर परियोजनाओं में](https://resources.oreilly.com/examples/0636920203964/blob/master/of_oaths_and_checklists.md) [ज्ञात सीमाएं हैं] । इसके बजाय, कई डेटा विज्ञान विशेषज्ञ [चेकलिस्ट के वकील](https://resources.oreilly.com/examples/0636920203964/blob/master/of_oaths_and_checklists.md), जो **सिद्धांतों को अभ्यासों से जोड़ सकते हैं** अधिक नियतात्मक और कार्रवाई योग्य तरीके ।
+
+चेकलिस्ट प्रश्नों को "हां/नहीं" कार्यों में परिवर्तित करते हैं जिन्हें संचालित किया जा सकता है, जिससे उन्हें मानक उत्पाद रिलीज वर्कफ़्लो के हिस्से के रूप में ट्रैक किया जा सकता है ।
+
+उदाहरणों में शामिल :
+ * [Deon](https://deon.drivendata.org/) - आसान एकीकरण के लिए कमांड-लाइन टूल के साथ [उद्योग अनुशंसाओं](https://deon.drivedata.org/#checklist-citations) से बनाई गई एक सामान्य-उद्देश्य डेटा नैतिकता चेकलिस्ट ।
+ * [Privacy Audit Checklist](https://cyber.harvard.edu/ecommerce/privacyaudit.html) - कानूनी और सामाजिक जोखिम के दृष्टिकोण से सूचना प्रबंधन प्रथाओं के लिए सामान्य मार्गदर्शन प्रदान करता है ।
+ * [AI Fairness Checklist](https://www.microsoft.com/en-us/research/project/ai-fairness-checklist/) - एआई विकास चक्रों में निष्पक्षता जांच को अपनाने और एकीकरण का समर्थन करने के लिए एआई चिकित्सकों द्वारा बनाया गया ।
+ * [22 questions for ethics in data and AI](https://medium.com/the-organization/22-questions-for-ethics-in-data-and-ai-efb68fd19429) - डिजाइन, कार्यान्वयन, और संगठनात्मक, संदर्भों में नैतिक मुद्दों की प्रारंभिक खोज के लिए संरचित, अधिक खुला ढांचा ।
+
+### 3. नैतिकता विनियम
+
+नैतिकता साझा मूल्यों को परिभाषित करने और _स्वेच्छा_ से सही काम करने के बारे में है । **अनुपालन** _कानून का पालन करने के बारे में है_ यदि और जहां परिभाषित किया गया है । **शासन** मोटे तौर पर उन सभी तरीकों को शामिल करता है जिनमें संगठन नैतिक सिद्धांतों को लागू करने और स्थापित कानूनों का पालन करने के लिए काम करते हैं ।
+
+आज, संगठनों के भीतर शासन दो रूप लेता है । सबसे पहले, यह **नैतिक एआई** सिद्धांतों को परिभाषित करने और संगठन में सभी एआई-संबंधित परियोजनाओं में गोद लेने के संचालन के लिए प्रथाओं को स्थापित करने के बारे में है । दूसरा, यह उन क्षेत्रों के लिए सरकार द्वारा अनिवार्य सभी **डेटा सुरक्षा नियमों** का अनुपालन करने के बारे में है जहां यह संचालित होता है ।
+
+डेटा सुरक्षा और गोपनीयता नियमों के उदाहरण :
+
+ * `१९७४`, [US Privacy Act](https://www.justice.gov/opcl/privacy-act-1974) - व्यक्तिगत जानकारी के संग्रह, उपयोग और प्रकटीकरण को नियंत्रित करता है ।
+ * `१९९६`, [US Health Insurance Portability & Accountability Act (HIPAA)](https://www.cdc.gov/phlp/publications/topic/hipaa.html) - व्यक्तिगत स्वास्थ्य डेटा की सुरक्षा करता है ।
+ * `१९९८`, [US Children's Online Privacy Protection Act (COPPA)](https://www.ftc.gov/enforcement/rules/rulemaking-regulatory-reform-proceedings/childrens-online-privacy-protection-rule) - 13 साल से कम उम्र के बच्चों की डेटा गोपनीयता की रक्षा करता है ।
+ * `२०१८`, [General Data Protection Regulation (GDPR)](https://gdpr-info.eu/) - उपयोगकर्ता अधिकार, डेटा सुरक्षा और गोपनीयता प्रदान करता है ।
+ * `२०१८`, [California Consumer Privacy Act (CCPA)](https://www.oag.ca.gov/privacy/ccpa) उपभोक्ताओं को उनके (व्यक्तिगत) डेटा पर अधिक _अधिकार_ देता है ।
+ * `२०२१`, चीन का [Personal Information Protection Law](https://www.reuters.com/world/china/china-passes-new-personal-data-privacy-law-take-effect-nov-1-2021-08-20/) अभी-अभी पारित हुआ, दुनिया भर में सबसे मजबूत ऑनलाइन डेटा गोपनीयता नियमों में से एक बना ।
+
+> 🚨 यूरोपीय संघ परिभाषित GDPR (जनरल डेटा प्रोटेक्शन रेगुलेशन) आज सबसे प्रभावशाली डेटा गोपनीयता नियमों में से एक है । क्या आप जानते हैं कि यह नागरिकों की डिजिटल गोपनीयता और व्यक्तिगत डेटा की सुरक्षा के लिए [8 उपयोगकर्ता अधिकार](https://www.freeprivacypolicy.com/blog/8-user-rights-gdpr) को भी परिभाषित करता है ? जानें कि ये क्या हैं, और क्यों मायने रखते हैं ।
+
+
+### 4. नैतिकता संस्कृति
+
+ध्यान दें कि _अनुपालन_ ("कानून के पत्र को पूरा करने के लिए पर्याप्त प्रयास करना") और [प्रणालीगत मुद्दों](https://www.coursera.org/learn/data-science-ethics/home/week) को संबोधित करने के बीच एक अमूर्त अंतर है । / 4) (जैसे ossification, सूचना विषमता, और वितरण संबंधी अनुचितता) जो AI के शस्त्रीकरण को गति दे सकता है ।
+
+बाद वाले को [नैतिक संस्कृतियों को परिभाषित करने के लिए सहयोगात्मक दृष्टिकोण](https://towardsdatascience.com/why-ai-ethics-requires-a-culture-drive-approach-26f451afa29f) की आवश्यकता होती है, जो पूरे संगठनों में भावनात्मक संबंध और सुसंगत साझा मूल्यों का निर्माण करते हैं । यह संगठनों में अधिक [औपचारिक डेटा नैतिकता संस्कृतियों](https://www.codeforamerica.org/news/formalizing-an-ethical-data-culture/) की मांग करता है - _किसी_ को [एंडोन कॉर्ड को खींचने] की अनुमति देता है (https:/ /en.wikipedia.org/wiki/Andon_(manufacturing)) (इस प्रक्रिया में नैतिकता संबंधी चिंताओं को जल्दी उठाने के लिए) और एआई परियोजनाओं में _नैतिक मूल्यांकन_ (उदाहरण के लिए, भर्ती में) एक मुख्य मानदंड टीम गठन करना ।
+
+---
+## [व्याख्यान के बाद प्रश्नोत्तरी](https://red-water-0103e7a0f.azurestaticapps.net/quiz/3) 🎯
+## समीक्षा और स्व अध्ययन
+
+पाठ्यक्रम और पुस्तकें मूल नैतिकता अवधारणाओं और चुनौतियों को समझने में मदद करती हैं, जबकि केस स्टडी और उपकरण वास्तविक दुनिया के संदर्भों में लागू नैतिकता प्रथाओं के साथ मदद करते हैं। शुरू करने के लिए यहां कुछ संसाधन दिए गए हैं।
+
+* [Machine Learning For Beginners](https://github.com/microsoft/ML-For-Beginners/blob/main/1-Introduction/3-fairness/README.md) - Microsoft से निष्पक्षता पर पाठ ।
+* [Principles of Responsible AI](https://docs.microsoft.com/en-us/learn/modules/responsible-ai-principles/) - माइक्रोसॉफ्ट लर्न की ओर से फ्री लर्निंग पाथ ।
+* [Ethics and Data Science](https://resources.oreilly.com/examples/0636920203964) - O'Reilly EBook (M. Loukides, H. Mason et. al)
+* [Data Science Ethics](https://www.coursera.org/learn/data-science-ethics#syllabus) - मिशिगन विश्वविद्यालय से ऑनलाइन पाठ्यक्रम ।
+* [Ethics Unwrapped](https://ethicsunwrapped.utexas.edu/case-studies) - टेक्सास विश्वविद्यालय से केस स्टडीज ।
+
+# कार्यभार
+
+[डेटा एथिक्स केस स्टडी लिखें](assignment.md)
From b3a86d9ffaeb14189336b3593d45c6252dc2f257 Mon Sep 17 00:00:00 2001
From: Heril Changwal <76246330+Heril18@users.noreply.github.com>
Date: Tue, 12 Oct 2021 10:01:07 +0530
Subject: [PATCH 100/319] Update README.hi.md
All Links Fixed.
The link : [Microsoft Word - Persuasive Instructions.doc (tpsnva.org)](https://www.tpsnva.org/teach/lq/016/persinstr.pdf) is not working even in the original Readme file.
So please have a look to this Link.
---
.../16-communication/translations/README.hi.md | 11 +++++------
1 file changed, 5 insertions(+), 6 deletions(-)
diff --git a/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md b/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md
index 85ae92d4..681d2923 100644
--- a/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md
+++ b/4-Data-Science-Lifecycle/16-communication/translations/README.hi.md
@@ -169,10 +169,9 @@
## [व्याख्यान के बाद प्रश्नोत्तरी](https://red-water-0103e7a0f.azurestaticapps.net/quiz/31)
### स्व अध्ययन के लिए अनुशंसित संसाधन
-[द फाइव सी ऑफ़ स्टोरीटेलिंग - आर्टिक्यूलेट पर्सुएशन](http://articlepersuasion.com/the-five-cs-of-storytelling/)
-
-[१.४ एक संचारक के रूप में आपकी जिम्मेदारियां - सफलता के लिए व्यावसायिक संचार (umn.edu)](https://open.lib.umn.edu/businesscommunication/chapter/1-4-your-responsibility-as-a-communicator/)
+[द फाइव सी ऑफ़ स्टोरीटेलिंग - आर्टिक्यूलेट पर्सुएशन](http://articulatepersuasion.com/the-five-cs-of-storytelling/)
+[१.४ एक संचारक के रूप में आपकी जिम्मेदारियां - सफलता के लिए व्यावसायिक संचार (umn.edu)](https://open.lib.umn.edu/businesscommunication/chapter/1-4-your-responsibilities-as-a-communicator/)
[डेटा के साथ कहानी कैसे सुनाएं (hbr.org)](https://hbr.org/2013/04/how-to-tell-a-story-with-data)
[टू-वे कम्युनिकेशन: अधिक व्यस्त कार्यस्थल के लिए 4 टिप्स (yourthoughtpartner.com)](https://www.yourthoughtpartner.com/blog/bid/59576/4-steps-to-increase-employee-engagement-through-two-way-communication)
@@ -181,7 +180,7 @@
[डेटा के साथ कहानी कैसे सुनाएं | ल्यूसिडचार्ट ब्लॉग](https://www.lucidchart.com/blog/how-to-tell-a-story-with-data)
-[6 Cs ऑफ़ इफेक्टिव स्टोरीटेलिंग ऑन सोशल मीडिया | कूलर इनसाइट्स](https://coolerinsights.com/2018/06/efffect-storytelling-social-media/)
+[6 Cs ऑफ़ इफेक्टिव स्टोरीटेलिंग ऑन सोशल मीडिया | कूलर इनसाइट्स](https://coolerinsights.com/2018/06/effective-storytelling-social-media/)
[प्रस्तुतिकरण में भावनाओं का महत्व | Ethos3 - एक प्रस्तुति प्रशिक्षण और डिजाइन एजेंसी](https://ethos3.com/2015/02/the-importance-of-emotions-in-presentations/)
@@ -193,7 +192,7 @@
[डेटा कैसे प्रस्तुत करें [१० विशेषज्ञ युक्तियाँ] | ऑब्जर्वप्वाइंट](https://resources.observepoint.com/blog/10-tips-for-presenting-data)
-[माइक्रोसॉफ्ट वर्ड - प्रेरक निर्देश.doc (tpsnva.org)](https://www.tpsnva.org/teach/lq/016/persinstr.pdf)
+[Microsoft Word - Persuasive Instructions.doc (tpsnva.org)](https://www.tpsnva.org/teach/lq/016/persinstr.pdf)
[द पावर ऑफ स्टोरी फॉर योर डेटा (थिंकहडी.कॉम)](https://www.thinkhdi.com/library/supportworld/2019/power-story-your-data.aspx)
@@ -209,4 +208,4 @@
## कार्यभार
-[एक कहानी बताओ](assignment.md)
+[एक कहानी बताओ](../assignment.md)
From f111820331016bd613d680c365f61c9b0fdc8401 Mon Sep 17 00:00:00 2001
From: Kaushal Joshi <53049546+joshi-kaushal@users.noreply.github.com>
Date: Tue, 12 Oct 2021 18:24:32 +0530
Subject: [PATCH 101/319] Updared README.hi.md - 3
---
.../translations/README.hi.md | 61 ++++++++++++++++---
1 file changed, 54 insertions(+), 7 deletions(-)
diff --git a/1-Introduction/03-defining-data/translations/README.hi.md b/1-Introduction/03-defining-data/translations/README.hi.md
index 49c16833..e4f41811 100644
--- a/1-Introduction/03-defining-data/translations/README.hi.md
+++ b/1-Introduction/03-defining-data/translations/README.hi.md
@@ -1,22 +1,69 @@
# डेटा का अवलोकन
-| ](../../../sketchnotes/03-DefiningData.png)|
+| ](../../../sketchnotes/03-DefiningData.png)|
|:---:|
-|Defining Data - _Sketchnote by [@nitya](https://twitter।com/nitya)_ |
+|डेटा का अवलोकन - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
डेटा मतलब तथ्य, माहिती और अनुभव है जिनका इस्तमाल करके नए खोज और सूचित निर्णयोंका समर्थन किया जाता है।
-डेटा पॉइंट यह डेटासेट का सबसे छोटा प्रमाण है। डेटासेट यह एक डेटा पॉइंट्स का बड़ा संग्रह होता है। डेटासेट बोहोत सारे अलगअलग प्रकार और स्ट्रक्चर का होता है, और बोहोत बार किसी स्त्रोतपे आधारित होता है। उदाहरण के लिए, किसी कम्पनी की कमाई स्प्रेडशीट मैं सेव्ह की हो सकती है मगर प्रति घंटे के दिल की धकड़न की गति [JSON](https://stackoverflow।com/questions/383692/what-is-json-and-what-is-it-used-for/383699#383699) रूप मैं हो सकती है। डेटा वैज्ञानिकोकेलिए अलग अलग प्रकार के डेटा और डेटासेट के साथ काम करना आम बात होती है।
+डेटा पॉइंट यह डेटासेट का सबसे छोटा प्रमाण है। डेटासेट यह एक डेटा पॉइंट्स का बड़ा संग्रह होता है। डेटासेट बोहोत सारे अलगअलग प्रकार और संरचनाका होता है, और बोहोत बार किसी स्त्रोतपे आधारित होता है। उदाहरण के लिए, किसी कम्पनी की कमाई स्प्रेडशीट मैं जतन की हो सकती है मगर प्रति घंटे के दिल की धकड़न की गति [JSON](https://stackoverflow।com/questions/383692/what-is-json-and-what-is-it-used-for/383699#383699) रूप मैं हो सकती है। डेटा वैज्ञानिकोकेलिए अलग अलग प्रकार के डेटा और डेटासेट के साथ काम करना आम बात होती है।
यह पाठ डेटा को उसके स्त्रोत के हिसाब से पहचानने और वर्गीकृत करने पे केंद्रित है।
## [पाठ के पाहिले की परीक्षा](https://red-water-0103e7a0f.azurestaticapps.net/quiz/4)
## डेटा का वर्णन कैसे किया जाता है
-**अपक्व डेटा** ऐसे प्रकार का डेटा होता जो उसके स्त्रोत से आते वक्त जिस अवस्था मैं था वैसे ही है और उसका विश्लेषण या वर्गीकरण नहीं किया गया है। ऐसे डेटासेट से जरूरी जानकारी निकलने के लिए उसे ऐसे प्रकार मे लाना आवश्यक है जो इंसान समज सके और जिस टैकनोलजीका उपयोग डेटा के विश्लेषण मे किया जाएगा उसको भी समज आये। डेटाबेस का स्ट्रक्चर हमे बताती है की डेटा किस प्रकार से वर्गीकृत किया गया है और उसका वर्गीकरण कैसे किया जाता है। डेटा का वर्गीकरण संरचित, मिश्र संरचित और असंरचित प्रकार मै किया जा सकता है। संरचना के प्रकार डेटा के स्त्रोत के अनुसार बदल सकते है मगर आखिर मै इन तीनो मैं से एक प्रकार के हो सकते है।
+**अपक्व डेटा** ऐसे प्रकार का डेटा होता जो उसके स्त्रोत से आते वक्त जिस अवस्था मैं था वैसे ही है और उसका विश्लेषण या वर्गीकरण नहीं किया गया है। ऐसे डेटासेट से जरूरी जानकारी निकलने के लिए उसे ऐसे प्रकार मे लाना आवश्यक है जो इंसान समज सके और जिस टैकनोलजीका उपयोग डेटा के विश्लेषण मे किया जाएगा उसको भी समज आये। डेटाबेसकी संरचना हमे बताती है की डेटा किस प्रकार से वर्गीकृत किया गया है और उसका संरचित, मिश्र संरचित और असंरचित प्रकार मै वर्गीकरण कैसे किया जाता है। संरचना के प्रकार डेटा के स्त्रोत के अनुसार बदल सकते है मगर आखिर मै इन तीनो मैं से एक प्रकार के हो सकते है।
### परिमाणात्मक डेटा
-परिमाणात्मक डेटा मतलब डेटासेट मे उपलब्ध होने वाला ऐसा संख्यात्मक डेटा जिसका इस्तमाल विश्लेषण,मापन और गणितीय चीजोंकेलिए हो सकता है। परिमाणात्मक डेटा के यह कुछ उदाहरण है: देश की जनसंख्या, इंसान की ऊंचाई या कंपनी की तिमाही कमाई। थोड़े अधिक विश्लेषण के बाद परिणामात्मक डेटा से मौसम के अनुसार वायु गुणवत्ता सूचकांक(Air Quality Index) के बदलाव पता करना या फिर किसी सामान्य कार्यदिवस पर भीड़भाड़ वाले घंटे के ट्रैफिक की संभावना का अनुमान लगना मुमकिन है
+परिमाणात्मक डेटा मतलब डेटासेट मे उपलब्ध होने वाला ऐसा संख्यात्मक डेटा जिसका इस्तमाल विश्लेषण,मापन और गणितीय चीजोंकेलिए हो सकता है। परिमाणात्मक डेटा के यह कुछ उदाहरण है: देश की जनसंख्या, इंसान की ऊंचाई या कंपनी की तिमाही कमाई। थोड़े अधिक विश्लेषण के बाद परिणामात्मक डेटा से मौसम के अनुसार वायु गुणवत्ता सूचकांक(Air Quality Index) के बदलाव पता करना या फिर किसी सामान्य कार्यदिवस पर भीड़भाड़ वाले घंटे के ट्रैफिक की संभावना का अनुमान लगना मुमकिन है.
### गुणात्मक डेटा
-गुणात्मक डेटा, जिसे वर्गीकृत देता भी कहा जाता है, यह एक डेटा का ऐसा प्रकार है जिसे परिमाणात्मक डेटा की तरह वस्तुनिष्ठ तरहसे मापा नहीं जा सकता। यह आम तौर पर अलग अलग प्रकार का आत्मनिष्ठ डेटा होता है जिस से किसी उत्पादन या प्रक्रिया की गुणवत्ता। कभी कभार गुणात्मक डेटा सांखिक स्वरुपमैं होके भी गणितीय कारणों के लिए इस्तमल नहीं किया जा सकता, जैसे की फोन नंबर या समय।
-गुणात्मक डेटा के यह कुछ उदाहरण हो सकते है: विडिओकी टिप्पणियाँ, आपके करीबी दोस्त के गाड़ी के पसंदिता रंग का नमूना बनाना। गुणात्मक डेटा का इस्तमाल करके ग्राहकोंको कोनसा उत्पादन सबसे ज्यादा पसंद आ रहा है या फिर नौकरी आवेदन के रिज्यूमे मैं सबसे ज्यादा इस्तमाल होने वाले शब्द ढूंढ़ना।
+गुणात्मक डेटा, जिसे वर्गीकृत डेटा भी कहा जाता है, यह एक डेटा का ऐसा प्रकार है जिसे परिमाणात्मक डेटा की तरह वस्तुनिष्ठ तरहसे नापा नहीं जा सकता। यह आम तौर पर अलग अलग प्रकार का आत्मनिष्ठ डेटा होता है जिस से किसी उत्पादन या प्रक्रिया की गुणवत्ता। कभी कभार गुणात्मक डेटा सांखिक स्वरुपमैं होके भी गणितीय कारणों के लिए इस्तमल नहीं किया जा सकता, जैसे की फोन नंबर या समय। गुणात्मक डेटा के यह कुछ उदाहरण हो सकते है: विडिओकी टिप्पणियाँ, आपके करीबी दोस्त के गाड़ी के पसंदिता रंग का नमूना बनाना। गुणात्मक डेटा का इस्तमाल करके ग्राहकोंको कोनसा उत्पादन सबसे ज्यादा पसंद आ रहा है या फिर नौकरी आवेदन के रिज्यूमे मैं सबसे ज्यादा इस्तमाल होने वाले शब्द ढूंढ़ना।
+
+### संरचित डेटा
+संरचित डेटा वह डेटा है जो पंक्तियों और स्तंभों में संगठित होता है, जिसके हर पंक्तिमे समान स्तंभ होते है. हर स्तंभ एक विशिष्ट प्रकार के मूल्य को बताता है और उस मूल्यको दर्शाने वाले अनाम के साथ जाना जाता है. जबकी पंक्तियाँ मे वास्तविक मूल्य होते है। हर मूल्य सही स्तम्भ का प्रतिनिधित्व करते है की नहीं ये निश्चित करने के लिए स्तंभमे अक्सर मूल्यों पर नियमोंका प्रतिबन्ध लगा रहता है. उदाहरणार्थ कल्पना कीजिये ग्राहकोंकी जानकारी होने वाला एक स्प्रेडशीट फ़ाइल जीके हर पंक्तिमे फोन क्रमांक होना जरुरी है और फोन क्रमांकमे कभीभी वर्ण या व्यंजन नहीं रहते। तो फिर फोन क्रमांक के स्तंभ पर ऐसा नियम लगा होना चाहिए जिससे ये निश्चित हो की बह कभीभी खाली नहीं है और उसमें सिर्फ आकड़े ही है.
+
+सरंचित डेटा का यह फायदा है की उसे स्तंभ और पंक्तियोंमे वर्गीकृत किया जा सकता है. तथापि, डेटा को एक विशिष्ट प्रकार मै वर्गीकृत करने के लिए आयोजित किये जाने के बजह से पुरे संरचना मे बदल करना बोहोत मुश्किल का काम होता है. जैसे की ग्राहकोंके जानकारी वाले स्प्रेडशीट फ़ाइलमे अगर हमें ईमेल आयडी खाली ना होने वाला नया स्तंभ जोड़ना हो तो हमे ये पता करना होगा की पहिलेसे जो मूल्य इस डेटासेट मे है उनका क्या होगा.
+
+संरचित डेटा के यह कुछ उदाहरण है: स्प्रेडशीट, रिलेशनल डेटाबेस, फोन नंबर, बैंक स्टेटमेंट
+
+### असंरचित डेटा
+असंरचित डेटा आम तौर पर स्तंभ और पांक्तोयोंमे वर्गीकृत नहीं किया जा सकता और किसी नियमोंसे बंधित भी नहीं रहता। संरचित डेटा के तुलना से असंरचित डेटा मैं कम नियम होने के कारण उसमे नया डेटा डालना बोहोत आसान होता है। अगर कोही सेंसर जो तापमानमापक के प्रेशर का दबाव हर दो मिनिट बाद दर्ज करता है, जिसके बजह से वह तापमान को मापके दर्ज कर सकता है, तो उसे असंरचित डेटा होने के कारण डेटाबेसमे पहिलेसे होने वाले डेटा को बदलने की आवश्यकता नहीं है। तथापि, ऐसे डेटा का विश्लेषण और जांच करने को ज्यादा समय लग सकता है।
+जैसे की, एक वैज्ञानिक जिसे सेंसर के डेटा से पिछले महीने के तापमान का औसत ढूंढ़ना हो, मगर वो देखता है की सेंसर ने कुछ जगह आधेअधूरे डेटा को दर्ज करने के लिए आम क्रमांक के बजाय 'e' दर्ज किया है, जिसका मतलब है की डेटा अधूरा है।
+असंरचित डेटा के उदाहरण: टेक्स्ट फ़ाइलें, टेक्स्ट मेसेजेस, विडिओ फ़ाइलें।
+
+### मिश्र संरचित डेटा
+
+मिश्र संरचित डेटा के ऐसे कुछ गन है जिसके बजह से उसे संरचित और असंरचित डेटा का मिश्रण कहा जा सकता है. वह हमेशा स्तंभ और पंक्तियोंके अनुरूप नहीं रहता मगर ऐसे तरह संयोजित किया गया होता है की उसे संरचित कहा जा सकता है और शायद किसी ठराविक नियमोंका पालन भी कर है। डेटा की संरचना उसके स्त्रोत के ऊपर निर्भर होती है जैसे की स्पष्ट अनुक्रम या फिर थोडा लचीला जिसमे नया डेटा जोड़ना आसान हो. मेटाडाटा ऐसे संकेतक होते है जिनसे डेटा का संयोजन और संग्रहीत करने मे सहायता होती है, और उन्हें डेटा के प्रकार के अनुरूप नाम भी दिए जा सकते है. मेटाडेटाके आम उदाहरण है: टैग्स, एलिमेंट्स, एंटिटीज और एट्रीब्यूट्स. उदाहरणार्थ: एक सामान्य ईमेल को उसका विषय, मायना, और प्राप्तकर्ताओंकी सूची होगी और किसको कब भेजना है उसके हिसाब से संयोजित किया जा सकता है।
+
+मिश्र संरचित डेटा के उदाहरण: एचटीएमएल, सीइसव्ही फाइलें, जेसन(JSON)
+
+### डेटा के स्त्रोत
+
+डेटा का स्त्रोत मतलब मतलब वो जगह जहाँ डेटा सबसे पहिली बार निर्माण हुवा था, और हमेशा कहा और कब जमा किया था इसके ऊपर आधारित रहेगा। उपयोगकर्ताके द्वारा निर्माण किये हुवे डेटा को प्रार्थमिक डेटा के नाम से पहचाना जाता है जबकि गौण डेटा ऐसे स्त्रोत से आता है जिसने सामान्य काम के लिए डेटा जमा किया था। उदाहरण के लिए, वैज्ञानिकों का समूह वर्षावनमे टिप्पणियों और सूचि कमा कर रहे है तो वोप्रार्थमिक डेटा होगा और अगर उन्होंने उस डेटा को बाकि के वैज्ञनिकोके साथ बाँटना चाहा तो वो गौण डेटा कहलाया जायेगा।
+
+## डेटा के स्त्रोत
+
+डेटा का स्त्रोत मतलब मतलब वो जगह जहाँ डेटा सबसे पहिली बार निर्माण हुवा था, और हमेशा कहा और कब जमा किया था इसके ऊपर आधारित रहेगा। उपयोगकर्ताके द्वारा निर्माण किये हुवे डेटा को प्रार्थमिक डेटा के नाम से पहचाना जाता है जबकि गौण डेटा ऐसे स्त्रोत से आता है जिसने सामान्य काम के लिए डेटा जमा किया था। उदाहरण के लिए, वैज्ञानिकों का समूह वर्षावनमे टिप्पणियों और सूचि कमा कर रहे है तो वोप्रार्थमिक डेटा होगा और अगर उन्होंने उस डेटा को बाकि के वैज्ञनिकोके साथ बाँटना चाहा तो वो गौण डेटा कहलाया जायेगा।
+
+डेटाबेस यह एक सामान्य स्त्रोत है और वह होस्टिंग और डेटाबेस मेंटेनन्स सिस्टिम निर्भर होता है। डेटाबेस मेंटेनन्स सिस्टिममे उपयोगकर्ता कमांड्स जिन्हें क्वेरीज़ कहा जाता है इस्तमाल करके डेटाबेस का डेटा खोज सकते है. डेटा स्त्रोत फ़ाइल स्वरुप मे हो तो आवाज, चित्र, वीडियो, स्प्रेडशीट ऐसे प्रकार मे हो सकता है। आंतरजाल के स्त्रोत डेटा होस्ट करने के बोहोत आम तरीका है। यहां डेटाबेस तथा फाइलें खोजी जा सकती है। अप्लीकेशन प्रोगरामिंग इंटरफेस, जिन्हे 'एपीआय'(API) के नाम से जाना जाता है, उसकी मद्त से प्रोग्रामर्स डेटाको बहार के उपयोगकर्ताओको आंतरजाल द्वारा इस्तमाल करने के लिए भेज सकते है. जबकि वेब स्क्रैपिंग नामक प्रक्रियासे आंतरजाल के वेब पेज का डेटा अलग किया जा सकता है. [डेटा के साथ काम करना](https://github.com/microsoft/Data-Science-For-Beginners/tree/main/2-Working-With-Data) यह पथ अलग अलग डेटा का इस्तमाल करनेपर ध्यान देता है..
+
+## निष्कर्ष
+यह पथ मे हमने पढ़ा की:
+- डेटा क्या होता है
+- डेटा का वर्णन कैसे किया जाता है
+- डेटा का वर्गीकरण कैसे किया जाता है
+- डेटा कहा मिलता है
+
+## 🚀 चुनौती
+Kaggle यह के मुफ्त के डेटाबेस का बोहोत अच्छा स्त्रोत है. [सर्च टूल ](https://www.kaggle.com/datasets) का इस्तमाल करके कुछ मजेदार डेटासेट ढूंढे और उनमेसे तीन-चार डेटाबेस का ऐसे वर्गीकरण कीजिए:
+- डेटा परिमाणात्मक है या गुणात्मक?
+- डेटा संरचित, असंरचित या फिर मिश्र संरचित है?
+
+## [पाठ के बाद वाली परीक्षा](https://red-water-0103e7a0f.azurestaticapps.net/quiz/5)
+
+## समीक्षा और स्वअध्ययन
+- माइक्रोसॉफ्ट लर्न का [Classify your data](https://docs.microsoft.com/en-us/learn/modules/choose-storage-approach-in-azure/2-classify-data) पाठ संरचित, असंरचित और मिश्र संरचित डेटा के बारे मे और अच्छेसे बताता है।
+
+## अभ्यास
+[डेटा का वर्गीकरण](../assignment.md)
From 9f8b29749a2320db81c215116e837d41a6905030 Mon Sep 17 00:00:00 2001
From: Kaushal Joshi <53049546+joshi-kaushal@users.noreply.github.com>
Date: Tue, 12 Oct 2021 18:27:45 +0530
Subject: [PATCH 102/319] Updated README.hi.md - 4
---
1-Introduction/03-defining-data/translations/README.hi.md | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/1-Introduction/03-defining-data/translations/README.hi.md b/1-Introduction/03-defining-data/translations/README.hi.md
index e4f41811..34f3f0e8 100644
--- a/1-Introduction/03-defining-data/translations/README.hi.md
+++ b/1-Introduction/03-defining-data/translations/README.hi.md
@@ -5,7 +5,7 @@
डेटा मतलब तथ्य, माहिती और अनुभव है जिनका इस्तमाल करके नए खोज और सूचित निर्णयोंका समर्थन किया जाता है।
-डेटा पॉइंट यह डेटासेट का सबसे छोटा प्रमाण है। डेटासेट यह एक डेटा पॉइंट्स का बड़ा संग्रह होता है। डेटासेट बोहोत सारे अलगअलग प्रकार और संरचनाका होता है, और बोहोत बार किसी स्त्रोतपे आधारित होता है। उदाहरण के लिए, किसी कम्पनी की कमाई स्प्रेडशीट मैं जतन की हो सकती है मगर प्रति घंटे के दिल की धकड़न की गति [JSON](https://stackoverflow।com/questions/383692/what-is-json-and-what-is-it-used-for/383699#383699) रूप मैं हो सकती है। डेटा वैज्ञानिकोकेलिए अलग अलग प्रकार के डेटा और डेटासेट के साथ काम करना आम बात होती है।
+डेटा पॉइंट यह डेटासेट का सबसे छोटा प्रमाण है। डेटासेट यह एक डेटा पॉइंट्स का बड़ा संग्रह होता है। डेटासेट बोहोत सारे अलगअलग प्रकार और संरचनाका होता है, और बोहोत बार किसी स्त्रोतपे आधारित होता है। उदाहरण के लिए, किसी कम्पनी की कमाई स्प्रेडशीट मैं जतन की हो सकती है मगर प्रति घंटे के दिल की धकड़न की गति [JSON](https://stackoverflow.com/questions/383692/what-is-json-and-what-is-it-used-for/383699#383699) रूप मैं हो सकती है। डेटा वैज्ञानिकोकेलिए अलग अलग प्रकार के डेटा और डेटासेट के साथ काम करना आम बात होती है।
यह पाठ डेटा को उसके स्त्रोत के हिसाब से पहचानने और वर्गीकृत करने पे केंद्रित है।
@@ -43,11 +43,9 @@
डेटा का स्त्रोत मतलब मतलब वो जगह जहाँ डेटा सबसे पहिली बार निर्माण हुवा था, और हमेशा कहा और कब जमा किया था इसके ऊपर आधारित रहेगा। उपयोगकर्ताके द्वारा निर्माण किये हुवे डेटा को प्रार्थमिक डेटा के नाम से पहचाना जाता है जबकि गौण डेटा ऐसे स्त्रोत से आता है जिसने सामान्य काम के लिए डेटा जमा किया था। उदाहरण के लिए, वैज्ञानिकों का समूह वर्षावनमे टिप्पणियों और सूचि कमा कर रहे है तो वोप्रार्थमिक डेटा होगा और अगर उन्होंने उस डेटा को बाकि के वैज्ञनिकोके साथ बाँटना चाहा तो वो गौण डेटा कहलाया जायेगा।
## डेटा के स्त्रोत
-
डेटा का स्त्रोत मतलब मतलब वो जगह जहाँ डेटा सबसे पहिली बार निर्माण हुवा था, और हमेशा कहा और कब जमा किया था इसके ऊपर आधारित रहेगा। उपयोगकर्ताके द्वारा निर्माण किये हुवे डेटा को प्रार्थमिक डेटा के नाम से पहचाना जाता है जबकि गौण डेटा ऐसे स्त्रोत से आता है जिसने सामान्य काम के लिए डेटा जमा किया था। उदाहरण के लिए, वैज्ञानिकों का समूह वर्षावनमे टिप्पणियों और सूचि कमा कर रहे है तो वोप्रार्थमिक डेटा होगा और अगर उन्होंने उस डेटा को बाकि के वैज्ञनिकोके साथ बाँटना चाहा तो वो गौण डेटा कहलाया जायेगा।
-डेटाबेस यह एक सामान्य स्त्रोत है और वह होस्टिंग और डेटाबेस मेंटेनन्स सिस्टिम निर्भर होता है। डेटाबेस मेंटेनन्स सिस्टिममे उपयोगकर्ता कमांड्स जिन्हें क्वेरीज़ कहा जाता है इस्तमाल करके डेटाबेस का डेटा खोज सकते है. डेटा स्त्रोत फ़ाइल स्वरुप मे हो तो आवाज, चित्र, वीडियो, स्प्रेडशीट ऐसे प्रकार मे हो सकता है। आंतरजाल के स्त्रोत डेटा होस्ट करने के बोहोत आम तरीका है। यहां डेटाबेस तथा फाइलें खोजी जा सकती है। अप्लीकेशन प्रोगरामिंग इंटरफेस, जिन्हे 'एपीआय'(API) के नाम से जाना जाता है, उसकी मद्त से प्रोग्रामर्स डेटाको बहार के उपयोगकर्ताओको आंतरजाल द्वारा इस्तमाल करने के लिए भेज सकते है. जबकि वेब स्क्रैपिंग नामक प्रक्रियासे आंतरजाल के वेब पेज का डेटा अलग किया जा सकता है. [डेटा के साथ काम करना](https://github.com/microsoft/Data-Science-For-Beginners/tree/main/2-Working-With-Data) यह पथ अलग अलग डेटा का इस्तमाल करनेपर ध्यान देता है..
-
+डेटाबेस यह एक सामान्य स्त्रोत है और वह होस्टिंग और डेटाबेस मेंटेनन्स सिस्टिम निर्भर होता है। डेटाबेस मेंटेनन्स सिस्टिममे उपयोगकर्ता कमांड्स जिन्हें क्वेरीज़ कहा जाता है इस्तमाल करके डेटाबेस का डेटा खोज सकते है। डेटा स्त्रोत फ़ाइल स्वरुप मे हो तो आवाज, चित्र, वीडियो, स्प्रेडशीट ऐसे प्रकार मे हो सकता है। आंतरजाल के स्त्रोत डेटा होस्ट करने के बोहोत आम तरीका है। यहां डेटाबेस तथा फाइलें खोजी जा सकती है।अप्लीकेशन प्रोगरामिंग इंटरफेस, जिन्हे 'एपीआय'(API) के नाम से जाना जाता है, उसकी मद्त से प्रोग्रामर्स डेटाको बहार के उपयोगकर्ताओको आंतरजाल द्वारा इस्तमाल करने के लिए भेज सकते है। जबकि वेब स्क्रैपिंग नामक प्रक्रियासे आंतरजाल के वेब पेज का डेटा अलग किया जा सकता है। [डेटा के साथ काम करना](https://github.com/microsoft/Data-Science-For-Beginners/tree/main/2-Working-With-Data) यह पथ अलग अलग डेटा का इस्तमाल करनेपर ध्यान देता है।
## निष्कर्ष
यह पथ मे हमने पढ़ा की:
- डेटा क्या होता है
@@ -56,7 +54,7 @@
- डेटा कहा मिलता है
## 🚀 चुनौती
-Kaggle यह के मुफ्त के डेटाबेस का बोहोत अच्छा स्त्रोत है. [सर्च टूल ](https://www.kaggle.com/datasets) का इस्तमाल करके कुछ मजेदार डेटासेट ढूंढे और उनमेसे तीन-चार डेटाबेस का ऐसे वर्गीकरण कीजिए:
+Kaggle यह के मुफ्त के डेटाबेस का बोहोत अच्छा स्त्रोत है। [सर्च टूल ](https://www.kaggle.com/datasets) का इस्तमाल करके कुछ मजेदार डेटासेट ढूंढे और उनमेसे तीन-चार डेटाबेस का ऐसे वर्गीकरण कीजिए:
- डेटा परिमाणात्मक है या गुणात्मक?
- डेटा संरचित, असंरचित या फिर मिश्र संरचित है?
From 36a21b89a39a82e4a4aa2bff2c95e7802ed0d3e7 Mon Sep 17 00:00:00 2001
From: Keshav Sharma
Date: Tue, 12 Oct 2021 06:58:20 -0700
Subject: [PATCH 103/319] added
---
.../{R => 07-python/R(Bonus Lesson)}/Notebook.ipynb | 0
1 file changed, 0 insertions(+), 0 deletions(-)
rename 2-Working-With-Data/{R => 07-python/R(Bonus Lesson)}/Notebook.ipynb (100%)
diff --git a/2-Working-With-Data/R/Notebook.ipynb b/2-Working-With-Data/07-python/R(Bonus Lesson)/Notebook.ipynb
similarity index 100%
rename from 2-Working-With-Data/R/Notebook.ipynb
rename to 2-Working-With-Data/07-python/R(Bonus Lesson)/Notebook.ipynb
From 1858ea407d55d5174a949adbcd93b9c7a0d97de8 Mon Sep 17 00:00:00 2001
From: Kaushal Joshi <53049546+joshi-kaushal@users.noreply.github.com>
Date: Wed, 13 Oct 2021 00:54:24 +0530
Subject: [PATCH 104/319] README.hi.md: Proof reading - 1
---
.../translations/README.hi.md | 28 +++++++++----------
1 file changed, 14 insertions(+), 14 deletions(-)
diff --git a/1-Introduction/03-defining-data/translations/README.hi.md b/1-Introduction/03-defining-data/translations/README.hi.md
index 34f3f0e8..e21a638d 100644
--- a/1-Introduction/03-defining-data/translations/README.hi.md
+++ b/1-Introduction/03-defining-data/translations/README.hi.md
@@ -3,34 +3,34 @@
|:---:|
|डेटा का अवलोकन - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
-डेटा मतलब तथ्य, माहिती और अनुभव है जिनका इस्तमाल करके नए खोज और सूचित निर्णयोंका समर्थन किया जाता है।
+डेटा मतलब तथ्य, ज्ञान और अनुभव है जिनका इस्तेमाल करके नए खोज और सूचित निर्णयोंका समर्थन किया जाता है।
-डेटा पॉइंट यह डेटासेट का सबसे छोटा प्रमाण है। डेटासेट यह एक डेटा पॉइंट्स का बड़ा संग्रह होता है। डेटासेट बोहोत सारे अलगअलग प्रकार और संरचनाका होता है, और बोहोत बार किसी स्त्रोतपे आधारित होता है। उदाहरण के लिए, किसी कम्पनी की कमाई स्प्रेडशीट मैं जतन की हो सकती है मगर प्रति घंटे के दिल की धकड़न की गति [JSON](https://stackoverflow.com/questions/383692/what-is-json-and-what-is-it-used-for/383699#383699) रूप मैं हो सकती है। डेटा वैज्ञानिकोकेलिए अलग अलग प्रकार के डेटा और डेटासेट के साथ काम करना आम बात होती है।
+डेटा पॉइंट यह डेटासेट का सबसे छोटा प्रमाण है। डेटासेट यह एक डेटा पॉइंट्स का बड़ा संग्रह होता है। डेटासेट बहुत सारे अलगअलग प्रकार और संरचनाका होता है, और बहुत बार किसी स्त्रोत पे आधारित होता है। उदाहरण के लिए, किसी कम्पनी की कमाई स्प्रेडशीट मैं जतन की हो सकती है मगर प्रति घंटे के दिल की धकड़न की गति [JSON](https://stackoverflow.com/questions/383692/what-is-json-and-what-is-it-used-for/383699#383699) रूप मैं हो सकती है। डेटा वैज्ञानिकों केलिए अलग अलग प्रकार के डेटा और डेटासेट के साथ काम करना आम बात होती है।
-यह पाठ डेटा को उसके स्त्रोत के हिसाब से पहचानने और वर्गीकृत करने पे केंद्रित है।
+यह पाठ डेटा को उसके स्त्रोत के हिसाब से पहचानने और वर्गीकृत करने पर केंद्रित है।
## [पाठ के पाहिले की परीक्षा](https://red-water-0103e7a0f.azurestaticapps.net/quiz/4)
## डेटा का वर्णन कैसे किया जाता है
-**अपक्व डेटा** ऐसे प्रकार का डेटा होता जो उसके स्त्रोत से आते वक्त जिस अवस्था मैं था वैसे ही है और उसका विश्लेषण या वर्गीकरण नहीं किया गया है। ऐसे डेटासेट से जरूरी जानकारी निकलने के लिए उसे ऐसे प्रकार मे लाना आवश्यक है जो इंसान समज सके और जिस टैकनोलजीका उपयोग डेटा के विश्लेषण मे किया जाएगा उसको भी समज आये। डेटाबेसकी संरचना हमे बताती है की डेटा किस प्रकार से वर्गीकृत किया गया है और उसका संरचित, मिश्र संरचित और असंरचित प्रकार मै वर्गीकरण कैसे किया जाता है। संरचना के प्रकार डेटा के स्त्रोत के अनुसार बदल सकते है मगर आखिर मै इन तीनो मैं से एक प्रकार के हो सकते है।
+**अपरीपक्व डेटा** ऐसे प्रकार का डेटा होता जो उसके स्त्रोत से आते वक्त जिस अवस्था में था वैसे ही है और उसका विश्लेषण या वर्गीकरण नहीं किया गया है। ऐसे डेटासेट से जरूरी जानकारी निकलने के लिए उसे ऐसे प्रकार मे लाना आवश्यक है जो इंसान समझ सके और जिस तंत्रज्ञान का उपयोग डेटा के विश्लेषण में किया जाएगा उसको भी समझ आये। डेटाबेस की संरचना हमें बताती है कि डेटा किस प्रकार से वर्गीकृत किया गया है और उसका संरचित, मिश्र संरचित और असंरचित प्रकार में वर्गीकरण कैसे किया जाता है। संरचना के प्रकार डेटा के स्त्रोत के अनुसार बदल सकते हैं मगर आखिर में इन तीनों में से एक प्रकार के हो सकते हैं।
### परिमाणात्मक डेटा
-परिमाणात्मक डेटा मतलब डेटासेट मे उपलब्ध होने वाला ऐसा संख्यात्मक डेटा जिसका इस्तमाल विश्लेषण,मापन और गणितीय चीजोंकेलिए हो सकता है। परिमाणात्मक डेटा के यह कुछ उदाहरण है: देश की जनसंख्या, इंसान की ऊंचाई या कंपनी की तिमाही कमाई। थोड़े अधिक विश्लेषण के बाद परिणामात्मक डेटा से मौसम के अनुसार वायु गुणवत्ता सूचकांक(Air Quality Index) के बदलाव पता करना या फिर किसी सामान्य कार्यदिवस पर भीड़भाड़ वाले घंटे के ट्रैफिक की संभावना का अनुमान लगना मुमकिन है.
+परिमाणात्मक डेटा मतलब डेटासेट में उपलब्ध होने वाला ऐसा संख्यात्मक डेटा जिसका उपयोग विश्लेषण, मापन और गणितीय चीजों के लिए हो सकता है। परिमाणात्मक डेटा के यह कुछ उदाहरण हैं: देश की जनसंख्या, इंसान की कद या कंपनी की तिमाही कमाई। थोडे अधिक विश्लेषण बाद डेटा की परिस्थिति के अनुसार वायुगुणवत्ता सूचकांक का बदलाव पता करना या फिर किसी सामान्य दिन पर व्यस्त ट्रैफिक की संभावना का अनुमान लगाना मुमकिन है।
### गुणात्मक डेटा
-गुणात्मक डेटा, जिसे वर्गीकृत डेटा भी कहा जाता है, यह एक डेटा का ऐसा प्रकार है जिसे परिमाणात्मक डेटा की तरह वस्तुनिष्ठ तरहसे नापा नहीं जा सकता। यह आम तौर पर अलग अलग प्रकार का आत्मनिष्ठ डेटा होता है जिस से किसी उत्पादन या प्रक्रिया की गुणवत्ता। कभी कभार गुणात्मक डेटा सांखिक स्वरुपमैं होके भी गणितीय कारणों के लिए इस्तमल नहीं किया जा सकता, जैसे की फोन नंबर या समय। गुणात्मक डेटा के यह कुछ उदाहरण हो सकते है: विडिओकी टिप्पणियाँ, आपके करीबी दोस्त के गाड़ी के पसंदिता रंग का नमूना बनाना। गुणात्मक डेटा का इस्तमाल करके ग्राहकोंको कोनसा उत्पादन सबसे ज्यादा पसंद आ रहा है या फिर नौकरी आवेदन के रिज्यूमे मैं सबसे ज्यादा इस्तमाल होने वाले शब्द ढूंढ़ना।
+गुणात्मक डेटा, जिसे वर्गीकृत डेटा भी कहा जाता है, यह एक डेटा का ऐसा प्रकार है जिसे परिमाणात्मक डेटा की तरह वस्तुनिष्ठ तरह से नापा नहीं जा सकता। यह आम तौर पर अलग अलग प्रकार का आत्मनिष्ठ डेटा होता है जैसे से किसी उत्पादन या प्रक्रिया की गुणवत्ता। कभी कभी गुणात्मक डेटा सांख्यिक स्वरुप में हो के भी गणितीय कारणों के लिए इस्तेमाल नहीं किया जा सकता, जैसे की फोन नंबर या समय। गुणात्मक डेटा के यह कुछ उदाहरण हो सकते है: विडियो की टिप्पणियाँ, किसी गाड़ी का मॉडल या आपके प्रीय दोस्त का पसंदिदा रंग। गुणात्मक डेटा का इस्तेमाल करके ग्राहकौं को कोनसा उत्पादन सबसे ज्यादा पसंद आता है या फिर नौकरी आवेदन के रिज्यूमे में सबसे ज्यादा इस्तेमाल होने वाले शब्द ढूंढ़ना।
### संरचित डेटा
-संरचित डेटा वह डेटा है जो पंक्तियों और स्तंभों में संगठित होता है, जिसके हर पंक्तिमे समान स्तंभ होते है. हर स्तंभ एक विशिष्ट प्रकार के मूल्य को बताता है और उस मूल्यको दर्शाने वाले अनाम के साथ जाना जाता है. जबकी पंक्तियाँ मे वास्तविक मूल्य होते है। हर मूल्य सही स्तम्भ का प्रतिनिधित्व करते है की नहीं ये निश्चित करने के लिए स्तंभमे अक्सर मूल्यों पर नियमोंका प्रतिबन्ध लगा रहता है. उदाहरणार्थ कल्पना कीजिये ग्राहकोंकी जानकारी होने वाला एक स्प्रेडशीट फ़ाइल जीके हर पंक्तिमे फोन क्रमांक होना जरुरी है और फोन क्रमांकमे कभीभी वर्ण या व्यंजन नहीं रहते। तो फिर फोन क्रमांक के स्तंभ पर ऐसा नियम लगा होना चाहिए जिससे ये निश्चित हो की बह कभीभी खाली नहीं है और उसमें सिर्फ आकड़े ही है.
+संरचित डेटा वह डेटा है जो पंक्तियों और स्तंभों में संगठित होता है, जिसके हर पंक्ति में समान स्तंभ होते है। हर स्तंभ एक विशिष्ट प्रकार के मूल्य को बताता है और उस मूल्य को दर्शाने वाले नाम के साथ जाना जाता है। जबकि पंक्तियौं में वास्तविक मूल्य होते है। हर मूल्य सही स्तंभ का प्रतिनिधित्व करते हैं कि नहीं ये निश्चित करने के लिए स्तंभ में अक्सर मूल्यों पर नियमों का प्रतिबन्ध लगा रहता है। उदाहरणार्थ कल्पना कीजिये ग्राहकों की जानकारी होने वाला एक स्प्रेडशीट फ़ाइल जिसके हर पंक्ति में फोन नंबर होना जरुरी है और फोन नंबर में कभी भी अक्षर नहीं रहते। तो फिर फोन नंबर के स्तंभ पर ऐसा नियम लगा होना चाहिए जिससे यह निश्चित हो कि वह कभी भी खाली नहीं रहता है और उसमें सिर्फ आँकडे ही है ।
-सरंचित डेटा का यह फायदा है की उसे स्तंभ और पंक्तियोंमे वर्गीकृत किया जा सकता है. तथापि, डेटा को एक विशिष्ट प्रकार मै वर्गीकृत करने के लिए आयोजित किये जाने के बजह से पुरे संरचना मे बदल करना बोहोत मुश्किल का काम होता है. जैसे की ग्राहकोंके जानकारी वाले स्प्रेडशीट फ़ाइलमे अगर हमें ईमेल आयडी खाली ना होने वाला नया स्तंभ जोड़ना हो तो हमे ये पता करना होगा की पहिलेसे जो मूल्य इस डेटासेट मे है उनका क्या होगा.
+सरंचित डेटा का यह फायदा है की उसे स्तंभ और पंक्तियों में संयोजित किया जा सकता है। तथापि, डेटा को एक विशिष्ट प्रकार में संयोजित करने के लिए आयोजित किये जाने के वजह से पुरे संरचना में बदल करना बहुत मुश्किल काम होता है। जैसे की ग्राहकों के जानकारी वाले स्प्रेडशीट फ़ाइलमें अगर हमें ईमेल आयडी खाली ना होने वाला नया स्तंभ जोड़ना हो, तो हमे ये पता करना होगा की पहिले से जो मूल्य इस डेटासेट में है उनका क्या होगा?
-संरचित डेटा के यह कुछ उदाहरण है: स्प्रेडशीट, रिलेशनल डेटाबेस, फोन नंबर, बैंक स्टेटमेंट
+संरचित डेटा के यह कुछ उदाहरण हैं: स्प्रेडशीट, रिलेशनल डेटाबेस, फोन नंबर एवं बैंक स्टेटमेंट ।
### असंरचित डेटा
-असंरचित डेटा आम तौर पर स्तंभ और पांक्तोयोंमे वर्गीकृत नहीं किया जा सकता और किसी नियमोंसे बंधित भी नहीं रहता। संरचित डेटा के तुलना से असंरचित डेटा मैं कम नियम होने के कारण उसमे नया डेटा डालना बोहोत आसान होता है। अगर कोही सेंसर जो तापमानमापक के प्रेशर का दबाव हर दो मिनिट बाद दर्ज करता है, जिसके बजह से वह तापमान को मापके दर्ज कर सकता है, तो उसे असंरचित डेटा होने के कारण डेटाबेसमे पहिलेसे होने वाले डेटा को बदलने की आवश्यकता नहीं है। तथापि, ऐसे डेटा का विश्लेषण और जांच करने को ज्यादा समय लग सकता है।
-जैसे की, एक वैज्ञानिक जिसे सेंसर के डेटा से पिछले महीने के तापमान का औसत ढूंढ़ना हो, मगर वो देखता है की सेंसर ने कुछ जगह आधेअधूरे डेटा को दर्ज करने के लिए आम क्रमांक के बजाय 'e' दर्ज किया है, जिसका मतलब है की डेटा अधूरा है।
-असंरचित डेटा के उदाहरण: टेक्स्ट फ़ाइलें, टेक्स्ट मेसेजेस, विडिओ फ़ाइलें।
+असंरचित डेटा आम तौर पर स्तंभ और पंक्तियों में वर्गीकृत नहीं किया जा सकता और किसी नियमों से बंधित भी नहीं रहता। संरचित डेटा के तुलना में असंरचित डेटा में कम नियम होने के कारण उसमे नया डेटा जोडना बहुत आसान होता है। अगर कोई सेंसर जो बैरोमीटर के दबाव को हर दो मिनट के बाद दर्ज करता है, जिसकी वजह से वह दाब को माप के दर्ज कर सकता है, तो उसे असंरचित डेटा होने के कारण डेटाबेस में पहलेसे उपलब्ध डेटा को बदलने की आवश्यकता नहीं है। तथापि, ऐसे डेटा का विश्लेषण और जाँच करने में ज्यादा समय लग सकता है।
+जैसे की, एक वैज्ञानिक जिसे सेंसर के डेटा से पिछले महीने के तापमान का औसत ढूंढ़ना हो, मगर वो देखता है की सेंसर ने कुछ जगह आधे अधूरे डेटा को दर्ज करने के लिए आम क्रमांक के विपरीत 'e' दर्ज किया है, जिसका मतलब है की डेटा अपूर्ण है।
+असंरचित डेटा के उदाहरण: टेक्स्ट फ़ाइलें, टेक्स्ट मेसेजेस, विडियो फ़ाइलें।
### मिश्र संरचित डेटा
@@ -45,7 +45,7 @@
## डेटा के स्त्रोत
डेटा का स्त्रोत मतलब मतलब वो जगह जहाँ डेटा सबसे पहिली बार निर्माण हुवा था, और हमेशा कहा और कब जमा किया था इसके ऊपर आधारित रहेगा। उपयोगकर्ताके द्वारा निर्माण किये हुवे डेटा को प्रार्थमिक डेटा के नाम से पहचाना जाता है जबकि गौण डेटा ऐसे स्त्रोत से आता है जिसने सामान्य काम के लिए डेटा जमा किया था। उदाहरण के लिए, वैज्ञानिकों का समूह वर्षावनमे टिप्पणियों और सूचि कमा कर रहे है तो वोप्रार्थमिक डेटा होगा और अगर उन्होंने उस डेटा को बाकि के वैज्ञनिकोके साथ बाँटना चाहा तो वो गौण डेटा कहलाया जायेगा।
-डेटाबेस यह एक सामान्य स्त्रोत है और वह होस्टिंग और डेटाबेस मेंटेनन्स सिस्टिम निर्भर होता है। डेटाबेस मेंटेनन्स सिस्टिममे उपयोगकर्ता कमांड्स जिन्हें क्वेरीज़ कहा जाता है इस्तमाल करके डेटाबेस का डेटा खोज सकते है। डेटा स्त्रोत फ़ाइल स्वरुप मे हो तो आवाज, चित्र, वीडियो, स्प्रेडशीट ऐसे प्रकार मे हो सकता है। आंतरजाल के स्त्रोत डेटा होस्ट करने के बोहोत आम तरीका है। यहां डेटाबेस तथा फाइलें खोजी जा सकती है।अप्लीकेशन प्रोगरामिंग इंटरफेस, जिन्हे 'एपीआय'(API) के नाम से जाना जाता है, उसकी मद्त से प्रोग्रामर्स डेटाको बहार के उपयोगकर्ताओको आंतरजाल द्वारा इस्तमाल करने के लिए भेज सकते है। जबकि वेब स्क्रैपिंग नामक प्रक्रियासे आंतरजाल के वेब पेज का डेटा अलग किया जा सकता है। [डेटा के साथ काम करना](https://github.com/microsoft/Data-Science-For-Beginners/tree/main/2-Working-With-Data) यह पथ अलग अलग डेटा का इस्तमाल करनेपर ध्यान देता है।
+डेटाबेस यह एक सामान्य स्त्रोत है और वह होस्टिंग और डेटाबेस मेंटेनन्स सिस्टिम निर्भर होता है। डेटाबेस मेंटेनन्स सिस्टिममे उपयोगकर्ता कमांड्स जिन्हें क्वेरीज़ कहा जाता है इस्तेमाल करके डेटाबेस का डेटा खोज सकते है। डेटा स्त्रोत फ़ाइल स्वरुप मे हो तो आवाज, चित्र, वीडियो, स्प्रेडशीट ऐसे प्रकार मे हो सकता है। आंतरजाल के स्त्रोत डेटा होस्ट करने के बहुत आम तरीका है। यहां डेटाबेस तथा फाइलें खोजी जा सकती है।अप्लीकेशन प्रोगरामिंग इंटरफेस, जिन्हे 'एपीआय'(API) के नाम से जाना जाता है, उसकी मद्त से प्रोग्रामर्स डेटाको बहार के उपयोगकर्ताओको आंतरजाल द्वारा इस्तेमाल करने के लिए भेज सकते है। जबकि वेब स्क्रैपिंग नामक प्रक्रियासे आंतरजाल के वेब पेज का डेटा अलग किया जा सकता है। [डेटा के साथ काम करना](https://github.com/microsoft/Data-Science-For-Beginners/tree/main/2-Working-With-Data) यह पथ अलग अलग डेटा का इस्तेमाल करनेपर ध्यान देता है।
## निष्कर्ष
यह पथ मे हमने पढ़ा की:
- डेटा क्या होता है
@@ -54,7 +54,7 @@
- डेटा कहा मिलता है
## 🚀 चुनौती
-Kaggle यह के मुफ्त के डेटाबेस का बोहोत अच्छा स्त्रोत है। [सर्च टूल ](https://www.kaggle.com/datasets) का इस्तमाल करके कुछ मजेदार डेटासेट ढूंढे और उनमेसे तीन-चार डेटाबेस का ऐसे वर्गीकरण कीजिए:
+Kaggle यह के मुफ्त के डेटाबेस का बहुत अच्छा स्त्रोत है। [सर्च टूल ](https://www.kaggle.com/datasets) का इस्तेमाल करके कुछ मजेदार डेटासेट ढूंढे और उनमेसे तीन-चार डेटाबेस का ऐसे वर्गीकरण कीजिए:
- डेटा परिमाणात्मक है या गुणात्मक?
- डेटा संरचित, असंरचित या फिर मिश्र संरचित है?
From 9ea38061baba033e3b4e9971f15cf017c36c2298 Mon Sep 17 00:00:00 2001
From: Fernanda Kawasaki <50497814+fernandakawasaki@users.noreply.github.com>
Date: Tue, 12 Oct 2021 18:15:07 -0300
Subject: [PATCH 105/319] Create assignment.pt-br.md
---
.../translations/assignment.pt-br.md | 11 +++++++++++
1 file changed, 11 insertions(+)
create mode 100644 3-Data-Visualization/09-visualization-quantities/translations/assignment.pt-br.md
diff --git a/3-Data-Visualization/09-visualization-quantities/translations/assignment.pt-br.md b/3-Data-Visualization/09-visualization-quantities/translations/assignment.pt-br.md
new file mode 100644
index 00000000..fb5b62a3
--- /dev/null
+++ b/3-Data-Visualization/09-visualization-quantities/translations/assignment.pt-br.md
@@ -0,0 +1,11 @@
+# Linhas, dispersão e barras
+
+## Instruções
+
+Nessa aula, você trabalhou com gráfico de linhas, dispersão e barras para mostrar fatos interessantes sobre esse dataset. Nessa tarefa, explore mais a fundo o dataset para descobrir algo sobre um dado tipo de pássaro. Por exemplo, crie um notebook que mostre visualizações de todos os fatos interessantes que encontrar sobre os Snow Geese (gansos-das-neves). Use os três tipos de gráficos mencionados anteriormente para contar uma história em seu notebook.
+
+## Rubrica
+
+Exemplar | Adequado | Precisa melhorar
+--- | --- | -- |
+O notebook foi apresentado com boas anotações, contação de histórias (storytelling) sólida e gráficos cativantes | O notebook não tem um desses elementos | O notebook não tem dois desses elementos
From 2125afcd3054502a0deb71eb75cbe7837293ae56 Mon Sep 17 00:00:00 2001
From: Angel Mendez
Date: Tue, 12 Oct 2021 20:48:09 -0500
Subject: [PATCH 106/319] fix:(translation) Fix broken links on main page
module 1
---
1-Introduction/translations/README.es.md | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/1-Introduction/translations/README.es.md b/1-Introduction/translations/README.es.md
index 75d3b4cc..592da390 100644
--- a/1-Introduction/translations/README.es.md
+++ b/1-Introduction/translations/README.es.md
@@ -9,10 +9,10 @@ cómo se definen los datos y un poco de probabilidad y estadística, el núcleo
### Temas
-1. [Definiendo la Ciencia de Datos](01-defining-data-science/README.md)
-2. [Ética de la Ciencia de Datos](02-ethics/README.md)
-3. [Definición de Datos](03-defining-data/README.md)
-4. [introducción a la probabilidad y estadística](04-stats-and-probability/README.md)
+1. [Definiendo la Ciencia de Datos](../01-defining-data-science/README.md)
+2. [Ética de la Ciencia de Datos](../02-ethics/README.md)
+3. [Definición de Datos](../03-defining-data/translations/README.es.md)
+4. [introducción a la probabilidad y estadística](../04-stats-and-probability/README.md)
### Créditos
From 373a22af6191bf4725eb05eb49b961f19b8d06eb Mon Sep 17 00:00:00 2001
From: Fernanda Kawasaki <50497814+fernandakawasaki@users.noreply.github.com>
Date: Wed, 13 Oct 2021 00:41:05 -0300
Subject: [PATCH 107/319] Create README.pt-bt.md
---
.../translations/README.pt-br.md | 199 ++++++++++++++++++
1 file changed, 199 insertions(+)
create mode 100644 3-Data-Visualization/10-visualization-distributions/translations/README.pt-br.md
diff --git a/3-Data-Visualization/10-visualization-distributions/translations/README.pt-br.md b/3-Data-Visualization/10-visualization-distributions/translations/README.pt-br.md
new file mode 100644
index 00000000..1ce0f827
--- /dev/null
+++ b/3-Data-Visualization/10-visualization-distributions/translations/README.pt-br.md
@@ -0,0 +1,199 @@
+# Visualizando distribuições
+
+| ](../../sketchnotes/10-Visualizing-Distributions.png)|
+|:---:|
+| Visualizando distribuições - _Sketchnote por [@nitya](https://twitter.com/nitya)_ |
+
+Na aula anterior, você aprendeu fatos interessantes sobre um dataset de pássaros de Minnesota. Você encontrou dados incorretos ao visualizar outliers e olhou as diferenças entre categorias de pássaros com base no seu comprimento máximo.
+
+## [Quiz pré-aula](https://red-water-0103e7a0f.azurestaticapps.net/quiz/18)
+## Explore o dataset de pássaros
+
+Outra forma de explorar os dados é olhar para sua distribuição, ou como os dados estão organizados ao longo do eixo. Por exemplo, talvez você gostaria de aprender sobre a distribuição geral, nesse dataset, do máximo de envergadura (wingspan) ou máximo de massa corporal (body mass) dos pássaros de Minnesota.
+
+Vamos descobrir alguns fatos sobre as distribuições de dados nesse dataset. No arquivo _notebook.ipynb_ na raiz do diretório dessa aula, importe Pandas, Matplotlib, e seus dados:
+
+```python
+import pandas as pd
+import matplotlib.pyplot as plt
+birds = pd.read_csv('../../data/birds.csv')
+birds.head()
+```
+
+Geralmente, você pode olhar para a forma como os dados estão distribuídos usando um gráfico de dispersão (scatter plot) como fizemos na aula anterior:
+
+```python
+birds.plot(kind='scatter',x='MaxLength',y='Order',figsize=(12,8))
+
+plt.title('Max Length per Order')
+plt.ylabel('Order')
+plt.xlabel('Max Length')
+
+plt.show()
+```
+
+Isso nos dá uma visão geral da distribuição de comprimento de corpo por Ordem do pássaro, mas não é a forma ótima de mostrar a distribuição real. Essa tarefa geralmente é realizada usando um histograma.
+
+## Trabalhando com histogramas
+
+O Matplotlib oferece formas muito boas de visualizar distribuição dos dados usando histogramas. Esse tipo de gráfico é parecido com um gráfico de barras onde a distribuiçao pode ser vista por meio da subida e descida das barras. Para construir um histograma, você precisa de dados numéricos. Para construir um histograma, você pode plotar um gráfico definindo o tipo (kind) como 'hist' para histograma. Esse gráfico mostra a distribuição de massa corporal máxima (MaxBodyMass) para todo o intervalo numérico dos dados. Ao dividir um certo vetor de dados em intervalos (bins) menores, vemos a distribuição dos valores:
+
+```python
+birds['MaxBodyMass'].plot(kind = 'hist', bins = 10, figsize = (12,12))
+plt.show()
+```
+
+
+
+Como você pode ver, a maior parte dos mais de 400 pássaros cai no intervalo de menos de 2000 para a massa corporal máxima. Obtenha mais conhecimento dos dados mudando o parâmetro de intervalo (`bins`) para um número maior, como 30:
+
+```python
+birds['MaxBodyMass'].plot(kind = 'hist', bins = 30, figsize = (12,12))
+plt.show()
+```
+
+
+
+Esse gráfico mostra a distribuição de forma mais detalhada. Um gráfico menos concentrado na esquerda pode ser criado garantindo que você só seleciona os dados dentro de um certo intervalo:
+
+Filtre seus dados para obter somente os pássaros que possuem menos de 60 de massa corporal, e mostre 40 intervalos (`bins`):
+
+```python
+filteredBirds = birds[(birds['MaxBodyMass'] > 1) & (birds['MaxBodyMass'] < 60)]
+filteredBirds['MaxBodyMass'].plot(kind = 'hist',bins = 40,figsize = (12,12))
+plt.show()
+```
+
+
+✅ Tente outros filtros e pontos de dados (data points). Para ver a distribuição completa dos dados, remova o filtro `['MaxBodyMass']` para mostrar as distribuições com identificação.
+
+O histrograma também oferece algumas cores legais e identificadores (labels) melhorados:
+
+Crie um histograma 2D para comparar a relação entre duas distribuições. Vamos comparar massa corporal máxima vs. comprimento máximo (`MaxBodyMass` vs. `MaxLength`). O Matplotlib possui uma forma integrada de mostrar convergência usando cores mais vivas:
+
+```python
+x = filteredBirds['MaxBodyMass']
+y = filteredBirds['MaxLength']
+
+fig, ax = plt.subplots(tight_layout=True)
+hist = ax.hist2d(x, y)
+```
+
+Aparentemente, existe uma suposta correlação entre esses dois elementos ao longo de um eixo esperado, com um forte ponto de convergência:
+
+
+
+Por definição, os histogramas funcionam para dados numéricos. E se você precisar ver distribuições de dados textuais?
+
+## Explore o dataset e busque por distribuições usando dados textuais
+
+Esse dataset também inclui informações relevantes sobre a categoria de pássaro e seu gênero, espécie e família, assim como seu status de conservação. Vamos explorar mais a fundo essa informação sobre conservação. Qual é a distribuição dos pássaros de acordo com seu status de conservação?
+
+> ✅ No dataset, são utilizados vários acrônimos para descrever o status de conservação. Esses acrônimos vêm da [IUCN Red List Categories](https://www.iucnredlist.org/), uma organização que cataloga os status das espécies.
+>
+> - CR: Critically Endangered (Criticamente em perigo)
+> - EN: Endangered (Em perigo)
+> - EX: Extinct (Extinto)
+> - LC: Least Concern (Pouco preocupante)
+> - NT: Near Threatened (Quase ameaçada)
+> - VU: Vulnerable (Vulnerável)
+
+Esses são valores textuais, então será preciso transformá-los para criar um histograma. Usando o dataframe filteredBirds, mostre seu status de conservação juntamente com sua envergadura mínima (MinWingspan). O que você vê?
+
+```python
+x1 = filteredBirds.loc[filteredBirds.ConservationStatus=='EX', 'MinWingspan']
+x2 = filteredBirds.loc[filteredBirds.ConservationStatus=='CR', 'MinWingspan']
+x3 = filteredBirds.loc[filteredBirds.ConservationStatus=='EN', 'MinWingspan']
+x4 = filteredBirds.loc[filteredBirds.ConservationStatus=='NT', 'MinWingspan']
+x5 = filteredBirds.loc[filteredBirds.ConservationStatus=='VU', 'MinWingspan']
+x6 = filteredBirds.loc[filteredBirds.ConservationStatus=='LC', 'MinWingspan']
+
+kwargs = dict(alpha=0.5, bins=20)
+
+plt.hist(x1, **kwargs, color='red', label='Extinct')
+plt.hist(x2, **kwargs, color='orange', label='Critically Endangered')
+plt.hist(x3, **kwargs, color='yellow', label='Endangered')
+plt.hist(x4, **kwargs, color='green', label='Near Threatened')
+plt.hist(x5, **kwargs, color='blue', label='Vulnerable')
+plt.hist(x6, **kwargs, color='gray', label='Least Concern')
+
+plt.gca().set(title='Conservation Status', ylabel='Max Body Mass')
+plt.legend();
+```
+
+
+
+Aparentemente não existe uma correlação forte entre a envergadura mínima e o status de conservação. Teste outros elementos do dataset usando esse método. Você também pode tentar outros filtros. Você encontrou alguma correlação?
+
+## Gráfico de densidade (Estimativa de densidade kernel)
+
+Você pode ter percebido que até agora os histogramas são quebrados em degraus e não fluem de forma suave em uma curva. Para mostrar um gráfico de densidade mais 'fluido', você pode tentar usar a estimativa de densidade kernel (kde).
+
+Para trabalhar com gráficos de densidade, acostume-se com uma nova biblioteca de gráficos, [Seaborn](https://seaborn.pydata.org/generated/seaborn.kdeplot.html).
+
+Depois de carregar o Seaborn, tente um gráfico de densidade básico:
+
+```python
+import seaborn as sns
+import matplotlib.pyplot as plt
+sns.kdeplot(filteredBirds['MinWingspan'])
+plt.show()
+```
+
+
+Você consegue ver como o gráfico reflete o anterior (de envergadura mínima); só é mais fluido/suave. De acordo com a documentação do Seaborn, ""
+"Em comparação com o histograma, KDE pode produzir um gráfico que é menos confuso e mais legível, especialmente quando plotamos múltiplas distribuições. Mas pode potencialmente introduzir distorções se a distribuição usada é limitada ou não suave. Como um histograma, a qualidade da representação também depende na escolha de bons parâmetros suavizadores (smoothing parameters)." [créditos](https://seaborn.pydata.org/generated/seaborn.kdeplot.html) Em outras palavras, dados discrepantes (outliers) vão fazer seus gráficos se comportarem mal, como sempre.
+
+Se você quer revisitar a linha irregular/dentada MaxBodyMass (massa corporal máxima) no segundo gráfico construído, você pode suavizá-la muito bem recriando o seguinte método:
+
+```python
+sns.kdeplot(filteredBirds['MaxBodyMass'])
+plt.show()
+```
+
+
+Se você quer uma linha suave, mas não tão suave, mude o parâmetro `bw_adjust`:
+
+```python
+sns.kdeplot(filteredBirds['MaxBodyMass'], bw_adjust=.2)
+plt.show()
+```
+
+
+✅ Leia sobre os parâmetros disponíveis para esse tipo de gráfico e experimente!
+
+Esse tipo de gráfico oferece visualizações bonitas e esclarecedoras. Com algumas linhas de código, por exemplo, você pode mostrar a densidade de massa corporal máxima por pássaro por Ordem:
+
+```python
+sns.kdeplot(
+ data=filteredBirds, x="MaxBodyMass", hue="Order",
+ fill=True, common_norm=False, palette="crest",
+ alpha=.5, linewidth=0,
+)
+```
+
+
+
+Você também pode mapear a densidade de várias variáveis em um só gráfico. Teste usar o comprimento máximo (MaxLength) e mínimo (MinLength) de um pássaro comparado com seu status de conservação:
+
+```python
+sns.kdeplot(data=filteredBirds, x="MinLength", y="MaxLength", hue="ConservationStatus")
+```
+
+
+
+Talvez valha a pena pesquisar mais a fundo se o cluster de pássaros vulneráveis ('Vulnerable') de acordo com seus comprimentos tem significado ou não.
+
+## 🚀 Desafio
+
+Histogramas são um tipo mais sofisticado de gráfico em relação a simples gráficos de dispersão, barras ou linhas. Pesquise na internet bons exemplos de uso de histogramas. Como eles são usados, o que eles demonstram e em quais áreas ou campos de pesquisa eles são usados.
+
+## [Post-lecture quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/19)
+
+## Revisão e autoestudo
+
+Nessa aula, você usou o Matplotlib e começou a trabalhar com o Seaborn para mostrar gráficos mais avançados. Pesquise sobre o `kdeplot` no Seaborn, uma "curva de densidade de probabilidade contínua em uma ou mais dimensões". Leia a [documentação](https://seaborn.pydata.org/generated/seaborn.kdeplot.html) para entender como funciona.
+
+## Tarefa
+
+[Use suas habilidades](assignment.md)
From 1a44378334a819d17da9b1e938614a9d3b6aef1a Mon Sep 17 00:00:00 2001
From: Dhruv Krishna Vaid
Date: Wed, 13 Oct 2021 11:46:51 +0530
Subject: [PATCH 108/319] Added Hindi translations
---
.../17-Introduction/translations/README.hi.md | 100 ++++++++++++++++++
1 file changed, 100 insertions(+)
create mode 100644 5-Data-Science-In-Cloud/17-Introduction/translations/README.hi.md
diff --git a/5-Data-Science-In-Cloud/17-Introduction/translations/README.hi.md b/5-Data-Science-In-Cloud/17-Introduction/translations/README.hi.md
new file mode 100644
index 00000000..8b15fd41
--- /dev/null
+++ b/5-Data-Science-In-Cloud/17-Introduction/translations/README.hi.md
@@ -0,0 +1,100 @@
+# क्लाउड में डेटा साइंस का परिचय
+
+| ](../../../sketchnotes/17-DataScience-Cloud.png)|
+|:---:|
+| क्लाउड में डेटा साइंस: परिचय - _[@nitya](https://twitter.com/nitya) द्वारा स्केचनोट_ |
+
+
+इस पाठ में, आप क्लाउड के मूलभूत सिद्धांतों को जानेंगे, फिर आप देखेंगे कि आपके डेटा साइंस परियोजनाओं को चलाने के लिए क्लाउड सेवाओं का उपयोग करना आपके लिए दिलचस्प क्यों हो सकता है और हम क्लाउड में चलने वाले डेटा साइंस प्रोजेक्ट के कुछ उदाहरण देखेंगे।
+
+
+## [प्री-लेक्चर क्विज़](https://red-water-0103e7a0f.azurestaticapps.net/quiz/32)
+
+
+## क्लाउड क्या है?
+
+क्लाउड, या क्लाउड कंप्यूटिंग, इंटरनेट पर एक बुनियादी ढांचे पर होस्ट की जाने वाली पे-एज़-यू-गो कंप्यूटिंग सेवाओं की एक विस्तृत श्रृंखला की डिलीवरी है। सेवाओं में स्टोरेज, डेटाबेस, नेटवर्किंग, सॉफ्टवेयर, एनालिटिक्स और इंटेलिजेंट सर्विसेज जैसे समाधान शामिल हैं।
+
+हम आम तौर पर पब्लिक, प्राइवेट और हाइब्रिड क्लाउड में ऐसे अंतर करते हैं:
+
+* पब्लिक क्लाउड: एक पब्लिक क्लाउड का स्वामित्व और संचालन तीसरे पक्ष के क्लाउड सेवा प्रदाता के पास होता है जो इंटरनेट पर अपने कंप्यूटिंग संसाधनों को जनता तक पहुंचाता है।
+* प्राइवेट क्लाउड: एक ही व्यवसाय या संगठन द्वारा विशेष रूप से उपयोग किए जाने वाले क्लाउड कंप्यूटिंग संसाधनों को संदर्भित करता है, जिसमें सेवाओं और निजी नेटवर्क पर बनाए रखा गया इंफ्रास्ट्रक्चर होता है।
+* हाइब्रिड क्लाउड: हाइब्रिड क्लाउड एक ऐसा सिस्टम है जो पब्लिक और प्राइवेट क्लाउड को जोड़ता है। उपयोगकर्ता ऑन-प्रिमाइसेस डेटासेंटर चुनते हैं, जिससे डेटा और एप्लिकेशन को एक या अधिक पब्लिक क्लाउड पर चला सकते हैं।
+
+अधिकांश क्लाउड कंप्यूटिंग सेवाएं तीन श्रेणियों में आती हैं: सर्विस के रूप में इंफ्रास्ट्रक्चर (IaaS), सर्विस के रूप में प्लेटफॉर्म (PaaS) और सर्विस के रूप में सॉफ्टवेयर (SaaS)।
+
+* सर्विस के रूप में इंफ्रास्ट्रक्चर (IaaS): उपयोगकर्ता आईटी इन्फ्रास्ट्रक्चर किराए पर लेते हैं जैसे सर्वर और वर्चुअल मशीन (VMs), स्टोरेज, नेटवर्क, ऑपरेटिंग सिस्टम।
+* सर्विस के रूप में प्लेटफॉर्म (PaaS): उपयोगकर्ता सॉफ्टवेयर ऍप्लिकेशन्स के विकास, परीक्षण, वितरण और मैनेज करने के लिए एक वातावरण किराए पर लेते हैं। उपयोगकर्ताओं को विकास के लिए आवश्यक सर्वर के इंफ्रास्ट्रक्चर, स्टोरेज, नेटवर्क और डेटाबेस को स्थापित करने या प्रबंधित करने के बारे में चिंता करने की आवश्यकता नहीं है।
+* सर्विस के रूप में सॉफ्टवेयर (SaaS): उपयोगकर्ताओं को आमतौर पर मांग और सदस्यता के आधार पर इंटरनेट पर सॉफ़्टवेयर एप्लिकेशन तक पहुंच प्राप्त होती है। उपयोगकर्ताओं को सॉफ़्टवेयर एप्लिकेशन की होस्टिंग और मैनेजिंग, बुनियादी इंफ्रास्ट्रक्चर या मेंटेनेंस, जैसे सॉफ़्टवेयर अपग्रेड और सुरक्षा पैचिंग के बारे में चिंता करने की आवश्यकता नहीं है।
+
+कुछ सबसे बड़े क्लाउड प्रदाता ऐमज़ॉन वेब सर्विसेस, गूगल क्लाउड प्लेटफॉर्म और माइक्रोसॉफ्ट अज़ूर हैं।
+## डेटा साइंस के लिए क्लाउड क्यों चुनें?
+
+डेवलपर और आईटी पेशेवर कई कारणों से क्लाउड के साथ काम करना चुनते हैं, जिनमें निम्न शामिल हैं:
+
+* नवाचार: आप क्लाउड प्रदाताओं द्वारा बनाई गई नवीन सेवाओं को सीधे अपने ऐप्स में एकीकृत करके अपने एप्लिकेशन को सशक्त बना सकते हैं।
+* लचक: आप केवल उन सेवाओं के लिए भुगतान करते हैं जिनकी आपको आवश्यकता है और आप सेवाओं की एक विस्तृत श्रृंखला से चुन सकते हैं। आप आमतौर पर अपनी उभरती जरूरतों के अनुसार अपनी सेवाओं का भुगतान और अनुकूलन करते हैं।
+* बजट: आपको हार्डवेयर और सॉफ़्टवेयर खरीदने, साइट पर डेटासेंटर स्थापित करने और चलाने के लिए प्रारंभिक निवेश करने की आवश्यकता नहीं है और आप केवल उसी के लिए भुगतान करते हैं जिसका आपने उपयोग किया है।
+* अनुमापकता: आपके संसाधन आपकी परियोजना की ज़रूरतों के अनुसार बड़े हो सकते हैं, जिसका अर्थ है कि आपके ऐप्स किसी भी समय बाहरी कारकों को अपनाकर, कम या ज्यादा कंप्यूटिंग शक्ति, स्टोरेज और बैंडविड्थ का उपयोग कर सकते हैं।
+* उत्पादकता: आप उन कार्यों पर समय बिताने के बजाय, जिन्हें कोई अन्य व्यक्ति प्रबंधित कर सकता है, जैसे डेटासेंटर प्रबंधित करना, अपने व्यवसाय पर ध्यान केंद्रित कर सकते हैं।
+* विश्वसनीयता: क्लाउड कम्प्यूटिंग आपके डेटा का लगातार बैकअप लेने के कई तरीके प्रदान करता है और आप संकट के समय में भी अपने व्यवसाय और सेवाओं को चालू रखने के लिए आपदा वसूली योजनाएँ स्थापित कर सकते हैं।
+* सुरक्षा: आप उन नीतियों, तकनीकों और नियंत्रणों से लाभ उठा सकते हैं जो आपकी प्रोजेक्ट की सुरक्षा को मजबूत करती हैं।
+
+ये कुछ सबसे सामान्य कारण हैं जिनकी वजह से लोग क्लाउड सेवाओं का उपयोग करना चुनते हैं। अब जब हमें इस बात की बेहतर समझ है कि क्लाउड क्या है और इसके मुख्य लाभ क्या हैं, तो आइए डेटा के साथ काम करने वाले डेटा वैज्ञानिकों और डेवलपर्स की नौकरियों पर और अधिक विशेष रूप से देखें, और क्लाउड उन्हें कई चुनौतियों का सामना करने में कैसे मदद कर सकता है:
+
+* बड़ी मात्रा में डेटा स्टोर करना: बड़े सर्वरों को खरीदने, प्रबंधित करने और उनकी सुरक्षा करने के बजाय, आप अज़ूर कॉसमॉस डीबी , अज़ूर एसक्यूएल डेटाबेस और अज़ूर डेटा लेक स्टोरेज जैसे समाधानों के साथ अपने डेटा को सीधे क्लाउड में स्टोर कर सकते हैं।
+* डेटा एकीकरण करना: डेटा एकीकरण डेटा साइंस का एक अनिवार्य हिस्सा है, जो आपको डेटा संग्रह से कार्रवाई करने के लिए संक्रमण करने देता है। क्लाउड में दी जाने वाली डेटा एकीकरण सेवाओं के साथ, आप डेटा फ़ैक्टरी के साथ विभिन्न स्रोतों से डेटा एकत्र, रूपांतरित और एकीकृत कर सकते हैं।
+* डेटा प्रोसेसिंग: बड़ी मात्रा में डेटा को संसाधित करने के लिए बहुत अधिक कंप्यूटिंग शक्ति की आवश्यकता होती है, और हर किसी के पास इसके लिए पर्याप्त शक्तिशाली मशीनों तक पहुंच नहीं होती है, यही वजह है कि बहुत से लोग अपने समाधानों को चलाने और डिप्लॉय करने के लिए क्लाउड की विशाल कंप्यूटिंग शक्ति का सीधे उपयोग करना चुनते हैं।
+* डेटा एनालिटिक्स सेवाओं का उपयोग करना: अज़ूर सिनेप्स एनालिटिक्स, अज़ूर स्ट्रीम एनालिटिक्स और अज़ूर डेटाब्रिक्स जैसी क्लाउड सेवाएं आपके डेटा को कार्रवाई योग्य अंतर्दृष्टि में बदलने में आपकी सहायता करती हैं।
+* मशीन लर्निंग और डेटा इंटेलिजेंस सेवाओं का उपयोग करना: स्क्रैच से शुरू करने के बजाय, आप क्लाउड प्रदाता द्वारा पेश किए गए मशीन लर्निंग एल्गोरिदम का उपयोग अज़ूरएमएल जैसी सेवाओं के साथ कर सकते हैं। आप संज्ञानात्मक सेवाओं का भी उपयोग कर सकते हैं जैसे कि स्पीच-टू-टेक्स्ट, टेक्स्ट-टू-स्पीच, कंप्यूटर दृष्टि और बहुत कुछ।
+
+## क्लाउड में डेटा साइंस के उदाहरण
+
+आइए कुछ परिदृश्यों को देखकर इसे और अधिक मूर्त बनाते हैं।
+
+### रीयल-टाइम सोशल मीडिया भावना विश्लेषण
+हम आमतौर पर मशीन लर्निंग से शुरू होने वाले लोगों द्वारा अध्ययन किए गए परिदृश्य से शुरू करेंगे: वास्तविक समय में सोशल मीडिया की भावना का विश्लेषण।
+
+मान लीजिए कि आप एक समाचार मीडिया वेबसाइट चलाते हैं और आप यह समझने के लिए लाइव डेटा का लाभ उठाना चाहते हैं कि आपके पाठकों की किस सामग्री में रुचि हो सकती है। इसके बारे में अधिक जानने के लिए, आप एक प्रोग्राम बना सकते हैं जो ट्विटर प्रकाशनों से डेटा का रीयल-टाइम भावना विश्लेषण करता है, उन विषयों पर जो आपके पाठकों के लिए प्रासंगिक हैं।
+
+आप जिन प्रमुख संकेतकों को देखेंगे, वे विशिष्ट विषयों (हैशटैग) और भावना पर ट्वीट्स की मात्रा है, जो विश्लेषिकी टूल का उपयोग करके स्थापित किया जाता है जो निर्दिष्ट विषयों के आसपास भावना विश्लेषण करते हैं।
+
+इस प्रोजेक्ट को बनाने के लिए आवश्यक स्टेप्स इस प्रकार हैं:
+
+* स्ट्रीमिंग इनपुट के लिए एक इवेंट हब बनाएं, जो ट्विटर से डेटा एकत्र करेगा
+* ट्विटर क्लाइंट एप्लिकेशन को कॉन्फ़िगर करें और शुरू करें, जो ट्विटर स्ट्रीमिंग एपीआई को कॉल करेगा
+* एक स्ट्रीम एनालिटिक्स जॉब बनाएं
+* जॉब इनपुट और क्वेरी निर्दिष्ट करें
+* आउटपुट सिंक बनाएं और जॉब आउटपुट निर्दिष्ट करें
+* जॉब शुरू करें
+
+पूरी प्रक्रिया देखने के लिए [प्रलेखन](https://docs.microsoft.com/azure/stream-analytics/stream-analytics-twitter-sentiment-analysis-trends?WT.mc_id=academic-40229-cxa&ocid=AID30411099) देखें।
+
+### वैज्ञानिक कागजात विश्लेषण
+आइए इस पाठ्यक्रम के लेखकों में से एक, [दिमित्री सोशनिकोव](http://soshnikov.com) द्वारा बनाई गई परियोजना का एक और उदाहरण लें।
+
+दिमित्री ने एक टूल बनाया जो कोविड पेपर्स का विश्लेषण करता है। इस परियोजना की समीक्षा करके, आप देखेंगे कि आप एक उपकरण कैसे बना सकते हैं जो वैज्ञानिक पत्रों से ज्ञान प्राप्त करता है, अंतर्दृष्टि प्राप्त करता है और शोधकर्ताओं को एक कुशल तरीके से कागजात के बड़े संग्रह के माध्यम से नेविगेट करने में मदद करता है।
+
+आइए इसके लिए उपयोग किए जाने वाले विभिन्न चरणों को देखें:
+* [टेक्स्ट एनालिटिक्स फॉर हेल्थ](https://docs.microsoft.com/azure/cognitive-services/text-analytics/how-tos/text-analytics-for-health?WT.mc_id=academic-40229-cxa&ocid=AID3041109) के साथ जानकारी निकालना और प्री-प्रोसेस करना
+* प्रसंस्करण को समानांतर रखने के लिए [अज़ूरएमएल](https://azure.microsoft.com/services/machine-learning?WT.mc_id=academic-40229-cxa&ocid=AID3041109) का उपयोग करना
+* [कॉसमॉस डीबी](https://azure.microsoft.com/services/cosmos-db?WT.mc_id=academic-40229-cxa&ocid=AID3041109) के साथ जानकारी संग्रहीत करना और क्वेरी करना
+* पावर बीआई का उपयोग करके डेटा अन्वेषण और विज़ुअलाइज़ेशन के लिए एक इंटरैक्टिव डैशबोर्ड बनाना
+
+पूरी प्रक्रिया देखने के लिए [दिमित्री के ब्लॉग](https://soshnikov.com/science/analyzing-medical-papers-with-azure-and-text-analytics-for-health/) पर जाएँ।
+
+जैसा कि आप देख सकते हैं, हम डेटा साइंस का प्रदर्शन करने के लिए कई तरह से क्लाउड सेवाओं का लाभ उठा सकते हैं।
+## पादटिप्पणी
+
+स्त्रोत:
+* https://azure.microsoft.com/overview/what-is-cloud-computing?ocid=AID3041109
+* https://docs.microsoft.com/azure/stream-analytics/stream-analytics-twitter-sentiment-analysis-trends?ocid=AID3041109
+* https://soshnikov.com/science/analyzing-medical-papers-with-azure-and-text-analytics-for-health/
+
+## पोस्ट-लेक्चर क्विज़
+
+[पोस्ट-लेक्चर क्विज़](https://red-water-0103e7a0f.azurestaticapps.net/quiz/33)
+
+## असाइनमेंट
+
+[मार्केट रिसर्च](../assignment.md)
From 5b06f6a5b3b825939bcaef61de572abd6c05afa7 Mon Sep 17 00:00:00 2001
From: Dhruv Krishna Vaid
Date: Wed, 13 Oct 2021 12:53:19 +0530
Subject: [PATCH 109/319] Added assignment translation
---
.../17-Introduction/translations/README.hi.md | 2 +-
.../17-Introduction/translations/assignment.hi.md | 10 ++++++++++
2 files changed, 11 insertions(+), 1 deletion(-)
create mode 100644 5-Data-Science-In-Cloud/17-Introduction/translations/assignment.hi.md
diff --git a/5-Data-Science-In-Cloud/17-Introduction/translations/README.hi.md b/5-Data-Science-In-Cloud/17-Introduction/translations/README.hi.md
index 8b15fd41..e06add10 100644
--- a/5-Data-Science-In-Cloud/17-Introduction/translations/README.hi.md
+++ b/5-Data-Science-In-Cloud/17-Introduction/translations/README.hi.md
@@ -97,4 +97,4 @@
## असाइनमेंट
-[मार्केट रिसर्च](../assignment.md)
+[मार्केट रिसर्च](./assignment.hi.md)
diff --git a/5-Data-Science-In-Cloud/17-Introduction/translations/assignment.hi.md b/5-Data-Science-In-Cloud/17-Introduction/translations/assignment.hi.md
new file mode 100644
index 00000000..8317d126
--- /dev/null
+++ b/5-Data-Science-In-Cloud/17-Introduction/translations/assignment.hi.md
@@ -0,0 +1,10 @@
+# मार्केट रिसर्च
+
+## निर्देश
+
+इस पाठ में आपने सीखा कि कई महत्वपूर्ण क्लाउड प्रदाता हैं। डेटा साइंटिस्ट को प्रत्येक क्या पेशकश कर सकता है, यह जानने के लिए कुछ मार्केट रिसर्च करें। क्या उनके सब्स्क्रिप्शन्स तुलनीय हैं? इनमें से तीन या अधिक क्लाउड प्रदाताओं की पेशकशों का वर्णन करने के लिए एक पेपर लिखें।
+## स्कोर गाइड
+
+उदाहरणात्मक | पर्याप्त | सुधार की जरूरत है
+--- | --- | -- |
+एक पृष्ठ का पेपर तीन क्लाउड प्रदाताओं के डेटा विज्ञान प्रसाद का वर्णन करता है और उनके बीच अंतर करता है। | एक छोटा पेपर प्रस्तुत किया गया है। | विश्लेषण पूरा किए बिना एक पेपर प्रस्तुत किया गया है।
\ No newline at end of file
From b3aabac0fe1a461e012c5be6b3bc1056cdea3891 Mon Sep 17 00:00:00 2001
From: Keshav Sharma
Date: Wed, 13 Oct 2021 04:58:59 -0700
Subject: [PATCH 110/319] Add Pandas in R
---
.../07-python/R(Bonus Lesson)/Notebook.ipynb | 2249 -----------------
.../07-python/R/notebook.ipynb | 2131 ++++++++++++++++
2 files changed, 2131 insertions(+), 2249 deletions(-)
delete mode 100644 2-Working-With-Data/07-python/R(Bonus Lesson)/Notebook.ipynb
create mode 100644 2-Working-With-Data/07-python/R/notebook.ipynb
diff --git a/2-Working-With-Data/07-python/R(Bonus Lesson)/Notebook.ipynb b/2-Working-With-Data/07-python/R(Bonus Lesson)/Notebook.ipynb
deleted file mode 100644
index eec49104..00000000
--- a/2-Working-With-Data/07-python/R(Bonus Lesson)/Notebook.ipynb
+++ /dev/null
@@ -1,2249 +0,0 @@
-{
- "cells": [
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "304296e3",
- "metadata": {},
- "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "\n",
- "Attaching package: 'dplyr'\n",
- "\n",
- "\n",
- "The following objects are masked from 'package:stats':\n",
- "\n",
- " filter, lag\n",
- "\n",
- "\n",
- "The following objects are masked from 'package:base':\n",
- "\n",
- " intersect, setdiff, setequal, union\n",
- "\n",
- "\n",
- "-- \u001b[1mAttaching packages\u001b[22m ------------------------------------------------------------------------------- tidyverse 1.3.1 --\n",
- "\n",
- "\u001b[32mv\u001b[39m \u001b[34mggplot2\u001b[39m 3.3.5 \u001b[32mv\u001b[39m \u001b[34mpurrr \u001b[39m 0.3.4\n",
- "\u001b[32mv\u001b[39m \u001b[34mtibble \u001b[39m 3.1.5 \u001b[32mv\u001b[39m \u001b[34mstringr\u001b[39m 1.4.0\n",
- "\u001b[32mv\u001b[39m \u001b[34mtidyr \u001b[39m 1.1.4 \u001b[32mv\u001b[39m \u001b[34mforcats\u001b[39m 0.5.1\n",
- "\u001b[32mv\u001b[39m \u001b[34mreadr \u001b[39m 2.0.2 \n",
- "\n",
- "-- \u001b[1mConflicts\u001b[22m ---------------------------------------------------------------------------------- tidyverse_conflicts() --\n",
- "\u001b[31mx\u001b[39m \u001b[34mdplyr\u001b[39m::\u001b[32mfilter()\u001b[39m masks \u001b[34mstats\u001b[39m::filter()\n",
- "\u001b[31mx\u001b[39m \u001b[34mdplyr\u001b[39m::\u001b[32mlag()\u001b[39m masks \u001b[34mstats\u001b[39m::lag()\n",
- "\n",
- "\n",
- "Attaching package: 'lubridate'\n",
- "\n",
- "\n",
- "The following objects are masked from 'package:base':\n",
- "\n",
- " date, intersect, setdiff, union\n",
- "\n",
- "\n",
- "\n",
- "Attaching package: 'zoo'\n",
- "\n",
- "\n",
- "The following objects are masked from 'package:base':\n",
- "\n",
- " as.Date, as.Date.numeric\n",
- "\n",
- "\n",
- "\n",
- "Attaching package: 'xts'\n",
- "\n",
- "\n",
- "The following objects are masked from 'package:dplyr':\n",
- "\n",
- " first, last\n",
- "\n",
- "\n"
- ]
- }
- ],
- "source": [
- "library(dplyr)\n",
- "library(tidyverse)\n",
- "library('lubridate')\n",
- "library('zoo')\n",
- "library('xts')"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "d786e051",
- "metadata": {},
- "source": [
- "## Series"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "f659f553",
- "metadata": {},
- "outputs": [],
- "source": [
- "a<- 1:9"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "9acc193d",
- "metadata": {},
- "outputs": [],
- "source": [
- "b = c(\"I\",\"like\",\"to\",\"use\",\"Python\",\"and\",\"Pandas\",\"very\",\"much\")"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "id": "f577ec14",
- "metadata": {},
- "outputs": [],
- "source": [
- "a1 = length(a)\n",
- "b1 = length(b)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "id": "31e069a0",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- " a\n",
- "1 1\n",
- "2 2\n",
- "3 3\n",
- "4 4\n",
- "5 5\n",
- "6 6\n",
- "7 7\n",
- "8 8\n",
- "9 9\n"
- ]
- }
- ],
- "source": [
- "a = data.frame(a,row.names = c(1:a1))\n",
- "print(a)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "id": "29ce166e",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- " b\n",
- "1 I\n",
- "2 like\n",
- "3 to\n",
- "4 use\n",
- "5 Python\n",
- "6 and\n",
- "7 Pandas\n",
- "8 very\n",
- "9 much\n"
- ]
- }
- ],
- "source": [
- "b = data.frame(b,row.names = c(1:b1))\n",
- "print(b)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "id": "eeb683c7",
- "metadata": {},
- "outputs": [],
- "source": [
- "library('ggplot2')"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 39,
- "id": "e7788ca1",
- "metadata": {},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "[1] \"length of index is 366\"\n"
- ]
- },
- {
- "data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAABaAAAALQCAMAAABR+ye1AAAAOVBMVEUAAAAzMzNNTU1oaGh8\nfHyMjIyampqnp6eyIiKysrK9vb3Hx8fQ0NDZ2dnh4eHp6enr6+vw8PD///+vl18TAAAACXBI\nWXMAABJ0AAASdAHeZh94AAAgAElEQVR4nOy9i4L0rI212/Nm9oTkzyRM3//F7q+7ykZaWsKY\n8gHbInmrfABJCOkpClP9fX1HiRIlSpQhy9fZBkSJEiVKFF4C0FGiRIkyaAlAR4kSJcqgJQAd\nJUqUKIOWAHSUKFGiDFoC0FGiRIkyaAlAR4kSJcqgJQAdJUqUKIOWXkDnj8qHzXctYVtfCds6\ny9DGhXX95QPjAtB+Cdv6StjWWYY2LqzrLwHoXUrY1lfCts4ytHFhXX8JQO9Swra+ErZ1lqGN\nC+v6SwB6lxK29ZWwrbMMbVxY118C0LuUsK2vhG2dZWjjwrr+EoDepYRtfSVs6yxDGxfW9ZcA\n9C4lbOsrYVtnGdq4sK6/BKB3KWFbXwnbOsvQxoV1/SUAvUsJ2/pK2NZZhjYurOsvAehdStjW\nV8K2zjK0cWFdfwlA71LCtr4StnWWoY0L6/pLAHqXErb1lbCtswxtXFjXXwLQu5Swra+EbZ1l\naOPCuv4SgN6lhG19JWzrLEMbF9b1lwD0LiVs6ythW2cZ2riwrr8EoHcpYVtfCds6y9DGhXX9\nJQC9Swnb+krY1lmGNi6s6y8B6F1K2NZXwrbOMrRxYV1/CUDvUsK2vhK2dZahjQvr+ksAepcS\ntvWVsK2zDG1cWNdfAtC7lLCtr4RtnWVo48K6/hKA3qWEbX0lbOssQxsX1vWXAPQuJWzrK2Fb\nZxnauLCuvxwE6D8/5f0egD63hG19ZWTbxjYurOsvRwFavBVCn2b67iVs6ythW2cZ2riwrr8E\noHcpYVtfCds6y9DG7WNdShsJGtp3xwD6j3wPQJ9bwra+MrJtYxu3i3UpbUXooX13EKCnJegZ\n0P/1U5aa3a2ksw24ZUnh1geW9FvOtuIipW0G/edRM2jy6b7ZR/5nZWy/rS7bzaQWysh+G9u4\nHaxL77KFrKF9d+A2u0cBmsTOZgH1Yen2W4P1n3ZwvW3b5elSGTnexjZuD+sC0MtNA9BuIcHz\ncURtRaFeOQ3Wf5wyq8d0y5nUQtkg3vYzdORk2G8NehtvDu27Ax8SPgjQhBrpU5RsFY69chqs\n/5yVvTPoawB6R0sHToa82y6OeEhYb7oO0Poh4a0BTVn8OZ83CcheOQ2fLxvMZjvXoK+xxLHn\nZ8m4yfBT9rEuttktNG0H9PwLwof8kpBm4ud83iC5++U08/lgQB/27PXTeNvCP24ZOBlyAPqD\nEn+LY5dC8/BzPn8ckp/IWW53xhJH3i5RF0rMoLtLALq7BKB3Kd80Dz95QrhRbn8iZ7ndxzbe\nGtCxBr1tCUAvNB0K0CdsYKtkG90H/ZGNW+X2opzK3RG32eULATp2cWxaAtALTUcC9FELkVql\nq3R7QG/2pweWd2L49xqkrzVHl5sDej9Th4ZMALq/3ATQhz3KB5We0s2XOD5u3Sin6sgANGpe\nG3QB6A1LAHqh6TiA3vPxi1Pqz+SvCui6IwPQoHh10AWgNywB6IWmwwB6z/1LbrnjDHrBkQFo\nrXd91D0Q0D8OWrCu96F1XzNTxvVdvgmgD/yDDKJU16CvCegFRwagldqOecHzAP3roLp1vakb\ngF5oOg6gj/uTZrLUdnFcFNB1Rwagtd6YQS+Wl4uq1vVvzO8zyZRRffdbbgLo7f7zCmuKr/Oy\ngK46MgANimMNul5avmT0L08GoBeajgToM/ZB3xLQn+2DfhagYxfHYlmcQX/w/CgAvdA0AO3e\nCUD3lYsBerXqxwF6eQ26//lRAHqhaQDavROA7isB6M5yImQWf/m0YF3386MA9ELTALR755aA\nbsmjAHS1jAvo7oc4i3xd3GbX/bdh+pqZEoAOQC/V/0zbpnIC0E65NaA1Zlcgc3mFIvZBf1IC\n0B+UAPRa4UslAN1ZPswljdkViw4t2zQC0B+UAPQHpbZjOADdUwLQneWjXILdcKse28UMet8S\ngP6g3BHQtd+pjAnoo7bA3xfQGrMrN759vgYdgK6UAPTaIsLxooCuz3cC0E65M6AVZlfuTF78\n7+zsC+jPhz8AfSdAq0j2awWgu0oAurNsuotj5c7kUwG9wR95CEDfCNBqchGAtnVWm6NKALpB\nBhPyOWSk1E3/AtSugO7/mUspAej7ABqepvj1AtBdJQC9LIICaVtAr7PzPEC3/K2P5RKAvg+g\nYwa9JHy1OaoEoBclcCA9E9Db/CX4APSNAN28Bs1uBqCXSgB6SYBDpIcCOpY4ak0fCejWXRwB\n6J4SgF6UEDNo1SweEvpNnwnoPAqgu/+EQQC6q4wB6FiDhmaxzc5tGoCu1dkb0P1/BGwgQKPI\nywF6bX83sOOAXRxXAvTnPg1A3xTQ9f841M6A/uDP6I4DaNOFAHSnkAB0fwlAPw7QP/8JiX0B\n/cHj63EAbbtwe0BvMoUm1wLQ/SUAHYCGlp/pfSu//AyafMgEoFuEkGsB6P4SgL44oAGEawDt\ntOwzw1h17TVo9iuDAHSLEHItAN1fAtDXBjRCZAWgvZZ9dhi7OuUMAujNljgO+vYUgG5pSW8H\noFeXKTEC0EvFTPPaAe227DNk8UKjnN0B3Ty33+YhYQA6AP1JGRHQMzkC0AvFfhFvBjS2/DA7\nLwPoFavjm2yzC0AHoD8pAwK6kCMAvVS2m0Hb+p/90bBBAb1qf0kAen0JQKtmNwS0mBUGoBfL\nZmvQpv6653wXATR79lerr0oAukUIufZIQKdPWpcyHqBjBr2qbLWLA+tXOUam24sX2krMoPtK\nALqlJb19NKDXuXlAQMca9KqS+FnLPmgf0PWZJrlxFUCvWYMOQHeUAPS7mQfolTtQRwR07OJY\nU3YBdHUGzW4dAuga7bbfxRGA7il3A/SH+0VtXqyZIeRBAT11KwDdUHYCtP9JTyfX1wH0CssC\n0OsLExKAFpdXEjoAfQCgd/052VpAp9wG6IqEAPQKEQHoALS6HoD+bRqAdus3AtqP7aYljr4+\nB6D7SgC6pSW9fSygVz0EyQHonQCNj9w+Eragip95Ov+Kjo8BTdZxrwToWIPm5ZKAXjJ6OECv\neQiSA9D7ABo3J48D6Pfn94eAbljROBjQTSB8CqDT2o6qZp+VgwG9OB0dD9DrIiIAvQOgzc/7\nhgH0tAIWgG4yRp9eBdBT8D0A0MsLugHoPcq1AW2e1A4D6Hl7cwC6yRh9ehFAzwN8BqBpf3cD\ndMOeiAD0HuXagI4ZtKt/sVwS0LXfzxwNaPERfH9Axwz60JLw6KqA/mgNenVa8bPP16BrhuCt\nda0r5YqArjEiZtA51qDXmaTLQIAWrr46oD/YxbFuE05eC2i5i2PhQ2REQLvBfSKgq7O4W65B\nVwPjAECvmfycC2i+0LhGzjiAlmF+eUBXImqh3cptkqsBncs+6C0B3YDspnIeoInbmwBdXwe9\n4y6OWoAeDugGAi/cDkA3FRXmTwV0qmf7sqYAdFV4RQVz+0Vn0Ou/KuQ19asBOhqgGybYAeiW\nkgLQ76oxg65p3QvQ9IOxDdCjrUHnfQFdn0LsA+jaiuF5gK7H6s0Arcn0XEDvvgYdgGYq+FeX\nRkAPtYvjrfX3dQ9AL3zJexCg65l6O0Cr/o4A6M9KUifJq8ZarqmMmsqZKyaVF1nF1q8ZgreS\nvbKyG64grdXvVYO+ZLtdbv0Wo6+t+BWb7Nq8vHSujqSm+tRTQvMe/U2VdKrrW+rTorV+T2tu\nqMfaks5Bi+jMll242Ax65Qfs6hn0a8ow7gy6vkXgwxm0P6v5YA3aFdls1wZlzDXod707zqCX\nnhbdbwYtDR9hBr1VX3IAekU5AtC8au8ujiK4264NyoGAbt/FMQHsjoBeelp0S0APtcTxYV/U\nSQC6tZwHaHI1AO1UrN0S06wJYbcE9OPWoAPQrOna6gHomvDDAV2xeNsyHqDF08R7AnqsXRy9\n2wHXlAC0bbq2egC6JjwAXSsxg14pYCBAr9781VMC0Lbp2uoB6JrwAHStbA3oe69BL6g+FtDr\nfz7RUwLQtmm3phsCOhHFLe3ApgC0UzYH9J13cSypPhTQHT9A7ikBaNu0W1MAmlQKQNfK9oDO\nYwGab9S5AaBzAHp1X9RJALq1XBTQnvYA9E8ZBNDOVvc7AJr2bfMSgLZNuzUFoEmlAHSt3BzQ\n3p9bOR7Qa8alfRfH/oEmtui83gLQ/ZoC0KTSwwC9Uve9Ae3+uZV7APqIOAtA26bdmh4H6GW/\nvWUPDejeeVAAerHlvWfQAei1fZEnzwN079Tx4YDuXkkMQC+3HGQNet2wBKAD0Lz+0o7keuUA\ndLNZsnr3s54HAZqqaGt53C6OKdYC0E4JQK8FtEXsbQHNq/YD2le+zg0fbGd9AqATfYawAtB8\niAPQ7SoC0Ni0W1MA2pM9LqBjBu1J/R36t2+2BHT6DNC1WA1AV8uggG7JgY8AXVwYgHZk7wNo\nrn2tG26wBp11VP2UTQA9f7sIQDsady/3BfR02pR/mwC6tioagN4c0N7ArnbD9Xdx7ATosksu\nAC1lscO9yu0B3fYNdldA/1wdGNDp+5KAdgd2vRsC0ExqzKAdA4qoQwBd8PJbbgboxmdAn2RV\nK6BfsLsPoGvsFpUsoKsRvhLQNBkD0Lddg65/2zsA0CroAtBr+yJP9ptBu4D2p4kB6BLV1Qhf\nBHSS4xoz6LwroPfYxdEAaF/B2YDWYReAXtsXeTL1a/s16AD0wi1ZCQG9EOFLgE4FGVutQd8L\n0GlTQPNuPRbQ8Nv1APTavsiTQoSVTZdrdwNaB8oTAb0U4QuAnhsnNrBmtt5WxgM0/fmdKyMA\nLe/FDNorgwK6xY0B6IrS3WfQ0p91QBe8M+0F3UtmoZFdZT9Ak9WbAPTSrUMAveca9MKC3ZMB\nvXKj1SeAVrn0REDbNWhFoxUzaNP/Gd0XBzRbXw9AL90qsWarXGEXx9Ij7wB0u6IAdP2WrGQA\nnQHQekq9AGixBo3aJ6w1uuHzPNsL0LDQKcVzGQFoeW9nQOsE3q44D70D0L81AtAVpVsDOitA\nw6K0D+h5+wbXntYBeoNvqsfOoKtjdDSgq50MQK8v9EN5UiFHIgDdougxgJ4nq869FYBWU2gl\no20GPVdyAD3LsW5gE5MNnvUcuwZ9Q0B7Yh4IaHdff4KRCEC3KApA5+0ArSezHqBL+HqAnuQY\nN5C4h2n7Ui94IYBePQTtuzguCmiCnAnQfNHVamu8dW1A+0scOu4D0C2KAtB5FaBBjNgC+Xpl\n7H6V7/m6ejzItc+BDDRxCa2sWF12BXRjRav0MoB2iGS1Nd66OKC9h4QQ9wHoFkUB6Lwa0Oqh\nHKBxGdAtM2gOSee74z5r0CMA+nUyPKC97/RWW+OtqwN66YFxALpd0TmAXqp8LKDlp3pF+ERS\nDcT1gF5egxY2AU34d0f5gdFVNgC0U5/Ewe0AHTNoI5iGQsygcwC6A9DqU70i/E1SWFLoAPR8\nay2gvbXOAHTdtL0BHWvQKNgJhViDHgXQ9fTeEdBuOxfQb+C2Ato8lNNkbgK0rrwC0N5WCXLk\nVmZlL0BTx94Q0LGLAwQ704jYxXEyoOXDqrEArb5bwQ3/G6o10sygjwW0Pyz2dkuniG0bApp/\n8vlWgScuBGg37ALQSkUA+lRAs5lli5H7A1o/nbC32mfQGnyzTL1qIa9M5WhAt33sWNu2A7Tz\nyReAXroVgF4qNwN0JRJWAXrOFw5okY8UOq6RuwM61Si8cg1amZ+ms+EA3frFwNi2GaA9CwLQ\nS7eOB/TKeR2IIRuLHG0BaGeDS1v1zwAtE/JzQPcQ2gP0vIzBZa7cxaGBOD2YHg3Q3vyVlphB\nV1t6VQPQr7ZmkAPQfksypJU8HXcGvSWg6zPoRnUc0FOfxwN0ddyhxBp0taVXtQLoRLWpGiKt\n7ShfB9DsYzgA7bekWeJm6qaA1mvQIwG6tgbdqo4BuszMBwR07ZsTlP12cXhd82TcBNDlm2QD\noMkM9DqAZl9OuY0B6N8a9qP8OECrXRxDAdrfxdGs7uAZtPUxyKcXk3N9oey2D3oMQMtROwTQ\nc1RUnFiSjcxArwNoNgFcBHRqMK5azgb0t/N51AHo+tP8VYBOi4Auk4KqsScAGsm6Xt3Ba9C3\nBXR1UjkaoJU5zYAu36uWAU0fj1wJ0GQpLQDtF2cNuq16HdBvOQMB2uT+CYDeaxdHAHpUQOsh\nrgC67sS7zKCzXUoLQPuF7+JorF4F9BRH4wDaBkYfoGvslrU4oKeJkJAl6peyHtDsRgB6fdkW\n0CLqfm4TQJcpcROgr70GreVM5wFor7jPT1uq1wA9fxMbBtBk4nE1QM/SAtCT4tEBLaOOA1os\nWbQB+tq7OHLrEAegf2usBHSmmlhgjTaDZkt3AejlyqwEoKstRVUVdRTQqRnQfl4HoBdKAJqF\nxlBr0PTZSgB6uTIrAehqy1I1LQI6XQfQou8B6JX6RwT0YLs42LOVb3/v70UBzTxDTOO3rwho\n68GRAK2j7sMljrsC2ul1APq3xp6Aftk23z8X0OzZyre/pTAAXSkc0KtQ4IfM0mATxarmZoBm\n0Otag5a3nYeEy59yy4CmrQPQv00D0KsAPZ8cCWjybIUse0iRDwN0e9YFoKstVdWkDeTb7Epf\nAtCyVgD6SYC2cV0h9OUALe7alt7FADQVuyWgwcBvfXOuEoBmtQLQRwFaZ97BgLapMl2/4Aya\na78CoJX8ALRsFoDmtQLQe+7ieNuWRQBlqH8IoGcGm7i+4hr0ZQGtXR2Als3WAdrZBx2AdstI\ngK590toyBqDryCM8XgNo8ZAcm11xF8dwgK4svOhqitABaNksAM1rBaDvD2ixjkEA7TfzTQpA\ndwAal5MC0LJZAJrXOgnQf35ffkoAurztBOj6DPomgC4fP6MC+rozaB3yxwCau+aegPYi5zxA\nv7j8R13r1/2rPwBdM3pegw5AG6Xy8q6Avuwa9AmAZgtv3YBeQ+gTAJ3ImM+1TgH0n+8nAHqO\nsfMBXVJlIEBrB2TbqdsBeu9dHJPE6wOaPbqWnrkToF9dHQnQf15s1ny+HaDTSIBWgJOXA9Dq\n8s6AhtgJQJfrCtB08+dNAf3u6oiAnpeg/+unLDarl6QOE7nsNWRV/GZJ3ZvPErZ5+R0qyvP3\nSWlf0ZrwFrlQ7+ukZkWzikl1a0utWbzq+HRJOeCbdAo1vu2n2tO3c4fKTPw2jYXlkma9S+11\n7LDqRErFL7+1Z6u5xI7y1jj3S5ldVaFsnarSIU7gi++ibHqciqKLJGPEHGvWus28otR2hsrU\nWsihnZ1rpQ206bII6D/i3/dNHxKqX+g1zKDp3M9Tyi8cOIPGqa8rIGbQrt7NZtBirnmzGbQR\nHTPofo3tgNZcvgmgTVqtA/S0Ca5NKb8QgOZePALQc6+OB7RcDRgO0HSICaCTfHl91Fo1NwX0\ncGvQf/7I7XW7A3oZI7sAWk4CFgE91f0Q0LU4fACgHS/eG9DpqoCGiuJFqAbRVUCX/9NbDeVX\naOzimLl8zBLHToCeU8ID9IqHhHOSPRPQSdcvZR2gPS/eG9BqNWAwQMsYF7ebAM1BuyugX36M\nfdAK0GInR7/uX/3HAnr+4YcP6CJ4yBn0W/t9AP3MGXQWE4EBAJ3AtQMAumVYXiJ/PTkkoCdn\nnfBLwmsCep621AA9XzhqDXoNoKcPmBsB+plr0NqB2wB6knMsoKeunwPoeTNFvRI77CjDA5qU\nft2/+o8EdFn42wjQ2+ziaAH0rO8NszsBun0XR0r89iUBLYI9AN0N6JhBDwtoL00q1beeQS8B\nditAl8ybPmAOArSCyG6AxiPXzAB0tTwS0COvQQeg7cV69ZY16HEBnQLQ/HYAWsl5FqBP28UR\ngK6WLkA37OLQgMa9NIcDWuVaOnqJYyNA00UTWYHcCUCvLhcANBnljwGdDwM0NA5A10ofoBUx\nLgfo8g3gFEC/rx8EaDa8AehaCUC7FVaApSbHATT8PCcA/ap9F0DX+oq5Nn0DOBrQ0l0B6AC0\nvP94QIst7ep6AJpcXKo+HqDZz2PLfcy1ADSpHYBWcgLQpsKegJY/Cp0uB6BvA+iEn7/6/rUB\nrZb8A9BaXgD6FoDWP9vPAei59i0AjaOLEgLQ/FoAmkodG9DWdccDulXogpxJb8qPnEE3uPEm\ngI4ZtD0C+fSaRmYAWsoZD9Dl5g0BjWvQAWi/ygUBfdwadFJvfnkuoP0vMvr4LoCerwWgGxVR\nQMMujgC0X6WMA8bj5oAW4j4D9KpdHK2ArjgnAK3bic66H5UbAxqG6CxA27+np08C0L6cSS/p\ndQDarzK3N7k2MKBrfb0doKkD6I3DAe0vNuEE6R6AFv0NQNdae40D0JXip0meQg+8RgCdaBRo\nQGdV5TRAa7xeHdDwyJvZdTigK49rlwHNxnEoQNcfTwegK41LTASgtwE02xrBAD39ZTsqJAAt\nK28LaBXz2RyBfHpte0B/NIO+HqBzzKDbhIuoCEBvA+jGGfS7UgDa1NsZ0DrmwTxo6l3bAdD+\nGnQvoD27xgD0p2vQMjZuC2g11yOAJiMfgIbK+urrvWEN2t2CPCSgU1L2XxnQEPNgHjT1rul9\nTdsA2n1ce1NAf7aLQ33G3hbQOQA9lw0BXdnFUTx3pRm0XC3MFwd0HhbQLlAJoPELmm09PqBl\nv9cCWg+hbaMXsa4LaDnZC0BvBmhzU3FlzrGrrEEXpF0U0DqDt1jiOBnQ5hGHbb0G0MuQaShH\nAlo/YbRtYIQvDGgx2QtAHwvow3dx0ABdAeii/dqAVjGfzRHIp9dOBjQujo0E6DRbsx+g0wKg\n8TvSlQHtNg5AV8omgKZRMJ8fAGgXpnMFTId7AFrDidp1KKBNnyq2pCw/MMvFNkAn69srAhoJ\njMML/A5A95YA9CUAvfkatIeiYnUA2rElzeOhY7cJ0OprkGx8NUDDGga2iRm0rB+AFlfNkZZ4\nUUBvvIuDbV4BqzcHtPx8gEdIzOKxAd25Bq2eS58BaLO3iZjaBmg9hKbNjdagA9DvEoAmEmYF\nmm+fAZpvL9RWbwNowYobAfrlwwwXlwGtl0ZOADQOew+gISYcQN9nF0cAeiojAjqB/3sBzR6s\nZ2mFJ2EPQJsVVGb12IBWOk8BtJeU6pqJrXNn0GbYdwS0rhaA7iwB6Cqg1ZM5pXYdoIUcO5aH\nAzoHoEW7IwHdvQZde2aACl1A2w/mSwG6ErAB6GcCWu9tU2pXAVrK2RvQqp535/JLHEMCGoHC\nRqFvF0f1mQEqvOkM2vfBzoB2Iz0AXSlHAHqecXwIaCXnNEDrhDnoIeHGgOYBciVAe76tQkaC\ntTpwC4A+cg16a0BXvvQFoB8K6I1m0OWb5ZzblnueBAD0pPZTQFd1BqBR77mAVruKq199aoDW\nkSfN3hDQSYnR3fsI0LXnJkuATk5HG0sA+l22ALTwJdbvAPQma9AitCbdegtSAJpZFICW+iWf\na2uxS4C2JlpTtwU05EkDoM23xvfVAPStAG0ioQfQ7bs4aOzgk5l3+pRIC0AHoBsAnTWfK2ux\nJwG6+PFzQLsP1LuXOALQH+gfHNBwyQU0jx45g57VIbF7Ae00C0DXykUBPVsw6Ax6Q0C7D9T9\n5yYB6KsAWj4McJJlH0A7ExvxZEZE6DmAXrGscgtA82weANCGfy2AVoR262wNaDXExwDaf6Be\nGZ4cgA5AMzlzVDpfPeddHDpC1yxxAA+6AW10+koPAbS1c1tAOyC7PKD7d3G0AtoEWz4S0JUZ\ndAA6AJ3l+VYzaJ0dv/86ZrOfAdrO2j8EdGXn0tGAZqHgLQVUAe1G3UiArkTLLQDtrkEHoAPQ\n87WXbfYSyCmDUVmDFq00D7K0wikbAVpO8Z8AaPdhmgPohH2q2DIFUAAa1aom3YB2dnFo4aYX\nAegAtJUjA68GlNMBHTNoa5vo6PzRFYAmwfYWuR+g8fpqQKcAdACayXHHCOzZAtAqAc5fg94b\n0PzBHw0QHgor1qDLsmcA+oKA/hUQgA5AWzkXArRZ9745oFfs4hAbB24PaJsWxtQBAG3ypFhB\nSgA6AO3IuRKgUefdAc3dc90ZtOEiq6NHSFY9GdBzjQD0uhKANmIC0DJHjaE3BHTfGnTGiwHo\n5wKax0IAej6DtGeReA9Av9/2BbR2kDb0joDu2sWR8aLtAu1DuSzfEhqHzQwXWR09QipDZEfB\nbF235EQAGtQGoB1PkKI8XTYolJBykiUAjcnHKiY0zQTdvQBt+gSiMP8SVmYDye26DqBhiAPQ\naICIHB4LAej3cbLfUe8EaKXq9XYaoJM2BA5/SwC6ZtcRgJZvcM+YaEwVPlFDvBrQyidJ/EOT\n8XoAupQ7ALr8KmFgQOsINdzzyn0BbfrekIwuoHllUjxAq3yyojD/ElZmA8lhEIA2JuN1H9B+\nlgWgxwX0iTNohU5mpFZjAo/ltZWgtLzeAtC6Ia1MypUB7Y4sKNRvcM+YaEwVQaqG2AW09KMB\ntLSaDFRSrWVLklQOXQPQowO6Zw26eBc1zecBaGJo0obA4W9pXOJIyhM2GSlzWYCwQeSXrgxo\n+KERrSOMQ69MV4yJxlQRpGqIuwCtrCYDlVRr1dL21aFrAHp4QHfs4ijeRU3zeQCaGJq0IXD4\nW5oALf5mGdjHhcLFZwLaDTSlMMkjvGdMRFMlHdUQ9wBaW00GKqnWxATSP9vttCegfSwVEwPQ\npoCnJ6eUoHHGtgvQMhYV4BQ6mZFysEAvtOfl3oBWLXHAKHNZgLBB5JfqgPaT93xAmz92RSsJ\n49Ar0xVNGMst8bn5OaDBamJ9Uq2lWSSpnAEKQAegRSwqwCl0MiPlYIFeaM/LfoD2tUpWGUOd\n7JWlAdDzz6v18GfTEi1jAeJQi/bv4oC2SojCJI/wniaM4Zb83DxvBi2iTNel84oAdABaxKIG\nZbnLDJa1jF5oz8tNAc1m0Bj8Xiqa22wQ+aUhAI2uKBFRA7RezfUqFeMwKKd72snILfW5ueUa\ntE47MBmvFxchD+mfwApA3wrQIox5bs/nAWhiKDiAiepcg8bgd+Bmb7NBNJeSsU2jQXbaikIz\nExrIBpLL6ykaxvkAACAASURBVAX02l0cEJTTPe1kw62NZ9DlNSW0qIjDAXQBjV+6hPoAdABa\nyglAY36WsmYXhx1DRyhcXAvoKa8vDGh/ZEGhGiW8B14xpm66Bj1b7i7QJNVayCMOJctiRX0A\neqeS1GGaD3+PUiItTOUiJpVr0DSpKmlqMR9wi8w10TCV68lvltT9hJVFe16StJxIdE21N1Gn\nqzV9g1JpaNKGLIpKcztlWCJ9SKQlFQe32SCi6J/iiDNdI6LQzIQGsoGk8qwraoOJSus1kzAO\npU73tJOTvjtf84dYJam6ixmbimPfhVifVGshjzg0fSc6jpX4YzYvVK63BoO+bbdlLCxatqYM\nOIOmK062chaf1gmumOqfzaBFQzEfYFODjLVSBr3Qnpf7zqCt12B24sw+7W02iPpSmXm5M2jp\naSsKzUxoIBtIKi/peev8ZoyznXFHFhSqUcJ7MCclpmKMT4fnz6AdIpw/g/ZiYYgZdL/uX/08\nz9I0Gi6htaeHBbRVpSqzvAYJCbS83gLQPHCouBSAnu9pJzNuYYxPhx8Aeqs1aJPWRX0A+nBA\nOytOpba4YwCN+S2Iw6gJVd+22WsYvAqdrpFKlarM8hokJNDyegtA88Dh4kogBaBVA8YtjPHp\n8BNA560A7UZFAPpaM+gAtHdD+sHTCUqloUkbkkXdqYwH6PLVOACtGrAxxBifDj8CNGSMNgsH\nMABdysiA/mQNOgDt3ZB+8HSCUmlo0oZkUXcaBgpo2CVmCWKy0UtFc5sNohH3PgtAqwaMWxjj\n0+GHgFY/7dJm4QAGoEsZGtDLv0eez5A6JwCaM8KoUmHWkHOalYcBGsRIrEL2lpGw67wS0EQS\n5mU2LdEyNr5sEM2l91kXoG3+JXaRqKSjsBWgzfQlCePQvdM97WTGLYzx6fAb7NB3bRbgDJo+\nJUyqtZBHHEqaF/UB6DMAXSWXvEkAbXwpqhhqQtW3bfYaBq9EJ2eEUaXCDHOOSEhay/ttREBP\nS1KNgCZ5mU1LtIykip+35OxAQLPYerVko5/QONPMDIn5gpmEceiW6Z42kaULxvh0+Bmg5/VK\nYjIJBParSX+gUwD6noCmQz6fB6BtI5H66Or5oW4AWtazVm0DaPIIPQnj0C3TPd2ApQvG+HTY\nB2htrXFxUq3FRcehblQEoAPQupFoi6GJWhWklE7AoCNBanm/DQjoq8+g0RzPlpmBPCmxnrVq\nE0Czn9UlYRy6ZbqnTWTpgjE+Hc6ANiHWAOhbzKAJVCZ38VgIQM9nSB3iS1GFwlLa8bLNXsPg\nlejkjFBqjE4MdS5Banm/jQjolWvQLC9NS7QMAyTnosF2w56tAHRKRFkxezNAl9eLzKBNiLUA\net0adAB6KgFoo2M+D0ATQ5M2JM+d/D38PbgmoNEcgT8G6MRsN+lLR+FTQM/v69eg8Ts6SxeM\n8enwQ0BT5habwJkuoKlL0+eAbtkvRqAiwoHYGYCez5A6xJeiCoWltONlm72GwSvRyRmh1Bid\nGOpcgtTyftsA0K5WmcjG0KQNAUm/B9cENMyg5QR1TED37OI4E9AZWhXFJBCOBnTTLy4IVEQ4\nEDsD0PMZZhLxpahCYSnteNlmr2HwSnRyRig1RieGOpcgtbzfAtBaJ3e+PfP/mp1+gqWWeCHW\nsvSD6RXUI1ZtBWgWuWl+A7dMF7WJYpSVVBjifHdAt/1mmUBFhAOx88GAhjE9H9AOULQaoxND\nnQjIAWhycWtAvzNU4kXOoFWsZekH0yuox6yyrni9fgho4dNk3DJd1CYm3VKfPAfQ1b8qkWSH\n+ZgHoE2BMQ1Ao6kcDcpmT6tMZGNo0oaApN+DCwKa7IlQa9Aq1rL0g+kV1GNWWVe8XgPQ5aLj\nUO7SdMoM2kt+ez0AHYBGUzkalM2eVpXICd6SNgQk/R4cC2iHA4QEYBuABWbQWe3iULGWpR9M\nr/QlyifritfrsID+bXlnQFfWoKXnvTEPQJsCY3oxQNOdsH43hS8GAjQL29+DQwFNd9dacdNZ\n6xq0EqBvzGNCbDeXKJ+sK16vQwA6iReVTecBWm3QY4GddtzFIT3Pxtwmv70egH4CoOVJALrk\nLvl9mhU3na35oYrsVdKXA9BS+A6Alr7RP3FhgZ022AcdM2iuX4WQzT+vHYzphoAupwFoI240\nQM9Lx9z59qyyD5qYUA6SvvwRoOXatmr6ZEBbD89H+kOYBXbdd3Ks/LAPQHP9KoQw/5bIVU6n\n6t5AaK4oPJAwf9tmL4pGoq0dI61VQSoVc2lek24mcxKAnuRcD9Bqd4hqegSgtYlJt9RmdwFa\nXPUcAGcU0GL/DHwIs8Cu+0722g/7ADTXr5Pb5J/XTmdWqe4NhOaKwgMJ88k2cxED0RsjrVVB\n6vcfx5/XzWROrgdosvlXj6EZMOpPDJCsZlfYDXt2PqD1/mrVNAA9yxMeihn0AIDGfL0WoEWY\nGVlKzfxP02qpm8mcXA7Qas9a1t5AA2VLtAwDJOeNHhJSE8pB0pc/B3TRXJoGoKeWykOxBn06\noOcPSMy/BXKVs6Te7UBorig8kDCfbDMXMRDlyHFGKDXzP5UNi91M5mRPQOuIS/CmHYmSfg8I\noPWvPrL2BhooW6JlGCAuBwgJwDYFFm6C1ZuFf0hDc8nyaYgZNCwJ6tQrLyqbVgLacpcZnVjF\n4qM0nZcaLCzqvtNx44a9k07i+xkdc50u/PrVAV2WmDD/FshVznTA2YHQXFF4IGE+2WYuYiDK\nkeOMKIZKvSobFruZzMk1AQ1ppsfQDBjtAQZIxYOGBGCbAgs3werNwj+koblE+ASfVWIWexig\ny1jo0dBmHwRo+xGL3zGElTywm31XC3t6XT3hgGiRvrOxIK9fHNDmS5+OHFq0rwLQxlSKN/Fe\niVSpE8SNNoOGhRPTD3M2AKC10dIvRwFapJseDW22B2jj880Brdag5zcPCRsBmsaQjFyMFuk7\nGwvy+rUBbVfldOTQon1V4tEbCM0VhQcS5pNt5iIGomQUZ0QxVOpV2bDYTUXCl94PAC0dsagT\nxJ27Bi00vZlwKqBtslpZVT6lXFjpQSZpf5UrVrjwaTLqU/m/XEEw0kU7B9DwUTtXNG2WHSCz\nQTtTq0jmVctwfCd1yJ44tYjkAHSZQavxDEBLfYqEL71XA/QmuziEpjShgv0mmIibzjYCtPlZ\nC/H6Mp/wyyPJJTmVnHRob2vRaTYX1Cf5/09m0LhYNVc0bZYdILPBOlN02bxqGdx3SofsiVOL\nxVAscUwBoyPKBhgU7atS3RsIzRWFBxLmk23mIgaiNJczohgq9WpaLXVTOeWldwtAVwR8BmiJ\nEONikWZ6kFLS1WgPZPrblW3TD3O2AGhiqbXzZWy2F0nzKp9SXppBq6+Wkw7tbS06zeaC+iSv\nf7AGTb7uylG+EKATWsLSQaDJ5Iz0HY/wuwA6B6Cr3VROeekdGdDlOyHLUpFmapDEV0nREi1T\ngm8A6PoatFyMKDq0t0HcbC6oT/r6pDOpltrslTNoEzjLDpDZYJ0pumxetQzmO9Ahe/I60hMC\nnsB6oOmY617z65cH9G9PDPEyG408VzDRZ0JK1xdMSFICCfPJNnNRNDLmckYUQ6VelQ2L3VRO\neekdGNB61c64WKSZapk0oXEA4VpZg/bSlpAgDwjo+i6OnWbQ5TWpltrslWvQJnCWHSCzwTpT\ndNm8ahmz7+hWDOjdWwxMCHgC69Z0zHWv+fUANA8pXV8wQYU8CfPJNnNRNDLmckYUQ6VelQ2L\n3VROeekdF9DquYoH6IQt4XGMHUCs/5ZTe0iIAfBTxgO0aLp6DZrAbjYX1Cd1fX5NqqU22wG0\nt4vDBM6yA6S8xGqoLie8KzXNCzDmNvZuCpyGr2zQmo657jW/fgdA62GyAQZF+6pUx4GwNWRe\nMh3l1AJa/LopG3N5aBRDxck6QOs+vfSOC+imGXSyLXtm0BnNVLUxAH7KCkB7wZRm4YQkKKrK\nJ9WFxPO4uouDwG42F9QndX1+NdJFOw/QprP7AXqyTweNkfj2nZkVCx2yJ5nVTd5O6Kw6bMZc\n95pfvwOg9YMHG2BQtK9KdZ5TWmA3oNXDTAjVCwLa++1UBkeqN+FIGKsienkNOpmWdg2a5jYI\nRjNV7UTOLgdole6yJxl7KKPa5k9S1+dXI120WwC0HDohpTL02V4oAZ5ojSnxmNpSJ6n9ukxj\nsoe4Bs2n305rbc8DAC0fhmQWYFC0r0p1nlNaYC+g3zZKTdJcxghpqDhRG3cWu6n79NK7AaCd\ngDR5pt+EI2Gs5moNuziSaZnNLg6a25D+aKaqDPn3+xaAlkr3BTQdw0xstqZi76naUie1zqBV\nR8ESh9BOa2XP7QFdPvwg6evkkmcspHT9VNolKQF0lFMAtPqEvj6gvYC0eabehCNhrOZq6i8U\nGhfrXLOBXU6MceJaAFodqToyTuRNcX1+NdJFuzEBzVyaWteg1ahBfKzDe5ZuuD2g8xUAPX/G\nliRQ5vLQKLXEyfmAdr8PkjxTb8KRMFZztQC0qEf6AMelaQDa1EhCgRfZK3ZxqFEDSwLQVP8U\n7xdY4pDr5AlHrgJo3Sc1oF7MqW6SkO8AtE6AfQD9EheAFvVIH+C4NG0CtOjXakBLKVKW7EwS\nL8oB+wEaxZnazYCmOaETVQanqhVLHFT/HAubPyREp6VyPakaNmQm2/Ci2vKE5vLBNSBTA+rF\nnOomCfnPAe0GpM0z9SYcCWM1mxqAFvVIH+C4NPUBneRh8S6HnYwTeXM3QCspJFptf4zNGJ8g\n2RioZaQtAJ39dCBuyrzX/PodAL39Njt0WirXk6phQ2ayzVwUMmDc9wM0C/kNAL3HQ8IANIqq\nDAOEXwpAkxqpdNQd6q0AzRMKerzQa379FoDOWRzYAIOifVWqJ3pfCwxAv//5ArYFtNKjA94G\ndjlhXShm6bFjINDySKoEoBVMUN8BgKbiiIuMgVpICkAPDej55GqA9vGC+kDaMqDNXZ0AFe+a\nPFNvwpEwVjkbQGsvmS5BTmA1kolC8pIHdwa0qWzM4C7mmfx6C0CbGql01B3qkQGd3xnRXQLQ\nZKBm28xFIQPG/VBAv7KFt/Jk6gSoeNfkmXoTjoSxyjkAbbrD+gDHhUEBaFOjOMcf6k0AbSLB\nMXGh16byOyO6y9iAdr6EZ+JpG1LotFkR/nUDG+aTbeaibKnHfWBAs21MnB7iNoiZ34Qjdb+K\npgB0ZlWcGu96Pw8Edga05Nz0askju6ccEICGOqzXpvI7I7pLABrHpZzdBNB6f0wWhvsCAtCW\nNbYDJlltd1gf4PgVkj9lHEDbIGgDtBohOoa1YCA1UumoO9QB6AC0Cg7hemkusVUFqWhyJKBh\nh3kWhvsCAtCGT6QDJlltd1gf4DhNY+RvqpGdDUBbiakGaNi+K9yHlrT4n/U6B6BNMZ6G4EV/\ny4h9EqDNj+izMNwXEIA2fCIdsEwx3WF9gONXSAagaY0pio2Bqk6qABp/ACfch5a0+J/1Ogeg\nTVEjJoiTyH0hUEjFYAUL8j0AnWMGLU37DNBiMR/8x8zgLuaZnGOJwzpT3LMGqjrJB7T5ExLC\nfWgJzQcwkfU6PxfQDkOUrwR/AtBE5hZr0OVMOFL366Xq53VEQNMdqSsBLR0J/mNmcBfzTP4V\n7jwkhEANQFuJyQU0/EkDNd7GEpoPYCLrtYk1UfmdEd3l+oCWH5EBaCZzg10c5Uw4UveraGoG\nNA1s3RQtU4Zk1tsMNfTZB4BWX0WIS9EM7mKeya+3ALSpMeW4MVDVSYszaNJRawnNBzCR9ZrG\ncRmJZwNafUTeEtDkb3Gs3AfNctVXa3JTCRWO1P3Ks6m3BDQs5hOXohncxTyTX+8BaFMjlY66\nQ92yBk06ai2h+QAmsl7TOC4j8WxA57sDWn0/m2UEoJc8uPUM+lKAzolZeF1Ag2CUmBZ3cZCO\n+mFp7SWtuadM5XdGdJfrAzrfe4lDr6DNMm4CaHtHnJBM1IZk2luooc96AD2Nt/iyzFaN0Azu\nYp7Jr/cAtBEs7HSHegHQqCSp66RSW2vuKVP5nRHd5QaAznd+SKiecQSgtSGZ9hZq6LMPAC2+\nLNPnrmgGdzHP5Nd7ABqP5Ddjd6hHAHT9+sMBLYmT2P0isMSpqGHDfLLNXJQthYrkmKqCVDSJ\nGbSsj5ajVHZtC0C71oGA4q8pyOjORTSDu9jUEG0D0Ppojv0AdABaSZ5sMxdlS6EiOaaqIBVN\n+tag8yCAhpiUufMOR52kfiZcDNBv253f/qAZ3MWmhmh7HKDlV7Ks3SCvqHE8GtDi2+MSoIvv\nWEiDkqSuk0ptreFi9fpxgP7z+/JXCUBLFckxFUMW0qeCF9kN0c9ZbQB6yYP7AfoGM+grADq3\nz6Bn39FftoGSpK6TSm2t4WL1+mGA/gXzn5nU1wJ0+Zd0DRvmk23momwpVCTHVAxZSJ8KXuiN\nYQE9H77DUSepnwkXBfT116B9QKs+q3H0AT2vFOsRomPoBoN15uznHIBuA/Sf76sCOslrOwGa\nB6OBiuyXjxd6YxdAJwxqk5uZd0Hn8uvwHY46Sf1MuCqg+3dxEGWl7faAhhwpIz/dNeRRfVbj\n6AK6zHP1CLk5ASeYhXIMhJ3uUBfrHg7oP98BaNFSR7UXjDKyROslvNAbewBaPINUt7WYciYo\noHJZ9PB7zhPtJWvMCID2Rj/XAI2VmRk05UmHSxhfE9BipViPkJsTcIJZaJyYFaCZTwPQFND/\n9VMWmy2UNL+Ig/Q6SuUGaZXMSamv7gslv/+SrqF1UI1S7NRSqEjzBdNMXJd6tWavl8a05NtN\nTQWry+l06Z1dKCCBmHKWioAk64seinczENoYq1iemP4lNKTmQR07LJLYqCmvmQ4SR+ojrZLY\nRTqMsY8tjPqkbmlx5R8RXkSl6W5SLfWbCgJ5W4men+VhtDvp6wUDc2u5l6ArpNuiX0Sj6Wgl\nLJtaw8WG6x+WRUD/+Y4ZtGyppx3ebEHOKUTrpfkfvdE5g678LQ69z1re1mLKmZimJRCXYwZN\nY86Rb8LvyBm0lqVDOYOMt5t6ZtAkRI0HkrdeJFUl6IrpdsygZy4HoKWBwlwGCRWyovUSXuiN\n7QGdLZ9tbiqhggIql8XtADSrTOyE8LsooGtr0CxE9bDI1rzCZAnzcTE1AP3nVQLQ0vXKXAYJ\nFbKi9RJe6I0FQGsQJ+e6clfJjwy3tZhyJiigclncPhDQXtKqLsizZwMahCQpWHQWZLwr9uzi\nYIENw+LvWZSqEnTFdDsAPU+jA9DSQGEug4QKWdFaa/Z6aUyb1RJAw0xE5g+cy46s3MWBDpDi\npmaPATRRz2LOkW/C77KALhfUCDnpq4YlBaDrJQDtp+iYgC7ZAjch0EXAQ/DpXKUMCUALARAp\nJhUdO2jKkw6XtpcFtKil27DAhmG54BJHkg7QycCv3+6XhBhmthhPQ/ACFxJUUzW8FL0SoO3v\nj0XAQ/BpP1CGBKCFAIgUk4qOHTTlSYdLW5ZLyaovDaztmDksR6RUnW/67RBAix+jOBUmS7yx\nfpl6IKC1xToZ+PXL/y0OHBwMM1uMpyF4gQsJqqkaXopeCdA5AG0qJ3I2MqAdPhj1RZm1HTOH\n5YiUqvNNvx0DaD0fZRXegkoMsm4fB2iY8+tkMKp+TwLQJngJk0yM8kgrZ/sAWkauEut1U57N\nqode4pBo08OrNQ8NaNZBzGKrHm8y58JJacvwcn9Am0NbQQJ6xqOM0EMAnWb1MtN0MhhVvyfP\nA7RyJ8adqSBOkz1zU/RjQMPY7Qvo3oeElCEBaNpBzGKrHm8y58JJ8S0j9FiAlkKd4VFtWGAn\nfpwqFQSg5y+KKSlTDwN0zKCzDTNTlDsx7kwFcZrsmZuiowMam3Vts+MMOQjQ5I44cQggNDnJ\nJmvA2SpAm6zGLLbq8SZzLpxMvrI70ovhSn1RZm3HzEl4UwpJtod6UMUwqTkB7Rr4MylTTa/N\ncapUKIBOqghTjwN0rEHbMDPFZjkEL2GSiVFKKHFyNUATrwi9BD2k+/NtLUYHte6X8sPPQQBa\nHIJhpMPvV/KbzmK4Ul+UWdsxcxLelEKS7aEeVDFMJwO6dMgQ+mXqgYAeeRdHADor1ysQsYhV\nIWujzSol/RRns8QAtO6U9eDFAN0xg8bsSOofGThMlGR7qAdVDNOZgFbGvNd/T5xB48BhGClV\nvycBaBK8VizGKM0hcRKAtnXQAVLc1CwALQ7BsAoUVq9Bo/FJ/bMDJ1ZOLwzoKZxPW4NWwjIJ\nI6Xq9yQAzYLXiE0ZnUxySJwEoG0ddIAUNzXrBTRmpkMAoclJNlkDzgYG9OpdHGh8Uv/MwMm9\nB1cG9GSzjKEAdADayNFR7QBFhWwqlZO4wAOD3AhAi4a6U9aD1wO0wwejXjiPipMDWAxT++Rv\nAGgVQ/sBGgfeJo4JI6Xq9yQAzYLXiE0ZnUxySJzsDWj9Lc3vpzibJV4X0OqKvSOOHQIITU6y\nyRpw9nhAJyEk2R7qQVWa7gZosseTtMaBt4ljwkip+j0JQLPgNWJTRieTHBInGwFa5EipnDI+\n5/D7Kc5m1SdtszMOkOKmZu2AJtsQxLFDAKHJJK2qnMjZMYDGv5/B5NfEGcOVejE0VJwcQBF8\nMYPW0RiAbtIPGaz9eGdA404hv5/lrKTgLQCNT8UgMx0CCE0maVXlRM4OATT+0IzK15KdTkyG\nK/ViaChj5AAKw45fg67w0gQlJiv0KTcCmvkQYn1uR53nWEtedJCbnkgHB6BZ8Bqxs1R1xqJ8\nss1cRT55Y1RqbzqDLim4CGgRfqMC2uwrg2OHAEKTSVpVOZEzALSJMOIljBQGFXUgp6pmjMwZ\nZRQYLpGkTiljZL+kYYfv4qhNaPXPXtWhCQnMI13tdbcD0FwRayzu2sQxya9U/Z4EoFXczRKS\nrZ9UbJAcEidrAG05ZLvw+Rp0ScElQEv0DQpo+8sMSBiHAEKTSVpVOZGznQGdyjcj/tnLXeF2\n4m2gDBN1akM36X5ZB4lESaRfalBFEGwL6MlBtPMQBjyZQVEA+nqAVowqUmVskBwSJ3sDev0u\njpKCC4BW6BsU0PanGZAwDgGEJsIfUTmRs30BrX5EQb3ruMLrxNtAiSR1akM36X5ZB4lESapH\nSB3UtCGg9YI4dh7CwIhKutrL1E0ATcNIi7CJo7uNqn5PAtAq7t5HilGzVBUbZFTEye6A1mLd\nfpazkoJ1QOvJ6aiANj/NgGOHAEIT4Y+onMjZroCeP/u3XoOWSFKnFDGyX9ZBIlGS6hFSBzU9\nYAZNw0iLsImju42qfk8C0Crufo+AUUXqQDNoLdbtZzkrGeoCWqIiQTNlVqbdN1azPDfZq/2w\nZheH53uIeXNtRECXkGvcxYHO4TjTntanFDGyX9ZBIlGS6hFSBzWdvAZt7itFAeirAdqbQavY\nIKMiTi4MaDU5/QjQkIzoACluanahfdAmwoiX6BHUft+eve4EV80VtBNvAyWS1ClFjOyXdZBI\nlFREor1miE/fxWHuK0UB6GsB+jWkdA1axgYZFXGCyQxyDJhYxOaTAC0npwMD2moWxw4BhCaw\nCyoncrYroEvIOcG14AqOM+1pfUoRI/oFUxQhRI5hyvhqh3hrQBsZiRzJeDP3laIA9EUArRya\nbH0vAcGAt23mMvJJgYlFbN4S0KKTi4DOAehEzvYF9Ox0J7gWXOHgTE07tOMpYkq/cJFPCJFj\nmDK+2iHuAzTrkCMjkSMZb+a+VhSAvh6gUazVREZFnFhAG9crMLGIzacBGvW6PUflaLWWJQQY\nGOXHA9r6hI0g0eF0QkULxDi6KMmLr4k32+73FiLHkOSMGeKtAK0Nop03nfoVZe5rRQHoALR1\nvR5rFrE5AI0anMyFq4lUMh4Hu6ByImdPAjTfj/0WIseQ5IwZ4o0ArZfpeedNp35FmftaUQD6\nboDGysqAt214GV2vx5pFrOqCSCGt2eklT+85W8xd5srpOtzAuPOt1nWkIw2MMgE0BK+uTzSL\nY4cAQhPYBZUTOXsSoNVzchxBOTQkZ8wQbwNo8ZmBMkjgylrbAVoHqxbKwkj3xjgMuq3uTtcf\nCmgbLknfAbE2HcioiJObADphz4EeqBys1nWUI/Vgzc0C0HnLfdC5G9DyOTmOoBwakjN65OaQ\nMxlnTFUxTUbl/Bm0DlYtlIWR7o2wRQ2D6Yn0YAAagxfdrbhSrpFREScBaFtHOVIP1twsAD0K\noOWTYnVwvTVoe18ragI0dJF6jzUWvfk9QFcl0hPpwQA0Bi+6W3GlXCOjIk42AbT96Bcpl9QN\nUlh6z9li7vJ8T9hzDRyjHByv6yhH6sGamwWgc9JQZfKJPOYVES0Q4+iiJC9ihJkRlENDckaP\n3BxyJuOMqSqmaYccGZgmWMvc14qOAvRvF9FVifREevBxgE5qWVUNAAk2WUX7kYyKOAlA2zrK\nkXqw5mYB6JRPBzSqxxGUQ0NyRo/cHHIm44ypKqbpMDsyME2wlrmvFQWgBwK0es7gQkMGpapS\nVJFREScBaFvH8bWCw20BjQ4m6qcWAWh3mB0ZmCZYy9zXivYHtNiEha5KpCfSgw8DdPmTB9AG\nyFOGEH/cLRvAqIiT3QBtNHNe2RslQwPQxC6ozMQdDWj8LV3VFTTYhY1l+LM+UqpfUlVumIMA\nNAkQanCpXL62I6DpMEoPPgvQaS2g7Z9Hkg1gVMTJgwGNT5fsmRaihmBgQJs/nrczoOHP9S24\ngnklCRvL8Gd9pFSrjjoWHgNo7LyyBGVgmmAtc18mz/6AltsDA9CFAHSE2RLHHJkgJhegWz+S\nUREnlwM0pKzOYw89RjlYrevMZ1qIGoJxAa15ySLM81Jxis1qezABWv3BLmInzWzsRj4S0DIi\nQdNqQJvOK0tQBqYJ1Npkm53Kuh5AT2Fk5ZieSA8+DNDy24ZqA+TRziXpgEEMJ1cHNHOMsJ8Y\nyC7qvC0Jo4QoTcMCGnjJIszzUnGKdbA9eANa/ZSP2UkzG7uRhwB0TmsBbTuvLEEZmCbQZfzL\nZ2iqNxaPEQAAIABJREFUD2jTu7m91e5bGzPoPHdLEsDJZPDQ+xXIo2IlZ+tHNiri5GKA1iYp\nc62PZdRSJZjQr2PtSD1Ys6a9AC0bYqdYZqE4RAaLMOIl+UIdbJrlG86guwHNs8LKwDTR1kpJ\n0LWXqXsDerw16H/++f7+368//xgR0Ojp9yuQRwTlPDzKj2xUxMlwgK7/uVFtkjLX+lhGLWiH\nizKy4auIHqxZ06iA7pxByxfqYNMsX3cNWkak0nTqDFrLgq69TN0d0JVdHHQYpQf3APQ/v76+\n//Pn6+tridD9ul+mW3hIAvBMRk+/X4E8Nm3Aj2xUxMmZgMY/cfO6Nn/HOgXQZjFfD9asaVhA\n961ByxfqYNMsX3cXh4xIpWk9oLdcgx5gBi0dTOQYWdKDewD6b1//+9e/f/77608Amog1wYhy\nitaSGqlU1gbalnqVfb42PaVoBLTMaIIeNFBazU4poHXk/xyMC+iuXRzyBR3Mm+UCaDSs6grm\nlYSBWkygxsxTd8/C0pNZgAwGbZkcx1N2ccyGNK1Bk6AhvZvbW+1ea50zQwD6rwn0v77+9vse\ngLZiMRgpdVSqCOgmNNCgSDyR0NfSu3UjoEWUEPSAgcpqesqWOHTk/xwMDGi8ZiKMeEm+oFV2\nAOeKPOWrrmBeSRioxQQc7TzbmMcANC+OjESOiiFtuzhI0JDeze2tdq+1zpk6oEns7wHoP1//\n+Z+vf/+sQgegiVgMRkodnSoFugkNxCyel9ySufrW2wJoWZ+iRxuorean5CGhjvyfgz0BLSQr\nOSyzmDi4ZiKMeEm+oFV2AOeKPOWrrmBeSRioCX2gW98V0I6AucbjAP2Pr68fNn99/X10QLOU\nVXeyGB41JmxUxMmmgJbQTWigQRGZQee1a9Bqxk3Ro3NZW+2cKkcmuP/WdAtAJ/qCVtkBnCvy\nlK+64pu4JeUcgPYFzDUeB+jvv3/9+ddfE+klPgegsx0jpXXS0j6DznQNehKVZDxqbTJA0y6A\nVgIS3H9rCkAHoL0h9mUkclQM8QTMNZ4H6NbSr/tlOutdIYAzzMm86pRVd7IYHjUmbFRkvlQU\nYjBS6qhUIWvQNbxgD5TUFTNocIyw3xiorPZOpYAE99+a7gNo00u0ytJorshTvuqKALRtSaIF\nuvaqcSigIRRtepUWv2cB6Awpq+5kMTxqTNioyHypKMRgpNRRqZILdC0tqmFl77SvQc/XQZns\nCYt/4jpdI6lbInYD0C9tPOWrrghA25YkWqBrrxoPBPQ//7+vr+///ncAminEYKQhq1JF64X6\n1bCyd7bcZheApiboQJxe0CpLo9nlPOWrrghA25YkWqBrrxr7A1oe6VC06VUM/z3bA9D/97ev\nv8r319f/BqCJWAxGGrIqVbReqF8NK3unEdAqo7Uy2RMW/8R1ukZSt2ZxPwdbAbpmGHaKORDE\n/da3Kqxx4gB7SR2s1M8u5ylfdcUOgLYWlp5AHCasrjQFoOWRDkWbXsXw37M9AP0/X3//2QP9\n/77+OwBNxGIw0pDVqaL0Qv1qWNk7AWjsFHMgiHtRBpIyAJ1lX1V1pSkALY90KNr0Kob/nu30\nQ5X53ymAVqGJJZlXnbLqjhCox4SNisyXikIMRhqyOlWUXqhfDSt7JwCNnWIOBHEvykBS3hXQ\nxjLVsaTfkuyiki3H8QmApp3SksuAKDnGfBn7Aeg8O4PcEQL1mLAckvlSUYjBSEP2ddFeTRnr\nV8PK3glAY6eYA0HcizKQlPcBtHwETSxTHUvyTe25F9WFcS/X6XPrcswJpzgynGRg0WJ7OCag\n5+s7LnH8/et/AtBErE7uGwOa7seWQnTk/xx8Jy3faU17Vquls0LXYA4EcS/KQFLeBtBqEyex\nTFlYvJfKXzr+vZJUJam4BdCyP07mujKcZGDRYnv4QED/389fsvv5NeF/rgtoSDcY6wA09AIa\nvU7ILxqFA3T1t6YAdMoHAVqNU5IdQsuUhcV781/yTNOZrKQ0BaDLkQ5Fm16lxe/ZPtvs/vG3\nr6+//f3/Fvh8CqCTqQwhhdkxO1SNCcshmS/mss7a8QE9p6FRJnvCGDK5hvxNEOkAXf1dMQCd\nsgNo0iMZcMQtMgtKWmBr+EMCxDJlYfGemkHPlEf/vlynz63LA9Ckxe/ZLX+ookITSjKVIaQw\nO2aHqjFhOSTzxVzWWdsIaGt8MreqYWXvHAVoObuyUkVNldWPB/TvPxZcrEcy4IhbZBaUtDCt\nu2fQuaxBF8qjf1+u0+dOYBOFtkdMhpMMLFpsD9sArf9+QgC6ST/tXYmRaiYnOEnkmjiAsQ5A\nQy+I1QFo3UvsjM3r2TcsuFiPZMARt8gsKGlhW3evQefCrSNn0ObZhpMMLFpsD5sADX+B7MqA\n/pKlE9ytJb3/lbP3a/r5f3Lb6Mrv6vaaONCainClgytUYtP8L813U7kNjYzANJlquu0pNHeY\n4e/r5pp2jLQfe8Gs/g1qR6qoOYtL8l2qt61pz2q1pGHYKeZAEGc6y3oPAn+ayV5CZ1LCZsU3\n3hjhKXbadAMCVQ+/UDxZrCOHWli8B6OXhNlCvB7samAThbZHs7kJL5uWLFqkBLTTz6T3ZENU\nJtpJp0ByGRAtB0PRpN2npRfQ/R8Or88W9vFTpgrVqVaCk0SuiQP4MH7IDLpYK0XLnpBpnZqZ\nMU1FiKj+1naHGXTZtKb2oOnOqGdqYPfRM+jich05dmpY4kTHobQa/PtynT53ApsotD2aNRnz\nTUsWLVLCVGNxBi2X6YW5oJN0anqTRzoUIWNUi9+zSy9xaK+K6LJfKKGWDkUVUpgdswvVmLAc\nkvliLivs5LEALTNO18Weazftsg9ae4q3pj2r1dJZoWswB4K4F2UgKR1AzwsGKqkhIvWKANi9\nJaAVPRQqLGM+A7SsLsS/XKfPncAmCm2Pcj4e0OaPrAegm/SDV1WsUH+XWjoUVUhhdswuVGPC\nckjmi7mssJMD0HD/re0AQJtOMQeCuBdlICk5oEsuq0V4HZHwTA3s3gPQ4E8Tz7LbSBxlYdKi\nVNLJSkrT1QGNf2T96oD++0FLHNqrKlaov0stHYoqpDA7ZheqMSE5JI8D0E4nlEid4j8Hlwe0\n+DY8wgxa91KhwjImAO3oKy3mdkQ76dT0Jo+SuqW7olv8nu0B6L8ftQYNQV9eBcj4CGXwN5An\no9NMQJMckscBaKcTSqRO8Z+DqwNaTI2HWIPWvVSosIwJQDv64BCG3rW6KJFHSd3SXdEtfs/2\n+Y/G/vu/v/7zf/+9+58b1V5V3pz6mJDQyVSGkMLsmF2oxoTkkDwOQDudUCJ15P8cXBzQuB1Y\n9hI6Y/5mnLCMBBftkbgUgLYtWbRICVONxwH6r5nzP77+9f1/u/+5Ue1V5c0kEka1SqYyhBRm\nx+xCNSYkh+Sx6JpavSp0KMNKQ7YSx3CrGlb2TgAaazAHgrgXZSApDaDVzzVML7EzNq91C9Z3\nOBWXAtC2JYsWKWGq8URA/+vrnwf8NTvtVeXNly+SJXQylSGkMDtmF6oxITkkj0vX9PPfQoci\nn4ZsJY7hVjWs7J0jAU1SRNmvI//n4NKAzmpCMOuQEUksJs4lwUV7JC4FoG1LFi1SwlTjcYD+\n/77+33++/vb9v6cDeoAZNOygLHQo8mnIVuIYblXDyt4JQGMN5kAQ96KM6RgxLqmlC9lL7IzN\na92C9R21ldMAtG3JoqUMRjl7HKB/yPzfP88I9/5zo9qryptTH09eg8Yt7oUORT4N2Uocw61q\nWNk7FwF0kpUr8pRmvxbkmKpBHZjMWesuDiNER6SxmDiXBBftkbgUgLYtWbSUwShnjwP097/+\n9vNHob/+vsDnJ+ziSLHEwaTiy3R9C0Az10GOKTnUgcmcte6DNkIgIlEtcS4JLtojcSkAbVuy\naCmDUc6eB+jW0q/7bXqyg4PpQEfI+FsMJmbH7EI1JiSH5LFeg9YKIbnPBXQqFlkgiHtCtOwJ\noQbGLpWKL9N1BDSRz7MXE8Y3zHSWOjCZs/0BPWsiwUV7JC4dCej3G/hUVhfiX67T505gM5fY\nHpGMccaSRYuOq9fZ7oBWnkn2VjItiu0B6Awhhdkxu1CNCckheax2cYBCSO5TAV2WX0yAJmmt\nFC17QqiBsUul4st0fVxAp7wPoCGCUi7Lckw8WDCXgwE9t1ddhEpF0zcI44FNFJIekYxxxpJF\ni46r19mBgIb0PQvQ//zzsxD95x8BaFSV5n9FPg3Z30q7/8F+8WNkE6BJWqvMEkeEGhi7VCq+\nTNeHBrTxg+malKYjsB3Q7EfgTo/EpUcD2mnIokXH1evscYD+59fX939+/rNXS4Tu1/02nSQa\npgMdIeNvMZiYOLML1Zjw9JK2MbtmOS2AtjtQ7AiTll7HJ6kC0HLjrgnQJK2VomVPCDUwdqlU\nfJmuB6D1kPAeim5M5U6A5iMstBkZ1tJZNZPwdED/7et///r3z39//QlAg9g0/yvyacjOm7jx\nqglxGxe84+9rzTNokTQIpnJEqIGxa5Im6ZpJXg9A5+EAbVyRzwa0iQOwWJzR9H86oH9/qPK3\n03+owmMhmcoQUhAlKquLEJ5e0jZm1yxnGdD6Z2lSL9S3cZH49TyDptytrEHnALQ8OwzQ5ZsT\nEw8WzGVUQL/uDAro9+vjAP3n6z//8/Xvn1XoADSITfO/MiiOmUfMoHNlF0cOQMuz4wA9P3tg\n4sGCuTwD0DIsWOpZj9H0fzqg//H19cPm5Y3Q/brfppNEwyGkI2T8jSGFbjcBzdNL2sbsmuXs\nuQa9CtA5ZtA0sQyVDgQ0u0V7JC4RQP/WwB82JnWGpusuWDOS7Dn4VDaDbApAi6NkbyVoIXu4\n09+D/vOvvybSJ/9QhcdCMpVZSKHbTUDz9JK2MbtmOXvu4ghA21qQY8k2cLXM9Y8DNLWr7goO\naPunQfQZmvIyNeFlNDAAzcKHdOr9XmpA+p4F6NbSr/ttuvIqpgOPhWQqs5BCt8NYHwNoWwLQ\nvmdcI6wm6CzzvenyFQFN/rheYq0VLjTTtZYke47cSlip3AlAl2bSAQFoZ4QWQwrdDmMdgMZe\nCFsgdk3SJF0zyetPBzSrwa0EVzBAq+fMutu2i7ONmulaS5I9R2752RSAFs2kAwLQzggthhS6\nHcZ6bEA7STDdGxnQCW7VqYSauRFWE3SW+d50+WxAsz/6JS5tNYNGpmf0VOm59SlUmu8/A9Bu\nDgegp17NrwFoRwIF9FTdIKBYK5XJnhBqYOyapEm6ZpLXE9yqUwk1cyOsJugs873pcgegdaih\ng2kEsRqvY/Znc8WVzdagu2fQyVSa7wegRTPpgAC06zV1z4YUjjmM9amA5tlljKMSjge0rZF0\nzSSvJ7hF5HuecY2wmsAu5nvT5XMBzbbEqwof7eJQuOhdg06m0nw/AC2aSQcEoF2vqXs2pHDM\nYawZoFUCXQjQxQADNa5M9sQ0yeAnUiPpmkleT3CLyPc84xphNYFdzPemy6sBrf8o3UeAZj9a\nAlc4gDb1kzpDU5K0m8W47Ln1qeqGTMcAdGkmHRCAdr2m7gk6mChhEZgJoHUCrQE0CELB+qqi\nGK+5OaAhGGVPTJOcASO2RrFOR77qWvJau57Rh6ShHgKWYVTLXP88QOfuGbStz+KZYYdamGTP\nrU9VN2Q6BqBLM+mAALTrNXVP0MFECYvAbAENU5znAlqDJAAtu5JsdeAMq/EW1rIGbebYtj6L\nZ4YdamFSPffFQTZdANA8soRBAAcWPl4OPxrQxlE4hI7X1L1ExwClMu6UMQhAv4UplASgZVeS\nrQ6cYTW0NG7kC9D44cjqs3hm2KEWJtVzXxxkUwBaNJMOCEC7XlP3Eh0DlKpUYSylGy5xkOuY\nk6YJfFAFoGVXkq0OnGE1uJXgit8N5PjhyOqzeGbYoRYm1XNfHGTTDQCd4AILHy+HA9BTr+bX\nk9aglW3MLpEmid02ZpqrOs9ozTMB3TuDTvBPJxJRbzumD0lDPQQsw6iWuf7hgKb1uZE/gLYf\njqw+i2eGHWphUj33xUE23RXQ8Hzfy+EA9NSr+fUEQOvvoI8FdOcadIJ/OpGIetsxfUga6iEw\ngeNpmeuPDmh8DsJpxeJZ46JiYVI998VBNt0U0G9nOw4XVhXzknZAANr1mrqX5ABBlLAIzAzQ\n6vhCgC5hhgiYxICPZU8YNWrI0RGqX1TXktfa9Yw+JA31EJjA8bTM9YcHdN54icOGZlI919yS\n4iCb7gno6eMQMplYVcxL2gEBaNdr6l6SAwRRwgI63w3QCZn1uvOuCD6WPaEZbW1RNYp1+kV1\nLXmtXc/oQ2aYGgITOJ6Wuf74gM7w7YXVZ/GscYEStEjRc80tKQ6y6ZaAtpvTvRwWqZ60AwLQ\nrtfUvSQHCKKEBXT+BNCIHpMF9FzqpdlljKMSDKDnKEN+5OIj8LH0SAUo0hZVpVinX1TX0Euo\n3ioyZrOuFBkuyZh/fw3ZGNBk4D8GtBuP4F0eQocA2hlDuVjAR1iIMzISvPuaVFy9Xz+bQZeO\neTksuq7C36ZqkhVfZ08DtHKaOIGQQu/C+N0H0JUZNGbhdEl6pAIUaYuqUqzTL6pr6CVUbxUZ\ns1lXigwTOJ6WuaOWMqz3oqGMkREAbZ2gawagqTrsaLFArXCYnerWqocD2mS2m6l4L8kBSqQi\njt8qQENiIHpsZrFzqZdmlzGOSlizBo1ZOF2SHqkARdqiqshhki+qa+glVG8VGbNZV4oMbOBq\nmTsagC6JAqGB4iCbbgpobbT5rae1KgCd1Qi4mYr3khygRCri+N0J0OXvRzvBDBmqPFIBCheq\nIlS/qK6hl1C9VWTMZl0pMrCBq2VOxSZAE7GlK8xbOoJ8f9aN7AG04UIAmqrDjoKAd3zYv5Zi\nrQpAZzUCbqbivSQHKJGKOH63ArRuRuwG6EiP+EBxhKoI1S+qa+glVG8VGbNZV4oMbOBqeSdg\nALoICUDrWHp1IGbQqB9DCjPbzVS8l6SrE6mI4/dYQJdTF1HMFtuevKiuoZeKAid9wSFs2NVQ\nJrjpank1TQHoIiQArWPp3YNYgwb9GFKY2W6m4r0kXZ1IRRy/RwCark0n7REfKJ5QPUzyRXUN\nvVQUOOkLDmHDroYywU1Xy6tpWg1o/Vc7nw7opNxBAkVcdkZY2I0yErz7mlRcvV+3AfSaXRxJ\nXiDJKK4HoHPWbkukIo7f9oD27MSrimK85kaA5rs7kvaIDxQuVHNCv6iuoZeKAid9wSHMnWoo\nE9x0tbyaprWALhOq0hXmLe1c3591I0cHdNLuIIEiLjsjLOxGGQnefU0qrt6vGwGa68oS3OK/\nLyflmGQMQOt7ym2JVMTxewCgnf3RSXvEBwoVCpzQL6pr6KWiwElfcAhzpxrKBDddLa+maSWg\nxZJk6QrzFjjd6dOSkYMDOoE7SKCIy84IC7tRRoJ3ZpURMJu4L6DVr++TbhOAdsGX4ARCCscc\nxu/6gLYxDb7cGdAwQAn+GS9l3Yj1TB0xdyrFCW66Wt6I+f2hStJVSO+n0RRIKl1h3lLXbg7o\nZOvObcRlZ4SF3SjDBDOxygiYTdwV0OrhYQDaDmHtySrAScWjHnMYv7MAXeLYDwsz6nivCdDe\nD1iS9ogPFGmw316+JHKVeMXzjD5i7lSKE9x0tcxR9C0zbcm6mEFLRY+dQcP2uwC0HcJ7ArqC\nFzPqeK8N0OJvdelOqpz0gSIN1lXAo0me4dUKAk3P1BFzp1IMHsQ/YoHi5mSTVWrW9axB14KA\nZb84HRvQsAbtxQlPBWPRhQCdYwYtLmBm1xiCjoGQwjGvYQpFv21jdok0sdlB7YRyIKD5nyE9\nHdBe9kKeMHcqxYAd/KM3KC6xaSBRItpBqBnbDf6qQUCyXwWclWa0gRNICFL94KlSUXNLegyy\nCXdxeHHCU8FYdDqgbRBTu991Yg16KpjZNYagYyCkcMxrmELRb9uYXSJNbHZQO6EcCej5vtvz\n2wBa/8qApN36GbTRbWw3+KsGAcl+FXBWmtEGTiAhSPXrWBM919ySHoNswn3QXpzwVDAWDQTo\npPvNW4saSWsKQLsMQcckvKbfa5hC0W/bmF0iTWx2UDuh3BnQiiHoJWoM1816D2wCD8LvdBE3\n726sWoO2143tBn/VIPCG+V0C0LYlGw8VV+/X73Kd1caOyiBOut+0tbyStFUBaJch6JiE1/R7\nDVMo+m0bs0ukic0OaieUzQGNaUVa+D3fD9AJvUSN4bpZ74FN6MGmGfTSLg4KBFWZeSvZS/TU\nG+Z3mQDtBkYyTiAhSPVrkaLnmlsYSVJRADqr+sVeOhA5AG0ck/Cafq9hCkW/bWN2iTSx2UHt\nhNIOaC7gIoCeH1B6ramd8oi5UykG7CyvQS/ug3bdPldm3kr2Ej11hnkqRwE6qTcdDLqWVBSA\nzqp+sZcORA5AG8ckvKbfa5hC0W/bmF2SQdCskpvyMsULNvVI8bo3PqDLFj+vNbVTHjF3KsXg\nwYZdHAHofC9ATw98A9ABaJRzF0BX+TIbbK0TcapeXqwsP5KxXvGyF4lK7irF6EE4R3FpJ0Dz\nP1BJTp1hnsq3FWe0gRNICFL9SqQCnIoijCSpaEhAz7+c2QnQTi+lVQFoE3TJ1NX1zVjDkMj6\nGS8q25hdIk1sdlA7M7lM8KLreKR43Rsd0KfMoPEcxaV9AK2WVqpB4AzzVALQtiUbjyJgfu4Q\ngH4GoJWeAPR0yWaI4iS8TM45eg3anKO4tAug9cPJnJKpQY/fF+SlALRtycZjFlB27vQDGgZ1\nOYVT1laNA+g/f5Xp/c+NAO0NTwB6umQzRHESXuZ39BI1husmuZbglXgQzlFcUoB2raPmzXEI\nRpntfTCfpgwSF+SlDkDrPSleh8AzSb1BMKhaovGIgP5wBs0GdTmFU9ZWDQPoP9PLH3W5X3fO\nDqAnDyofQp35XTgm4TX97mHKG552QKu/2GDtzOQywYuu45Hide8SgOYm8eQ2HTGmJHglHoRz\nIm5zQOds+eyueDjDPJX1gIZd3V6HwDNJvUEwqFqi8XmAJl4rAj5Zg2aDupzCSfwrhijf4PUn\nA5qGFDqrBLSSYkNC6WkG9JyTy6MrRbl5qDtD2gegjRxsgLGR51RJ+hoZM2YeyeX5suEz447t\nkFG/GtD4u0ivQ+CZpN4gGFQt0XhMQH+0i4MOKkcNXJK2DwPoidKazw8ENIS4mDXR3DAFEoPW\n3BbQxsc6J5OualvjtQC0FiwUHbnEYf6yiNch8IwCnIoimk3vimMCenoNQBdAz0vQ//VTmprV\nSvrrf/Ls+32e5mNd43s+S+JY1i8Xk3xL36Ap6Vrf36DH6JzFJfn2Lra5I20SlSo10zeR5xmO\nPqg1UWfJGEJ1Jrw2CdAOV/2axRqTUCHVrYdR3FSK0YNwzsUlfY2MGTNvjkPfqNdhSqwGHs5X\nSOXkNkmqJ++/L6e7RTsEVqjoVX6h2VR0yrxy48QoNNVQvuqA02VHgDHRqY0dLf3BQXVHT16S\ntifQ613/tDQA+gXmeanjVfo/HHK+0wzaNHdnETBzoTVjBm1MSfBKPAjnRNwuM2jtXD8IrNwb\nrkETl2A1lK864HTZETCrixn0m9HwHoDOIk383GSiaql7A0BXiOFlLyEq3lSK0YNwTsR9AuhE\nbF+kUXWU86eAHnAXx7JLhHrMfNuSaVIj934NQAegMQnkLg7S3A1SSAxaMwBtTEnwSjwI50Tc\nBQCd/CZJ9dMJdRqESmRSbxAMqpZUFIAul5Tt8O259ENGXW9ZAehpaSOWOAygpzo0N0yBxKA1\nhwK0uRaAZkbxUhvlvB2gAQsVC3cENFdIqwWg20onoMVOjn7dOV8c0JgKtR+REVG11A1AG7WY\n/sSDcK7FpU9/qJKI7Ys0qo5yHh3Q8iUAXS4p20cBtPol4Q0AnfH5Sob7KwFd32FFRNVSNwBt\n1GL6Ew/CuRL3Gp0AdA5Az6+zu2sed7QLZ+gew0j8vNztb3FMHVY+hDq6rq6vAqm8GRTZkFD3\n1wF64TcKRFQtdauATjkATT0I58ov7y1pkEgBaBVFJpLEyw6AVkFps9HTpEbu/RqADkALcUm1\nn8QFoAVDasTwshcGwmehNRwbQGyIH3Uk3YLwzHFBIrYv0qg6yjkAPV/En0RWNKmRe78GoK8P\naJSDtTKMR88a9CLrMhlhGlUeKfKGgGbMcehExWmHq34NBmgxg1ZeC0Arf5hIEi+7Atr+JLKi\nSY3c+zUAHYA2rjfcOughYToY0Mvb7JTrtXM+ADQzTKlDD9b+k1fzGrTyWgBa+cNEknjZE9By\nfZB3mQso6s4EtK6KlMgB6Az1Pd/tDWieG6aYEaZR5ZEiHw5oszslXxHQ8y4O5bUAtPKHiSTx\nEjNopd3gwxmJn5cAtK7v+S4AjWeEOdYUs3STLwno3xKAzkMA2vjyymvQuipSIj8a0AIdhwPa\njQXnVF2uRUVWnbHt05GAJrtT8jKg0xmATiCAjGsAOlcAzWrNL/sC+jK7OOZ1GAcyOBI/L88F\ntERHADqTSb22GM8Ic6wpHTNoMyxagZe9OBDEMKUOPHgWoCswqo9yDkBja6/LpqYFdD0pVUeL\nFXkloMsfFnYggyPx8/JYQKvJXQA67wLojjVoOyy+MVw3cTCmv/VgAJq7G61I6s0STN+eXs4B\nNIsWNXLv14MALfZqOpDBkfh5eRigi2cUoYVDHN9dB9DJJ0U+HNCrd3GwYXGN4bqJgzH9rQcD\n0NzdaEVSb5Zg+vb0cltAL3hcnU6RnbQzmP0BaDqD1hmW4M2RYy+9bWN2YWwwifRUXbZ4gSoe\nKfLxgPbEid99zNa+4v2ugDb1kX60ITmcr8hrowNa+ZN0GkfIKWXwaOCZAeFp1gpoHerawNmS\nmsf1ecygc+lx8aEHVrsGrTMswZsnxx7lALRf3p+d01wiC2vfEm+6xGHqI/1oQ3I4X5HX1gFa\nuwhj0zfxXoCejD8I0LEGLRKkAdB2F4fOsARvrhxzlAPQfkl5grDcvCr7VXlIyFLO6CYOxvQ5\nAMPRAAAgAElEQVS3HgxAc3ejFQC44g9Wa34ZB9DSTZPxRwE6dnHk0uPiQx+sMmiSuZjgzZeD\nRzkA7ZcfAbjSDP1Ksm7NGK6bOBjT33owAM3djVYk9Sb8wWrNLwFofd2BDI7Ez8tdAV3OfbDK\noEnmYoI3Xw4e5YcA2qip0kY2gwcAtF9GNb9idRMHY/qzuElwbgQHoCF6FbdsLfEyNKCLNNr9\nDB0tVryu1zzu9cGBDI7Ez0sAOmflEMd3AWg8+wTQWSxBFzEmdQPQ7HC+Iq8FoEnLAHQAmiTy\n2zZqVw5Av/ugg570y6jmV6xu4mBM/8sBmiJNXuwBtMUDJZRFb3ktEmwt8RKA1tcdyOBI/LwE\noHNWDnF8F4DGs88ADUFP+mVU8ytWN3Ewpn8AOts36m4tJwAtrXhdr3nc64MDGRyJn5cAdM7K\nIY7vAtB4FoBmY8bMS0yppR9taI7EJXkxAE1aBqAD0E4OfQroKoYwHUwVjxQ5AM2iAq9gbLxL\nADoHoKUVr+s1j3t9cCCDI/HzEoDOWTokOb4LQONZANrygLsgMaWWfrShORKX5MVFQGeDTYIH\nSiiLXjaEtpZ4CUDr6w5kcCR+XgLQOQuHqF8X6jdfDh7lHkA78qEOARmJKo8UwwIaL7k2BaDF\nJXlxArTfJgAtJKiOJU+xdYwMoQD0kv7tAY3bc+WbLweP8l6AJr8VvSyg5YFJeNemALS4dDyg\n9X9DLwAdgK7q3wvQarQ2ALQckk8Azf7aSgDa6D4M0CyfA9APAzQGknemrgeg1wL67fh9ZtBy\nSD4ANP17hbZuANrYkuR/UjQAHYDWHQtADwNo/QjQAnqfNWg5JP2ATgHoTkBPn7rPArTjugB0\nAHpYQAN+CaB32cUhh2T7JQ4blS4pngnoJPyWLwhoR+wBgE7GxJsCOtmeQgVj5mSJn4mVSHUg\ngwp/Xp4EaFjAkCFFUkEHywiA5g8JbVS6pHgkoNPFAS1Mh5vy4h6AVl88pkuiRZISZDahptEB\nnUhPtVitIJV/AeiK/pWAxkeAMqQSChgS0HSbnY1KlxSPBDTmn2UlXtH9mo/OAbT8TxvAzS5A\nmyzxY1x9rk3XRIsNAY1yvbIVoNVYJdZTLVYrSOXfJoDGZdX8TEBffwZtxZG6AWi0ZRr1jQFN\n83lrQKtZBdzcF9DykYe4KFpsB2g7Qk7ZA9D4FYuJpYD+fdsA0GZjQn4ooGENOgCtqo0A6AQh\nsdkuDrhxGUCn8wB94AyafMdxyh6A7l7i2AjQ+nu9VHgzQMto4oDWjwDvC2iWBO+bPqA9nTcA\ntLq0K6CpdR8AesreMwB92Bo0e0rglP0AzRSvBXRbNsAgyY/gWwNaRbIDaO0ZcTkAHYCGcyN4\na0Bb/5GGmX/5NuL2APRhuzg+nkGTaKYdUhJkv90BBTk7APopM2j9XTAA7eI2AH0pQGf65duI\n2wXQJY3mC7sA+uM1aBLNvEMeoGlz4pg9AH3nNWjx/DMALU4D0MywawI6nzeDLmk0X9gH0J/u\n4iDRzDs0JKBvvItDLioHoMvpbQBdN4ZXDUA71g0MaGXz4wAN4u8J6EetQdfqBqBt0wB0ANpI\nsP0OQO8J6JZdHMozmwGahFUA2isBaEcPb5gD0FjhTEBLsFjvtGXDYwENEXMYoNkenQC0V4SA\nAHQroFmtEQAtwBKArp7o608FNLo6HQFouss9AO2VgwDNvnoGoEFrAFpXC0DfENDsB7EBaL8c\nA2j48d16QBPQnQZo/iPPALRpaB3cBmjkI0jNuwPaGYn7AVomfQ5A07ZPADT+PHpYQNc9NiCg\nwbByOQBdPdHXnwxoyMWDAI3DsQRoknIB6K0Abf+ARQC6AuiENcCAAHQAuk9/tiNifpK0D6Dl\n3/Sgf2hlAdDsz98EoDebQQegqXVLgGa/CwJBAWiQUUmAAHQ2w2R/1L8LoEXyp6nxGkDjN3Bt\nmpVmagagiVxxtOESR7kcgA5Al04FoJf1Z/TICkAjcee7DYDGnyyuBbSZ4IFpRhqUADSVK486\nHhKSJNRyv5kkMOEZgJ7TJQBdSYAAdDbD1L7EkbsBrVc4ewAdM+jX0WzCPGhnbrMjSajlBqBn\n387jRXuyI6BzEv83Da2DA9BDATorbq4AtI1dH9D54xl0xgkemGb7hTXvBejytefMH6qQJNRy\nTgM0i5YTAS3Hi/bkYoCWD5SYOVVAf/ZTb/xouD2g27fZkRyFcfcBDWvQNoxjF4dfPECn/QEt\ns+tqgGbxcg6gkxwv2pNrARqSmUjNPqDpUpojTXdBq34MoKeyL6DhQ7cD0K7J7MTWvBOgy5LR\ncYDOJq3GBTR/YnEeoAeeQXtdxpoS0Ph12EqtAJo/jGYnpgta9dMAXbq9D6B1wwB0XYIj7sQZ\n9JUAPdoMOi+vQSdoPDCgzQMlK9UHtPnsXAVo2fphgBbdhqHcB9CGVAFot5y4Bn1lQLObZwF6\ncRfHhQCdPwF0zKDr+rPjBJHxAWirZThA77GL406AbtvFgdHeA2hqAc8fbQE8oil1LaArY3gG\noOXn32pA330N+vOS2LV3+b2bdJ0kjux1XTnJBomqet1PybFEVFqSAwJq0lBGwrum16xy0qe+\nTnSU7km71VZAgivWN0Zgi/tsz4qC+QU7ntA1STdlwom11Lo0qbCy6h7jauabVBwMsK0vRg3l\nW004xODVRA5n5yb5kmgzpq4llYznvAFxeyS9MAeDF36qoyDDT4BKpGIK6AGZFbbkU1O5zAza\n/RwnM2icSN1yBu1+/HvaNpxBl2lDOThrBq26MtYM2u9wzKAXGrZts6u5zMiBGbTTgErTXaAN\nR5tB9+vOuWsNOgAdgA5APx3QU6emm9/iHpEagO7Vn10nqKcYAWh9NwBdB7RuH4BuArSqFIBm\n0sAC1vAxgH6XADTT8gxAs64FoPM9AG1k9AB6vheADkBfANB0L5fRNiygiVhz7+6AlvXJlrnS\n8wD0YIA2QReADkDrts5u2x0BXfZBl1v7AFrB9hmAtqP5Hq/3T4JoN8YA9ELsnwBo/HMdAeg2\n/XlhMAPQVAsFtPN7YqMNAe2nr1PKfHnepB+Adko/oMlo/t6ef1RPuxGAJub8HgSge/TnhcEM\nQFMtDNDeH3ww2jYDtPiZawDaKb2Atr9BftVPcANdsRGgpXsC0J4FrGUAWhzykHoooL0/+GC0\nbQVoQYoAtFd6AU2H05lBy3oBaGLO70EAukd/XhjMawG6HXW7rUE3aNt4Bh2ArpV+QDevQSuS\nB6CJOb8HAege/XlhMAPQVIm7i6NF27Zr0BqWAWjbsBfQTbs4cC0kAE3M+T0IQPfozwuDGYCm\nSrx90E3atgP0vK8vAO2VTwDt0kp3NWbQAegAdACaigM+B6BJw70BvW4N2jx29G5o6wPQxALW\nMgAtDjcDtNGSA9Ct4hQgAtC24e6AXrOLw/xxzQC004BJAwtYywC0OLwuoKnt6tRN/TQaoPUS\naADa1Ngd0BLAdQNwa0gAmtrvSQMLSEuhJQB9V0BPVnNU/CTYt664VLyvtZsAWj2kCkCTluMA\n2myuDkBT+z1pYAFpGYDeCdDW52MC+pVgIwH6UTPo1vra8GEAbX5t2gDo99G9AJ082W3ZEICe\nyyMAbX4wlr0YYD8jq+qxyjYH9JPWoDcGtLpxBKDx77XcDtBYPQD9UbkjoP0dqaqm6gc+uclO\nDOQRZ9A6+APQpkZDh9/G7Q/ojKEWgLbSPWFoAWkYgB4d0GI5tlbk10z75CY7MfCuPdQaNN4K\nQGON3QHN08MxwImFADQREIDOC4N5OUAX2LYC2jy4qQNa7OIIQMvbAWjWZNGAALSV7glDC0jD\nAPTYgJZ/QMirhLabv4pTB7SwLQAtbt8P0K7rTgF04s2YumcCWns6AD0koDtm0Nn8VZwA9IUA\nveSwADSpEYBuLAHouckCoC3WPFGr16BN3t8J0OYvsgWg4aY0LgB9DUDzpgHoDIcEcqcDumMX\nB9a9EaBx9WaptT4IQHNtAegA9NUALRYWTga0m1665iMAbZ5/LrXWBwForu3qgKZ43BvQ6lIA\nulE/iXldGgEtH80FoLld5eQoQMPvahpa64MANNcWgB4R0HprwF0Abb8C69IG6AIClWH6aV0A\nejo5CtAxgw5AuzrvBuiZP6rG1QFNEliXT2bQsN/tqDVoJ9Sg5iMAvesaNLQMQNM2AWi8vg+g\n3wS6GaDZV2Co0QZotgYtJtVCFFOxB6CXlm6eAeg9d3FAy50BzQc1AM3U3QXQfjdMU7Oadw9A\nbzaDnk/KqM0eOwXQi0s3DwH0ytb6IADtaEu6cwHoLkDzSVQfoG86g95qDVrdnkP3xBn08gdP\nAJrdgoNkbgWgwczPAK2e41jLAtBOecwatOOkUj4A9Hlr0HT3AtQMQLNbcLAW0Bpt00kA+nNA\n4zPZ2wLa/FEcWp6zi2OpfALoc3ZxxAz6g9ZwwAFNUrkCaDkQVwe08EIAehdAq7RdB2jt6QB0\nA6B1tMQa9HTyKaArldJogFYfld/iuhWSfa8HoJ8BaHzQ55UA9FSuCuhb7+K4DqBhsSkA7Yo8\nHdC85bGAhqXJAPRAgDa3PwW0V+ddMwDNboERHwM6xww6AA3XK4COGTTqX6wRgGYlAN0MaLsG\nXQe0FyMDA7o+fgFouF4D9Edr0MqiAHQLoFXjJUDD3SMBbWgTgAZr+gFtdnEEoJnIAHQ21wLQ\nAWjRUJ88FdBquDcBtKgZgHZFBqCtgAD0DQENqKjUdE+fDGjc7Y9H2wK6GbgB6LMA7ef4JQD9\n8xqAVvXPB/RCTsyGeacPBrT5vSwebQro9r/kFIAOQOtKi4Aug/gQQCeTlzcF9PzzmucB2v7F\nGTzaEtDk10RrAb20m7IP0KyHSlkAOuueB6BPB7T4c3XzvYEB3ZK3/LT8QD0AvSug2e/xA9BK\nzuGAJjuTXQkB6JEAPefSgv/Ac9cDtPgTT08FtJdV0/tGgH7IDLoeSWMBmqw5BaAvAegy27k5\noJOA1AmAbhG29xq0m1XT+0pAi4qbr0EHoIm2bkCzT8wA9CUAHTNoZtv9AJ31GO8M6M93cdwc\n0KLtAYBma06VJO8EtIiwAPSi/sUa1TVot76qci1An7sGfTqgNdCaAQ1daQV0s3FnANq2ujmg\n2Qx6c0DL72h88GzQeRaQls8FNNvF4dZXVS4G6AN3ceSmzzwrbURAEyaPAuhanosSgCZrTlVA\nl3vNgFZPOQLQi/oXaxRA27y8KaDn3o4K6FqtcwGdFKcD0A0WDAVo5mBXQg+g9TLKnoAWHgxA\nm/qySnJuYZ1NAb0IutsCenFMdwT0/JxC9es+gGZUvhegbcuNAD1PaI+aQT8V0AvBuCOgF+VI\ncwPQlRp7AXreAKOfJwegA9Bl7eSoNegAdBOgFY0uBeiqkM0AjXcHALT+TMWjCqCTLgHoJguI\nr8XZTQAtnz7uCGh1+XGANrO9IQHNY48qdU93BrRJypsAOu8PaMKPADQxcTBAw/69AHS7/sUa\nlwT0ctrWAD2PbAB6HaCnr6/7LXE8BtBiD8KlAT0br/aH7LUGfS6g//xV5PvdAI13bwxoOLoQ\noOWvDHTTJF+XHxI6NlwR0InVaLaAArrsc7gHoGH/3i0B/ef98mc+CUATWQHoPQGtnvDopklV\nkzX2BTSXoyoMDWjWskw4+wBdD/8aoG3DbQCNj0AD0I36F2tcEtDLfA5Ae3JrgNZ7pHRTBehM\n3n7KRQBtQx3q7wtosWTrAboqkPzcRNc4AdAgwBlNuHopQE+UHgfQNApGAPRCgE5KvdNmQLt5\nvKBtZEAnxRNtKfzKQDeVNL44oPUSOquvfLQ5oBtm0FV0kR9sQw3xpk35GNCg2BnNVEud2wD6\nv35KU7MPSvrrf9+vf+n7ffS+k1Ly6tNzvFWuf4NoWmtRzlTrFaBVYUZZssc1NW9dafLNUkGf\n6KPkVWwT13prruHWSe+7aOB0MAH6fayaJllPNTbqKl7zY8S4yhFOKvj9xTuv3r/7SKPbCgU/\nrLOAufpb5FZpm/R9X6AcJEdlEm/aFNuKyZmgQGpjWPgmencdlzALqpe96OssDYB+PRwcZQbt\nfE6fP4OGPT1eRf+0cQYNm8lWaLvODBrn0mWJA1280QzamxvuMoOGLrxn0HP8nDeDVu5cO4Ne\njP8dZ9BGM3NHMfDmM+hTAe2FwQDb7I4BdC2PF7RdF9DzrwyMjy8IaOzCtMRRwcdhgLZtr7DE\nYaFAWweg2/Uv1jhsBm2kfbYGvZS4HwP6ujPoKi0WAD1lpknF6wHadGGLNWieEscC+sSHhC0z\naFHrVoAebhcHD4NeQFvef7SLoyVvvdNGQFfyeEHb9QG9ZgYt1Y0EaNOHvl0cWXuBfqk8GNAn\nbrPD/vufjZW7C62LBdXLjwf0hrs46MLxR/ug3QpFiHvaCuir7uLYAtAr1qCHBTT2oW8ftLSY\nL64dD+i6V87fxXHPbXbj/ZKQN+gCNIntzj+W1FYwjboA3ZAM1KwrA3q+jVMlAeWk2owL6GwD\nTpJjNaDJLGPJgtsB2g1119bK1WsBmpd+3b/6F2scBWj77XBPQBtl+wLa/TS4MKAdqZcCtL7z\nMaCd5zIB6KqtlauVXtbDRDQOQGMDHPZGQJtJ7Y6AtjOdXQHtfxoEoBut2wnQqnxnMfZc9AKg\n+XOZAHTV1srV1YBWn5bz2zMAneqJVRr0Ahrv9gB6+eHgq5Z9xpXN8WaArnwajAhoNg0JQGtt\nHqCp+QHoqq2VqwHoVkDneWK1I6DrtrUAumF73Vxvqxl0I5/5p8HogJ7rJqjjSxX3rwtoz8RF\nQK+24MKAllYEoAPQy4DmS4BuTW2DOd4K0BeeQRe7RwV0037KADTUGBvQwrV+ywD0u4wI6Cqf\nmwkNNpjjzQA92Bp0nRYS0MKdgwK6YbhHAHR1vOyRvh2AZlUC0O+yJaDXJeVqQK+YQZO25mg7\nQI+1i6MZ0PIDb0xAtwz41QE9f4cJQMsqAeippDKaowO6eQ2aNTVHGwJ6QZdIwZEAnTsBDfm1\nH6CbvjJ1A5qTyd5ezogmQPNuiO8wAWhZJQA9lSsBunEXB21pjo4G9G8qDgVoiQeoY5ueAGjy\nsJe1vASgnX6Ij8gjAE3a7Qhob+gC0EX/cpVLAbq/nA7oFQvohwG6pNCYgLYPe1nLKwDaGXzx\nHWENoJc+ssYAtDt4Aeiif7nKKIDOy3I+KWcDes0TzuMALYmk6timpwDanYTJljsA2vayasEi\noN3R75tBnwlo3aDWczfgA9BF/3KVADQpT5lBE0A7flkD6Bo0VwK6BY8XALQ7+n1r0FcAdHI/\nlALQQv9ylQA0KQ9Zg74AoBfL3oBejsbmNWjKqp5dHFcAdMygm/QvV5kS43NAr0xKF9A9TFws\n5wN6vF0cWTpDXApAl1427cRuALSzWON8ENwB0LEG3aJ/uUoAmpTvuc4WgBZv1we0PDLqbgfo\ntp3YLYDmle4M6NjF0aB/ucqngG4AawB6pbSxAa0bq8p3A3TjTuwmQC+0vSGgvbsB6KJ/uUoA\nmpSdAH2LNWhA9TaAnq4PA+h5yDabQS+01VruAWjXwMXW/vMsPHoKoKdxD0D/lF0AneqpTAWQ\nW+cCWgLrAYDebg16qe3lAJ2WLXENrOgWwqttA9Be/QD0al3bATptAGj22KAV0PorvwmY+wF6\nq10ci22vBuhXGASg+3X/6l+u8img4yHhsq7NAP2TFWcCGja3PgHQLRY8ENDvMAhA9+v+1b9c\n5UNAJ1yTXG7i2HY4oCtqBgV0229ddgT0A2fQLRZ8AGhH4+iArvwMZbl8AGhLmwA01teBZNYk\nF5t4tgWgiQB9uS0pasmNibcW0Hpz63UB7fUSzGm0YBNANzZrSNK0P6BPmkET2gSgsb7MZfuN\nd6mJb1sAmgiA6x/PoD8GtNrcGoD2RaFE5/bZgKYdHXUNmtEmAI31kz6LGfSyrq0A/eka9LyS\nbOq2Axo13QnQax4NSgtqoKkLOhvQ7g/QtXF8F8fRgKbTwQA01odIutoadFMOjgroD3dxWEBr\nC2Ul2pJouhGg5VxjjSF+5cVvPCcD2vlK1rYP+qwZtG4cgMb6OOzLYA1Av1832AfdMKYBaF6W\nAK2mZ5sAevmZwZGAJsh2DBwV0Gw6GIDG+vXYaW8SgG4QULGtp3UAujTjs8l5t8oaE9w+nQpo\n8gnELSQ6BgU0mQ4GoLF+ALpZVwC6TU65fscZdFvTUwD9yRr0ini28kEUsatVSACa1l91KwA9\nCwtAjwvondag6y1PBrQ3rgmMCEAHoG8I6KSw8xhA1777ngto+YipSquNAL0gx98CdRigHR0N\ngE6OE5vk63dSIwA9lQA0KdsAes6/+eXCgPY0WUBXf8GwA6Bb23xLIC402wrQC+3cHxEEoBuF\npAA0rb/q1kMBXfLvUYCuPxlbiJFdAa2eAp4PaLqtN7MzvDMKoLt7Du+kRiOgf/33CEBPLwHo\nn7IFoNMzAb2wdaEeIx0J3w7oNBig9TPEVkA3xKSYc4G+VsOuAuiXBwPQtv6qW02AthtoNi2n\nzaDnC/cEtL7+4Qx66ZkaF7hyBi30VaQeAWjV3wC0qtEE6IZ9jAslAP17vQHQfqxuU04AdOnT\nYICWNbYH9MIadMWyrmxbuwYt9NWkHgJof1YSgG4SEoD26q+61QDoyre9bcoZgJ5VPAvQuRPQ\nffOhNYBu36ZxEKClkMoZ3hoF0J1wdNbdwYRFKW9Rj1jimJdKG/2y7tYioJPOzdsAWl24DKCp\ngDWArlHz1Bk06KtJDUAvA7pz+iofBFRMWDbyLewBDwnfLjsN0HedQasLAeiacUesQRt9FakB\n6EVAd64Ai1YbAPrn7faAnlx2IqAPWYMuOyp8q17lXEDXa90Y0Pvu4oCAq0sdGdBL6oaeQUuu\nbwHofH9AJ+Gz0wB9wC4O6OEZgG6SthD1Q+3iAFM/BXRHCUA7ovYCtAiZ/hWO7WbQ+f6AHmIG\n3aiiu8yf98MDemle8jGglXgLaK6dAxpNDUB3lysCuq/nm65Bo3EryzUAPcAadKuK/mKWzAZd\n4lhc2fsU0OAFbONorwCa5bFrfwDaE1I5g1vjALqvtOwGCEBP5f2lN+WHAFpc8CvvBOimr4Q7\nz6DBD5Aq7pMfJtTWDUB3lwcBukVEAHoqSR3eGNBZr3CcAOjGhyo7r0FvPoNmtg0P6IWBCEBr\nIwLQAegmFf3lhabzAL3ihxg77+LYeA2a2jY6oJcG4p6AbrclAB2APh7Q+VxA921LMmWwXRzc\ntsEBvfhReQNAO1/k2mwJQI8I6KZgD0A3qbK6szc5XVk22AdtKy+a5QFaX7gIoJe/ywSgdf0A\ndAC6SUV/mZYZyoXDAF025W+RylcAtE+/IQC9/F0mAJ3VMr0F9Ed9DkA3m66C6wmAnoUfBuh5\nD+M25QKArvBvDEAvfpd5OKCnr3tziwD0SYDWgdoQ7NXvhauun7LEITeYHQTot9IHAbq2gjAI\noJf0BaD1GAagzwE0bg1eHvvazGN0QGtw7AZoLdjdWdxbxgd01wx6yQDfsC5AL0k9GtAgYw9A\nt8eg2C7/brIxoJdlBKAzYcfi2NdRcxVAv6NjJ0Cjh4aaQXtXlj+YuTgO6J416AUD/NKMnZEB\nbYKmUrUT0CtmCbvPoAPQjaavnEEvTAavAuj3hQZAr5ifzS2Mh0Zag/auNHwwU3EOoDt2cdQN\nqJQ7ANoEzfaAXvM9bvc16BsBeufyMwji7Dv5Vb+n/ypy8iqtve7qWVm/VWj6lta73dCtVhrD\nXPR7ukunuAlrbqXFNt9Tp5i41b3aKkZEyz08u0roxxbYoKmO4ZI6NqZLuQv2vJtUxH/W50UZ\ny+I3HvYxZ9Dw6X/3JQ61DrfrDBq/Xl56Bu1+cTKXGuJt5fXlcp8ZtPgqO8AMWpWYQZ8FaFXu\n/pBQX9gH0HSt79qAdpM7AL2+sidC87maZYesQesyHqCdTdory+0AfeltdscAmj0tvzigveS+\nL6BX2bPNLg7N59r31D5Ar9vFAWU4QHsL5CvL/QBdb77m+m0BzZpst8vurF0crEYAep0FbQoX\nH8V3AnqFLcMDWnkoAN3cfM31+wKaab84oHmF9YBevbtjudwE0M0/RL8FoF/NuwGtP8MC0M3N\n11wPQPeVbQH9UbqtBrQPngC0ErjDGvSKMjygYwbdVQLQvvbjAL3yKcGRgK5MDQPQrRLvA+ia\niFiDZuVRgK6p2RrQ7Q/Ql0sDBP2b3qVDAF1bXA1AN2sLQP9WKDUC0M3N11x/EKBXbEFdLi0Q\ndG+Sa+J1banmMdUfM2ivjATo+sCOAWhRAtDNzddcfw6gFx7Lryy1MU1VVXwvs3hdWeoTrWYL\n+g1wrHBKALq5BKAD0A8C9IEz6OoqAr3Tn25EWuzi6C4B6LoFlRKAbm6+5vqDAH3gGvQSn9mP\nHHOXy5m0/nANQDdrW7JtGpVdAf1plwPQPaYHoN/lwrs4KqsIm86gKe8D0N1lM0DPozI2oG0+\n1i2olAB0c/M11w8HNJ870rI5oDcs/dvs/DXo3hWOmEGvsqBWtgJ0+dwcGdC/JgagA9BS6EMA\nXSnuLo6uTvasQfvC+ls+DdAtfF6aoC7pwAvfePezLi8/NQ9AsxKAfpebApqVD+ZD63dxLJix\nkRVOeQSgD5hBfw7ohm1NAWhWAtDvEoDuK88E9Cbu2wrQ+69BHzGDXvWZF4Bubs4vB6C3LAFo\naBmAxgq9D36LCLyAgP50V9ICoNftegpANzenVx1nB6D7SgAaWgagnYqdlrCmAOjP9/VXHxKu\n/N1AALq5ObvoOfsagB6OzwFobBmAdip2WsKaakAvLyE3qXCXRFfKD0A3NyfXXGcHoPtKABpa\nBqCdip2WsKaHAjpm0F75DNBrfmCcDwe0NS8A/VMC0Fzoqm0cK+p65XqA7hf/VhFr0IcC2nPq\nIGvQAWheAtBU5ipCPA3Qm/zpgipsYhcHLZ8A2v9UHWMXRwCalwA0E7mO0E8EdL/wIsWR5yUA\nAAiISURBVOVjGa8SgG5puvp7TwC6rwSgO1uuQGAAGgoCul/0xlJ+SgC6qe3adakAdF8JQHe2\nXDuD3sGESrkOoLeYPucAdI/pu6xBe8W1bScoBqBpCUAzmfGQUBdh3QYPCN86zo+6RwF6Lcw8\n2zYafyIXL/h1A9B95S6AXjnuTwL0Jls4XjrOj7pnAXplcWzbavyJYLzg1w1A95XbAHonEyrl\nIoCe1ufjIeFDAb3Z+BPJeMGvG4DuKwHo7nIRQMcM+uGA3i4AjODFC6UEoPtKALq7XAXQsQb9\ncEBvFgBG7uKFUgLQfSUA3V0uA+jYxfFwQG8VAEbs4oVSAtB9JQDdXa4D6I1KAHq96Yfi6PB9\n0CvUBKD7SgC6uwSgu0sAepeyX77wEoCmJQD9eQlAd+k4P+oC0JVyMKBt31oAfeSaT2MJQHe2\n3C3ghvpvEoqKnZawpnv4bjPaPAfQh+LoWECTJ48B6J8yBqA/SNbTAb3NQ+0AdHcJQO9SDgU0\n27sXgP4pQwD6E8adDeiNtoUGoLtLAHqXciCg+c+fAtA/ZQRAf8S4cwG92S/rAtDdJQC9S4kZ\ndF+5H6A/Y1zMoL2KnZawpgHoAPS+JdagvZLy2YC+8gw6xxp0t5IAdAC6FMvnAPRvSQOkypXX\noEfdxfGJVQHoAPTh+6ChVJnwLEBv99v6R+7i2KZsDeiPBjUAHYA+OV/q36ofBegN/zrVGWP6\nOEC31fpoUAPQAehz82XhudQDAb1JRwPQ3WVbQH84qAHoAHTMoDtLAFqVdsOfBOiYQf//7d1t\nU6NKGIRhFrW2ttxay/n/P3aNEYRkIEw7wzTMfX3Iix7d9gE6SMCz8Usp6EUcg9aUOQZ91EMc\nCdGbKmiOQW/7Ugp6Ue3thbM4rj7P4qj9JqEqZUex9gq3LndB/+h9VwqagrbeXpor6Ezfa+9l\nmnR4xnmFK1DQushEy1yFmen7UNBFOG8vbRV0vp+RPWiZT0HHZkpBU9BOxmzn/3vQxy5ojkHn\nFv/jIhQ0Be1kyJbvEo58KOgZzuLIba89aIcDaxT0Muft5Stbxks48qGgRdbhjAp6p2PQBtev\nUtBrnLeXa7acZwjnk/886Hzfy3mZeodzKuhdzuLwuH6Vgl7mvL00tAed9Qd0Xqbe4Tanq7M+\nZp+dyeVRKQXdfxju+zoFfeb/J2GSdo5B530Jcl6m3uG2pqu0RrIHHfrhpp99eM/o+y585+2l\nmbM4Mh/EcV6m3uE2pqv1Ox3HoA0KeueF77y9tJItfk6Vznlu3uG2pav2rghncYwtPe/nHQt6\n74XvvL00k409aBfN7UHns3NBj4egf11s+rI8rgt/x38Q9bHMj4XFVcimgh6PcNR5k5Bj0IOG\nsnEWh4fGzuLIab896P7uAWdxVEE2jXM273Ck0+1W0H3kUbXoxZFNQzaRdTjS6fYq6P77loKu\ni2wa52ze4Uin2+tCle+7yZkc1aIXRzYN2UTW4Uin2+k86OH0jdmFhBR0FWTTOGfzDkc6XUN/\ni2NPZNOQTWQdjnQ6CroIsmnIJrIORzodBV0E2TRkE1mHI52Ogi6CbBqyiazDkU5HQRdBNg3Z\nRNbhSKejoIsgm4ZsIutwpNNR0EWQTUM2kXU40uko6CLIpiGbyDoc6XQUdBFk05BNZB2OdDoK\nugiyacgmsg5HOh0FXQTZNGQTWYcjnY6CLoJsGrKJrMORTkdBF0E2DdlE1uFIp6OgiyCbhmwi\n63Ck01HQRZBNQzaRdTjS6SjoIsimIZvIOhzpdBR0EWTTkE1kHY50Ogq6CLJpyCayDkc6HQVd\nBNk0ZBNZhyOdjoIugmwasomsw5FOR0EXQTYN2UTW4Uino6CLIJuGbCLrcKTTVS3on/lV5V89\nPuamYW4yRifLMjoK+kiYm4a5yRidjIJuDnPTMDcZo5NR0M1hbhrmJmN0sgMXNADgIQoaAExR\n0ABgioIGAFMUNACYoqABwFTRgu6XPv5h7b51zE3W3z9kao8tz4LpPZK6tS5/RUyNgu6/bpbu\nW9cvDIG5PXZf0EztoaFAph+a3jO9NYktdzPoRyhoPxS0joIWRHbsKOjNUgt6afOOK17QXzv2\nfbj5leh2IbPQR/0wie/RTT/J3FbcbhH3H44+b1o/fXDdXucHMJjemsSWC2lzK13Q49ZyzR/9\nXPR5u4ZJDSPjhS0BBZ1uWtCx4TG9VYktF9LmtsshjsgCHX+SPv68YX2Yj+N2W2FuK+Idw9TW\n3BV0mA+G6a1KbLmQNrfyBf31O9P4/PtTLPSY8UWWghZQ0Oko6B9JbLmQNrfyx6BD7LWln9/c\nPW9YfxUvaOb2QHRuTG3Vg4JmeusSWy6kza1UQc+2krvo8SXNQr9Y24Nmbiuiq9z4Caa27HbP\nOEwHw/SWKS0X0ua2R0Hf7fz38//m7nnLvmc0vkl4+znmFhVd5QJTe2xyHvTCqxvTi1JaLqTN\nrdghjsn1M7evLcPv8GFyN33etElBj6fZDR9ibqtiqxxT22Rylds4nPEp01uktJxJQeOnWP+B\n1lHQtihooHUUtC0KGmgdBQ0ApihoADBFQQOAKQoaAExR0ABgioIGAFMUNM6r6+4fAQfCeovz\noqBxcKy3aAEFjUNivcV5fdby23P3cnn00v0L4V/3XDsUsB0FjfO6FPR733Xdy8ej9+4phOdL\nSwNHQUHjvC4F/ftjn/n9+fLoT/f3tftdOxOQgILGeV1q+al7C+Ht82AHf8MYR0NB47wutXx9\ne/Dz9rXrXisnApJQ0DgvChoHR0HjvG4PcTw9cYgDh0JB47yubw0+v4fhTcK/3Z/amYAEFDTO\n6/40u6fuvXYoYDsKGud1vVDlZXahykvtUMB2FDQAmKKgAcAUBQ0ApihoADBFQQOAKQoaAExR\n0ABgioIGAFMUNACYoqABwBQFDQCmKGgAMPUfY3iRW07XslQAAAAASUVORK5CYII=",
- "text/plain": [
- "plot without title"
- ]
- },
- "metadata": {
- "image/png": {
- "height": 360,
- "width": 720
- }
- },
- "output_type": "display_data"
- }
- ],
- "source": [
- "start_date <- mdy(\"Jan 1, 2020\")\n",
- "end_date <- mdy(\"Dec 31, 2020\")\n",
- "idx = seq(start_date,end_date,by ='day')\n",
- "print(paste(\"length of index is \",length(idx)))\n",
- "size = length(idx)\n",
- "sales = runif(366,min=25,max=50)\n",
- "sold_items <- data.frame(row.names=idx[0:size],sales)\n",
- "ggplot(sold_items,aes(x=idx,y=sales)) + geom_point(color = \"firebrick\", shape = \"diamond\", size = 2) +\n",
- " geom_line(color = \"firebrick\", size = .3)"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 36,
- "id": "30747f7c",
- "metadata": {},
- "outputs": [],
- "source": [
- "library(repr)\n"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 38,
- "id": "48f3e762",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "366"
- ],
- "text/latex": [
- "366"
- ],
- "text/markdown": [
- "366"
- ],
- "text/plain": [
- "[1] 366"
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- }
- ],
- "source": [
- "#changing plot size\n",
- "options(repr.plot.width = 12,repr.plot.height=6)\n",
- "size"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "id": "abe41544",
- "metadata": {},
- "outputs": [
- {
- "data": {
- "text/html": [
- "