{ "cells": [ { "cell_type": "markdown", "id": "9f9b980c", "metadata": {}, "source": [ "## Pandas Usecase in R\n", " We have to use dplyr library to solve pandas usecase in R. We will start importing typical data science library" ] }, { "cell_type": "code", "execution_count": 28, "id": "625abf4a", "metadata": {}, "outputs": [], "source": [ "options(warn=-1)\n", "library(dplyr)\n", "library(tidyverse)\n", "library(lubridate)\n", "library(zoo)\n", "library(xts)\n", "library('ggplot2')" ] }, { "cell_type": "markdown", "id": "d786e051", "metadata": {}, "source": [ "## Series" ] }, { "cell_type": "markdown", "id": "0f47587a", "metadata": {}, "source": [ " Series is like a list or 1D-array, but with index. All operations are index-aligned. Indexing of row in R we have to use row.names.\n" ] }, { "cell_type": "code", "execution_count": 29, "id": "f659f553", "metadata": {}, "outputs": [], "source": [ "a<- 1:9" ] }, { "cell_type": "code", "execution_count": 30, "id": "9acc193d", "metadata": {}, "outputs": [], "source": [ "b = c(\"I\",\"like\",\"to\",\"use\",\"Python\",\"and\",\"Pandas\",\"very\",\"much\")" ] }, { "cell_type": "code", "execution_count": 31, "id": "f577ec14", "metadata": {}, "outputs": [], "source": [ "a1 = length(a)\n", "b1 = length(b)" ] }, { "cell_type": "code", "execution_count": 32, "id": "31e069a0", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " a\n", "1 1\n", "2 2\n", "3 3\n", "4 4\n", "5 5\n", "6 6\n", "7 7\n", "8 8\n", "9 9\n" ] } ], "source": [ "a = data.frame(a,row.names = c(1:a1))\n", "print(a)" ] }, { "cell_type": "code", "execution_count": 33, "id": "29ce166e", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ " b\n", "1 I\n", "2 like\n", "3 to\n", "4 use\n", "5 Python\n", "6 and\n", "7 Pandas\n", "8 very\n", "9 much\n" ] } ], "source": [ "b = data.frame(b,row.names = c(1:b1))\n", "print(b)" ] }, { "cell_type": "markdown", "id": "a83abe74", "metadata": {}, "source": [ " One of the frequent usages of series is time series. In time series, the index has a special structure - typically a range of dates or datetimes. The easiest way to create time series using the ts function. But we will try another way to implement time series. We have to use the lubridate library to create an index of dates using the seq function.\n", " \n", " Suppose we have a series that shows the amount of product bought every day, and we know that every Sunday we also need to take one additional item for ourselves. Here is how to model using series:" ] }, { "cell_type": "code", "execution_count": 34, "id": "eeb683c7", "metadata": {}, "outputs": [], "source": [ "# We will use ggplot2 for visualizing the data\n", "# If you want to change the plot size repr library will be used\n", "library(repr)\n", "options(repr.plot.width = 12,repr.plot.height=6)" ] }, { "cell_type": "code", "execution_count": 35, "id": "e7788ca1", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "[1] \"length of index is 366\"\n" ] }, { "data": { "image/png": "", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 360, "width": 720 } }, "output_type": "display_data" } ], "source": [ "start_date <- mdy(\"Jan 1, 2020\")\n", "end_date <- mdy(\"Dec 31, 2020\")\n", "idx = seq(start_date,end_date,by ='day')\n", "print(paste(\"length of index is \",length(idx)))\n", "size = length(idx)\n", "sales = runif(366,min=25,max=50)\n", "sold_items <- data.frame(row.names=idx[0:size],sales)\n", "ggplot(sold_items,aes(x=idx,y=sales)) + geom_point(color = \"firebrick\", shape = \"diamond\", size = 2) +\n", " geom_line(color = \"firebrick\", size = .3)" ] }, { "cell_type": "markdown", "id": "3f199e43", "metadata": {}, "source": [ "We are merging additional_items and sold_items so that we can find the total no of products.\n", "As you can see, we are having problems here to find the total, we are getting NaN value as in the weekly series non-mentioned days are considered to be missing (NaN) if we add NaN to a number that gives us NaN.\n", "In order to do addition, we need to replace NAN with 0." ] }, { "cell_type": "code", "execution_count": 36, "id": "abe41544", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 53 × 1
additional_product
<dbl>
2020-01-0110
2020-01-0810
2020-01-1510
2020-01-2210
2020-01-2910
2020-02-0510
2020-02-1210
2020-02-1910
2020-02-2610
2020-03-0410
2020-03-1110
2020-03-1810
2020-03-2510
2020-04-0110
2020-04-0810
2020-04-1510
2020-04-2210
2020-04-2910
2020-05-0610
2020-05-1310
2020-05-2010
2020-05-2710
2020-06-0310
2020-06-1010
2020-06-1710
2020-06-2410
2020-07-0110
2020-07-0810
2020-07-1510
2020-07-2210
2020-07-2910
2020-08-0510
2020-08-1210
2020-08-1910
2020-08-2610
2020-09-0210
2020-09-0910
2020-09-1610
2020-09-2310
2020-09-3010
2020-10-0710
2020-10-1410
2020-10-2110
2020-10-2810
2020-11-0410
2020-11-1110
2020-11-1810
2020-11-2510
2020-12-0210
2020-12-0910
2020-12-1610
2020-12-2310
2020-12-3010
\n" ], "text/latex": [ "A data.frame: 53 × 1\n", "\\begin{tabular}{r|l}\n", " & additional\\_product\\\\\n", " & \\\\\n", "\\hline\n", "\t2020-01-01 & 10\\\\\n", "\t2020-01-08 & 10\\\\\n", "\t2020-01-15 & 10\\\\\n", "\t2020-01-22 & 10\\\\\n", "\t2020-01-29 & 10\\\\\n", "\t2020-02-05 & 10\\\\\n", "\t2020-02-12 & 10\\\\\n", "\t2020-02-19 & 10\\\\\n", "\t2020-02-26 & 10\\\\\n", "\t2020-03-04 & 10\\\\\n", "\t2020-03-11 & 10\\\\\n", "\t2020-03-18 & 10\\\\\n", "\t2020-03-25 & 10\\\\\n", "\t2020-04-01 & 10\\\\\n", "\t2020-04-08 & 10\\\\\n", "\t2020-04-15 & 10\\\\\n", "\t2020-04-22 & 10\\\\\n", "\t2020-04-29 & 10\\\\\n", "\t2020-05-06 & 10\\\\\n", "\t2020-05-13 & 10\\\\\n", "\t2020-05-20 & 10\\\\\n", "\t2020-05-27 & 10\\\\\n", "\t2020-06-03 & 10\\\\\n", "\t2020-06-10 & 10\\\\\n", "\t2020-06-17 & 10\\\\\n", "\t2020-06-24 & 10\\\\\n", "\t2020-07-01 & 10\\\\\n", "\t2020-07-08 & 10\\\\\n", "\t2020-07-15 & 10\\\\\n", "\t2020-07-22 & 10\\\\\n", "\t2020-07-29 & 10\\\\\n", "\t2020-08-05 & 10\\\\\n", "\t2020-08-12 & 10\\\\\n", "\t2020-08-19 & 10\\\\\n", "\t2020-08-26 & 10\\\\\n", "\t2020-09-02 & 10\\\\\n", "\t2020-09-09 & 10\\\\\n", "\t2020-09-16 & 10\\\\\n", "\t2020-09-23 & 10\\\\\n", "\t2020-09-30 & 10\\\\\n", "\t2020-10-07 & 10\\\\\n", "\t2020-10-14 & 10\\\\\n", "\t2020-10-21 & 10\\\\\n", "\t2020-10-28 & 10\\\\\n", "\t2020-11-04 & 10\\\\\n", "\t2020-11-11 & 10\\\\\n", "\t2020-11-18 & 10\\\\\n", "\t2020-11-25 & 10\\\\\n", "\t2020-12-02 & 10\\\\\n", "\t2020-12-09 & 10\\\\\n", "\t2020-12-16 & 10\\\\\n", "\t2020-12-23 & 10\\\\\n", "\t2020-12-30 & 10\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 53 × 1\n", "\n", "| | additional_product <dbl> |\n", "|---|---|\n", "| 2020-01-01 | 10 |\n", "| 2020-01-08 | 10 |\n", "| 2020-01-15 | 10 |\n", "| 2020-01-22 | 10 |\n", "| 2020-01-29 | 10 |\n", "| 2020-02-05 | 10 |\n", "| 2020-02-12 | 10 |\n", "| 2020-02-19 | 10 |\n", "| 2020-02-26 | 10 |\n", "| 2020-03-04 | 10 |\n", "| 2020-03-11 | 10 |\n", "| 2020-03-18 | 10 |\n", "| 2020-03-25 | 10 |\n", "| 2020-04-01 | 10 |\n", "| 2020-04-08 | 10 |\n", "| 2020-04-15 | 10 |\n", "| 2020-04-22 | 10 |\n", "| 2020-04-29 | 10 |\n", "| 2020-05-06 | 10 |\n", "| 2020-05-13 | 10 |\n", "| 2020-05-20 | 10 |\n", "| 2020-05-27 | 10 |\n", "| 2020-06-03 | 10 |\n", "| 2020-06-10 | 10 |\n", "| 2020-06-17 | 10 |\n", "| 2020-06-24 | 10 |\n", "| 2020-07-01 | 10 |\n", "| 2020-07-08 | 10 |\n", "| 2020-07-15 | 10 |\n", "| 2020-07-22 | 10 |\n", "| 2020-07-29 | 10 |\n", "| 2020-08-05 | 10 |\n", "| 2020-08-12 | 10 |\n", "| 2020-08-19 | 10 |\n", "| 2020-08-26 | 10 |\n", "| 2020-09-02 | 10 |\n", "| 2020-09-09 | 10 |\n", "| 2020-09-16 | 10 |\n", "| 2020-09-23 | 10 |\n", "| 2020-09-30 | 10 |\n", "| 2020-10-07 | 10 |\n", "| 2020-10-14 | 10 |\n", "| 2020-10-21 | 10 |\n", "| 2020-10-28 | 10 |\n", "| 2020-11-04 | 10 |\n", "| 2020-11-11 | 10 |\n", "| 2020-11-18 | 10 |\n", "| 2020-11-25 | 10 |\n", "| 2020-12-02 | 10 |\n", "| 2020-12-09 | 10 |\n", "| 2020-12-16 | 10 |\n", "| 2020-12-23 | 10 |\n", "| 2020-12-30 | 10 |\n", "\n" ], "text/plain": [ " additional_product\n", "2020-01-01 10 \n", "2020-01-08 10 \n", "2020-01-15 10 \n", "2020-01-22 10 \n", "2020-01-29 10 \n", "2020-02-05 10 \n", "2020-02-12 10 \n", "2020-02-19 10 \n", "2020-02-26 10 \n", "2020-03-04 10 \n", "2020-03-11 10 \n", "2020-03-18 10 \n", "2020-03-25 10 \n", "2020-04-01 10 \n", "2020-04-08 10 \n", "2020-04-15 10 \n", "2020-04-22 10 \n", "2020-04-29 10 \n", "2020-05-06 10 \n", "2020-05-13 10 \n", "2020-05-20 10 \n", "2020-05-27 10 \n", "2020-06-03 10 \n", "2020-06-10 10 \n", "2020-06-17 10 \n", "2020-06-24 10 \n", "2020-07-01 10 \n", "2020-07-08 10 \n", "2020-07-15 10 \n", "2020-07-22 10 \n", "2020-07-29 10 \n", "2020-08-05 10 \n", "2020-08-12 10 \n", "2020-08-19 10 \n", "2020-08-26 10 \n", "2020-09-02 10 \n", "2020-09-09 10 \n", "2020-09-16 10 \n", "2020-09-23 10 \n", "2020-09-30 10 \n", "2020-10-07 10 \n", "2020-10-14 10 \n", "2020-10-21 10 \n", "2020-10-28 10 \n", "2020-11-04 10 \n", "2020-11-11 10 \n", "2020-11-18 10 \n", "2020-11-25 10 \n", "2020-12-02 10 \n", "2020-12-09 10 \n", "2020-12-16 10 \n", "2020-12-23 10 \n", "2020-12-30 10 " ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 366 × 1
total
<dbl>
2020-01-0153.59979
2020-01-02 NA
2020-01-03 NA
2020-01-04 NA
2020-01-05 NA
2020-01-06 NA
2020-01-07 NA
2020-01-0840.93455
2020-01-09 NA
2020-01-10 NA
2020-01-11 NA
2020-01-12 NA
2020-01-13 NA
2020-01-14 NA
2020-01-1559.24704
2020-01-16 NA
2020-01-17 NA
2020-01-18 NA
2020-01-19 NA
2020-01-20 NA
2020-01-21 NA
2020-01-2238.26416
2020-01-23 NA
2020-01-24 NA
2020-01-25 NA
2020-01-26 NA
2020-01-27 NA
2020-01-28 NA
2020-01-2944.58327
2020-01-30 NA
......
2020-12-0241.74811
2020-12-03 NA
2020-12-04 NA
2020-12-05 NA
2020-12-06 NA
2020-12-07 NA
2020-12-08 NA
2020-12-0937.85650
2020-12-10 NA
2020-12-11 NA
2020-12-12 NA
2020-12-13 NA
2020-12-14 NA
2020-12-15 NA
2020-12-1646.73560
2020-12-17 NA
2020-12-18 NA
2020-12-19 NA
2020-12-20 NA
2020-12-21 NA
2020-12-22 NA
2020-12-2340.42143
2020-12-24 NA
2020-12-25 NA
2020-12-26 NA
2020-12-27 NA
2020-12-28 NA
2020-12-29 NA
2020-12-3041.20298
2020-12-31 NA
\n" ], "text/latex": [ "A data.frame: 366 × 1\n", "\\begin{tabular}{r|l}\n", " & total\\\\\n", " & \\\\\n", "\\hline\n", "\t2020-01-01 & 53.59979\\\\\n", "\t2020-01-02 & NA\\\\\n", "\t2020-01-03 & NA\\\\\n", "\t2020-01-04 & NA\\\\\n", "\t2020-01-05 & NA\\\\\n", "\t2020-01-06 & NA\\\\\n", "\t2020-01-07 & NA\\\\\n", "\t2020-01-08 & 40.93455\\\\\n", "\t2020-01-09 & NA\\\\\n", "\t2020-01-10 & NA\\\\\n", "\t2020-01-11 & NA\\\\\n", "\t2020-01-12 & NA\\\\\n", "\t2020-01-13 & NA\\\\\n", "\t2020-01-14 & NA\\\\\n", "\t2020-01-15 & 59.24704\\\\\n", "\t2020-01-16 & NA\\\\\n", "\t2020-01-17 & NA\\\\\n", "\t2020-01-18 & NA\\\\\n", "\t2020-01-19 & NA\\\\\n", "\t2020-01-20 & NA\\\\\n", "\t2020-01-21 & NA\\\\\n", "\t2020-01-22 & 38.26416\\\\\n", "\t2020-01-23 & NA\\\\\n", "\t2020-01-24 & NA\\\\\n", "\t2020-01-25 & NA\\\\\n", "\t2020-01-26 & NA\\\\\n", "\t2020-01-27 & NA\\\\\n", "\t2020-01-28 & NA\\\\\n", "\t2020-01-29 & 44.58327\\\\\n", "\t2020-01-30 & NA\\\\\n", "\t... & ...\\\\\n", "\t2020-12-02 & 41.74811\\\\\n", "\t2020-12-03 & NA\\\\\n", "\t2020-12-04 & NA\\\\\n", "\t2020-12-05 & NA\\\\\n", "\t2020-12-06 & NA\\\\\n", "\t2020-12-07 & NA\\\\\n", "\t2020-12-08 & NA\\\\\n", "\t2020-12-09 & 37.85650\\\\\n", "\t2020-12-10 & NA\\\\\n", "\t2020-12-11 & NA\\\\\n", "\t2020-12-12 & NA\\\\\n", "\t2020-12-13 & NA\\\\\n", "\t2020-12-14 & NA\\\\\n", "\t2020-12-15 & NA\\\\\n", "\t2020-12-16 & 46.73560\\\\\n", "\t2020-12-17 & NA\\\\\n", "\t2020-12-18 & NA\\\\\n", "\t2020-12-19 & NA\\\\\n", "\t2020-12-20 & NA\\\\\n", "\t2020-12-21 & NA\\\\\n", "\t2020-12-22 & NA\\\\\n", "\t2020-12-23 & 40.42143\\\\\n", "\t2020-12-24 & NA\\\\\n", "\t2020-12-25 & NA\\\\\n", "\t2020-12-26 & NA\\\\\n", "\t2020-12-27 & NA\\\\\n", "\t2020-12-28 & NA\\\\\n", "\t2020-12-29 & NA\\\\\n", "\t2020-12-30 & 41.20298\\\\\n", "\t2020-12-31 & NA\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 366 × 1\n", "\n", "| | total <dbl> |\n", "|---|---|\n", "| 2020-01-01 | 53.59979 |\n", "| 2020-01-02 | NA |\n", "| 2020-01-03 | NA |\n", "| 2020-01-04 | NA |\n", "| 2020-01-05 | NA |\n", "| 2020-01-06 | NA |\n", "| 2020-01-07 | NA |\n", "| 2020-01-08 | 40.93455 |\n", "| 2020-01-09 | NA |\n", "| 2020-01-10 | NA |\n", "| 2020-01-11 | NA |\n", "| 2020-01-12 | NA |\n", "| 2020-01-13 | NA |\n", "| 2020-01-14 | NA |\n", "| 2020-01-15 | 59.24704 |\n", "| 2020-01-16 | NA |\n", "| 2020-01-17 | NA |\n", "| 2020-01-18 | NA |\n", "| 2020-01-19 | NA |\n", "| 2020-01-20 | NA |\n", "| 2020-01-21 | NA |\n", "| 2020-01-22 | 38.26416 |\n", "| 2020-01-23 | NA |\n", "| 2020-01-24 | NA |\n", "| 2020-01-25 | NA |\n", "| 2020-01-26 | NA |\n", "| 2020-01-27 | NA |\n", "| 2020-01-28 | NA |\n", "| 2020-01-29 | 44.58327 |\n", "| 2020-01-30 | NA |\n", "| ... | ... |\n", "| 2020-12-02 | 41.74811 |\n", "| 2020-12-03 | NA |\n", "| 2020-12-04 | NA |\n", "| 2020-12-05 | NA |\n", "| 2020-12-06 | NA |\n", "| 2020-12-07 | NA |\n", "| 2020-12-08 | NA |\n", "| 2020-12-09 | 37.85650 |\n", "| 2020-12-10 | NA |\n", "| 2020-12-11 | NA |\n", "| 2020-12-12 | NA |\n", "| 2020-12-13 | NA |\n", "| 2020-12-14 | NA |\n", "| 2020-12-15 | NA |\n", "| 2020-12-16 | 46.73560 |\n", "| 2020-12-17 | NA |\n", "| 2020-12-18 | NA |\n", "| 2020-12-19 | NA |\n", "| 2020-12-20 | NA |\n", "| 2020-12-21 | NA |\n", "| 2020-12-22 | NA |\n", "| 2020-12-23 | 40.42143 |\n", "| 2020-12-24 | NA |\n", "| 2020-12-25 | NA |\n", "| 2020-12-26 | NA |\n", "| 2020-12-27 | NA |\n", "| 2020-12-28 | NA |\n", "| 2020-12-29 | NA |\n", "| 2020-12-30 | 41.20298 |\n", "| 2020-12-31 | NA |\n", "\n" ], "text/plain": [ " total \n", "2020-01-01 53.59979\n", "2020-01-02 NA\n", "2020-01-03 NA\n", "2020-01-04 NA\n", "2020-01-05 NA\n", "2020-01-06 NA\n", "2020-01-07 NA\n", "2020-01-08 40.93455\n", "2020-01-09 NA\n", "2020-01-10 NA\n", "2020-01-11 NA\n", "2020-01-12 NA\n", "2020-01-13 NA\n", "2020-01-14 NA\n", "2020-01-15 59.24704\n", "2020-01-16 NA\n", "2020-01-17 NA\n", "2020-01-18 NA\n", "2020-01-19 NA\n", "2020-01-20 NA\n", "2020-01-21 NA\n", "2020-01-22 38.26416\n", "2020-01-23 NA\n", "2020-01-24 NA\n", "2020-01-25 NA\n", "2020-01-26 NA\n", "2020-01-27 NA\n", "2020-01-28 NA\n", "2020-01-29 44.58327\n", "2020-01-30 NA\n", "... ... \n", "2020-12-02 41.74811\n", "2020-12-03 NA\n", "2020-12-04 NA\n", "2020-12-05 NA\n", "2020-12-06 NA\n", "2020-12-07 NA\n", "2020-12-08 NA\n", "2020-12-09 37.85650\n", "2020-12-10 NA\n", "2020-12-11 NA\n", "2020-12-12 NA\n", "2020-12-13 NA\n", "2020-12-14 NA\n", "2020-12-15 NA\n", "2020-12-16 46.73560\n", "2020-12-17 NA\n", "2020-12-18 NA\n", "2020-12-19 NA\n", "2020-12-20 NA\n", "2020-12-21 NA\n", "2020-12-22 NA\n", "2020-12-23 40.42143\n", "2020-12-24 NA\n", "2020-12-25 NA\n", "2020-12-26 NA\n", "2020-12-27 NA\n", "2020-12-28 NA\n", "2020-12-29 NA\n", "2020-12-30 41.20298\n", "2020-12-31 NA" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "index = seq(start_date,end_date,by = 'week')\n", "sz = length(index)\n", "additional_product <- rep(10,53)\n", "additional_items <- data.frame(row.names = index[0:sz],additional_product)\n", "additional_items\n", "# we are merging two dataframe so that we can add\n", "additional_item = merge(additional_items,sold_items, by = 0, all = TRUE)[-1] \n", "total = data.frame(row.names=idx[0:size],additional_item$additional_product + additional_item$sales)\n", "colnames(total) = c('total')\n", "total" ] }, { "cell_type": "code", "execution_count": 37, "id": "387cb4c2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 366 × 1
total
<dbl>
2020-01-0153.59979
2020-01-0230.41127
2020-01-0348.54839
2020-01-0439.20897
2020-01-0539.09894
2020-01-0647.53019
2020-01-0744.94766
2020-01-0840.93455
2020-01-0937.66561
2020-01-1031.68825
2020-01-1145.30576
2020-01-1226.45509
2020-01-1345.81249
2020-01-1446.84547
2020-01-1559.24704
2020-01-1629.28688
2020-01-1732.41731
2020-01-1845.23295
2020-01-1948.54330
2020-01-2036.69353
2020-01-2143.09588
2020-01-2238.26416
2020-01-2345.56863
2020-01-2425.70944
2020-01-2537.38721
2020-01-2644.53955
2020-01-2746.88427
2020-01-2848.05540
2020-01-2944.58327
2020-01-3026.19490
......
2020-12-0241.74811
2020-12-0335.03915
2020-12-0425.84637
2020-12-0527.73147
2020-12-0639.00993
2020-12-0741.03187
2020-12-0826.33862
2020-12-0937.85650
2020-12-1041.98943
2020-12-1136.68901
2020-12-1246.96883
2020-12-1339.70374
2020-12-1446.59464
2020-12-1541.24742
2020-12-1646.73560
2020-12-1732.68275
2020-12-1846.64238
2020-12-1925.22163
2020-12-2039.79997
2020-12-2134.45013
2020-12-2248.71183
2020-12-2340.42143
2020-12-2432.41991
2020-12-2539.12296
2020-12-2629.43616
2020-12-2739.09337
2020-12-2838.09288
2020-12-2941.00681
2020-12-3041.20298
2020-12-3143.25232
\n" ], "text/latex": [ "A data.frame: 366 × 1\n", "\\begin{tabular}{r|l}\n", " & total\\\\\n", " & \\\\\n", "\\hline\n", "\t2020-01-01 & 53.59979\\\\\n", "\t2020-01-02 & 30.41127\\\\\n", "\t2020-01-03 & 48.54839\\\\\n", "\t2020-01-04 & 39.20897\\\\\n", "\t2020-01-05 & 39.09894\\\\\n", "\t2020-01-06 & 47.53019\\\\\n", "\t2020-01-07 & 44.94766\\\\\n", "\t2020-01-08 & 40.93455\\\\\n", "\t2020-01-09 & 37.66561\\\\\n", "\t2020-01-10 & 31.68825\\\\\n", "\t2020-01-11 & 45.30576\\\\\n", "\t2020-01-12 & 26.45509\\\\\n", "\t2020-01-13 & 45.81249\\\\\n", "\t2020-01-14 & 46.84547\\\\\n", "\t2020-01-15 & 59.24704\\\\\n", "\t2020-01-16 & 29.28688\\\\\n", "\t2020-01-17 & 32.41731\\\\\n", "\t2020-01-18 & 45.23295\\\\\n", "\t2020-01-19 & 48.54330\\\\\n", "\t2020-01-20 & 36.69353\\\\\n", "\t2020-01-21 & 43.09588\\\\\n", "\t2020-01-22 & 38.26416\\\\\n", "\t2020-01-23 & 45.56863\\\\\n", "\t2020-01-24 & 25.70944\\\\\n", "\t2020-01-25 & 37.38721\\\\\n", "\t2020-01-26 & 44.53955\\\\\n", "\t2020-01-27 & 46.88427\\\\\n", "\t2020-01-28 & 48.05540\\\\\n", "\t2020-01-29 & 44.58327\\\\\n", "\t2020-01-30 & 26.19490\\\\\n", "\t... & ...\\\\\n", "\t2020-12-02 & 41.74811\\\\\n", "\t2020-12-03 & 35.03915\\\\\n", "\t2020-12-04 & 25.84637\\\\\n", "\t2020-12-05 & 27.73147\\\\\n", "\t2020-12-06 & 39.00993\\\\\n", "\t2020-12-07 & 41.03187\\\\\n", "\t2020-12-08 & 26.33862\\\\\n", "\t2020-12-09 & 37.85650\\\\\n", "\t2020-12-10 & 41.98943\\\\\n", "\t2020-12-11 & 36.68901\\\\\n", "\t2020-12-12 & 46.96883\\\\\n", "\t2020-12-13 & 39.70374\\\\\n", "\t2020-12-14 & 46.59464\\\\\n", "\t2020-12-15 & 41.24742\\\\\n", "\t2020-12-16 & 46.73560\\\\\n", "\t2020-12-17 & 32.68275\\\\\n", "\t2020-12-18 & 46.64238\\\\\n", "\t2020-12-19 & 25.22163\\\\\n", "\t2020-12-20 & 39.79997\\\\\n", "\t2020-12-21 & 34.45013\\\\\n", "\t2020-12-22 & 48.71183\\\\\n", "\t2020-12-23 & 40.42143\\\\\n", "\t2020-12-24 & 32.41991\\\\\n", "\t2020-12-25 & 39.12296\\\\\n", "\t2020-12-26 & 29.43616\\\\\n", "\t2020-12-27 & 39.09337\\\\\n", "\t2020-12-28 & 38.09288\\\\\n", "\t2020-12-29 & 41.00681\\\\\n", "\t2020-12-30 & 41.20298\\\\\n", "\t2020-12-31 & 43.25232\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 366 × 1\n", "\n", "| | total <dbl> |\n", "|---|---|\n", "| 2020-01-01 | 53.59979 |\n", "| 2020-01-02 | 30.41127 |\n", "| 2020-01-03 | 48.54839 |\n", "| 2020-01-04 | 39.20897 |\n", "| 2020-01-05 | 39.09894 |\n", "| 2020-01-06 | 47.53019 |\n", "| 2020-01-07 | 44.94766 |\n", "| 2020-01-08 | 40.93455 |\n", "| 2020-01-09 | 37.66561 |\n", "| 2020-01-10 | 31.68825 |\n", "| 2020-01-11 | 45.30576 |\n", "| 2020-01-12 | 26.45509 |\n", "| 2020-01-13 | 45.81249 |\n", "| 2020-01-14 | 46.84547 |\n", "| 2020-01-15 | 59.24704 |\n", "| 2020-01-16 | 29.28688 |\n", "| 2020-01-17 | 32.41731 |\n", "| 2020-01-18 | 45.23295 |\n", "| 2020-01-19 | 48.54330 |\n", "| 2020-01-20 | 36.69353 |\n", "| 2020-01-21 | 43.09588 |\n", "| 2020-01-22 | 38.26416 |\n", "| 2020-01-23 | 45.56863 |\n", "| 2020-01-24 | 25.70944 |\n", "| 2020-01-25 | 37.38721 |\n", "| 2020-01-26 | 44.53955 |\n", "| 2020-01-27 | 46.88427 |\n", "| 2020-01-28 | 48.05540 |\n", "| 2020-01-29 | 44.58327 |\n", "| 2020-01-30 | 26.19490 |\n", "| ... | ... |\n", "| 2020-12-02 | 41.74811 |\n", "| 2020-12-03 | 35.03915 |\n", "| 2020-12-04 | 25.84637 |\n", "| 2020-12-05 | 27.73147 |\n", "| 2020-12-06 | 39.00993 |\n", "| 2020-12-07 | 41.03187 |\n", "| 2020-12-08 | 26.33862 |\n", "| 2020-12-09 | 37.85650 |\n", "| 2020-12-10 | 41.98943 |\n", "| 2020-12-11 | 36.68901 |\n", "| 2020-12-12 | 46.96883 |\n", "| 2020-12-13 | 39.70374 |\n", "| 2020-12-14 | 46.59464 |\n", "| 2020-12-15 | 41.24742 |\n", "| 2020-12-16 | 46.73560 |\n", "| 2020-12-17 | 32.68275 |\n", "| 2020-12-18 | 46.64238 |\n", "| 2020-12-19 | 25.22163 |\n", "| 2020-12-20 | 39.79997 |\n", "| 2020-12-21 | 34.45013 |\n", "| 2020-12-22 | 48.71183 |\n", "| 2020-12-23 | 40.42143 |\n", "| 2020-12-24 | 32.41991 |\n", "| 2020-12-25 | 39.12296 |\n", "| 2020-12-26 | 29.43616 |\n", "| 2020-12-27 | 39.09337 |\n", "| 2020-12-28 | 38.09288 |\n", "| 2020-12-29 | 41.00681 |\n", "| 2020-12-30 | 41.20298 |\n", "| 2020-12-31 | 43.25232 |\n", "\n" ], "text/plain": [ " total \n", "2020-01-01 53.59979\n", "2020-01-02 30.41127\n", "2020-01-03 48.54839\n", "2020-01-04 39.20897\n", "2020-01-05 39.09894\n", "2020-01-06 47.53019\n", "2020-01-07 44.94766\n", "2020-01-08 40.93455\n", "2020-01-09 37.66561\n", "2020-01-10 31.68825\n", "2020-01-11 45.30576\n", "2020-01-12 26.45509\n", "2020-01-13 45.81249\n", "2020-01-14 46.84547\n", "2020-01-15 59.24704\n", "2020-01-16 29.28688\n", "2020-01-17 32.41731\n", "2020-01-18 45.23295\n", "2020-01-19 48.54330\n", "2020-01-20 36.69353\n", "2020-01-21 43.09588\n", "2020-01-22 38.26416\n", "2020-01-23 45.56863\n", "2020-01-24 25.70944\n", "2020-01-25 37.38721\n", "2020-01-26 44.53955\n", "2020-01-27 46.88427\n", "2020-01-28 48.05540\n", "2020-01-29 44.58327\n", "2020-01-30 26.19490\n", "... ... \n", "2020-12-02 41.74811\n", "2020-12-03 35.03915\n", "2020-12-04 25.84637\n", "2020-12-05 27.73147\n", "2020-12-06 39.00993\n", "2020-12-07 41.03187\n", "2020-12-08 26.33862\n", "2020-12-09 37.85650\n", "2020-12-10 41.98943\n", "2020-12-11 36.68901\n", "2020-12-12 46.96883\n", "2020-12-13 39.70374\n", "2020-12-14 46.59464\n", "2020-12-15 41.24742\n", "2020-12-16 46.73560\n", "2020-12-17 32.68275\n", "2020-12-18 46.64238\n", "2020-12-19 25.22163\n", "2020-12-20 39.79997\n", "2020-12-21 34.45013\n", "2020-12-22 48.71183\n", "2020-12-23 40.42143\n", "2020-12-24 32.41991\n", "2020-12-25 39.12296\n", "2020-12-26 29.43616\n", "2020-12-27 39.09337\n", "2020-12-28 38.09288\n", "2020-12-29 41.00681\n", "2020-12-30 41.20298\n", "2020-12-31 43.25232" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "additional_item[is.na(additional_item)] = 0\n", "total = data.frame(row.names=idx[0:size],additional_item$additional_product + additional_item$sales)\n", "colnames(total) = c('total')\n", "total" ] }, { "cell_type": "code", "execution_count": 38, "id": "bdb60236", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 360, "width": 720 } }, "output_type": "display_data" } ], "source": [ "ggplot(total,aes(x=idx,y=total)) + geom_point(color = \"firebrick\", shape = \"diamond\", size = 2) +\n", " geom_line(color = \"firebrick\", linetype = \"dotted\", size = .3)" ] }, { "cell_type": "markdown", "id": "38e65fd5", "metadata": {}, "source": [ "We want to analyse total no of product in monthly basis.Thus, we find the mean of total no of product in a month and draw a bargraph" ] }, { "cell_type": "code", "execution_count": 39, "id": "294dde87", "metadata": {}, "outputs": [], "source": [ "index = seq(start_date,end_date,by ='month')\n" ] }, { "cell_type": "code", "execution_count": 40, "id": "7542d95e", "metadata": {}, "outputs": [ { "data": { "text/plain": [ " total\n", "2020-01-31 41.03847\n", "2020-02-29 40.91568\n", "2020-03-31 39.27424\n", "2020-04-30 37.63589\n", "2020-05-31 38.75129\n", "2020-06-30 38.75744\n", "2020-07-31 38.35212\n", "2020-08-31 40.43712\n", "2020-09-30 38.90043\n", "2020-10-31 37.99855\n", "2020-11-30 41.20759\n", "2020-12-31 38.46355" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "image/png": "", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 360, "width": 720 } }, "output_type": "display_data" } ], "source": [ "x<- as.xts(total, dateFormat =\"Date\")\n", "(monthly<-apply.monthly(x,mean))\n", "ggplot(monthly, aes(x=index, y=total)) + \n", " geom_bar(stat = \"identity\", width=5) " ] }, { "cell_type": "markdown", "id": "945feffd", "metadata": {}, "source": [ "## DataFrame\n", "Dataframe is essentially a collection of series with the same index. We can combine several series together into a dataframe. \n", "For example we are making dataframe of a and b series" ] }, { "cell_type": "code", "execution_count": 41, "id": "88a435ec", "metadata": {}, "outputs": [], "source": [ "a = data.frame(a,row.names = c(1:a1))" ] }, { "cell_type": "code", "execution_count": 42, "id": "c4e2a6c1", "metadata": {}, "outputs": [], "source": [ "b = data.frame(b,row.names = c(1:b1))" ] }, { "cell_type": "code", "execution_count": 43, "id": "2bb5177c", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 9 × 2
ab
<int><chr>
11I
22like
33to
44use
55Python
66and
77Pandas
88very
99much
\n" ], "text/latex": [ "A data.frame: 9 × 2\n", "\\begin{tabular}{r|ll}\n", " & a & b\\\\\n", " & & \\\\\n", "\\hline\n", "\t1 & 1 & I \\\\\n", "\t2 & 2 & like \\\\\n", "\t3 & 3 & to \\\\\n", "\t4 & 4 & use \\\\\n", "\t5 & 5 & Python\\\\\n", "\t6 & 6 & and \\\\\n", "\t7 & 7 & Pandas\\\\\n", "\t8 & 8 & very \\\\\n", "\t9 & 9 & much \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 9 × 2\n", "\n", "| | a <int> | b <chr> |\n", "|---|---|---|\n", "| 1 | 1 | I |\n", "| 2 | 2 | like |\n", "| 3 | 3 | to |\n", "| 4 | 4 | use |\n", "| 5 | 5 | Python |\n", "| 6 | 6 | and |\n", "| 7 | 7 | Pandas |\n", "| 8 | 8 | very |\n", "| 9 | 9 | much |\n", "\n" ], "text/plain": [ " a b \n", "1 1 I \n", "2 2 like \n", "3 3 to \n", "4 4 use \n", "5 5 Python\n", "6 6 and \n", "7 7 Pandas\n", "8 8 very \n", "9 9 much " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df<- data.frame(a,b)\n", "df" ] }, { "cell_type": "markdown", "id": "6531fe0e", "metadata": {}, "source": [ "We can also rename the column name by using rename function" ] }, { "cell_type": "code", "execution_count": 44, "id": "8f45d3a5", "metadata": {}, "outputs": [], "source": [ "df = \n", " rename(df,\n", " A = a,\n", " B = b,\n", " )" ] }, { "cell_type": "code", "execution_count": 45, "id": "0efbf2d4", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 9 × 2
AB
<int><chr>
11I
22like
33to
44use
55Python
66and
77Pandas
88very
99much
\n" ], "text/latex": [ "A data.frame: 9 × 2\n", "\\begin{tabular}{r|ll}\n", " & A & B\\\\\n", " & & \\\\\n", "\\hline\n", "\t1 & 1 & I \\\\\n", "\t2 & 2 & like \\\\\n", "\t3 & 3 & to \\\\\n", "\t4 & 4 & use \\\\\n", "\t5 & 5 & Python\\\\\n", "\t6 & 6 & and \\\\\n", "\t7 & 7 & Pandas\\\\\n", "\t8 & 8 & very \\\\\n", "\t9 & 9 & much \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 9 × 2\n", "\n", "| | A <int> | B <chr> |\n", "|---|---|---|\n", "| 1 | 1 | I |\n", "| 2 | 2 | like |\n", "| 3 | 3 | to |\n", "| 4 | 4 | use |\n", "| 5 | 5 | Python |\n", "| 6 | 6 | and |\n", "| 7 | 7 | Pandas |\n", "| 8 | 8 | very |\n", "| 9 | 9 | much |\n", "\n" ], "text/plain": [ " A B \n", "1 1 I \n", "2 2 like \n", "3 3 to \n", "4 4 use \n", "5 5 Python\n", "6 6 and \n", "7 7 Pandas\n", "8 8 very \n", "9 9 much " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "8ac0204f", "metadata": {}, "source": [ "We can also select a column in a dataframe using select function" ] }, { "cell_type": "code", "execution_count": 46, "id": "88b51fdc", "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Column A (series):\n" ] }, { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 9 × 1
A
<int>
11
22
33
44
55
66
77
88
99
\n" ], "text/latex": [ "A data.frame: 9 × 1\n", "\\begin{tabular}{r|l}\n", " & A\\\\\n", " & \\\\\n", "\\hline\n", "\t1 & 1\\\\\n", "\t2 & 2\\\\\n", "\t3 & 3\\\\\n", "\t4 & 4\\\\\n", "\t5 & 5\\\\\n", "\t6 & 6\\\\\n", "\t7 & 7\\\\\n", "\t8 & 8\\\\\n", "\t9 & 9\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 9 × 1\n", "\n", "| | A <int> |\n", "|---|---|\n", "| 1 | 1 |\n", "| 2 | 2 |\n", "| 3 | 3 |\n", "| 4 | 4 |\n", "| 5 | 5 |\n", "| 6 | 6 |\n", "| 7 | 7 |\n", "| 8 | 8 |\n", "| 9 | 9 |\n", "\n" ], "text/plain": [ " A\n", "1 1\n", "2 2\n", "3 3\n", "4 4\n", "5 5\n", "6 6\n", "7 7\n", "8 8\n", "9 9" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "cat(\"Column A (series):\\n\")\n", "select(df,'A')" ] }, { "cell_type": "markdown", "id": "45397ec4", "metadata": {}, "source": [ "We will extract rows that meet a certain logical criteria on series" ] }, { "cell_type": "code", "execution_count": 47, "id": "010bcba8", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 4 × 2
AB
<int><chr>
11I
22like
33to
44use
\n" ], "text/latex": [ "A data.frame: 4 × 2\n", "\\begin{tabular}{r|ll}\n", " & A & B\\\\\n", " & & \\\\\n", "\\hline\n", "\t1 & 1 & I \\\\\n", "\t2 & 2 & like\\\\\n", "\t3 & 3 & to \\\\\n", "\t4 & 4 & use \\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 4 × 2\n", "\n", "| | A <int> | B <chr> |\n", "|---|---|---|\n", "| 1 | 1 | I |\n", "| 2 | 2 | like |\n", "| 3 | 3 | to |\n", "| 4 | 4 | use |\n", "\n" ], "text/plain": [ " A B \n", "1 1 I \n", "2 2 like\n", "3 3 to \n", "4 4 use " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df[df$A<5,]" ] }, { "cell_type": "code", "execution_count": 48, "id": "082277db", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\n", "
A data.frame: 1 × 2
AB
<int><chr>
66and
\n" ], "text/latex": [ "A data.frame: 1 × 2\n", "\\begin{tabular}{r|ll}\n", " & A & B\\\\\n", " & & \\\\\n", "\\hline\n", "\t6 & 6 & and\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 1 × 2\n", "\n", "| | A <int> | B <chr> |\n", "|---|---|---|\n", "| 6 | 6 | and |\n", "\n" ], "text/plain": [ " A B \n", "6 6 and" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df[df$A>5 & df$A<7,]" ] }, { "cell_type": "markdown", "id": "bf537050", "metadata": {}, "source": [ "Creating a new columns. \n", "\n", "Code below creates a series which calculates the divergence of a from its mean value then merging into a existing dataframe." ] }, { "cell_type": "code", "execution_count": 49, "id": "0bbd19f8", "metadata": {}, "outputs": [], "source": [ "df$DivA <- df$A - mean(df$A)" ] }, { "cell_type": "code", "execution_count": 50, "id": "f36d96af", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 9 × 3
ABDivA
<int><chr><dbl>
11I -4
22like -3
33to -2
44use -1
55Python 0
66and 1
77Pandas 2
88very 3
99much 4
\n" ], "text/latex": [ "A data.frame: 9 × 3\n", "\\begin{tabular}{r|lll}\n", " & A & B & DivA\\\\\n", " & & & \\\\\n", "\\hline\n", "\t1 & 1 & I & -4\\\\\n", "\t2 & 2 & like & -3\\\\\n", "\t3 & 3 & to & -2\\\\\n", "\t4 & 4 & use & -1\\\\\n", "\t5 & 5 & Python & 0\\\\\n", "\t6 & 6 & and & 1\\\\\n", "\t7 & 7 & Pandas & 2\\\\\n", "\t8 & 8 & very & 3\\\\\n", "\t9 & 9 & much & 4\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 9 × 3\n", "\n", "| | A <int> | B <chr> | DivA <dbl> |\n", "|---|---|---|---|\n", "| 1 | 1 | I | -4 |\n", "| 2 | 2 | like | -3 |\n", "| 3 | 3 | to | -2 |\n", "| 4 | 4 | use | -1 |\n", "| 5 | 5 | Python | 0 |\n", "| 6 | 6 | and | 1 |\n", "| 7 | 7 | Pandas | 2 |\n", "| 8 | 8 | very | 3 |\n", "| 9 | 9 | much | 4 |\n", "\n" ], "text/plain": [ " A B DivA\n", "1 1 I -4 \n", "2 2 like -3 \n", "3 3 to -2 \n", "4 4 use -1 \n", "5 5 Python 0 \n", "6 6 and 1 \n", "7 7 Pandas 2 \n", "8 8 very 3 \n", "9 9 much 4 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "2be67ef7", "metadata": {}, "source": [ "We are creating a series which calculates the length of string of A column then merge into existing dataframe" ] }, { "cell_type": "code", "execution_count": 51, "id": "c67f2bd0", "metadata": {}, "outputs": [], "source": [ "df$LenB <- str_length(df$B)" ] }, { "cell_type": "code", "execution_count": 52, "id": "cef214b2", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 9 × 4
ABDivALenB
<int><chr><dbl><int>
11I -41
22like -34
33to -22
44use -13
55Python 06
66and 13
77Pandas 26
88very 34
99much 44
\n" ], "text/latex": [ "A data.frame: 9 × 4\n", "\\begin{tabular}{r|llll}\n", " & A & B & DivA & LenB\\\\\n", " & & & & \\\\\n", "\\hline\n", "\t1 & 1 & I & -4 & 1\\\\\n", "\t2 & 2 & like & -3 & 4\\\\\n", "\t3 & 3 & to & -2 & 2\\\\\n", "\t4 & 4 & use & -1 & 3\\\\\n", "\t5 & 5 & Python & 0 & 6\\\\\n", "\t6 & 6 & and & 1 & 3\\\\\n", "\t7 & 7 & Pandas & 2 & 6\\\\\n", "\t8 & 8 & very & 3 & 4\\\\\n", "\t9 & 9 & much & 4 & 4\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 9 × 4\n", "\n", "| | A <int> | B <chr> | DivA <dbl> | LenB <int> |\n", "|---|---|---|---|---|\n", "| 1 | 1 | I | -4 | 1 |\n", "| 2 | 2 | like | -3 | 4 |\n", "| 3 | 3 | to | -2 | 2 |\n", "| 4 | 4 | use | -1 | 3 |\n", "| 5 | 5 | Python | 0 | 6 |\n", "| 6 | 6 | and | 1 | 3 |\n", "| 7 | 7 | Pandas | 2 | 6 |\n", "| 8 | 8 | very | 3 | 4 |\n", "| 9 | 9 | much | 4 | 4 |\n", "\n" ], "text/plain": [ " A B DivA LenB\n", "1 1 I -4 1 \n", "2 2 like -3 4 \n", "3 3 to -2 2 \n", "4 4 use -1 3 \n", "5 5 Python 0 6 \n", "6 6 and 1 3 \n", "7 7 Pandas 2 6 \n", "8 8 very 3 4 \n", "9 9 much 4 4 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df" ] }, { "cell_type": "markdown", "id": "e37d50de", "metadata": {}, "source": [ "Selecting rows based on numbers " ] }, { "cell_type": "code", "execution_count": 53, "id": "59fe5316", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 5 × 4
ABDivALenB
<int><chr><dbl><int>
11I -41
22like -34
33to -22
44use -13
55Python 06
\n" ], "text/latex": [ "A data.frame: 5 × 4\n", "\\begin{tabular}{r|llll}\n", " & A & B & DivA & LenB\\\\\n", " & & & & \\\\\n", "\\hline\n", "\t1 & 1 & I & -4 & 1\\\\\n", "\t2 & 2 & like & -3 & 4\\\\\n", "\t3 & 3 & to & -2 & 2\\\\\n", "\t4 & 4 & use & -1 & 3\\\\\n", "\t5 & 5 & Python & 0 & 6\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 5 × 4\n", "\n", "| | A <int> | B <chr> | DivA <dbl> | LenB <int> |\n", "|---|---|---|---|---|\n", "| 1 | 1 | I | -4 | 1 |\n", "| 2 | 2 | like | -3 | 4 |\n", "| 3 | 3 | to | -2 | 2 |\n", "| 4 | 4 | use | -1 | 3 |\n", "| 5 | 5 | Python | 0 | 6 |\n", "\n" ], "text/plain": [ " A B DivA LenB\n", "1 1 I -4 1 \n", "2 2 like -3 4 \n", "3 3 to -2 2 \n", "4 4 use -1 3 \n", "5 5 Python 0 6 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df[0:5,]" ] }, { "cell_type": "markdown", "id": "6abec1b7", "metadata": {}, "source": [ "***Grouping means which groups the multiple columns based on certain conditions and we will use summarise function to see the difference***\n", "\n", "Suppose that we want to compute the mean value of column A for each given number of LenB. Then we can group our DataFrame by LenB, and find mean name them as a" ] }, { "cell_type": "code", "execution_count": 54, "id": "f944a949", "metadata": {}, "outputs": [], "source": [ "df1 = df %>% group_by(LenB) %>% summarise(a = mean(A))" ] }, { "cell_type": "code", "execution_count": 55, "id": "8ffd39cd", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 5 × 2
LenBa
<int><dbl>
11.000000
23.000000
35.000000
46.333333
66.000000
\n" ], "text/latex": [ "A tibble: 5 × 2\n", "\\begin{tabular}{ll}\n", " LenB & a\\\\\n", " & \\\\\n", "\\hline\n", "\t 1 & 1.000000\\\\\n", "\t 2 & 3.000000\\\\\n", "\t 3 & 5.000000\\\\\n", "\t 4 & 6.333333\\\\\n", "\t 6 & 6.000000\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A tibble: 5 × 2\n", "\n", "| LenB <int> | a <dbl> |\n", "|---|---|\n", "| 1 | 1.000000 |\n", "| 2 | 3.000000 |\n", "| 3 | 5.000000 |\n", "| 4 | 6.333333 |\n", "| 6 | 6.000000 |\n", "\n" ], "text/plain": [ " LenB a \n", "1 1 1.000000\n", "2 2 3.000000\n", "3 3 5.000000\n", "4 4 6.333333\n", "5 6 6.000000" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "df1" ] }, { "cell_type": "code", "execution_count": 56, "id": "3b859950", "metadata": {}, "outputs": [], "source": [ "df2 = df %>% group_by(LenB) %>%\n", "summarise(MEAN = mean(A),count =length(DivA))" ] }, { "cell_type": "markdown", "id": "5d3f0287", "metadata": {}, "source": [ "## Printing and Plotting\n", " When We call head(df) it will print out dataframe in a tabular form.\n", "\n", "The first step of any data science project is data cleaning and visualization, thus it is important to visualize the dataset and extract some useful information." ] }, { "cell_type": "code", "execution_count": 57, "id": "69946dc7", "metadata": {}, "outputs": [], "source": [ "#dataset = read.csv(\"file name\")" ] }, { "cell_type": "code", "execution_count": 58, "id": "4976f190", "metadata": {}, "outputs": [ { "data": { "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A data.frame: 6 × 4
ABDivALenB
<int><chr><dbl><int>
11I -41
22like -34
33to -22
44use -13
55Python 06
66and 13
\n" ], "text/latex": [ "A data.frame: 6 × 4\n", "\\begin{tabular}{r|llll}\n", " & A & B & DivA & LenB\\\\\n", " & & & & \\\\\n", "\\hline\n", "\t1 & 1 & I & -4 & 1\\\\\n", "\t2 & 2 & like & -3 & 4\\\\\n", "\t3 & 3 & to & -2 & 2\\\\\n", "\t4 & 4 & use & -1 & 3\\\\\n", "\t5 & 5 & Python & 0 & 6\\\\\n", "\t6 & 6 & and & 1 & 3\\\\\n", "\\end{tabular}\n" ], "text/markdown": [ "\n", "A data.frame: 6 × 4\n", "\n", "| | A <int> | B <chr> | DivA <dbl> | LenB <int> |\n", "|---|---|---|---|---|\n", "| 1 | 1 | I | -4 | 1 |\n", "| 2 | 2 | like | -3 | 4 |\n", "| 3 | 3 | to | -2 | 2 |\n", "| 4 | 4 | use | -1 | 3 |\n", "| 5 | 5 | Python | 0 | 6 |\n", "| 6 | 6 | and | 1 | 3 |\n", "\n" ], "text/plain": [ " A B DivA LenB\n", "1 1 I -4 1 \n", "2 2 like -3 4 \n", "3 3 to -2 2 \n", "4 4 use -1 3 \n", "5 5 Python 0 6 \n", "6 6 and 1 3 " ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "head(df)" ] }, { "cell_type": "markdown", "id": "dcca35a8", "metadata": {}, "source": [ "ggplot2 is a very good library as it simple to create complex plots from data in a data frame.\n", "\n", "It provides a more programmatic interface for specifying what variables to plot, how they are displayed, and general visual properties." ] }, { "cell_type": "code", "execution_count": 59, "id": "515c95b2", "metadata": {}, "outputs": [ { "data": { "image/png": "iVBORw0KGgoAAAANSUhEUgAABaAAAALQCAMAAABR+ye1AAAAMFBMVEUAAABNTU1oaGh8fHyMjIyampqnp6eysrK9vb3Hx8fQ0NDZ2dnh4eHp6enw8PD////QFLu4AAAACXBIWXMAABJ0AAASdAHeZh94AAAVU0lEQVR4nO3c61bbuhaAUYU7bIjf/203OEBoC0RObGlJmvNHNmeckchR449VY5omAEJKtQ8AgO8JNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEVCHQC4Izalgj09ksARCfQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAEEJNEBQAg0QlEADBCXQAGUs/ufpBBqghLnOyxIt0AAlpC+Pi55yxiqbEmigM+mv/y55zrZPCbgEQEnp4yLHsuecsczWBBrozMdPCAUaIJR0Tp8FGmBrxxs43MUBEMfx7mf3QQPEsbjJfz67yFMCLgGwtYvqPAk0wDYuG54PL1HkKQGXANjQ5XWeBBpgdSsMz4fXKfKUgEsAbGOlOk8CDbCmtYbnw4sVeUrAJQDWtmqdJ4EGWMnKdZ4EGmANaw/Phxct8pSASwCsZos6TwINcKFNhufDKxd5SsAlANawWZ0ngQY433bD8+Hlizwl4BIAl9m4zpNAA5xl8zpPAg2w3PbD82GZIk8JuATAmcrUeRJogEUKDc+HtYo8JeASAMsVrPMk0AC5Sg7PhwWLPCXgEgBLFK/zJNAAGSrUeRJogFNqDM+HhYs8JeASAFlq1XkSaIBfVBueD6sXeUrAJQBOqVrnSaABvld3eD4cQpGnBFwC4GcB6jwJNMA/QtR5EmiAP8UYnmcCDXAUp86TQAN8CjQ8zwQaYBaszpNAA7yJNjzPBBoYXsg6TwINDC9onSeBBsYWdXieCTQwrsh1ngQaGFbo4Xkm0MCQwtd5EmhgRPGH55lAA4NppM6TQAODaabOk0ADI2lneJ4JNDCKtuo8CTQwiMaG55lAAwNosM6TQAP9a3F4npUM9Mtt2t1P08NV2t1ttATAm2OTm63zVDTQ+93rRqWH+7fHdL3JEgDT+wWN40OzCgb6Lr3OzXe7dLuf9vPX6y8BMH0kJLU8PM8KBno3PzGl/fyf3RZLAHzpc+XjuFjBQKd0fPzm7x3pqzOXAJgjNWek+ZJUmKDfHvcmaGArH0Ne8yWpcA36bv/+9fpLAKRu+uwuDqAnqZcbOGbugwa6cWxyHz/K8puEQB/6aPIfBBroQX91ngQa6ECHw/NMoIHGdVrnSaCBtvU6PM8EGmhW13WeBBpoVud1ngQaaFPvw/NMoIH2jFDnSaCB5gwxPM8EGmjJOHWeBBpoyUh1ngQaaMZQw/NMoIEmDFfnSaCBFow3PM8EGghu0DpPAg0EN2ydJ4EGIht3eJ4JNBDV2HWeBBoIavDheSbQQEDq/EaggWgMz+8EGghFnY8EGghEnb8SaCAKw/NfBBqIQZ3/IdBAAIbn7wg0UJs6/0CggbrU+UcCDVRkeP6NQAPVqPPvBBqow/B8kkADNahzBoEGijM85xFooCx1zibQQEnqvIBAA8UYnpcRaKAQdV5KoIESDM9nEGhgc+p8HoEGNqbO5xJoYEuG5wsINLAddb6IQAMbMTxfSqCBLajzCgQaWJ86r0KggZUZntci0MCq1Hk9Ag2sx/C8KoEGLvKlyeq8MoEGLjA3+f3Bibs2gQYucDhZkzpvQqCB8332ufJxdEqggfOl9zo7Zzch0MD5PmZn5+wmBBo4V9LnbQk0cJ6vN3DUPpZOCTRwhuPPBf2EcDsCDSylyYUINLCMOhcj0MAChueSBBrIps5lCTSQx/BcnEADOdS5AoEGTjI81yHQwO/UuRqBBn6jzhUJNPAjw3NdAg38QJ1rE2jgO4bnAAQa+Jc6hyDQwF8Mz1EINPCVOgci0MCROoci0MA7w3M0Ag3M1DkegQYMz0EJNKDOQQk0DM7wHJdAw8jUOTSBhnGpc3ACDYMyPMcn0DAkdW6BQMN4DM+NEGgYjTo3Q6BhKIbnlgg0jEOdGyPQMAp1bo5AwxAMzy0SaBiAOrdJoKF3hudmCTT0TZ0bJtDQMcNz2wQaeqXOzRNo6JM6d0CgoUOG5z4INHRHnXsh0NAXw3NHBBp6os5dEWjohuG5NwINfVDnDgk09ECduyTQ0DzDc68EGhqnzv0SaGiZ4blrAg3tUufOCTQ0yvDcv5KB3t/tXh/vr1K6ftxoCejascnqPISCgX7ZvX6k9q8Pb643WQJ6Njf5+ED/Cgb6Nt3sXx9uX15bfZvutlgCenY4L5LheRwFA53S/v1hmvZpt8US0LHPPlc+DsopGujXh1368j/++r+/OHMJ6Fh6r7PTYxxFL3E8T9P928PbBP3rRWifQPjHx+Ti9BhHwUA/p93d83Szey3001V62mIJ6FbS5wGVvM3uaXe8hnG/zRLQpeQGjjGV/UWVx9urtzrf3L9stgR059hkP58ZjN8khNA0eWQCDXGp8+AEGqJS5+EJNIRkeEagISJ1ZibQEI06806gIRTDM0cCDYGoM18JNERheOYvAg0hqDP/EmgIQJ35jkBDbYZnfiDQUJU68zOBhorUmd8INNRieOYEgYY61JmTBBoqMDyTQ6ChNHUmk0BDWepMNoGGggzPLCHQUIo6s5BAQxnqzGICDQUYnjmHQMPW1JkzCTRsS505m0DDhgzPXEKgYTPqzGUEGrZheOZiAg0bUGfWINCwOnVmHQIN6zI8sxqBhhWpM2sSaFiNOrMugYZ1GJ5ZnUDDGtSZDQg0XMzwzDYEGi6jzmxGoOES6syGBBrOZnhmWwIN51FnNifQcA51pgCBhsUMz5Qh0LCMOlOMQMMS6kxBAg3ZDM+UJdCQSZ0pbaVAP9/tLj6UE0tATYZnKlgj0C/3VykJNP1SZ+q4OND7x9c6p+unlY7nuyWgvC9NVmdquTDQj9fpzctqx/PvElDe3OT3B59Gqrkk0E+3rx/e3d3z+p9gpwR1HT6BSZ2p64JA797q/N+0xd8AnRVU9dnnysfB6C4IdEp3H1+sdjh/LQFVpPc6+yBSlwka/vExO/sgUtcK16D/E2i6kvSZINzFAX/4egNH7WNhdCvdB33jPmh6cPy5oJ8QEoDfJIR3mkw0/i0OmKkz8fjX7MDwTFACDepMUALN4AzPxCXQDE2diUygGZfhmeAEmkGpM/EJNENSZ1og0IzH8EwjBJrRqDPNEGiGYnimJQLNQNSZtgg0ozA80xyBZgjqTIsEmgGoM20SaHpneKZZAk3f1JmGCTQdMzzTNoGmW+pM6wSaPhme6YBA0yF1pg8CTXfUmV4INH0xPNMRgaYn6kxXBJpuGJ7pjUDTCXWmPwJNDwzPdEmgaZ460yuBpnHqTL8EmpYZnumaQNMudaZzAk2jDM/0T6BpkjozAoGmPYZnBiHQNEadGYdA0xR1ZiQCTTsMzwxGoGmFOjMcgaYJhmdGJNA0QJ0Zk0ATneGZYQk0oakzIxNoAlNnxibQRGV4ZngCTUzqDAJNRIZneCPQhKPOcCDQxGJ4hk9VAn3yFHSKDuXYZHWGrwSayuZPw/EB+FQw0OlPWyxBgw5/2Kc+EjCigoH+byfQ/OOzz5WPAwIqeYljf5OuX+ZX+O4lsutNV9J7nf2Zwz/KXoN+TOlxcg2aLz6+Hfszh38U/iHhy3W62Qs0H5I+w8+K38Vxn3ZPAs3MDRzwq/K32T1fnb7G7GwdgLuf4ZQa90HfCvTwNBky+FVvylNnyCLQFGZ4hlwCTVHqDPkEmnIMz7CIQFOKOsNCAk0RhmdYTqDZnjrDWQSarakznEmg2ZThGc4n0GxIneESAs1WDM9wIYFmG+oMFxNoNmB4hjUINGtTZ1iJQLMudYbVCDQrMjzDmgSa1agzrEugWYfhGVYn0KxBnWEDAs3FDM+wDYHmMuoMmxFoLqHOsCGB5myGZ9iWQHMmdYatCTTnMDxDAQLNcuoMRQg0CxmeoRSBZgl1hoIEmnzqDEUJNJkMz1CaQJNFnaE8geY0wzNUIdCcos5QiUDzK8Mz1CPQ/EydoSqB5ifqDJUJNN8yPEN9As031BkiEGj+ZniGIASaP6kzhCHQfGF4hkgEmg/qDMEINAfqDOEINJPhGWISaAzPEJRAj87wDGEJ9NjUGQIT6IEZniE2gR7LscnqDOEJ9EjmJh8fgNgEeiSHfU2GZ2iDQA/ks8+VjwPII9ADSe91tr3QBoEeyMfsbHuhDQI9jKTP0BiBHkNyAwe0R6BHcGyynxBCQwS6e5oMrRLozqkztEuge2Z4hqYJdL/UGRon0J0yPEP7BLpH6gxdEOj+qDN0QqA7Y3iGfgh0V9QZeiLQ/TA8Q2cEuhPqDP0R6C6oM/RIoNtneIZOCXTr1Bm6JdBNMzxDzwS6XeoMnRPoVqkzdE+gm2R4hhEIdIPUGcYg0K0xPMMwBLot6gwDEeiGGJ5hLALdCnWG4Qh0G9QZBiTQDTA8w5gEOjx1hlEJdGyGZxiYQAemzjA2gQ5LnWF0Ah2T4RkQ6JDUGZgEOiDDM3Ag0MGoM/BBoCMxPANfCHQY6gz8SaCDUGfgbwIdgeEZ+IZA16fOwLcEujLDM/ATga5JnYFfCHQ96gz8SqArMTwDpwh0FeoMnCbQ5RmegSwCXZg6A7kEuih1BvIJdDmGZ2ARgS5FnYGFBLoIwzOwXMlA729Tun56f5FfX6WHmn1psjoD5ygY6P0uvbk5vEjngZ7f3/tD++8GqKJgoO/Sw2ulH3bX84v0HujDozoD5ysY6N3hiS+7q5fuA/3Z58rHAbSsYKA/arW/vv4u0OmrM5cII73Xufk3AlRUMNBXaf/x1XX3E/T7+2v+jQAVFQz0Q7p9/+olXXcd6KTPwApK3mZ391nlpxNXMZoO29cbOGofC9Cyor+o8nzz8dXLbaeBPn7n6eBKOlCX3yRckSYDaxLo1agzsC6BXofhGVidQK9BnYENCPTFDM/ANgT6QuoMbEWgL2F4BjYk0GdTZ2BbAn0mdQa2JtDnMDwDBQj0cuoMFCHQCxmegVIEehF1BsoR6HyGZ6Aogc6kzkBpAp1FnYHyBPo0wzNQhUCfos5AJQL9K8MzUI9A/0KdgZoE+ieGZ6Aygf7+ANQZqE6gv1tenYEABPqfteUZiEGg/1pZnYEoBPrrsvIMBCLQx0XVGQhFoN9XlGcgGoGe1BmISaBd2gCCGj3QhmcgrLEDrc5AYAMH2vAMxDZsoNUZiG7MQBuegQYMGGh1BtowXKDVGWjFWIE2PAMNGSnQ6gw0ZZhAG56B1gwSaHUG2jNCoA3PQJO6D7Q6A63qPNDqDLSr50AbnoGm9RtodQYa12mgDc9A+7oMtDoDPegv0IZnoBNdBPrYZHUG+tFBoOcmHx8AOtFDoA+PhmegM+0H+rPP2xwKQC19BPqtzgINdKaPQC99DkAD2g/0xzWOTQ4EoJ4eAu0GDqBLHQTaTwiBPnURaIAeCTRAUAINEJRAAwQl0ABBCTRAUAINEJRAAwQl0ABBCTRAUAINEJRAAwQl0ABBBQ00ADED3QX7lM1WZbNV2UbdqlHf91L2KZutymarso26VaO+76XsUzZblc1WZRt1q0Z930vZp2y2KputyjbqVo36vpeyT9lsVTZblW3UrRr1fS9ln7LZqmy2KtuoWzXq+17KPmWzVdlsVbZRt2rU972Ufcpmq7LZqmyjbtWo73sp+5TNVmWzVdlG3apR3/dS9imbrcpmq7KNulWjvu+l7FM2W5XNVmUbdatGfd9L2adstiqbrco26laN+r4BwhNogKAEGiAogQYISqABghJogKAEGiAogQYISqABghJogKAEGiAogQYISqABghJogKAEGiAogQYISqAzPFyl3d2+9lG04j+fqSzPtyndvtQ+ihbs73bDnoBOptPu0pvdmB+QxfY7n6kcTz5VmV52h60a8puZk+mk53T7ehY9pNvaB9KGm+QzlWO3e572N+mu9nHEdztv0t2YJ6CT6aSbwx7pTpbHZKNyPM7V2add7QOJL418Ag75ps8y5udjqZd0baNy3Kbn2ofQivdrZmN+L3MyZdqn69qH0ILr9CLQOa7SdL+bL55xwv37JY772gdSg5Mp00N6qn0IDbhPj/6qkSWlm/knX7WPowUPbz8l3D3UPowqnEx5XnY3tQ+hAc/pxrWgPK/FeZ72t2OOhQvdz3dxjLlTTqYs+50LHBmu3u4aE+gcab4G/ZKuah9IfA9vlzhev5cNOUI7mbJcO48y3M6XgQQ6x9C3Jixzld6u1O/H/F7m85Hh5ep6yJvkl0qfah9JfG7ezDb097Ih3/RCT27gyCPQ+e7nv228+GiddrjNbtBbxp1KJzmJlpHnHC/pav92YfWx9oHEd5fe/h2OuzF/6dLJdNKtsXARG5XlcGuC7/0ZrgfeKifTSf7evoyNyvN0nXZDDoXLzf+aXe2DqMPJBBCUQAMEJdAAQQk0QFACDRCUQAMEJdAAQQk0QFACDRCUQAMEJdAAQQk0QFACDRCUQAMEJdAAQQk0QFACDRCUQAMEJdAAQQk0QFACDRCUQAMEJdAAQQk0QFACDRCUQAMEJdAAQQk0QFACDRCUQAMEJdAAQQk0QFACDRCUQAMEJdAAQQk0QFACDRCUQNOvlF5u0u5+/vrhKl09VD4eWEig6VdKu/TqrdDXb1+k69pHBIsINP16LfJ+ekhX0/SYds/T8y491j4kWEKg6VdK/82P03STnl6/ejJC0xaBpl8pfTwevvr4DzTCB5Z+CTSN84GlXwJN43xg6dcx0B/XoG8qHxEsItD06xhod3HQJIGmX8dAuw+aJgk0/foS6Olh5zcJaY5AAwQl0ABBCTRAUAINEJRAAwQl0ABBCTRAUAINEJRAAwQl0ABBCTRAUAINEJRAAwQl0ABBCTRAUAINEJRAAwQl0ABBCTRAUAINEJRAAwQl0ABBCTRAUAINEJRAAwQl0ABBCTRAUAINEJRAAwQl0ABBCTRAUP8Dr9AKPKfUMOkAAAAASUVORK5CYII=", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 360, "width": 720 } }, "output_type": "display_data" } ], "source": [ "plot(df$A,type = 'o',xlab = \"no\",ylab = \"A\")" ] }, { "cell_type": "code", "execution_count": 60, "id": "41b872c9", "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "plot without title" ] }, "metadata": { "image/png": { "height": 360, "width": 720 } }, "output_type": "display_data" } ], "source": [ "barplot(df$A, ylab = 'A',xlab = 'no')" ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.1.1" } }, "nbformat": 4, "nbformat_minor": 5 }