From 7c1137aea1bba22304067d9401bdecadc1f402d2 Mon Sep 17 00:00:00 2001 From: Dmitri Soshnikov Date: Tue, 24 Aug 2021 16:08:00 +0300 Subject: [PATCH] Add boxplot by role to main text --- .../04-stats-and-probability/README.md | 8 +++++++- .../images/boxplot_byrole.png | Bin 0 -> 8393 bytes 2 files changed, 7 insertions(+), 1 deletion(-) create mode 100644 1-Introduction/04-stats-and-probability/images/boxplot_byrole.png diff --git a/1-Introduction/04-stats-and-probability/README.md b/1-Introduction/04-stats-and-probability/README.md index 83e39597..54d1aa5e 100644 --- a/1-Introduction/04-stats-and-probability/README.md +++ b/1-Introduction/04-stats-and-probability/README.md @@ -70,6 +70,12 @@ Here is the box plot showing mean, median and quartiles for our data: ![Weight Box Plot](images/weight-boxplot.png) +Since our data contains information about different player **roles**, we can also do the box plot by role - it will allow us to get the idea on how parameters values differ across roles. This time we will consider height: + +![Box plot by role](images/boxplot_byrole.png) + +This diagram suggests that, on average, height of first basemen is higher that height of second basemen. Later in this lesson we will learn how we can test this hypothesis more formally, and how to demonstrate that our data is statistically significant to show that. + > When working with real-world data, we assume that all data points are samples drawn from some probability distribution. This assumption allows us to apply machine learning techniques and build working predictive models. To see what is the distribution of our data, we can plot a graph called a **histogram**. X-axis would contain a number of different weight intervals (so-called **bins**), and vertical axis would show the number of times our random variable sample was inside a given interval. @@ -78,7 +84,7 @@ To see what is the distribution of our data, we can plot a graph called a **hist From this histogram you can see that all values are centered around certain mean weight, and the further we go from that weight - the fewer weights of that value are encountered. I.e., it is very improbable that a weight of a baseball player would be very different from the mean weight. Variance of weights show the extent to which weights are likely to differ from the mean. -> If we take weights of other people, not from the baseball league, the distribution is likely to be different. However, the shape of the distribution will be the same, but mean and variance would change. So, if we train our model on baseball players, it i likely to give wrong results when applied to students of a university, because the underlying distribution is different. +> If we take weights of other people, not from the baseball league, the distribution is likely to be different. However, the shape of the distribution will be the same, but mean and variance would change. So, if we train our model on baseball players, it is likely to give wrong results when applied to students of a university, because the underlying distribution is different. ## Normal Distribution The distribution of weights that we have seen above is very typical, and many measurements from real world follow the same type of distribution, but with different mean and variance. This distribution is called **normal distribution**, and it plays very important role in statistics. diff --git a/1-Introduction/04-stats-and-probability/images/boxplot_byrole.png b/1-Introduction/04-stats-and-probability/images/boxplot_byrole.png new file mode 100644 index 0000000000000000000000000000000000000000..15226014c0e523554f849db3d9517decf0395244 GIT binary patch literal 8393 zcma)i2UHW?);5Z$fQU+of`FiaA}~r5ks^o`Y0`TFD82U{6r?vP5_**`AP{;WG-;tp z6{JJxB?%B(AV1!_*1g|*%m00A&8(Td&z^nudCs%<3Hzx0*QRfbLDfF*E%lh_GT{b#!jXrO2#e@ zHuf$ymL~VyOr4x9?d|w^1$g*{fbBmFc+tBqS_Yd1*-vkIBvHNUxlSv>%KwJd)AzTRsEa$ONsDZ&X<+;?jKGIEKU2JOe`Z z9>`zyzxS78fh=Z&pEVIGDX1_i!%YTkA_!GUIA;oU;=g%p6=lp1A72}r6RQL-Wo^1MD}C?wc<8ZPev>DP2#X-G{9-8w8vV+= zdW(WO$5S;ovG984Tu;-8)oaBLweV^CwHIy6pC3hf4ti+AcgECeNaIA{t}kLAO7VX( z+XZvo;gwM0k8Fg);t>0{njEHW7IJ4bhpY2K^K{0*b~2|5cX-vn(wbgmR!)AjcYD4^ z0_z9XE?e#ieYeYLQX(OQhALaM+r#@VCr}=#6UOLHZ%f9pnxkI|vmdHqd^SSZp`9X& z-(b|(ulI-zYCEZ1r;^?$9oG+Jl~0l(hLVa6VL^fFR9PanFnMP-3c~e@pIIFGsNp2Xfi&*K?P?^Hc-EyJ z!`I;MN2})r*DXv*0eP;RbuGUZ?kYRU`PL{w@hX8Z7LIwg;l3Tsr~NabbDBl7)H6C6 zdiENgb5(+phC|5(YkiixMqAWP28{XiSzi>;xKg-=x>@9%} z-keGCE3C~Be+3U1NlHoPaTDBp;43_#fT#hD8{7ypFKPGyUVUrh*_gy3-WxtLjW&M) z^fmJHb6t|WaY=I1&z^VE8~prEO;?Izw=3J&)0ZtM0C=M)#2_Y@UM?4YC;}k2y)jY* zypy36s8rT+o`_ztNTUx-fF{?Z=`O>4D?v!wA09y4t@=53(d5_?80K~%h8p_Dm@I!Q zywLzueNAE2AI2@Z>R1B9hYF70M;w|@D`_~vKk<4L2Ac{P+lE7~1%eO;oqzL@3(;RW z#`1zg%3;Rj(>rHmKzqsG?y#s)g@*63%_tA@>2n;5VOBcXvOaZx+MW>E>=1|hZglyP z9;@9>w?cF^49u9)$e;4_`MDMy(Xsxm5QGd|`*LgcE+r>>*(1&nAkx%GjcLOOS7kMu zKL;X&l>1v5Zx;(xH+E~~<#=THLooMcDDym8D8U$i5MquD_{%NIlY&8CKLCX24Ocv# z)kk6cVY&t`md^U6$HG?;K53x)sJGk4k}@7G3J<=~XQdkMHaze+5&=ASD^Hx(k_S>+ zC{NElw;qxKQ!Mj=M<7Hh^4@NP$5Dq_=YQ_;@7mOwyi%9tCqUJ;j7eLz8hiL*aY`D5`bvVWP$zo6Z}i9THOL`>QG4Wq5`@_8=o%Ij#dXd>oT z-*~EUTBu2U?4}|ZT#g_Cus(gPj|dj3v!MvY`=eXn?4#r_^o{AmX!WYeApf(S)9A#o z8)FmrKsl8ZmHq0XHOHf5TRAdFJWsYW+PbvtdFTdd)Yus&$01*7Y}4)ATAWWG4X_7D zy_cA1*aOCk@;cGh=?2kFJ_M6wVk~MP*V}d3xC^T3n^1PtzBmn=VhNP-h&JBf)%1-m zI~q9{E=%-lQ4oOa)~#%SpCH3Q=`8dA!rFh1bt3x@+jMV;O0=SG@rq`2p!}uEbdce& zntfdCUY>`iJ^;vPMEo7b^Bky^XrR&RX!c%hbjrn;#c&Y{? z>;%EpIJO2F23+$}#&WJ5G*1*}-8_eeHr>mOhXiZL72~k}A5}0WZ~-OB`&BI{xAB~M z1qe)I`ULa-=&3fyMTDBnd5a!q%P`LbZJ#bW{uPeZ3VTDf!K9Ml=k6Lm-rk7D5h^9@ zcXxNxMhRK4-uxNCKv3);au6xH0eO@0b)s9b7VBlz0YpHygK#+pG(q)-UiRD37+mdX z8pJgNng?2jR3EUlO?+!Tb?GPYf)}<}WE%Q33s)t~w<-}BLdeR6|IV2IQ5>333S3e; z=D$H}Gh_`;kE0IBz-FlxNv0^QjuX}NUW$enniL;EFA0s z<<8DgSk$_fFk8mrttv9;+YmF=`>Zeth-GY$}Sz1Iy+u5&fB#_^e&rq_?vBD9#JgceK zz?!kX=+n1tHlP23s|s_0Y&XMVDZghBA+Pjqg(LdhWD*kHV`4_+<0cR?&(M6nDa+4x z=24-?FdfY;|Ld+w#3A(Te%eBsKL^Nzk2e3VxR$Q@yihCe@N?^Q1yI2K7Q*tn-qHUMt-)iu` z1M+ve_b;>83wYYqv6Dg2=;rv5jmBQ_!ZMswD6f*TcEXALz{($OabvH2XpT*! z&iOU)e8Gx<{DTxIU4+4|AgBCIr`YpcjKS~x5wzqHbEYyUCq8|lu@?c4q@QQ35NEZz zrFAq|e6;G0fBy3q`OjayI}){DwuLH9_4b+~GKtU$~LWJ@3s=B@Rl%b7VKgCh@Gv{N{x&tr1CRYC>}UBVwpjT>K`f) zaW!Y`OV-;s1S)vEjI=b7F%n{Ty|(_|BJ8B$>QQprEmz&*=-t@4bAH)2En~Qx{62vY z*K!Qc!~Xq$%!NM)++8%CF6(2-P}!f5>em#EGK z)bx!T^!U);RP#LS3Y8WV{!epLk)Qll{lOr7%HO~E1m>Zn?Un_r5}V}Z+^;=De^3$r zRWK)mT&@0TVX_}$nGxr8cD`x&2k!rk8s}vxdVdIQEmSp9@%DUeo>G97JwIPEi_R0s z6Dme=luQ$MwdCB&9ITiT+>1|{>dUX^!osM;VRIs--ETSx+uI_mZ@sPa${`w^;Dm*HN|xQp^10&WWNAy=XsMY>)%lL4xgdg3W*ttKG|uy9c9gdy%B zrjBxzaky_*NI}3I`s_bA_=~r{2)J&9!G5xL(j$GL^FGDLh?-@(uQ#gq-pTWw%QG?H zvbym(v2UEf5?bYWXFoY|%UZRG0&^xSHabyJa3a)xI*{h~i)F^4!heI4P7WK%Q-Z zcyt?EmwS=v=^ec3i1nGLm{$nVX-}=?oudS_w1DtEsF(l)CNR|d$)KYKq;LQwLo7^X z`P>v-mB9fF^rKMY@?-5AU6cc){{y%qS%IA=9E%)qvF^ z7GZ;Y53#R7qsPuO-;Y)0y?(koLEQGWQ^vaN z+qtpVJM<5#JPK_oJj#AF3#SB7Pl(`f8;m^TEM6C0Xs*K+(BF?OsHkmCBdOCmNh`&> zPy^Z4W4XYq!aXj;{AP2;df~QRCznU5?d=j4s>aA{w=J;#^DT3 za=aTy-1m?Z7C}U{4(KHR8E#+!P-#){nQFefT)FEKHg9gG^(2W{jG0kJJ8%Wqv zneyv;$p*}}HoY8mu!J;~5dBSuhx}O2_x>@kV8Qoebt>`4rrza;1 z?r+oc=bznPkfnPY!- zWObiMhjeP&A5sN{JiElazt70*^Nf}vP%7)~L)xCo9%CsOnYZOGxBaD^LS3ZeFU0JJ zr|VQcsp2Z3%o0y1tu8Q6;?kAj*92W-NcC*gjbbF|eGA#`7ZX>pdR@#mS9dGMvm3ZK z9!-r}C~UL&6*AGOFHyY{p&wu`^f2rMQNXB_*bU#oOvN-yjhZdu2P^e@IUvjxqc^gC zlq{gxB}^s;+m+>q?ACst)xIOr!T0`@C+;8~(L&G~*Nm%d7H%=x#MWuM{8WUtsg_2 zoZZ*lN#NDfa5G?mb6x!o8jBESv{ihmc!?>ws9r*~^E|q$+t&VsP|mDJY|iqMSdp9* z(;+=XsKvBRoJnXowe8v~HnP>hc(*p?ujj_H>v%=x=xx1UH;}Ty!+n?Q*R63*o%xLk z@vrn&vv?RMa{~jTRDyp$zbEvwenRnDCVu2yYznxB$L>YWGXL6l=^dS~E8HNjOLCMq zF&twG`k~p%dfP#pb&LWJcr8U~t%Wn{-!EsqDBrC{@!IP!a%<`9j=+Q8>>rVqeMLz%WlwjOpV6dB z*I{}JEL;-}=(EkAcwI)Byo}|Ic~}y(n10f0aM1?vVoP@k6Kq|{xpm@yB==~-QG_W* za8lBG?}+ziK%OK|vu+e8@Y06syjm*F zZr=r-6G^HbdI`yp?o+=n2Z#$u-Gd5yATev@T@2;@ux|C()b%B9OmEDv3SwX~p zK^V_`{mBK*;Ki#$xz)B&bG+Fg~*R(@dpb;DTVeWQqD|X`udfo@}*a zSzLaWBwc-C$Ny?F0j|u*_gf*BGqFaSkA`(>I_ynO7U6vBx^=@Z2_M+C|`6XJ4Xj{&e{cRA*F?o%8Ak=|=AX!2*%&z$M+-cG=4czdb?LW-xa znLppzAEPH5?f7W3l4{OV1>~%MG@vX<5vafNBWw=N$xXgx;8-P0n3R`o}yHr5vWCviqC@xg@oHc_8vB{znjyE4gb z$$0h}vp_9hD=gO2o&SfSKf%q1Yr!n9)` zwcGEsY^&Nx!f;<#Q?Y)Zk`=yXGO!tQKdndD+x=ktqCt}CmL1-!=zG5Y&L$q?k@izW z1SIG{5)EcRRD51?o&H%Cq1dO#jo(@>BklH@)XFxy2J*02{?=H7#X*vvA2s2#hbHS{x4%hXDDoVA zoi#p03r7^wFK`pn!PoCpR$HvAvO3TA6TiUO)+)9=MU;8^vaS$7s}yFlR>$cv{!0S) zy#|gEDs1A^U8khi$%QXZiWq^dGecSmv>3uDv+b$(G^>ex_zuJe?I#MFO_gj-`hvHH zKSr)^mz`KG>}^!bZn=}T>#YXYrYPOKdJU;{aw{p!PQreG(qqe}m+U2FgE*??{Nf1xHb!mk2TMr8!;kL=PlOFN z{mPxqxctoL7oDO8b^0WiQ6_ol;>up;chICl44osxAbA8)o z9NFKW!n{dT*dsO43qk0Jd~%RIIT~evS?VPVF~78Ab(L9pY3tHie)^sD#k+_11*^2O z;8*I?$zj*7@_d!zHYC86Jr^wTY8m^F>(IB{v*co!yeo3f5{G=I6EBa*6*Gtl86H6?YsZ43E zxeHkErEC0p^Yt}m!UiiR4DGpM8rZ;Rk-;UOVtLW>v7S&buWe09+}#f~&+V%^Pb=Ch zFG6Y-c--sWGFl$?mJ|~>)2`1x#_4|L&)RpsI73k&AF(Gru=K^rlMjyxtqzOhzemjB zE6<{kH(z}^_{lw5TR!Q-p54>Dc;QU~y)Dv=hGm|=#^rU^`{EVN8=bbay)4tUg6Yl{ zeU2Y@GFaF}3pT_zdP2Vwqxh^C_*LFndXw8l*nXYq+SKUICLMKU(LC98MN2t?dtI+Y z!5Hn&93JJ@{0PrBglLS51(iEx8!jaRZLGmfBYBo`k3u~x35H!v0J^ob^rkd~>+1&e zf?2=vkIMC5!#`AKMCl-6*h0bcCwcpjisrFZ-L(NT^3PMQg60dif&_|<5HIVu7lnBz zNWMo+?GjOEsfiOgP&K-Ve4nrc2~5h9~>(ccd`C8+?aJO<_5+hnsA%=6jP3EIEVP1&3(AR%Ecu! zLpHDzw)&HR`&~dLwJgZ7($3dqR2ZqsSf}{VyA`Z z<1h%v)w0=G{4c7weG(l|`bEB(Rsr$J@SffLZ&vU!@Kmf* z;bBFGqEW~=wu0f*h2_z8iJP