From eccd7268256b55d289e2bd7d06683bc08d27fd6d Mon Sep 17 00:00:00 2001 From: chester Date: Fri, 8 Oct 2021 20:31:42 -0700 Subject: [PATCH 1/3] Commiting Rmd file of HW_2 to DACSS_601 repo. Hopefully, this works! --- Reading_in_data_HW2.Rmd | 124 + docs/Reading_in_data_HW2.html | 2942 +++++++++++++++++ .../figure-html5/unnamed-chunk-6-1.png | Bin 0 -> 25953 bytes 3 files changed, 3066 insertions(+) create mode 100644 Reading_in_data_HW2.Rmd create mode 100644 docs/Reading_in_data_HW2.html create mode 100644 docs/Reading_in_data_HW2_files/figure-html5/unnamed-chunk-6-1.png diff --git a/Reading_in_data_HW2.Rmd b/Reading_in_data_HW2.Rmd new file mode 100644 index 0000000..af3f77b --- /dev/null +++ b/Reading_in_data_HW2.Rmd @@ -0,0 +1,124 @@ +--- +title: "Homework_2 " +description: Reading in Data + +author: + - name: Cynthia Hester + +date: 09-29-2021 +output: + distill::distill_article: + + self_contained: no +draft: yes +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = FALSE) +``` + +## Reading in the first data set + +Reading in or importing data files to RStudio is a necessary step to gain access to any files that are needed for cleaning or tidying. After imported data is cleaned, it is then more suitable for exploration. + + +As we know data formats are not homogeneous,and come in many different flavors. So,whether data is in **CSV, SPSS,XLSX,SAS,TXT,STATA,or HTML** as well as many other formats, there is usually R package to read in the data. + + +The first data set I will read in is from the included R package "Data Sets". It is the *MTCars* (MotorTrend) dataset which was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption as well as 10 aspects of automotive design and performance for 32 cars (1973-74). + + + +**This R chunk loads in the data sets package and provides a summary of the statistics for the *mtcars* data set** + +```{r} +library(datasets) +summary(mtcars) + +``` + + + + +**This R chunk uses an alternative to the *summary* function called *Skim*. *Skim* provides a comprehensive overview of the *mtcars* data set as well as providing a visualization of the data in the rows represented by histograms.** + + +```{r} +library(skimr) +skim(mtcars) +``` + + + + + +**This R chunk exemplifies the granularity of the *Skim* package by selecting specific columns to summarize.** + + +```{r} +skim(mtcars,hp,wt) +``` + + + + + +**This R chunk provides the column names of the *mtcars* dataset using the colnames() function.** + +```{r} +colnames(mtcars) +``` + + + + + +**This R chuck introduces the *dim()* function provides information on the dimensions of the data set,which shows this data array to have 32 rows and 11 columns.** + +```{r} +dim(mtcars) +``` + + + + +**This R chunk shows a generic visualization of the *mtcars* object using the *plot()* function.** +```{r} +plot(mtcars) +``` + + + +## The Second Data Set comes from the course csv file eggs_tidy. + +I wanted to try reading data in from an external data set, that used the csv format. + + +**This first R chunk reads in the eggs tidy csv data** + +```{r} +library(readr) +eggs_tidy <- read_csv("_data/eggs_tidy.csv") +``` + + +**Summarizes the eggs_tidy data set** +```{r} +summary(eggs_tidy) +``` + + + +**Summarizes data set using the skim function** +```{r} +skim(eggs_tidy) +``` + +**This chunk uses the *tibble function* which provides a more comprehensive and readable data frame** + +```{r} +library(tibble) +as_tibble(eggs_tidy) +``` + + diff --git a/docs/Reading_in_data_HW2.html b/docs/Reading_in_data_HW2.html new file mode 100644 index 0000000..2c36e44 --- /dev/null +++ b/docs/Reading_in_data_HW2.html @@ -0,0 +1,2942 @@ + + + + + + + + + + + + + + + + + + + + DACSS 601 Fall 2021: Homework_2 + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

Homework_2

+ + +

Reading in Data

+
+ +
+ Cynthia Hester + +
09-29-2021 +
+ +
+

Reading in the first data set

+

Reading in or importing data files to RStudio is a necessary step to gain access to any files that are needed for cleaning or tidying. After imported data is cleaned, it is then more suitable for exploration.

+

As we know data formats are not homogeneous,and come in many different flavors. So,whether data is in CSV, SPSS,XLSX,SAS,TXT,STATA,or HTML as well as many other formats, there is usually R package to read in the data.

+

The first data set I will read in is from the included R package “Data Sets”. It is the MTCars (MotorTrend) dataset which was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption as well as 10 aspects of automotive design and performance for 32 cars (1973-74).

+

This R chunk loads in the data sets package and provides a summary of the statistics for the mtcars data set

+
+
      mpg             cyl             disp             hp       
+ Min.   :10.40   Min.   :4.000   Min.   : 71.1   Min.   : 52.0  
+ 1st Qu.:15.43   1st Qu.:4.000   1st Qu.:120.8   1st Qu.: 96.5  
+ Median :19.20   Median :6.000   Median :196.3   Median :123.0  
+ Mean   :20.09   Mean   :6.188   Mean   :230.7   Mean   :146.7  
+ 3rd Qu.:22.80   3rd Qu.:8.000   3rd Qu.:326.0   3rd Qu.:180.0  
+ Max.   :33.90   Max.   :8.000   Max.   :472.0   Max.   :335.0  
+      drat             wt             qsec             vs        
+ Min.   :2.760   Min.   :1.513   Min.   :14.50   Min.   :0.0000  
+ 1st Qu.:3.080   1st Qu.:2.581   1st Qu.:16.89   1st Qu.:0.0000  
+ Median :3.695   Median :3.325   Median :17.71   Median :0.0000  
+ Mean   :3.597   Mean   :3.217   Mean   :17.85   Mean   :0.4375  
+ 3rd Qu.:3.920   3rd Qu.:3.610   3rd Qu.:18.90   3rd Qu.:1.0000  
+ Max.   :4.930   Max.   :5.424   Max.   :22.90   Max.   :1.0000  
+       am              gear            carb      
+ Min.   :0.0000   Min.   :3.000   Min.   :1.000  
+ 1st Qu.:0.0000   1st Qu.:3.000   1st Qu.:2.000  
+ Median :0.0000   Median :4.000   Median :2.000  
+ Mean   :0.4062   Mean   :3.688   Mean   :2.812  
+ 3rd Qu.:1.0000   3rd Qu.:4.000   3rd Qu.:4.000  
+ Max.   :1.0000   Max.   :5.000   Max.   :8.000  
+
+

This R chunk uses an alternative to the summary function called Skim. Skim provides a comprehensive overview of the mtcars data set as well as providing a visualization of the data in the rows represented by histograms.

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 1: Data summary
Namemtcars
Number of rows32
Number of columns11
_______________________
Column type frequency:
numeric11
________________________
Group variablesNone
+

Variable type: numeric

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
skim_variablen_missingcomplete_ratemeansdp0p25p50p75p100hist
mpg0120.096.0310.4015.4319.2022.8033.90▃▇▅▁▂
cyl016.191.794.004.006.008.008.00▆▁▃▁▇
disp01230.72123.9471.10120.83196.30326.00472.00▇▃▃▃▂
hp01146.6968.5652.0096.50123.00180.00335.00▇▇▆▃▁
drat013.600.532.763.083.703.924.93▇▃▇▅▁
wt013.220.981.512.583.333.615.42▃▃▇▁▂
qsec0117.851.7914.5016.8917.7118.9022.90▃▇▇▂▁
vs010.440.500.000.000.001.001.00▇▁▁▁▆
am010.410.500.000.000.001.001.00▇▁▁▁▆
gear013.690.743.003.004.004.005.00▇▁▆▁▂
carb012.811.621.002.002.004.008.00▇▂▅▁▁
+
+

This R chunk exemplifies the granularity of the Skim package by selecting specific columns to summarize.

+
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 2: Data summary
Namemtcars
Number of rows32
Number of columns11
_______________________
Column type frequency:
numeric2
________________________
Group variablesNone
+

Variable type: numeric

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
skim_variablen_missingcomplete_ratemeansdp0p25p50p75p100hist
hp01146.6968.5652.0096.50123.00180.00335.00▇▇▆▃▁
wt013.220.981.512.583.333.615.42▃▃▇▁▂
+
+

This R chunk provides the column names of the mtcars dataset using the colnames() function.

+
+
 [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"  
+[10] "gear" "carb"
+
+

This R chuck introduces the dim() function provides information on the dimensions of the data set,which shows this data array to have 32 rows and 11 columns.

+
+
[1] 32 11
+
+This R chunk shows a generic visualization of the mtcars object using the plot() function. +
+

+
+

The Second Data Set comes from the course csv file eggs_tidy.

+

I wanted to try reading data in from an external data set, that used the csv format.

+

This first R chunk reads in the eggs tidy csv data

+
+ +
+Summarizes the eggs_tidy data set +
+
    month                year      large_half_dozen  large_dozen   
+ Length:120         Min.   :2004   Min.   :126.0    Min.   :225.0  
+ Class :character   1st Qu.:2006   1st Qu.:129.4    1st Qu.:233.5  
+ Mode  :character   Median :2008   Median :174.5    Median :267.5  
+                    Mean   :2008   Mean   :155.2    Mean   :254.2  
+                    3rd Qu.:2011   3rd Qu.:174.5    3rd Qu.:268.0  
+                    Max.   :2013   Max.   :178.0    Max.   :277.5  
+ extra_large_half_dozen extra_large_dozen
+ Min.   :132.0          Min.   :230.0    
+ 1st Qu.:135.8          1st Qu.:241.5    
+ Median :185.5          Median :285.5    
+ Mean   :164.2          Mean   :266.8    
+ 3rd Qu.:185.5          3rd Qu.:285.5    
+ Max.   :188.1          Max.   :290.0    
+
+Summarizes data set using the skim function +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Table 3: Data summary
Nameeggs_tidy
Number of rows120
Number of columns6
_______________________
Column type frequency:
character1
numeric5
________________________
Group variablesNone
+

Variable type: character

+ + + + + + + + + + + + + + + + + + + + + + + + + +
skim_variablen_missingcomplete_rateminmaxemptyn_uniquewhitespace
month01390120
+

Variable type: numeric

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
skim_variablen_missingcomplete_ratemeansdp0p25p50p75p100hist
year012008.502.8820042006.002008.52011.02013.00▇▇▇▇▇
large_half_dozen01155.1722.59126129.44174.5174.5178.00▆▁▁▁▇
large_dozen01254.2018.55225233.50267.5268.0277.50▅▂▁▁▇
extra_large_half_dozen01164.2224.68132135.78185.5185.5188.13▆▁▁▁▇
extra_large_dozen01266.8022.80230241.50285.5285.5290.00▅▂▁▁▇
+
+

This chunk uses the tibble function which provides a more comprehensive and readable data frame

+
+
# A tibble: 120 x 6
+   month      year large_half_dozen large_dozen extra_large_half_dozen
+   <chr>     <dbl>            <dbl>       <dbl>                  <dbl>
+ 1 January    2004             126         230                    132 
+ 2 February   2004             128.        226.                   134.
+ 3 March      2004             131         225                    137 
+ 4 April      2004             131         225                    137 
+ 5 May        2004             131         225                    137 
+ 6 June       2004             134.        231.                   137 
+ 7 July       2004             134.        234.                   137 
+ 8 August     2004             134.        234.                   137 
+ 9 September  2004             130.        234.                   136.
+10 October    2004             128.        234.                   136.
+# ... with 110 more rows, and 1 more variable:
+#   extra_large_dozen <dbl>
+
+
+ + +
+ +
+
+ + + + + +
+

Reuse

+

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

+
+ + + + + + + + + diff --git a/docs/Reading_in_data_HW2_files/figure-html5/unnamed-chunk-6-1.png b/docs/Reading_in_data_HW2_files/figure-html5/unnamed-chunk-6-1.png new file mode 100644 index 0000000000000000000000000000000000000000..e1162590902ca801ead1a07cfd7e81906023272c GIT binary patch literal 25953 zcmeFZcUTi&*EWhjHK;TN=~b#o@1d)7q<5tE4xtmO8fntIbm_hKZUm$U=`DdEy-R3@ zmJ^=mea<=8xz78~ci!*6GufG3vu5v|ne3T8_qx};5~-o4K=6d>2@Vbpfs&%E77oq> zBo5AfoJaSumiDl1HSEW_hKi0H_7ew33&t_pU+U|&lZ47L|cb3C@{U5h)zT|2C~-1W8!dzOzuiiVyzIJDt+?|X3@ z$2K@P3^+=%Z*=^!4!ZDipSPrLRPb-)y;PUU{Ba>1^wo(1ez-*_B=%Z^8CfFs1Oa2i5y7x8K+Mr`uo?Qy!+ zRu4$`{D?dG@-XP)do^Q)6usGs=_9SLRD~>$8~V3+BrLNH*I*wBq1)nRzP6@8)Apdb z$cgsDu#=_I-h=l0IWUs<$xcogCQ)zadv*x}7W{OrXSUWxbjiQ>u4|UPd7sE@6nOiC zNiRbPMibFbhAdvr0h4OwK@rUId*{FR2iAxKDMgX;*|e}=dhS#@x4lCDfILFJU}P~o zMl57yKSX!s`cUtgUT$2rKdMaCLU4fV|LS!%L+BjatGtefa!Y+uPUI#2;rZ`tL$6UE zM0rP~D^K+qxBWuW@oh?aLH*P7nszn1tr8QPB07AyiTBdl-ZUAvsNy?f+o*9frEEjc z<_AGyp0o4OiiBPKLtb;tO2z!*L=+E<_l?$4+)i5u75w%1;u0SB?5cf-vgsLRTUP|N zu}6F=^C(%a^zVoi+Zghvv__!XOn;|MT_taTFEI0sY1)VNzmA9Ay-&HJIBu(|9u{kS8oXPTuFuj$ zX&gWtQY{#{ZC`^!hpt{4V@3rsl@s(6H4Wy$} zXZLuEiwf@YTM}R!pKARIVq`blxGnlLZj4o>OX#G$kQ=gR>aR#ib3xqiebucazFg6n zG5IrPOXCV*G3+{yl%iP~Em#FOD6#UldfSb;6UtgOM3w}fr8#fqb3>-CT#B|A%EI}? z0=%KhYY5?7G-C#)eTmEzvect%cU%M`m`qeTp@I)L$kPQ$oH_O6^HK8x!&5G#v!OO! z1iShR-#bdL^alu3qg^QB&yVt}yPSj%_$PXiYaz%O+KpM3Hu3zp`5IVfT>``C zPhD>{gdSM6JsHzkxm%oL3`-Vw2&yU9UJo$g!hs24L}QeFkZ0m~rc??~YoWF?5&(dcLOVQVY_D zrh`d6R`6qQ7Q$~j&Oh`^U$nY~d}KP>dL7%toYzsGZAQ{YX}Tr)7}8K*R3qpatXf#p zFj-l*^+Cda00~dmrx&84!NH;V2)&0>{t$qRQ%XibgY)gZISA*2i zEwkR556$UR0@Ey;^nlT$l}L+IND-yvcnj}?ByEdabI{&>h!GtHja+EZCukKBzAM-q z^qlnkF`z|?w{_zqR8wc3jRJo89hr&?P|*C}(xCDeo6y7faEbN3gxr{or7ULcW8*h; zsCa21G}@pSX-g`a8m(%hb^CXvlq~V2lw%f?wjQOV)?RyLIH%qNI8HXI)Gj#DjxWmE zK9C>nalMrvw9A`cuVj~wR@moF=*%Gv&He~gWymL=&q7&cA~&cjKJJGN>wR>99Wopf zYrkKXG#YmT`GlQDRuKh~LafeTI}Lj_q|(2?>eQX5rhq?SOmYs>7XfyBOu<(RrF+2# zwpioH7R&FCaUSLPTF$VLXipn!MmN-EKo88y!p8>el(Qw#a6$SIVBz8M zRWtHXSrXCHp|N{SZc@Z0N#j%1jYg2Z*hu4R<#TzoF(&}zA-;H81|`jTP>-)Rpqy^p z4UNeVE^!5Ab<$0s63%Qd_-7l6p*Jlia}#dCAxj9+fGHY+!z+Ennw#{6f=_tYat1V| znJG=3-h?5yyeJTY3P{JEnSb9qJmg0kMAgx9V$(y@exHQ7o1Fz`;08)f4ur<07Pe{ZvF#<4#T)FG>@ zz9e3iM;o(`w66NN5VypIIFR!-_Uw{_WM03DHdjW=vh>K2cH_S^>TR9@%H>-}7F_6M zF7cN!*G2qJ_gn<9eaY#%wQF?)%q{6|S*h-sA7n$IXAi>zh?eZedLQGf2?D`{$?V*I zwDxe>#L9onGIq(2rNTi3;Q{PmUk`j(BpY2fy{U4rDY}9n;VqtWSJM$$47u3k<`gN;>Fp_znN8@-?sKX+a z^Y%erpnj5zsz0e7U=I3wbu@9Se1HIUotaqT)rT0)exhAl3_bfvN(G<#nQTB*;zzEV z3jg7xxI_2aCxN^?YU`O1v8?h%#V_Qi{%8J8HnSS{o``BtOrD~Q&mQ}L zZhCcQB6NVVCk;xFfCYMg$u{WYdJekn*HN{QY9I}!HSpA!J(uil6i9|mObAG0^ZYTG zU!nIQ%ht3gDyBqI;$4;R4r|d<*TAq}dI?1G5*|2g>f4`x)D@GV1Y5>7j7FECJMX6D z@MF1R+o2jGOJ17Vj1&F_<4Y8EpeaEhy|239?KO)}LRRNg`~)L^s)0lFwYg4V`I50g zP(Lj^1NEu#1XT!qHe9FDi~I+Yc_{?+$$Ccs>smV(qBt5Q$lp?`Yxh9^Y>yOjy3pl5 zYL?xX?_%XIXc>NNn4fNc=I3!(0#(|$an>GQjqHw;sKdz=Y`sNnoraOt+0xFoFwCXh z$V$$R`arC865xV}+(fKBd#M{Uyo>IMAmE;81u^nYbljKPe-y58yDOj@3Yh!5+oi%| zyCXg`Us228Q?%k$xW$pjA_Tlm{3+7ft)_SSxpe2X?WtbR$!d8G#}NUC?((0JS6K|V zYq#bnH8;>#4b#Ka%H1-w!kqp!MZ!M=727&7n*iG)8Kic7rL2Bq=Sxj%(-%CZ~+7ka;fD+=&o<2^GeI*hiS-@?kIv~nt>>a%K==h;jGK9$zG zGRWm*ev-#tVg%F@d-s`^c!5`c%&R52T%K2iEB5HMF* z5trPeP^6f{wXqStX1SH}xf3k{jS;)5^K7~a#~ay~uML8T|6N>&Gl?6^X@Vo?w;x2Q zXyj5{LDlLvWQ0y?sH;Jw_%kZrYxswGqU^|2eUCR^KL80_T&wM0RsGmr@NO@b&<^qe z_t3gZL|z7Z%~ioRHf#b&z+nVmxmL_-*H3Ix0AZr%t2#w zEiNTJSN-kGE0xfNj#nYu;T;R8>YO?JiNLp};-9&Y+x$1wNdX-byr4IST_1WYScLyh z3AKW2K0Fi{kb<={sdPVKClm@7!gnQRAOa8OW^|TTDSd?Ap#n3*U@1L&VjP^*d;ft< z<|ObNki!YWAvozC4$h??Y_}{uE?Lz;jT)2O`#7@^mKfsvpjO#Cm(Ar74F`>7er>CRv=Xcxv>swX? zkiDE29zGHpFTa|-t=C=&&e7;Tf>60;9yU{Y{5q3(x2!W~S^U_cX>Pxj*5cMH&V=n4&1wPS1yUI2@#1r*rQo}M zUAMVyHiQUpEF8C0%cdtvfSoqPHyu=<;)yXZ0qUt!4G}4ps6+v4M*BG$J_vA)Y!)mA8E*h|mOs;{#M_rPH42gkMF7=eiD)Hn*L! zi)gbBH-He&-`P_BW*UJW@x}vxEidc7;Tg8+-&6VX58Cu+$|+T(uKxE)mE%5JB%TeZykvF{KV0t0*aKSB@-UhV#DMJw3)hk%F#S%^F&%Y@QJdYJp z5AJcOD<=|caPhkbkL=cPj$?-J;+qPBNA8RGGRtNY%;%abfRgEt2R28nPrSz8yxv zPubImQmkAM`jC_H@Bf647+QA~H2E%e8C@(Ig@lUzK%OcMXH|`gWit35IB%v_aHNW` z&Hd36ZRwvNQuh-j2GY=XJ>~;87*xqMenz1qXsm_6u9wLp;lmTnyF$Q?*U2y}CTc3R z@C}4Z<3T*#u-8>Bt=L4TO(=X)(9+3fXAtM~&P&8My3uwS;`t}JaU<>gB zX$}H!@B(|}94E)ntTIT80}_!NZ;oZ_+H#5#wWh?R08D596*Z4^pW}@a%1tEkeOyA~ zGpG2np8C!~=$9`t8wN%f3rLA4viU6HH&j{OoGvTQ^WMJ4d^LPfpv!cyB=Wi+Z>AkMikyJ16z4 z9s;2%iG zrEM0pKkbotfHrEjCX$AdTW^3GK20e2=Dr+&ZCE(->bC_)OYNs>DnCn&MV7ZSd-uH*bOHXOiY$T_}K2sD8NY^ZEw z`)pI0WG&W@%GC2!+ZuK{uQ?@kY>q&^^Z@>3q>19O|Hdr(nTs-D@{Bp^b8@YUefzeH z9d&h(*b9^!m>n0{XVcSIed66j2ZgQI5=ns$S&LHR%7 zZ2$MZx=l=v4)BQH;Pbx^w4p9AGs(JqR{8@TcyLAGU+(Q`PkYa0dh~-rc}r_A_-kBs z6W*nF=`Xwz@$;VC{yDb$cSTMA+uU) zU}HNQsj7_OAw+(NizX>R&X>XNFsu6h<3xdy&ZTJIp;u0qrj#_IqWw%>JOr>-yVzW`9O!$HIlkngehv4VNJgn=@!{SbWSt!4@A!^vKp0 z`%uLYYzBH-0xphmi(ol{=cbS=)a7ptOov%)INYSH?3a(HmCnF;xTRh8n z_qXO_p>n?X?g^d5d9~9>EK|wU13sXD<5q!|rNF$cM^|f=Eet3E?kvMaX5)rR8sedmBAeVWSsGmQQ zM4*VV4`O}aLjvZdh3~UQ1wZO!9CgDm&$Rd7VjZ5T-4--A!{;}2-{!x+t(qiVa`MK<_D{}sX_dvk1EFT_j zTgW&tf)6utgB*zt4M({b%8j~$_KuM7H(pDm&dBRXavE=6n=X?riI;YA5GMCn3%Pn_ zxL*2wAo`{PTI16UMZEMr|J$+(8kwym`X?&?q9Ok%RAA&Yhvb;JZr42v622M2>hH*x zj}ecrtTiZt{I@CprfWkwQ`~wOxYQ&6>ezrx-4G`$SpYl4FI?XH2*i8XMt9WZxxo!I zO8v67uy5kpC6&3J^PumYNbw+hSxcp&$Kz_2e8oKb&kg()R)=i50gJgm9)2JujIBof#$!Q#qe5Z!Wk85L#42`}7kFU2 zJRrMB6xT>2P2mCDFADf7=x@WN%T{WQ8yYn_lRPHH__LFE06vTl%oN@9cDT%kjXCn$L%ddZv&cznj-gRe4^$I`0I^IQL`>ABcrU znnD`pN~tUDLkEk?(n7OFw!>{_D==j1Jimn>;|aPMkR{ZcJ3#uIB?S}0R;<{CI)~z`U&StGg=G5-^=U68 zQe;V|4kyr%NbXFkRZa+57bT>Uk4EdjcXmexJ!jfdpu@QQoBZDco&fA}BDrxlx{q_8 z9`{cctr1iUH4hv?uptMUgTG=sY5G{|q^KJM4+OjP6n`JVHT<@zsdVbK;w=E%7pNL0 zP$MXqY1KldbIpgmH=#U@R1M4jZHvGJQ)TH$O+RL{^(t2uGaK>M?6Q0Q*!{OTd89o} zo6bA@YRu~hF%0>Z5Ox-!edA=03%Lr%JJDcJFS2+KiIPyK`n*!gc_EA2x%^z-E#;~` zVY z8t&?%{_-NisAC~%=n7BI&6173)vuuu%3fl6{_4?XzWmm*P^;qfG^YUqIh(vi7Q1^< zV| z1GA_=1+EpjY$4q{7e`&SyX4DJ{y#W4N(3u%4{>lZo^(fH1K)V}{=2}d^QR7aZkKNH zF*kl3931Fv(@7bUTa{_Xmzo;>8bkB!^blAhPFw($zx`No;6Jvo@A zLjV8eNbblJfXInieCU6{oR2-uSg&0s*ydjd=TiroRy{`cjY{yDCMvK7rF>Y;j~Vo} z?QJX2$f41ES9#s~Y;&D0T^h_n`J-ugd|(%#zqv0E`@ol#L=mWE$VkoKYELzsXvJ1A zkJJ&dB5pDSq7SIx+mWJao#qecTtU-P-qXw8%^%g~6Aet`Y#ZIE`GB-pv^7SGtobiR zQBkC58!a8iha6^wuTtoXMpZ`BH`+3GepW>`b5>BoTZh`+Ay%osh4Eb<#YPJS!?=te z`XrS--|8fM?!pag2Z+2iwFG$%dkTp{G8wRbuur5TS0(F8s&JdGuiPkUN`swhRHY^8 zeE+0|QQ=5BJMlDPud!b)IZ4|{P>+q{=VP)}XTm=9e|;-sI0r#ZmJoh*?oS`vU49_b z#tO;_80hK`(dJfTxwOC{tiEUoZ{7}#OGlF%i`l{1w3k0w@T+xlq=jId<3gPF?WPdz z2Qh73JODQ1Ub-si!HT5TE;MR<^Eeb69bW&rE%qrER_65yfVg0v{f`zPt*O-l6pAwL zhF19pM7hbcQA9;Gu+4YZb3sC?CpJD@@;XA2wI_p8J=6WCIT4P#UA zxqrLbyY>{s+IA;ZOKx#7%Sj zxIcy%uW=<8>TJ6uw%vztBws8kMcZK%Z$nb8>R_91>~0 z)$%G??D-Pl2gkU>dnFz52F_#oTfc2h2(D9Q&D1VACqVmY1$1@Eut-1-&IW0q7xT7? z+P|fOYcS9jXhA%Az7r>)P=z{`p4DE>&@*A5Dgw2o#n?bnwN6|rOM#A71J%61Q}*HX ztTok_(Vg14!=B(%tr(&8ft+-Ok&dy_ZIVbiy&9+}SB3+P+PEPZrKVU9bSDI1)Wz+Y zFG5d3iLVyoY+Z8UQ8}pbHsha6vbw;1_6<`kQEa&PdbS8IS6Zs?J|IT~NutPd9dl{% z#{%8|pwkAf4Cl{LF2RtgJ&vFc^Txe}`*>U`nXxaV*0qd`HP zFTqNo_{pO$e2pcZZhmO}z+#DLNNM^Iu%@FrbfwY`i6VHQnL$#(GTu_^DVPKmwTh<> z$_En)Ut({65BZd+^31ivOsuO{deU>dfNHUu77ti`Fs;tk;(nOtL$>hI`inyH6!$Z{ zl?O{s^{9JF5NZsSalTQEzynLP!8;4skHt?WpIh%SqPz~LHfU7NQZ_<2jO%iZ_t>wn zoHYk=Qf{0>2Hfyy^I)6EE}hJ!LS0RbC&33X`SFYD4f2ERX@Vl+Rs&I>^>i5bDHbfe zUw>yDEkQ~;QL-JIl)`qVSjY}iHkdHDA1UvGFc7`3A5?ffxIHCGJ~B0QL}YFyY6$7* zsIN}W!qth2of+7)6UbT}D}Uk2gqld;cj?<}xbJei6FW_sbxc%k&BV@-f5CuX9g@H9 z2@0M(a1EG65`F|;F4GJZy%x$Q53(X+l=R)d$KX$&c1Mn9m`E9Bv+L&(*bIv-j~@HN zq3y8|oTKviFa-DfyYJ7%Nhj~qsuuMz*qpqi*?~UI(*0TVfhP}NEeO+wM!la{`LR`? z4-F0d^7vr|FHmk4fdo(OJI-}cLP6e~-e$BA2Gcal|0@nDzQEY^bw zyEx0(4D&O2Db|yc{kPAU#n!D-X2=6-|7fF1(JS;j@;9R>%V+3KuiPBY4F^-A!|sJZ zw%s^x5-K?iv_HR1;vz>yRGyYYW%&0ExmR#upoBj+TI+(y`w#-9RXNxnfX$rKb8|LO zzgHS}8%G$EJHUs;CCK-l3dHVYx060)I^wDNDcoB4_W|Id5z$pPkPx@7*lci{V4B+r zMG60M2iLfM<-ENDfB|VBTWOtOf|3!+Ns@8n^)Yto5~$Gf-}l3_s)<+{xpz_=1O_iG z^Gch_Q_*z2Y&_dJWp}DbOO8wPJYl9PV2`XU{wbSIHZBCDKfhR=Uw?X%K5t79AbG@@ zEeQ1B>{8e{R7xr!wOXbb{bz?tEj%qk93gIaq>UPj*goitPM~M~Ea_cHD{cXC>TZ@c z@IgY&TGcPLx(X#^dd8n-i31_;iUU5$$yc#luIn+6^A95{@x>A*nUG@v%Z#VrS(JQv zdXd+nP{IDC2J1gxcLk`jWOFKDT=?C=-K3 ze{CG%kv0jw_;j6M*wGKxzO&y>W0~Sa)~mQ~+7$;4JD!sG2RI)lRO~?qc5=7ZC;FFD zXE->uz$wbC+r}dyEZ)PxF^Jf~PDyfm=ZrVHT>si6h`r)|g@eP{_qhgo&`!SM%Cp@w z`xpyQ1QMX^XGr?sZhN!9CC?sQT&#_6j7|IYW{ZANJ%WZuXUJ>{ZqU5$9WM%wVOov$pQB;WCnSNT(_Uha44O zV`w@(TZm<}nO>n=>X;O4As!d!Pyg{t(^iP}EBB5gIeG@(G$Uko&9)G58!H;;7Hp0d+UxEEeIRf#}Y&_x&^|E2YhDX>-|Y)HXu?bLI-#HkPhqKdcC4g|CL} z;YafcoS0G>Ga6Ml5r8-3El<-8739B zTe~2LXU>U!c+c?IV%$hvb}=JZNnpa|>nQR|=GR4}&#qA!R46zM)!fMl#=H)}OLL8m z%OT>^!!#0bvlHW&bg{A=Eb+maAba$!U36rAYjIIc&c9J`S;gX1YOWt2TBGu6jSqA4 z?RS#ocm2X02w-stg!bcNA@VM7ey?1@M6#Fe0!#=A>Wk7y5v+zmQqs^9OGq**!QbX= zHn#;v3n!MatTTcKv?JqAk>j~VBApaG`mvuwW ztEXhW)q+oNK@9-5oy8k|AP2{6=c#R3dd20o#WT5$qy;jy-Yiw+eUM#U0z1PV(4sKE zPFdIrB*w}pcUiy$cZ|cTMN@I#yT06z8E42c4BfR7i5N8|u8GnJ*Y=T}m&xFf zmhS2747~j<8B!WCgAW^47Li6U`;!ZupI`@_AgZvmJ$8oP3HA5y;7J98jMVdFD)<0> zvRDABO`!FL;|#SXP(;@$ntXkH`TJWSI{&O`uR6Via-t)x(6sumwuaax?@ckKbcnS# z>t#BDD+{I(*Zgzm6K~)~>m&ERgND4*Yk~?k3pvQ%Ev;>!^R95O`{lMfz?ainPZP_` z^hQ>fR)Pg(_q<1}`{t4;;n0cLz?1jX@RPOPmCTzZi^Q882aule#DsNpSc-_2-mM9Q ziR~UfxA*7P?wW!KNg3H!LShQ8Jk$SoQppC(hr!abSQli-Mv2p6W!t^_2)Su?kbX7N zXmOyUFxc|Y=O;l0adTwV`Lb5L{@P^P!0k8x?{%*! ztNDolQKFODcBoLO3@$2Wlml%G z`gX(3!lW20rD7*Cnzn2Agp`(zp2H4lr`C=JZy?}lgW6X1Df6l%brBu`Sni2xAIm*q z?+fP<^}Mw4RcIDqPEqfTbz?T2SjYGuob=9R`b)<~G3pzTZ^_Dg;Jkqw=!Qh(^7m_L zU1yMWu~)k?ezp6!w?Av2{Aa4PD%U#J1Z;?@Y}WzO;JVKM7jV<+E)auEr9UD2AS8E| z)L!q!JXuuW3tEm3r=>ZkC>Ct~18-l|$;XE&>;G!(LlSNl)!_InB59}vSeWqDmg46! z2cHttKG~>RU&4ev#>;$4d-2X#NE!Q|xWO$PqkxZR)ZYY|r?{I}%DTmv?E!n7Y;!c(ZmSEvdx< zQ1E%7qS&|c6)w^{tC~+6tMXpT1omB_r&vPLEDKdUiu{pE0#M+KPe&S9o@OXD)qPwI@=i z|IB8BEo18`;Q{chjkUS9-TPav{|{J3pTM*3?me~Yf3s2g{J=f@r=waWg0B*j?ToOa zckVt+EiZ}kVQ`$kCGj5ItO_n&WvSp*h(TTC?}vtUD|BIrril>cLiQj6?(LWs!cvx1 znyl1|Eyb^itU`EdGf4Rnffzit; z>vw5i?;~RRke!OxfPVH)Q6R49jSeZ2Y z2l3J{IySH}>m&W5Cd%lKx-XOKlS?DaHa^y4A3{>eb*9>k8z(55D(MfaJP6L<|Dnc| z1KnZyY5@v%9qmC>)v(0gp681Qo`?chkkwJk))A>d+KDpIeYeig-%YD%Wj zJL`OZ1niO}q#L6K13Z_gc#3xtCP*9dp>i}Zn%*z#DPBdZJ7K_YeaP(!Wlt;BH+fLQ z6oC6Xf!&n)ZKCtDzg;J)fD7WYxO1nE1L}Py^seW*{2RT*;}OdEur38vIxA3u9|amE z_cBA&3M+O&%^J@>geTU^JqnN3gl-}A_<)-)xzP!U-{()yhqXoofGuPWpgP6SiHjAW zB!+w(d3Z{mLyuVuYFjN1`V&t!WIr#!IE*Z8E3u1BP#G)bj&r#&Xlvh{80`kdssDd(@Q3z8XJ7jh_*w9x|tw9=+}6{?K*%@l8|N5L3Qy7ji%t#SUNnXG`5m|J#C$O?S~>zEAe>( zp$?$bZ3--uH53U&oz#Uv*?ogjy#mxA)LXCFcjRMITV(SUP5^dt2`hOnlvz`G2>=o3 zp6uDMC&7vn#PfY4x&F`5Q(4rz!eJ}gmwHR5O&ZHO>LpxUM^Bz}Mml(4R(lkJJ*8KW ze;c1{42aA2qr9SuH#FJO&<5kk=b^^TJgi zuPV|iJ4x)$Ps`>$lJU2GpIaa5Q2=aJp9e;XKZY=kl^QIkePN3Z$qO59_QTN=1~O)f zYL}Q|)p5%%3eSAy?<5ld5%^~S_QC32&_8%3SwCHwv4DD$lb_%*!6AK=n*1$9ZbCbm zGD}GPQ|zO(7RBx?%8iFI;w7DPYJ(KmNuzj(mt^`>XL3&SCnrDaINUxV%$MPTF3<=Q&6$O*x5t46Stur27eF$Rw{ z^EVSa51Tk?*YDo^707<2{vnMAQr2DHS)0W{=s~U8Y}TVhfCa+dk*_jyC!(WiOCRIIo=gZmKB-)a z!H0Nin@_h>zw5lkIOoN`yb*imFe`GDqj_f-P|0`j^=ZzoWhy(DY+u=T9%FBi`> zbCcpyJV!akb?FVcD@`3i@V>0L0}A6>CP!jQi#bqZGQGWNFi?wwJwUwZ3_)E7ZL=V3 z`;$?R51S;RG~Ui{PTYB`@@gDO`18EJ%SE!PXUcWw%ZfGcLQM9H10DAKU}Gs!0iY~) z-VEdylh4WgJGq^mzTE1`PT7U_HJs%K3YZ@|EcR*tP??gIv=x*+DRsx4t+z5->8g_% zP8Q+Ub1w90gN9bkODX@Ph+EzS#rSs$BFbC`eEhB((>G0~2mAQ;e$gsOiSFcKF%2vO zwOmU!*zwA1lUAtQUnN*-R-|=ag4PPO4~!LRWzbteezI~Na|bA#S~o)FewzOS5#OI* zd^gom(U2RLGkdiCliK(>g>k;$GYDS4qX_WTvui;YwSMH~lS4vS4^#AVoVGq8V+;Lz zbredc*b~4og>;SO`GF{P!<^)Mx5JL)sc9sGG8j%|&sqa{HAF!(b16i{>ITb-+|gPF zxFX#YOw#mwlr&cEUdmoK&#+NAYyG&NaC@RJoDyct+&uGnpA#!K<2cwMy*QD9TG2wQ=Jh!`?ARqIM&TYiHT?Zu-#G--FoG zOVQQXls~o;mlw5e&*2gNyxe9a$#PROUaQ%&`n{l+7WWh~*3?I^1p!?m^tsTIPG_=| zlYa1OP_MMJ`95iRTuNLavys$vUXHt4Rj(yhm0*~JxxEPfaTs#BH8_1>gx|rHQ_}0d z{1pm*xP1Ap_e$U-lcZ%rJp#Pubh_i0Ej?gAT%ju7>!|w-UpKzUlmF!3!V_-{e( zU)e)i1rGub4jupA?x=FyfOJm9(7iqWF>K*snUn%l#;eZh*oaoO3~K&zn8r1&j#d#H zA7NHKH1ig3U2AeYW+~@xPHX0`VOR=1$qsL$P{=e&4O}qn8AO&!W=S965ma{x27x;p zwEgpGR#N>HO^Sv$xD=vX-ZX63`FztloF zhG!^@FSq{z7pJbYE{$tJMi@ts?a0fZZfR~7@z(U37bBGHJiI{poQ#b0tZFDhPv{g} zlG;=uwh2gZ7s7H%$c@8ZqbQY*#ak|XLR5TsVC;3^=&$u#>BJRl9IRBw0IGfOf4c_f z|Cw21xW+>YyTT=V@h@TNwKM0+CNnnYkqC)piY_#hlW+RY%O`yn>7?Pmkv*^DyoQ>miGWg{h87S|I#rLi0E%SXUDR?ftn*6So%SI3Z z+D&Z+|2l?=PKKuFP7>SLYL#wMDuKs;x7Tcn5!-d0F``xuheQ6%2G;bs?B3_UPns?_ zFV0u7Qb=6HQSih7Eq$(J@R7u9zb6&EG?IN-&TO)Rw$uvD<(>To3gaealJaM;v4Iwi_fmcp;Ao|8b^u%F?)6HL{HHl!x zBpzGrlg`LAIq+j3y0LQw3&su-?Na4H-%Nwu!9x-hZXneHjJH1n)g>u5|6xsO>2XU? znTk|=?UeAV%i;yv&fOCWBOeVXg2VH-K>_oX2O+w<7Dj=88hlR9&s^Lb!Uw`pH><1K z34weU{m4n&)S>YKR(zPW_}7nrSukNn`yfPomj z9eXt zsBd>$Ijc%DcL4Oy2+^ug@x7_1 zB5ezKqNz0JE{jnABmaQPQ)GWhF-J2n`KX;kN7G}{!R9Ic!B?{9xmVLEs{ALjnnT1) z=JN0Tmz76|_Ax8H_Ce?!REH~Ypdk^Opf7v=gWA=bC8*W()*a320#YqEu2pilP9VsM zm?N*WcsH0SKFm}MRhf#2*a;VB&g0Dt2w|Huz+=3UJKNk}BQohaF~h=& zw*H-1epCaMvGMX~o)^Q|3Eb|z8 zMLg8;Z1|^AG|I-5}MATo+{*%&cvB|ap62s+|}-m4)Y^Wn+jH2Um6d$ock39 zx%^_}{VjciWg%s-jCy8qYm@5a~S)!Z)J8u9T(n`$}hc z8;}X^BqEW;RSCh+wOPociH@R3B6Qji9q(V)zCRWl=9K}MAnbn(ar(5j zvkf^FJj8eEtbfhSt$xNZSyTLYitAnM+R1R6Hx($k@uC4x(qrs@u8zprKYxoS)fAOt z%78k`FAf;^{;$*?vp!Ue2tg(*KM9kT^Hv=F&+aZ@NC?m@-FHL*>jd)sy#lQ4?;9yg z>B3?f?7ztrd^=0QC|ZKBH(p?pqLB_WqxvgZogptg54RcW2a!IKh87+fYZn^hbUkx5 zKMz(+=o}^eP!2C$SG3N8O=M9{j60i@f4AqbTALh}qZL+LOO@!O>s9MhB$`OsL`rM9 zW3{N#8pNKPxqz(9H1HagsE^6l>`O^~;=@}^pAWXi=T50vmbPT%13Ow~93)vzWxIe< z4Flo=3c~-R1*i&WvKgZM(^JS`DSzhzE^FD1c;bbpwZs+Z*Z6`Wp|l~7_XZcD*ts?4 z_*!qx9HAMZ$e749fnb6GWd&+udq6T2-Q_V`GmX4l%qb-)rst%VOt^flYIv|@@r+0j zyd|MdD3oGzxgguS#19O}fiaA;u^GB?jzTal`A2+}&Y;o0 z7S*wLPsD%iXx(Dtv+|$BfI|0A>k`4!PY(>L&MvMW&2M-4*YSZ;Zs_V!=imAT-s+Jv zD>R5glGcM+AS4tF+LA!tEam089g+}me#871sF3Vjwt}njtRT>Gj~&QrMKs+ygQyL< z@b&$iX!05H9mK<{iR%0ze>9}fk-%G@gw0mF$l7hv=rqnL8y^`U#J2-tXRlWmCw2MI zbmR287anO(^WJHBCL5W=Sjzu!XYf}xTC2E9e%3{8A2|DRkD02Kj>C=K872fJi&iO7 zTs{^|(kuG(H3^&D&}@rUlF5ucY^(kLalTAiEob@}C5;H*mK7JI;ld3jC7b#H{fZ%K z#IDW!SA@w|;Lr8vUPQG`cmyy-{GTM3_VUp>k+g|(`Np5QMTH>LMLAL&oDAcF&MmZR zy;nA%BpfU=)J=wt{kH&U#*Ke86_t+?2>`N8=KqmfWHHWkR> zyLo|pf92(vMa8nPd^zOS;poVvtXi^5eV$T7Om&vH0q0hs~g<5m#s?~ zv3sGY_L6hBfwv8RV`6|RnI$$xiuP;7??`ry)OBp4ng^oLmT(F3g&;N)S^N?>*|nkn zIL&Foz`EL^X-?G>KscU`@_fV!vU*g}==naD-DJl2@atrh_}SRefcTaE?$WsG$;Mg# zD$Naohh)f3tKvHF`&|N!>`;srPn<6{kTX?`{>CE%3(1w0N_xNXZw8VLtKORKUCX`$ zRNifFEA&2|J>|Cxb*S{|SJL!;1X!4_p0XdSUwlKl+th7NTWvfryrqflTV-}VEjJfS z$xDSC=Qv|QIhrJ;$HCWifMYuCM%YygE#a7Q=Pz^y3D3`g{Q+5j6<^cMl}@M5zs0Ra+`(tgJd{AkX* zF7KouJZ*=JJUxV5Ij7i8mxopw16rtqmT&*cgyg*q(XjTM$PL3D;p93Z#99=GUl?Lqag9*HPu)?eguFO-#pb4-_u-~JP5{|4V;^_oJ>u2@ih{OyKzT>beJc;>xiU3;^f{XRb75Kh1AzHtoA{O6{# zMaT)ECi5;a>+{BGD0b&NEP^6${tBo&oYw&NLo@(!59{on=@~Y(-fN!9{j+;FMQYKH z#hjyrT~me4`_wfbu(2iEHrWW;d|G$9X|ul%JVLWcq~>y8Z~Fo~Cu*G;U9uBzWX?D6lJr-XJ$2^m|P1qp%9G6Ny^$ zPnRG@ci+BMp=_BuK{zjhj8gro0SDl^G5+)1YR(AP=z-HiS&XoZ;ZdTgr)c1p#N2l^ zwkzBgI+qv>e`Yj2OEPYDS4+1Y3mT{=q^dldcRm;F)CZHTas(dHQS9|61$I{oy@h77 z6SO0(hluUm@t{r@n}rCh2##x*X!4;bK9dg%T%aZ$9n!Fc4jJ8UAQYLOL<_(OuK;5~ zfYSJZ_cw9nq(!8Iws*GIU9rwk`WNc+e| zVu3z?tEI&NkqgJ)>CGG7t@d5LDW*BfvU4aHr2Txj2L1HM-GjW!kZIrbwj&X}I*|jC zRn?tZ^VlS%>Fin~%LjcM8N60FqW6s4%Y?bV%X2LV%&sH zz~i4FiQ$?K5Wx4}r66kSmAzdg?RltEv&TuHzrw?F`-jm@({WlncdyVP8a)krL84lG z(#pV~>C)bjHLNJ;ILu!jL#-rVL2QKC2W;os)StdgxcJ+VUZJqz?ZpOG2I_efpD?Fq zV|lYYV5?70@SoHIpwf<>F-)<&Qi6?T2f`3Y`*>w>-V3Kz6juQNyby%sC8n@yvps$X zhSX+Q5rG@&4CvV`TnyR4m8(SDwu=^#VFeysEd>{BJTS9?I4VbC z0|~a4J_X9m`sFH5_1*qiag?XaMOMJ`6K>?ALm` zWsk4npvA#{?CkH>kWS)2z%zCrH81vQUd_8crNrvYpAqf}MT0GqPdg@zm8#9~B!CHq zaR-0(u!w=qV@ieD1SLw#+iG>6Ig=J@&vH3Lz9Fk1ukSxqJBlyCH53Gu$&L_H=7q-4 zZOl90)p$DB%EMtpbrpw1DK^;8U?9O}yW~i!9;{xQYH<;f4NIE7xo4J2aWR$G;-q!= zJm-`wC^AB4>SqBhp7Yf0wO$&n2nPr^$A;GCmwB{GI(lqj%eO>on`H5d(&!{fQ)9O& zyIw)r@0|khc*x{OveL_aXkeaB6UsQQu5TJsTNjl%oixyV@1}m*tMQ&H8Rc(B9`LUW z;N2mY@0=bbsNaq(O@KU zDckcA<;)g;w8>R8FUU~6+_jBlV8)}NwjM!Q+51t4HhXhLRz(_H47PhQGJ09wGq%NTzC0u9YC{(nZK>64p?UJhWK z`*>H+o`&Y#rzsBDA2fe`czNQSX`ZRV*8ryb+216^{OSMvQ34nIWS_^4-}@}i{#TTP zEU`ql%6tsgUGp7dK z*&Me}Ef!3jE39?R9$bnyQ}=gi-%tT&J26pJ6oOu-mYiyJOQ8>>$yE$`#w#ng=zITd zL8$6h3d`x`hKl-D#+xtW$vB7bSLpKN3WQ^-?R|kh_UYaR&`7h%qip|KI^FO@b#?(%uUqHABh%G(^l`M+rYTZ7WZp zNa8I&wK=tvoJvwrlvW?&o}ZCgGY#+WzXiD`rBV9NNDkK%U~ppWBpbYRDEWH zuA8*`Z#a67nUNgBZKftRSlXwW0GZkp>(ea(8S9xTH0synwD$85iko<&a27^wVM zq~Kub$6+6Eb)%D|5#yg>{|*&b_WAP%o6ZPYoBedob^9vZaCmq~p-}edw|h9vhBR7h zI`2Yg`|BBG;2gKRIjI!5csz8L{-H>6s$C+cA>`m;R% zIJoNoQ11Pp8&{smU?RJH$MSAVW>Lv`ybLCt%Mq$na+U-lEZ4p{!0IUg_&K5H`#}Nb z{#=C8k!XBtC?gXs?rT{>`GyJSt!RKzP1v1xgxrVjN-lzbL3rxAFWcDzfeI4rI_??7 zcTeJlF;nhGLYR(r*YdrLL_W2y)B40;Hg4=_+DIi%7a>(*?v6@aibS&XR@{1EY8UhvfNylPk34) zHZf7n)PRQ^cVOG#4-CA>Ql?Z7dwHLoASWc*n%P{8d$9bZ4s$t(7@Ne#2KVxn-RM@J?Q#g7&DY0B0Jo}|kd}3mFwdPb0wS_9V z=xaN^P<4ur8hUtM2jJ@NUt1Z*bJ-lM&-1B08bx5fbb>vEyk61+I#WP^*Lcm0FwQx0 zXbWx#_;?T;H!{J#SA;++00BK7m}~{QqX6RX_N=D`0}`oD;&?M5lWZn1L$7=S0Iw%& zYVH!H2qLfwTw_R679QOo+)CnnL0_V>!3)m2=Ui=d&L2o5ilnj0FY{vC ztZ}BH(+O;1iH&+%X7y~7cY{JFX1%+FUkA_f!}4t4ns0m(QWg$L%zBrh#etIQxsx~b z2gNGG68WU3&c%DIp8G>lK3;v;s(hx6lV75TucPD}yj1y=vqYz;MEy8yD{2^TdQ1P3~zW%0zo9$6G6nU3eJB2{mYsQhMdFDIn+x zZNKEbXJPw3eC@{}nXxJuv*2qrPLiL;O9!C3FZS@T<2zF)h)nfC+s->yA^Uv{XDCfJ z@^U@(a@s%vBa3+fQS_=SB^UlH@+Mou`%|z%jK!s3)?yNF?jdFAXUO%BNr~>Vk`U#5 z?C>4R(e3n1n7mhMH-B_MOZTOGg(*U^u+MDFV?j$l|K7Q&j3cX%%33qGdw<`}zHojI z)|CD{ZE{Rc(P^ZYQGz1`pHSlSP084Q!jYw|iO7Y*8f%y9{LX*@NQ0XPg`5`p*lPR%rnNSVw+dfL{l@?xV6acB82i{WKTH82A1U2x6W>POH)6k?K?=Y;a6jW4>Mz7H2rSR1PxcZ2c zHlh{i8;}PAeK(*^t3m|UDj)6h6=AoyRRAP?x~CoTn_@#z6(XX_^aKOXg;c|&?K!*7 zVYJWCct1mF(p4dLG8Hj6%uV3+sqhE%|3%H!|BQ$SwDpE9>bvg9J|Oco0<=A^HyjlT zqF$G6AW97-`SI7I@Fkje->wc+e(2$ipv>z@lv1Ysn(-jnKTk+{c{fV*93XrAQzvjL zo>2fMsBb8Z(cLmn%DN|bq9u0!7$&#cQmN?{|B+jTy`AK2wscoC+AiU3%>0=9Tt{7z z_&Uq@Q~k{2;m=>p9PQwRvi^+Ko_h9Gv`_Bo!Ls-4; zK#2#A1w~3u8ALxJ)#4lLWyu%mYvvO%bY#&s5r98(13LBALB87v$;SNks!Dsb`6uyM zyL%cE5>fD+&}wym8Zj-x5oKRHADHPfQ$6?!Iyi6*>@|pwc5t9#Ln@t+{!|UdMUdPI z2I(q{CYnFI8|_MSkU^+mjJl&I^ecGRMg3<6A=;Vgt6n=j>KvrEED7EA!QRj=g%DC& zA3VmlN80tEKBVWPjvru&Z>yN=_l_J+Xn+YmTFm@Yj%hJKnjUxLuj;iG>WXCR*wHps zDX|FtdZ8=(c=uSVFXO_1-z`in)^g6@ptd(OpV^tjnK9rl!dH%MK@T(Nuyv*dD9V1| zt6CD2b2mX6cHnU4k`sPwRY1oOw(Ln=!($ujd4hhIjrnCtUCH(%hW*Z_VI-ZZ*aHVh zZ0bf6)y;MtCZ(Wk%cb!$i*6fMc65z7@a5K`!mFe=RSA<%xVu_dGQZ@6vy{V})%oK; z#SnI6n6}>pxH=b$LN)Fbo@=^3r)OWg;v(lR0R>MP&K+T^UU7d#qG1~c>ytVMdxJ!~ zMUj1u$ztmcLq$yyGC=Uycyc*o&1g`Up}YSrftXb!Zm7-#mFxY#GGRlVhxs3)|d8dj*wUV&Qoz}Fc?5Vp4ptwXu%=lno z9BQdtRFdS$B*vLxHl8Rl*HCwsa0Z)!PZ>1Ga7V3aBeFR*zMO_a4X$;X_gS2A3_jgE z%GeU-meGI7A)D_7c112prQs{O^IrYJkaUv0K@yYsMcDdFhk_DuDe*Xt;HsxU?vj&q z>O_Rl4>rLd$?i0JPqBsc=ZFygOjl8kn**YWWoenVUpSAYDf(lr0iW1Tb^n z)^-+Gq*lNhNUw?e_ZBI84e$|E5M=zc#a~Gb#g4gT$kyaawJvS4_-t^<(#=Tnkk{C_ zYw9UoXRikVOWGVC#oOumS?dP$^1xRYp05nLXDc=7wR)wkJubsLR{GSa)`8}-L5e-s zP;d>XBeWLtjkmL)?CY^Pf9w_)k1J$n4FI};4^;=M**R~l5BtW&CCT-k3%(F5JCqKW zN}C*cpfta_!QXXI)@UZ9nfx4NqTF@=v_U+G>6%`_sA9?m*e_1n8>*7~04Z?5Dutn~JyBhlOO)gkr#^0)p9nRwKV)eh)ICCrGXh=mMzhYr1+ut2q+Yzs$TFDJm$MmdD&3=XKjWD1 z+8M1*cUMvUJ=QeamZk1UdN6Uo;~va}ZsbcXSFX$N%l3I?txQg5I1*aq>*-(TW_XLu zU)53qz{HbQ52W|eWxPnaI646`(;$^hfh@kDEchD52qqso=vlJ$*fsB4_zNf~tDO~5 ziWSNPo>j{2Je53OL({M4Z=WMSg-BhyzK+fz%pyJcWih%cO_ej%C@Dqdoo}Cld-e_D z5E|fLJjer_7`d%H+`VckMxQ>H5q47Wg(hu(IhVI`VW_Y3W3w3$P9z&ck$6Mt zBLCiA5aiC@H8Q?D+HdtRt?hU$yJ>uR-RFodY)px>#-44ji|M<)GSk7336vzz0tIMc zzej2SPp-PMII={`=*;gj{;-{L>f7WrmaViF3q5d{a%IJ5C!H&KL%C97yyo|Cuc^Y} z!-&I=^?W;ugQwB-iWqw*?_&PxgGXqa6fOsdUL)}#5VgcRF57I__fVhczP=`n0s3E` z0R`SS5e(qR|FJ|`*%vL18F-2*YasBs0#j<<-Hd^(Xz4OMo!M=a#d5%_5TOPE}7W6{P#pV3xP z9{`5x?rBwW3YeFggEV>1OT7~FtxEOo?G*|8Dgo}_JE-JvOLIHT7t<8VTSW&*p=bNn z#qfId2)Wxy=%om2`c7U@T3kw2+`QzTiVw|L7JnE$1L&Klsv-tw=B-xYQRY9#9t3OG zgFzJsk8I46`B`BFBBM^!f^_(xcp@F&tER`iRkTh+&;jEBbGP4S^-s(S1_%iH{LU*GHDzq)_8N?o7&(HkB5RWzVE7IInjnXg3&@bt=E*Lmhx zeGx(5<=0UHuP!ry@m+aHd?7Uy^ac`(6&s}gy1lyO$AKi#VE)Tlg$8K;y$Te;3U zw7^YxdIZMkcQKF0G$}HeY^uJ~NUnt{sIhSp0uN|Iym&tYw4AocQ10kJ2S((yUF0u7 z-dqY5DrVl~1%XP@?WOoyR7?ZMf1)eq%%n@)+|!1Ln{s5U$bbuA=TfFZX9(VW^>|?Y z`K3W?PX7JAFpdG4d;$^IpZS^rSvF^;e6l6jw8}S=207ge1Hp^uq?xlA1V(ugWhf5G z{P`j3^Ms3;hKqf4Hfznwg~RKUMt!zlg^UK%47-n7QUDft@FQ#GSijbaq88OJFR?;e zT~C;gRzkXMcl+jKKFx#X(;3u`w7Xy1LDwdM{k|G3O1R`Pb1CDZ?LC$*`3V>094bPg z3!!PfC4VCX?WhCX>LsKhjBmA=+D=?wp8e*|z03OcMa=OgAab;}@kNA%&L>U>Jhs3| zJ0-EPY!)V|vhi4@n4XoRzgMSfaWF~&V=OWFG>}1ZJ^H*-``f64ZH}vo6ldOmO_6Oc z{xOi*eDoFHwLxW(9tuKqX3 oq5tm{+WuE`=>I1x_o#n_vGPrjtX$0ju^5^=dM3KX+78eD2l5E0ZvX%Q literal 0 HcmV?d00001 From 3ff879441edabc8c3b82645f329343fc8d3563e1 Mon Sep 17 00:00:00 2001 From: chester Date: Mon, 18 Oct 2021 19:08:02 -0700 Subject: [PATCH 2/3] Adding HW3 to the course blog. --- Data_Wrangling_HW3.Rmd | 183 +++ docs/Data_Wrangling_HW3.html | 2647 ++++++++++++++++++++++++++++++++++ 2 files changed, 2830 insertions(+) create mode 100644 Data_Wrangling_HW3.Rmd create mode 100644 docs/Data_Wrangling_HW3.html diff --git a/Data_Wrangling_HW3.Rmd b/Data_Wrangling_HW3.Rmd new file mode 100644 index 0000000..4569a95 --- /dev/null +++ b/Data_Wrangling_HW3.Rmd @@ -0,0 +1,183 @@ +--- +title: "Homework Three" +description: Basic Data Wrangling +author: + - name: Cynthia Hester + +date: 10-09-2021 +output: + distill::distill_article: + self_contained: no +draft: yes +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +library(stringr) +library(tidyverse) +library(readr) +library(here) +``` + + + + + +## Introduction + +Let's start with what the heck is data wrangling? You can wrangle cattle, but wrangling data? Yes! After data is imported or read into the RStudio environment it needs to be maneuvered before it can be used for visualization or modeling. + + +## First Steps + + +First, I am going to import data into R- In this case the Railroad_2012 data set which is a CSV file. + + + +```{r} +library(readr) +railroad_2012_clean_county <- read_csv("_data/railroad_2012_clean_county.csv") +View(railroad_2012_clean_county) +``` + + + +The railroad data set was checked out, starting with the **head** and **tail** functions, which displayed the first 5 and last 5 rows of the data set. + + + +```{r} +head(railroad_2012_clean_county,5) +tail(railroad_2012_clean_county,5) +``` + + + +Then using **colnames** which takes a look at column names of the data set. + + + +```{r} +colnames(railroad_2012_clean_county) +``` + + + +The **glimpse** function which is part of the dplyr package which provides a synopsis of data was used. + + + +```{r} +glimpse(railroad_2012_clean_county) +``` + + + +The **pivot_wider** function was used to make the data set more manageable and readable, as well as filling in missing values from n/a to 0. + + + +```{r} +railroad_2012_clean_county %>% + pivot_wider(names_from = state, values_from = total_employees,values_fill = 0) +``` + + + +## Next step - Some Data Wrangling + + +I was interested in the railroad county that had the most employees so I started with the **arrange** function which ranked the number of employees in descending order. + + +```{r} +arrange(railroad_2012_clean_county,desc(total_employees)) +``` + + +The **arrange** function was also used to display the *total_employees* column. + + + +```{r} +arrange(railroad_2012_clean_county,total_employees) +``` + + + +I was interested if there were any **na** values in the railroad data set. It shows there were none. + + + +```{r} + railroad_2012_clean_county %>% is.na() %>% sum() +``` + + +The **select** function was used to determine the number of rows. + + + +```{r} +select(railroad_2012_clean_county) +``` + + +The **select** function separates the *state* column from the rest of the data set. + + + +```{r} +select(railroad_2012_clean_county,state) +``` + + +The **filter** function was used to determine which rail stations had fewer than 2 employees + + + +```{r} +filter(railroad_2012_clean_county,total_employees < 2) +``` + + +The **summarise** function was used to learn what the mean of the total employees in all states. + + + +```{r} +railroad_2012_clean_county %>% + summarise(mean(total_employees)) +``` + + + +I was curious about the number of stations that had less than or equal to 2 employees, so I created a new object called *subset_employees*, that utilized the pipe operator, **group_by** function,and **filter** function to determine the stations with less than or equal to 2 employees. + + + +```{r} +subset_employees<-railroad_2012_clean_county %>% + group_by(total_employees) %>% + filter(total_employees<=2) +``` + + + +In this block the **rename function** was used to change one of the column names, *from total_rail_ employees* to *number_of_employees*. This comes in handy if a column needs to be renamed. + + + +```{r} +data<-railroad_2012_clean_county +data<-rename(data,number_of_rail_employees = total_employees) +data +``` + + + + + + + diff --git a/docs/Data_Wrangling_HW3.html b/docs/Data_Wrangling_HW3.html new file mode 100644 index 0000000..2cdf4c2 --- /dev/null +++ b/docs/Data_Wrangling_HW3.html @@ -0,0 +1,2647 @@ + + + + + + + + + + + + + + + + + + + + DACSS 601 Fall 2021: Homework Three + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+

Homework Three

+ + +

Basic Data Wrangling

+
+ + + +
+

Introduction

+

Let’s start with what the heck is data wrangling? You can wrangle cattle, but wrangling data? Yes! After data is imported or read into the RStudio environment it needs to be maneuvered before it can be used for visualization or modeling.

+

First Steps

+

First, I am going to import data into R- In this case the Railroad_2012 data set which is a CSV file.

+
+
+
library(readr)
+railroad_2012_clean_county <- read_csv("_data/railroad_2012_clean_county.csv")
+View(railroad_2012_clean_county)
+
+
+
+

The railroad data set was checked out, starting with the head and tail functions, which displayed the first 5 and last 5 rows of the data set.

+
+
+
head(railroad_2012_clean_county,5)   
+
+
+
# A tibble: 5 x 3
+  state county               total_employees
+  <chr> <chr>                          <dbl>
+1 AE    APO                                2
+2 AK    ANCHORAGE                          7
+3 AK    FAIRBANKS NORTH STAR               2
+4 AK    JUNEAU                             3
+5 AK    MATANUSKA-SUSITNA                  2
+
+
tail(railroad_2012_clean_county,5)
+
+
+
# A tibble: 5 x 3
+  state county     total_employees
+  <chr> <chr>                <dbl>
+1 WY    SUBLETTE                 3
+2 WY    SWEETWATER             196
+3 WY    UINTA                   49
+4 WY    WASHAKIE                10
+5 WY    WESTON                  37
+
+

Then using colnames which takes a look at column names of the data set.

+
+
+
colnames(railroad_2012_clean_county)
+
+
+
[1] "state"           "county"          "total_employees"
+
+

The glimpse function which is part of the dplyr package which provides a synopsis of data was used.

+
+
+
glimpse(railroad_2012_clean_county)   
+
+
+
Rows: 2,930
+Columns: 3
+$ state           <chr> "AE", "AK", "AK", "AK", "AK", "AK", "AK", "A~
+$ county          <chr> "APO", "ANCHORAGE", "FAIRBANKS NORTH STAR", ~
+$ total_employees <dbl> 2, 7, 2, 3, 2, 1, 88, 102, 143, 1, 25, 154, ~
+
+

The pivot_wider function was used to make the data set more manageable and readable, as well as filling in missing values from n/a to 0.

+
+
+
railroad_2012_clean_county %>%
+  pivot_wider(names_from = state, values_from = total_employees,values_fill = 0)
+
+
+
# A tibble: 1,709 x 54
+   county     AE    AK    AL    AP    AR    AZ    CA    CO    CT    DC
+   <chr>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+ 1 APO         2     0     0     1     0     0     0     0     0     0
+ 2 ANCHOR~     0     7     0     0     0     0     0     0     0     0
+ 3 FAIRBA~     0     2     0     0     0     0     0     0     0     0
+ 4 JUNEAU      0     3     0     0     0     0     0     0     0     0
+ 5 MATANU~     0     2     0     0     0     0     0     0     0     0
+ 6 SITKA       0     1     0     0     0     0     0     0     0     0
+ 7 SKAGWA~     0    88     0     0     0     0     0     0     0     0
+ 8 AUTAUGA     0     0   102     0     0     0     0     0     0     0
+ 9 BALDWIN     0     0   143     0     0     0     0     0     0     0
+10 BARBOUR     0     0     1     0     0     0     0     0     0     0
+# ... with 1,699 more rows, and 43 more variables: DE <dbl>,
+#   FL <dbl>, GA <dbl>, HI <dbl>, IA <dbl>, ID <dbl>, IL <dbl>,
+#   IN <dbl>, KS <dbl>, KY <dbl>, LA <dbl>, MA <dbl>, MD <dbl>,
+#   ME <dbl>, MI <dbl>, MN <dbl>, MO <dbl>, MS <dbl>, MT <dbl>,
+#   NC <dbl>, ND <dbl>, NE <dbl>, NH <dbl>, NJ <dbl>, NM <dbl>,
+#   NV <dbl>, NY <dbl>, OH <dbl>, OK <dbl>, OR <dbl>, PA <dbl>,
+#   RI <dbl>, SC <dbl>, SD <dbl>, TN <dbl>, TX <dbl>, UT <dbl>, ...
+
+

Next step - Some Data Exploration/Wrangling

+

I was interested in the railroad county that had the most employees so I started with the arrange function which ranked the number of employees in descending order.

+
+
+
arrange(railroad_2012_clean_county,desc(total_employees))
+
+
+
# A tibble: 2,930 x 3
+   state county           total_employees
+   <chr> <chr>                      <dbl>
+ 1 IL    COOK                        8207
+ 2 TX    TARRANT                     4235
+ 3 NE    DOUGLAS                     3797
+ 4 NY    SUFFOLK                     3685
+ 5 VA    INDEPENDENT CITY            3249
+ 6 FL    DUVAL                       3073
+ 7 CA    SAN BERNARDINO              2888
+ 8 CA    LOS ANGELES                 2545
+ 9 TX    HARRIS                      2535
+10 NE    LINCOLN                     2289
+# ... with 2,920 more rows
+
+

The arrange function was also used to display the total_employees column.

+
+
+
arrange(railroad_2012_clean_county,total_employees)
+
+
+
# A tibble: 2,930 x 3
+   state county   total_employees
+   <chr> <chr>              <dbl>
+ 1 AK    SITKA                  1
+ 2 AL    BARBOUR                1
+ 3 AL    HENRY                  1
+ 4 AP    APO                    1
+ 5 AR    NEWTON                 1
+ 6 CA    MONO                   1
+ 7 CO    BENT                   1
+ 8 CO    CHEYENNE               1
+ 9 CO    COSTILLA               1
+10 CO    DOLORES                1
+# ... with 2,920 more rows
+
+

I was interested if there were any na values in the railroad data set. It shows there were none.

+
+
+
 railroad_2012_clean_county %>% is.na() %>% sum()
+
+
+
[1] 0
+
+

The select function was used to determine the number of rows.

+
+
+
select(railroad_2012_clean_county)
+
+
+
# A tibble: 2,930 x 0
+
+

The select function separates the state column from the rest of the data set.

+
+
+
select(railroad_2012_clean_county,state)
+
+
+
# A tibble: 2,930 x 1
+   state
+   <chr>
+ 1 AE   
+ 2 AK   
+ 3 AK   
+ 4 AK   
+ 5 AK   
+ 6 AK   
+ 7 AK   
+ 8 AL   
+ 9 AL   
+10 AL   
+# ... with 2,920 more rows
+
+

The filter function was used to determine which rail stations had fewer than 2 employees

+
+
+
filter(railroad_2012_clean_county,total_employees < 2)
+
+
+
# A tibble: 145 x 3
+   state county   total_employees
+   <chr> <chr>              <dbl>
+ 1 AK    SITKA                  1
+ 2 AL    BARBOUR                1
+ 3 AL    HENRY                  1
+ 4 AP    APO                    1
+ 5 AR    NEWTON                 1
+ 6 CA    MONO                   1
+ 7 CO    BENT                   1
+ 8 CO    CHEYENNE               1
+ 9 CO    COSTILLA               1
+10 CO    DOLORES                1
+# ... with 135 more rows
+
+

The summarise function was used to learn what the mean of the total employees in all states.

+
+
+
railroad_2012_clean_county %>% 
+  summarise(mean(total_employees))
+
+
+
# A tibble: 1 x 1
+  `mean(total_employees)`
+                    <dbl>
+1                    87.2
+
+

I was curious about the number of stations that had less than or equal to 2 employees, so I created a new object called subset_employees, that utilized the pipe operator, group_by function,and filter function to determine the stations with less than or equal to 2 employees.

+
+
+
subset_employees<-railroad_2012_clean_county %>% 
+  group_by(total_employees) %>% 
+  filter(total_employees<=2)
+
+
+
+

In this block the rename function was used to change one of the column names, from total_rail_ employees to number_of_employees. This comes in handy if a column needs to be renamed.

+
+
+
data<-railroad_2012_clean_county
+data<-rename(data,number_of_rail_employees = total_employees)
+data
+
+
+
# A tibble: 2,930 x 3
+   state county               number_of_rail_employees
+   <chr> <chr>                                   <dbl>
+ 1 AE    APO                                         2
+ 2 AK    ANCHORAGE                                   7
+ 3 AK    FAIRBANKS NORTH STAR                        2
+ 4 AK    JUNEAU                                      3
+ 5 AK    MATANUSKA-SUSITNA                           2
+ 6 AK    SITKA                                       1
+ 7 AK    SKAGWAY MUNICIPALITY                       88
+ 8 AL    AUTAUGA                                   102
+ 9 AL    BALDWIN                                   143
+10 AL    BARBOUR                                     1
+# ... with 2,920 more rows
+
+
+ + +
+ +
+
+ + + + + +
+

Reuse

+

Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

+
+ + + + + + + + + From d0e3eea56673e52d0d15ee8ed1c477862b552627 Mon Sep 17 00:00:00 2001 From: chester Date: Tue, 19 Oct 2021 13:57:57 -0700 Subject: [PATCH 3/3] Edited and increased the readibility of the commit. --- Data_Wrangling_HW3.Rmd | 54 +++++++++++++++++++++++++----------------- 1 file changed, 32 insertions(+), 22 deletions(-) diff --git a/Data_Wrangling_HW3.Rmd b/Data_Wrangling_HW3.Rmd index 4569a95..7dc50e2 100644 --- a/Data_Wrangling_HW3.Rmd +++ b/Data_Wrangling_HW3.Rmd @@ -31,7 +31,7 @@ Let's start with what the heck is data wrangling? You can wrangle cattle, but wr ## First Steps -First, I am going to import data into R- In this case the Railroad_2012 data set which is a CSV file. +First, I am going to import data into R- In this case the Railroad_2012 data set which is a CSV file. While this is an already cleaned data set, it can still provide insight. @@ -53,7 +53,7 @@ tail(railroad_2012_clean_county,5) ``` - +________________________________________________________________________________ Then using **colnames** which takes a look at column names of the data set. @@ -63,7 +63,7 @@ colnames(railroad_2012_clean_county) ``` - +________________________________________________________________________________ The **glimpse** function which is part of the dplyr package which provides a synopsis of data was used. @@ -72,7 +72,7 @@ The **glimpse** function which is part of the dplyr package which provides a s glimpse(railroad_2012_clean_county) ``` - +________________________________________________________________________________ The **pivot_wider** function was used to make the data set more manageable and readable, as well as filling in missing values from n/a to 0. @@ -82,13 +82,13 @@ The **pivot_wider** function was used to make the data set more manageable and railroad_2012_clean_county %>% pivot_wider(names_from = state, values_from = total_employees,values_fill = 0) ``` - +________________________________________________________________________________ ## Next step - Some Data Wrangling -I was interested in the railroad county that had the most employees so I started with the **arrange** function which ranked the number of employees in descending order. +I was interested in the railroad county that had the most employees so I started with the **arrange** function which ranked the number of employees in descending order.It shows, not surprisingly that Cook county in Illinois had the most number of employees. ```{r} @@ -104,18 +104,20 @@ The **arrange** function was also used to display the *total_employees* column. arrange(railroad_2012_clean_county,total_employees) ``` +________________________________________________________________________________ - -I was interested if there were any **na** values in the railroad data set. It shows there were none. +I was interested if there were any **na** values in the railroad data set. It shows there were none. Which makes sense because this was a clean data set. ```{r} - railroad_2012_clean_county %>% is.na() %>% sum() + railroad_2012_clean_county %>% + is.na() %>% + sum() ``` +________________________________________________________________________________ - -The **select** function was used to determine the number of rows. +The **select** function was used to determine the number of rows. In this data set there were 2930 rows. @@ -131,9 +133,9 @@ The **select** function separates the *state* column from the rest of the data s ```{r} select(railroad_2012_clean_county,state) ``` +________________________________________________________________________________ - -The **filter** function was used to determine which rail stations had fewer than 2 employees +The **filter** function was used to determine which rail stations had fewer than 2 employees. Yeah,I was curious about this. It turns out there were 145 stations with less than 2 employees. @@ -142,17 +144,8 @@ filter(railroad_2012_clean_county,total_employees < 2) ``` -The **summarise** function was used to learn what the mean of the total employees in all states. - -```{r} -railroad_2012_clean_county %>% - summarise(mean(total_employees)) -``` - - - I was curious about the number of stations that had less than or equal to 2 employees, so I created a new object called *subset_employees*, that utilized the pipe operator, **group_by** function,and **filter** function to determine the stations with less than or equal to 2 employees. @@ -161,8 +154,22 @@ I was curious about the number of stations that had less than or equal to 2 empl subset_employees<-railroad_2012_clean_county %>% group_by(total_employees) %>% filter(total_employees<=2) +subset_employees + ``` +________________________________________________________________________________ + + +The **summarise** function was used to learn what the mean of the total employees in all states. As it turns out the mean was 87.17816. + + + +```{r} +railroad_2012_clean_county %>% + summarise(mean(total_employees)) +``` +________________________________________________________________________________ In this block the **rename function** was used to change one of the column names, *from total_rail_ employees* to *number_of_employees*. This comes in handy if a column needs to be renamed. @@ -179,5 +186,8 @@ data +_______________________________________________________________________________ +By doing this assignment I learned about the power of the tidyverse,which contains the dplyr package. A lot of insight can be gleaned from the railroad dataset with these tools,such as business, and municipal forcasting. For instance, stations with the fewest employees, such as the examples used of 2 or less could be analyzed for either expansion or closure depending on the geographic proximity to other local stations,jobs and housing. +