diff --git a/Data_Wrangling_HW3.Rmd b/Data_Wrangling_HW3.Rmd new file mode 100644 index 0000000..7dc50e2 --- /dev/null +++ b/Data_Wrangling_HW3.Rmd @@ -0,0 +1,193 @@ +--- +title: "Homework Three" +description: Basic Data Wrangling +author: + - name: Cynthia Hester + +date: 10-09-2021 +output: + distill::distill_article: + self_contained: no +draft: yes +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = TRUE) +library(stringr) +library(tidyverse) +library(readr) +library(here) +``` + + + + + +## Introduction + +Let's start with what the heck is data wrangling? You can wrangle cattle, but wrangling data? Yes! After data is imported or read into the RStudio environment it needs to be maneuvered before it can be used for visualization or modeling. + + +## First Steps + + +First, I am going to import data into R- In this case the Railroad_2012 data set which is a CSV file. While this is an already cleaned data set, it can still provide insight. + + + +```{r} +library(readr) +railroad_2012_clean_county <- read_csv("_data/railroad_2012_clean_county.csv") +View(railroad_2012_clean_county) +``` + + + +The railroad data set was checked out, starting with the **head** and **tail** functions, which displayed the first 5 and last 5 rows of the data set. + + + +```{r} +head(railroad_2012_clean_county,5) +tail(railroad_2012_clean_county,5) +``` + + +________________________________________________________________________________ +Then using **colnames** which takes a look at column names of the data set. + + + +```{r} +colnames(railroad_2012_clean_county) +``` + + +________________________________________________________________________________ +The **glimpse** function which is part of the dplyr package which provides a synopsis of data was used. + + + +```{r} +glimpse(railroad_2012_clean_county) +``` + +________________________________________________________________________________ + +The **pivot_wider** function was used to make the data set more manageable and readable, as well as filling in missing values from n/a to 0. + + + +```{r} +railroad_2012_clean_county %>% + pivot_wider(names_from = state, values_from = total_employees,values_fill = 0) +``` +________________________________________________________________________________ + + +## Next step - Some Data Wrangling + + +I was interested in the railroad county that had the most employees so I started with the **arrange** function which ranked the number of employees in descending order.It shows, not surprisingly that Cook county in Illinois had the most number of employees. + + +```{r} +arrange(railroad_2012_clean_county,desc(total_employees)) +``` + + +The **arrange** function was also used to display the *total_employees* column. + + + +```{r} +arrange(railroad_2012_clean_county,total_employees) +``` + +________________________________________________________________________________ + +I was interested if there were any **na** values in the railroad data set. It shows there were none. Which makes sense because this was a clean data set. + + + +```{r} + railroad_2012_clean_county %>% + is.na() %>% + sum() +``` +________________________________________________________________________________ + +The **select** function was used to determine the number of rows. In this data set there were 2930 rows. + + + +```{r} +select(railroad_2012_clean_county) +``` + + +The **select** function separates the *state* column from the rest of the data set. + + + +```{r} +select(railroad_2012_clean_county,state) +``` +________________________________________________________________________________ + +The **filter** function was used to determine which rail stations had fewer than 2 employees. Yeah,I was curious about this. It turns out there were 145 stations with less than 2 employees. + + + +```{r} +filter(railroad_2012_clean_county,total_employees < 2) +``` + + + + +I was curious about the number of stations that had less than or equal to 2 employees, so I created a new object called *subset_employees*, that utilized the pipe operator, **group_by** function,and **filter** function to determine the stations with less than or equal to 2 employees. + + + +```{r} +subset_employees<-railroad_2012_clean_county %>% + group_by(total_employees) %>% + filter(total_employees<=2) +subset_employees + +``` + +________________________________________________________________________________ + + +The **summarise** function was used to learn what the mean of the total employees in all states. As it turns out the mean was 87.17816. + + + +```{r} +railroad_2012_clean_county %>% + summarise(mean(total_employees)) +``` +________________________________________________________________________________ + + +In this block the **rename function** was used to change one of the column names, *from total_rail_ employees* to *number_of_employees*. This comes in handy if a column needs to be renamed. + + + +```{r} +data<-railroad_2012_clean_county +data<-rename(data,number_of_rail_employees = total_employees) +data +``` + + + + + +_______________________________________________________________________________ +By doing this assignment I learned about the power of the tidyverse,which contains the dplyr package. A lot of insight can be gleaned from the railroad dataset with these tools,such as business, and municipal forcasting. For instance, stations with the fewest employees, such as the examples used of 2 or less could be analyzed for either expansion or closure depending on the geographic proximity to other local stations,jobs and housing. + + + diff --git a/Reading_in_data_HW2.Rmd b/Reading_in_data_HW2.Rmd new file mode 100644 index 0000000..af3f77b --- /dev/null +++ b/Reading_in_data_HW2.Rmd @@ -0,0 +1,124 @@ +--- +title: "Homework_2 " +description: Reading in Data + +author: + - name: Cynthia Hester + +date: 09-29-2021 +output: + distill::distill_article: + + self_contained: no +draft: yes +--- + +```{r setup, include=FALSE} +knitr::opts_chunk$set(echo = FALSE) +``` + +## Reading in the first data set + +Reading in or importing data files to RStudio is a necessary step to gain access to any files that are needed for cleaning or tidying. After imported data is cleaned, it is then more suitable for exploration. + + +As we know data formats are not homogeneous,and come in many different flavors. So,whether data is in **CSV, SPSS,XLSX,SAS,TXT,STATA,or HTML** as well as many other formats, there is usually R package to read in the data. + + +The first data set I will read in is from the included R package "Data Sets". It is the *MTCars* (MotorTrend) dataset which was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption as well as 10 aspects of automotive design and performance for 32 cars (1973-74). + + + +**This R chunk loads in the data sets package and provides a summary of the statistics for the *mtcars* data set** + +```{r} +library(datasets) +summary(mtcars) + +``` + + + + +**This R chunk uses an alternative to the *summary* function called *Skim*. *Skim* provides a comprehensive overview of the *mtcars* data set as well as providing a visualization of the data in the rows represented by histograms.** + + +```{r} +library(skimr) +skim(mtcars) +``` + + + + + +**This R chunk exemplifies the granularity of the *Skim* package by selecting specific columns to summarize.** + + +```{r} +skim(mtcars,hp,wt) +``` + + + + + +**This R chunk provides the column names of the *mtcars* dataset using the colnames() function.** + +```{r} +colnames(mtcars) +``` + + + + + +**This R chuck introduces the *dim()* function provides information on the dimensions of the data set,which shows this data array to have 32 rows and 11 columns.** + +```{r} +dim(mtcars) +``` + + + + +**This R chunk shows a generic visualization of the *mtcars* object using the *plot()* function.** +```{r} +plot(mtcars) +``` + + + +## The Second Data Set comes from the course csv file eggs_tidy. + +I wanted to try reading data in from an external data set, that used the csv format. + + +**This first R chunk reads in the eggs tidy csv data** + +```{r} +library(readr) +eggs_tidy <- read_csv("_data/eggs_tidy.csv") +``` + + +**Summarizes the eggs_tidy data set** +```{r} +summary(eggs_tidy) +``` + + + +**Summarizes data set using the skim function** +```{r} +skim(eggs_tidy) +``` + +**This chunk uses the *tibble function* which provides a more comprehensive and readable data frame** + +```{r} +library(tibble) +as_tibble(eggs_tidy) +``` + + diff --git a/docs/Data_Wrangling_HW3.html b/docs/Data_Wrangling_HW3.html new file mode 100644 index 0000000..2cdf4c2 --- /dev/null +++ b/docs/Data_Wrangling_HW3.html @@ -0,0 +1,2647 @@ + + + + +
+ + + + + + + + + + + + + + +Basic Data Wrangling
+Let’s start with what the heck is data wrangling? You can wrangle cattle, but wrangling data? Yes! After data is imported or read into the RStudio environment it needs to be maneuvered before it can be used for visualization or modeling.
+First, I am going to import data into R- In this case the Railroad_2012 data set which is a CSV file.
+The railroad data set was checked out, starting with the head and tail functions, which displayed the first 5 and last 5 rows of the data set.
+head(railroad_2012_clean_county,5)
+
+# A tibble: 5 x 3
+ state county total_employees
+ <chr> <chr> <dbl>
+1 AE APO 2
+2 AK ANCHORAGE 7
+3 AK FAIRBANKS NORTH STAR 2
+4 AK JUNEAU 3
+5 AK MATANUSKA-SUSITNA 2
+tail(railroad_2012_clean_county,5)
+
+# A tibble: 5 x 3
+ state county total_employees
+ <chr> <chr> <dbl>
+1 WY SUBLETTE 3
+2 WY SWEETWATER 196
+3 WY UINTA 49
+4 WY WASHAKIE 10
+5 WY WESTON 37
+Then using colnames which takes a look at column names of the data set.
+colnames(railroad_2012_clean_county)
+
+[1] "state" "county" "total_employees"
+The glimpse function which is part of the dplyr package which provides a synopsis of data was used.
+glimpse(railroad_2012_clean_county)
+
+Rows: 2,930
+Columns: 3
+$ state <chr> "AE", "AK", "AK", "AK", "AK", "AK", "AK", "A~
+$ county <chr> "APO", "ANCHORAGE", "FAIRBANKS NORTH STAR", ~
+$ total_employees <dbl> 2, 7, 2, 3, 2, 1, 88, 102, 143, 1, 25, 154, ~
+The pivot_wider function was used to make the data set more manageable and readable, as well as filling in missing values from n/a to 0.
+railroad_2012_clean_county %>%
+ pivot_wider(names_from = state, values_from = total_employees,values_fill = 0)
+
+# A tibble: 1,709 x 54
+ county AE AK AL AP AR AZ CA CO CT DC
+ <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+ 1 APO 2 0 0 1 0 0 0 0 0 0
+ 2 ANCHOR~ 0 7 0 0 0 0 0 0 0 0
+ 3 FAIRBA~ 0 2 0 0 0 0 0 0 0 0
+ 4 JUNEAU 0 3 0 0 0 0 0 0 0 0
+ 5 MATANU~ 0 2 0 0 0 0 0 0 0 0
+ 6 SITKA 0 1 0 0 0 0 0 0 0 0
+ 7 SKAGWA~ 0 88 0 0 0 0 0 0 0 0
+ 8 AUTAUGA 0 0 102 0 0 0 0 0 0 0
+ 9 BALDWIN 0 0 143 0 0 0 0 0 0 0
+10 BARBOUR 0 0 1 0 0 0 0 0 0 0
+# ... with 1,699 more rows, and 43 more variables: DE <dbl>,
+# FL <dbl>, GA <dbl>, HI <dbl>, IA <dbl>, ID <dbl>, IL <dbl>,
+# IN <dbl>, KS <dbl>, KY <dbl>, LA <dbl>, MA <dbl>, MD <dbl>,
+# ME <dbl>, MI <dbl>, MN <dbl>, MO <dbl>, MS <dbl>, MT <dbl>,
+# NC <dbl>, ND <dbl>, NE <dbl>, NH <dbl>, NJ <dbl>, NM <dbl>,
+# NV <dbl>, NY <dbl>, OH <dbl>, OK <dbl>, OR <dbl>, PA <dbl>,
+# RI <dbl>, SC <dbl>, SD <dbl>, TN <dbl>, TX <dbl>, UT <dbl>, ...
+I was interested in the railroad county that had the most employees so I started with the arrange function which ranked the number of employees in descending order.
+arrange(railroad_2012_clean_county,desc(total_employees))
+
+# A tibble: 2,930 x 3
+ state county total_employees
+ <chr> <chr> <dbl>
+ 1 IL COOK 8207
+ 2 TX TARRANT 4235
+ 3 NE DOUGLAS 3797
+ 4 NY SUFFOLK 3685
+ 5 VA INDEPENDENT CITY 3249
+ 6 FL DUVAL 3073
+ 7 CA SAN BERNARDINO 2888
+ 8 CA LOS ANGELES 2545
+ 9 TX HARRIS 2535
+10 NE LINCOLN 2289
+# ... with 2,920 more rows
+The arrange function was also used to display the total_employees column.
+arrange(railroad_2012_clean_county,total_employees)
+
+# A tibble: 2,930 x 3
+ state county total_employees
+ <chr> <chr> <dbl>
+ 1 AK SITKA 1
+ 2 AL BARBOUR 1
+ 3 AL HENRY 1
+ 4 AP APO 1
+ 5 AR NEWTON 1
+ 6 CA MONO 1
+ 7 CO BENT 1
+ 8 CO CHEYENNE 1
+ 9 CO COSTILLA 1
+10 CO DOLORES 1
+# ... with 2,920 more rows
+I was interested if there were any na values in the railroad data set. It shows there were none.
+ +The select function was used to determine the number of rows.
+select(railroad_2012_clean_county)
+
+# A tibble: 2,930 x 0
+The select function separates the state column from the rest of the data set.
+select(railroad_2012_clean_county,state)
+
+# A tibble: 2,930 x 1
+ state
+ <chr>
+ 1 AE
+ 2 AK
+ 3 AK
+ 4 AK
+ 5 AK
+ 6 AK
+ 7 AK
+ 8 AL
+ 9 AL
+10 AL
+# ... with 2,920 more rows
+The filter function was used to determine which rail stations had fewer than 2 employees
+filter(railroad_2012_clean_county,total_employees < 2)
+
+# A tibble: 145 x 3
+ state county total_employees
+ <chr> <chr> <dbl>
+ 1 AK SITKA 1
+ 2 AL BARBOUR 1
+ 3 AL HENRY 1
+ 4 AP APO 1
+ 5 AR NEWTON 1
+ 6 CA MONO 1
+ 7 CO BENT 1
+ 8 CO CHEYENNE 1
+ 9 CO COSTILLA 1
+10 CO DOLORES 1
+# ... with 135 more rows
+The summarise function was used to learn what the mean of the total employees in all states.
+railroad_2012_clean_county %>%
+ summarise(mean(total_employees))
+
+# A tibble: 1 x 1
+ `mean(total_employees)`
+ <dbl>
+1 87.2
+I was curious about the number of stations that had less than or equal to 2 employees, so I created a new object called subset_employees, that utilized the pipe operator, group_by function,and filter function to determine the stations with less than or equal to 2 employees.
+subset_employees<-railroad_2012_clean_county %>%
+ group_by(total_employees) %>%
+ filter(total_employees<=2)
+
+In this block the rename function was used to change one of the column names, from total_rail_ employees to number_of_employees. This comes in handy if a column needs to be renamed.
+data<-railroad_2012_clean_county
+data<-rename(data,number_of_rail_employees = total_employees)
+data
+
+# A tibble: 2,930 x 3
+ state county number_of_rail_employees
+ <chr> <chr> <dbl>
+ 1 AE APO 2
+ 2 AK ANCHORAGE 7
+ 3 AK FAIRBANKS NORTH STAR 2
+ 4 AK JUNEAU 3
+ 5 AK MATANUSKA-SUSITNA 2
+ 6 AK SITKA 1
+ 7 AK SKAGWAY MUNICIPALITY 88
+ 8 AL AUTAUGA 102
+ 9 AL BALDWIN 143
+10 AL BARBOUR 1
+# ... with 2,920 more rows
+Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
+Reading in Data
+Reading in or importing data files to RStudio is a necessary step to gain access to any files that are needed for cleaning or tidying. After imported data is cleaned, it is then more suitable for exploration.
+As we know data formats are not homogeneous,and come in many different flavors. So,whether data is in CSV, SPSS,XLSX,SAS,TXT,STATA,or HTML as well as many other formats, there is usually R package to read in the data.
+The first data set I will read in is from the included R package “Data Sets”. It is the MTCars (MotorTrend) dataset which was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption as well as 10 aspects of automotive design and performance for 32 cars (1973-74).
+This R chunk loads in the data sets package and provides a summary of the statistics for the mtcars data set
+ mpg cyl disp hp
+ Min. :10.40 Min. :4.000 Min. : 71.1 Min. : 52.0
+ 1st Qu.:15.43 1st Qu.:4.000 1st Qu.:120.8 1st Qu.: 96.5
+ Median :19.20 Median :6.000 Median :196.3 Median :123.0
+ Mean :20.09 Mean :6.188 Mean :230.7 Mean :146.7
+ 3rd Qu.:22.80 3rd Qu.:8.000 3rd Qu.:326.0 3rd Qu.:180.0
+ Max. :33.90 Max. :8.000 Max. :472.0 Max. :335.0
+ drat wt qsec vs
+ Min. :2.760 Min. :1.513 Min. :14.50 Min. :0.0000
+ 1st Qu.:3.080 1st Qu.:2.581 1st Qu.:16.89 1st Qu.:0.0000
+ Median :3.695 Median :3.325 Median :17.71 Median :0.0000
+ Mean :3.597 Mean :3.217 Mean :17.85 Mean :0.4375
+ 3rd Qu.:3.920 3rd Qu.:3.610 3rd Qu.:18.90 3rd Qu.:1.0000
+ Max. :4.930 Max. :5.424 Max. :22.90 Max. :1.0000
+ am gear carb
+ Min. :0.0000 Min. :3.000 Min. :1.000
+ 1st Qu.:0.0000 1st Qu.:3.000 1st Qu.:2.000
+ Median :0.0000 Median :4.000 Median :2.000
+ Mean :0.4062 Mean :3.688 Mean :2.812
+ 3rd Qu.:1.0000 3rd Qu.:4.000 3rd Qu.:4.000
+ Max. :1.0000 Max. :5.000 Max. :8.000
+This R chunk uses an alternative to the summary function called Skim. Skim provides a comprehensive overview of the mtcars data set as well as providing a visualization of the data in the rows represented by histograms.
+| Name | +mtcars | +
| Number of rows | +32 | +
| Number of columns | +11 | +
| _______________________ | ++ |
| Column type frequency: | ++ |
| numeric | +11 | +
| ________________________ | ++ |
| Group variables | +None | +
Variable type: numeric
+| skim_variable | +n_missing | +complete_rate | +mean | +sd | +p0 | +p25 | +p50 | +p75 | +p100 | +hist | +
|---|---|---|---|---|---|---|---|---|---|---|
| mpg | +0 | +1 | +20.09 | +6.03 | +10.40 | +15.43 | +19.20 | +22.80 | +33.90 | +▃▇▅▁▂ | +
| cyl | +0 | +1 | +6.19 | +1.79 | +4.00 | +4.00 | +6.00 | +8.00 | +8.00 | +▆▁▃▁▇ | +
| disp | +0 | +1 | +230.72 | +123.94 | +71.10 | +120.83 | +196.30 | +326.00 | +472.00 | +▇▃▃▃▂ | +
| hp | +0 | +1 | +146.69 | +68.56 | +52.00 | +96.50 | +123.00 | +180.00 | +335.00 | +▇▇▆▃▁ | +
| drat | +0 | +1 | +3.60 | +0.53 | +2.76 | +3.08 | +3.70 | +3.92 | +4.93 | +▇▃▇▅▁ | +
| wt | +0 | +1 | +3.22 | +0.98 | +1.51 | +2.58 | +3.33 | +3.61 | +5.42 | +▃▃▇▁▂ | +
| qsec | +0 | +1 | +17.85 | +1.79 | +14.50 | +16.89 | +17.71 | +18.90 | +22.90 | +▃▇▇▂▁ | +
| vs | +0 | +1 | +0.44 | +0.50 | +0.00 | +0.00 | +0.00 | +1.00 | +1.00 | +▇▁▁▁▆ | +
| am | +0 | +1 | +0.41 | +0.50 | +0.00 | +0.00 | +0.00 | +1.00 | +1.00 | +▇▁▁▁▆ | +
| gear | +0 | +1 | +3.69 | +0.74 | +3.00 | +3.00 | +4.00 | +4.00 | +5.00 | +▇▁▆▁▂ | +
| carb | +0 | +1 | +2.81 | +1.62 | +1.00 | +2.00 | +2.00 | +4.00 | +8.00 | +▇▂▅▁▁ | +
This R chunk exemplifies the granularity of the Skim package by selecting specific columns to summarize.
+| Name | +mtcars | +
| Number of rows | +32 | +
| Number of columns | +11 | +
| _______________________ | ++ |
| Column type frequency: | ++ |
| numeric | +2 | +
| ________________________ | ++ |
| Group variables | +None | +
Variable type: numeric
+| skim_variable | +n_missing | +complete_rate | +mean | +sd | +p0 | +p25 | +p50 | +p75 | +p100 | +hist | +
|---|---|---|---|---|---|---|---|---|---|---|
| hp | +0 | +1 | +146.69 | +68.56 | +52.00 | +96.50 | +123.00 | +180.00 | +335.00 | +▇▇▆▃▁ | +
| wt | +0 | +1 | +3.22 | +0.98 | +1.51 | +2.58 | +3.33 | +3.61 | +5.42 | +▃▃▇▁▂ | +
This R chunk provides the column names of the mtcars dataset using the colnames() function.
+ [1] "mpg" "cyl" "disp" "hp" "drat" "wt" "qsec" "vs" "am"
+[10] "gear" "carb"
+This R chuck introduces the dim() function provides information on the dimensions of the data set,which shows this data array to have 32 rows and 11 columns.
+[1] 32 11
+
I wanted to try reading data in from an external data set, that used the csv format.
+This first R chunk reads in the eggs tidy csv data
+ month year large_half_dozen large_dozen
+ Length:120 Min. :2004 Min. :126.0 Min. :225.0
+ Class :character 1st Qu.:2006 1st Qu.:129.4 1st Qu.:233.5
+ Mode :character Median :2008 Median :174.5 Median :267.5
+ Mean :2008 Mean :155.2 Mean :254.2
+ 3rd Qu.:2011 3rd Qu.:174.5 3rd Qu.:268.0
+ Max. :2013 Max. :178.0 Max. :277.5
+ extra_large_half_dozen extra_large_dozen
+ Min. :132.0 Min. :230.0
+ 1st Qu.:135.8 1st Qu.:241.5
+ Median :185.5 Median :285.5
+ Mean :164.2 Mean :266.8
+ 3rd Qu.:185.5 3rd Qu.:285.5
+ Max. :188.1 Max. :290.0
+| Name | +eggs_tidy | +
| Number of rows | +120 | +
| Number of columns | +6 | +
| _______________________ | ++ |
| Column type frequency: | ++ |
| character | +1 | +
| numeric | +5 | +
| ________________________ | ++ |
| Group variables | +None | +
Variable type: character
+| skim_variable | +n_missing | +complete_rate | +min | +max | +empty | +n_unique | +whitespace | +
|---|---|---|---|---|---|---|---|
| month | +0 | +1 | +3 | +9 | +0 | +12 | +0 | +
Variable type: numeric
+| skim_variable | +n_missing | +complete_rate | +mean | +sd | +p0 | +p25 | +p50 | +p75 | +p100 | +hist | +
|---|---|---|---|---|---|---|---|---|---|---|
| year | +0 | +1 | +2008.50 | +2.88 | +2004 | +2006.00 | +2008.5 | +2011.0 | +2013.00 | +▇▇▇▇▇ | +
| large_half_dozen | +0 | +1 | +155.17 | +22.59 | +126 | +129.44 | +174.5 | +174.5 | +178.00 | +▆▁▁▁▇ | +
| large_dozen | +0 | +1 | +254.20 | +18.55 | +225 | +233.50 | +267.5 | +268.0 | +277.50 | +▅▂▁▁▇ | +
| extra_large_half_dozen | +0 | +1 | +164.22 | +24.68 | +132 | +135.78 | +185.5 | +185.5 | +188.13 | +▆▁▁▁▇ | +
| extra_large_dozen | +0 | +1 | +266.80 | +22.80 | +230 | +241.50 | +285.5 | +285.5 | +290.00 | +▅▂▁▁▇ | +
This chunk uses the tibble function which provides a more comprehensive and readable data frame
+# A tibble: 120 x 6
+ month year large_half_dozen large_dozen extra_large_half_dozen
+ <chr> <dbl> <dbl> <dbl> <dbl>
+ 1 January 2004 126 230 132
+ 2 February 2004 128. 226. 134.
+ 3 March 2004 131 225 137
+ 4 April 2004 131 225 137
+ 5 May 2004 131 225 137
+ 6 June 2004 134. 231. 137
+ 7 July 2004 134. 234. 137
+ 8 August 2004 134. 234. 137
+ 9 September 2004 130. 234. 136.
+10 October 2004 128. 234. 136.
+# ... with 110 more rows, and 1 more variable:
+# extra_large_dozen <dbl>
+Text and figures are licensed under Creative Commons Attribution CC BY-NC 4.0. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
+