Irregular data

Contour plotting with irregular data

NOTE: This file has been copied from early draft of the tutorial and still needs edited slightly to be fully stand-alone.

We first need to load the tidyverse library:

library(tidyverse)

## -- Attaching packages ------------------------------------------------------------------------------------------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.2     v purrr   0.3.4
## v tibble  3.0.3     v dplyr   1.0.2
## v tidyr   1.1.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.5.0

## -- Conflicts ---------------------------------------------------------------------------------------------------------------------------- tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

and the data:

fieldData <- read_csv('DamariscottaRiverData.csv')

## Parsed with column specification:
## cols(
##   date = col_double(),
##   station = col_double(),
##   depthbin = col_double(),
##   year = col_double(),
##   month = col_double(),
##   day = col_double(),
##   depth_m = col_double(),
##   temperature_degC = col_double(),
##   salinity_psu = col_double(),
##   density_kg_m3 = col_double(),
##   PAR = col_double(),
##   fluorescence_mg_m3 = col_double(),
##   oxygenConc_umol_kg = col_double(),
##   oxygenSaturation_percent = col_double(),
##   latitude = col_double()
## )

Interpolating and visualizing data

The approach in previous section works well when your data are consistent in terms of the variables you want to compare. For the data we plotted above, there were five different cruises, and on each cruise, data was collected at the same four locations. We had data for every cruise and station. If we’d been missing data at one of the stations on one of the cruises, we’d just have a blank part on the plots we made. But what if we were missing a lot of data - would the above approach still be a good way to visualize our data?

To dig into this a bit further, let’s consider how chlorophyll fluorescence varies by depth at each station for one cruise.

We need to do a bit of data manipulation again to create a data frame we’ll use for plotting. In this case, we’re going to have station on the x-axis and depth on the y-axis, and we’ll consider the cruise that took place on September 8th 2016.

cruiseData <- filter(fieldData, date==20160908)

We now have a data frame that includes columns for depth, station and chlorophyll fluorescence for just one cruise - so let’s use the same approach as before to create a contour plot:

ggplot(cruiseData,aes(x=station,y=depth_m)) +
  geom_contour_filled(aes(z=fluorescence_mg_m3)) +
  geom_point() +
  labs(fill='surface chlorophyll fluorescence (mg m^-3)') +
  scale_y_reverse() +
  theme(panel.background = element_rect(fill = "white", colour = "white"))

## Warning: stat_contour(): Zero contours were generated

## Warning in min(x): no non-missing arguments to min; returning Inf

## Warning in max(x): no non-missing arguments to max; returning -Inf

We end up with no contours! What’s going on here? To draw the contours, R needs the y values need to all be at the same intervals (similarly for the x values). For our data, the depths at each station are irregular and different from each other:

ggplot(cruiseData, aes(x=fluorescence_mg_m3, y=depth_m, color = factor(station))) +
  geom_point() +
  labs(color='Station') +
  ylim(5,1) + xlim(3,9)

## Warning: Removed 174 rows containing missing values (geom_point).

So we need to sort the depth data onto a regular grid - to do this we will need to group the data into depth bins (or depth ranges) and then calculate the mean for each depth bin. Let’s bin our data into 1 m intervals. Again, we’re making a decision here based on our particular data set and situation, this could be different for you.

We are going to use a very similar process to earlier (when we considered surface chlorophyll fluorescence on all cruises in 2016).

Round the depths to the nearest meter and include as column in the data frame (use mutate)
Separate the data into depth bin and station groups (use group_by)
Take the average for each group of data (use summarize)

binned <- cruiseData %>% mutate(depthBin = round(depth_m)) %>% group_by(station,depthBin) %>% summarize(av_fluor = mean(fluorescence_mg_m3))

## `summarise()` regrouping output by 'station' (override with `.groups` argument)

head(binned)

## # A tibble: 6 x 3
## # Groups:   station [1]
##   station depthBin av_fluor
##     <dbl>    <dbl>    <dbl>
## 1       1        1     4.31
## 2       1        2     4.60
## 3       1        3     4.54
## 4       1        4     4.52
## 5       1        5     4.92
## 6       1        6     4.73

We’ve now got a data frame like we had before - let’s try geom_contour_filled again:

ggplot(binned,aes(x=station,y=depthBin)) +
  geom_contour_filled(aes(z=av_fluor)) +
  geom_point() +
  labs(fill='surface chlorophyll fluorescence (mg m^-3)') +
  scale_y_reverse()

This looks better, but what this plotting function doesn’t do is interpolate data between missing data points. We know we have data at station 4 below 50 m that isn’t represented in this plot. Can we use a different function to show those data too?

ggplot(binned,aes(x=station,y=depthBin)) +
  geom_tile(aes(fill=av_fluor)) +
  scale_fill_continuous() +
  labs(fill='surface chlorophyll fluorescence (mg m^-3)') +
  scale_y_reverse()

All the data are visualized when we plot the data this way.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Irregular data

Contour plotting with irregular data

Interpolating and visualizing data

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally