13  Check Species Record Criteria

13.1 Problem Description

This script checks the species record criteria, focusing on validating the minimum time (independence) between sequential records of the same species at the same camera.

13.2 Problem Solving

We follow a pipeline: (1) load helper functions and read the species records sheet; (2) perform sanity checks on the Record_criteria values (e.g., values less than 5 minutes); (3) check for consistency, looking for cameras that have more than one independence criterion applied; (4) calculate the time difference between sequential records and flag those that violate the established independence criterion.

13.2.1 Common steps

To begin, we source our external FUNCTIONS.R script. This makes our custom helper functions, like read_sheet and dttm_update, available for use throughout the analysis.

Code
source("R/FUNCTIONS.R")

13.2.2 Specific steps

First, we read the data from the “Species_records_camera” sheet located in the “Example” folder. Using our read_sheet function, we load the data into the sps list and then preview the first few rows of the first dataset.

Code
# Read sheets
sps <- read_sheet(
  path = "Example/11",
  sheet = "Species_records_camera",
  recurse = FALSE
)

head(sps)[1]
$Example11
# A tibble: 250 × 7
   Structure_id Camera_id Species        Record_date         Record_time        
   <chr>        <chr>     <chr>          <dttm>              <dttm>             
 1 Paso 9       camP9     Lycalopex gri… 2023-11-23 00:00:00 1899-12-31 00:24:00
 2 Paso 9       camP9     Lycalopex gri… 2023-11-26 00:00:00 1899-12-31 03:03:00
 3 Paso 9       camP9     Lycalopex gri… 2023-11-26 00:00:00 1899-12-31 18:10:00
 4 Paso 9       camP9     Lycalopex gri… 2023-11-26 00:00:00 1899-12-31 18:12:00
 5 Paso 9       camP9     Lycalopex gri… 2023-11-26 00:00:00 1899-12-31 23:19:00
 6 Paso 9       camP9     Lycalopex gri… 2023-11-27 00:00:00 1899-12-31 11:47:00
 7 Paso 9       camP9     Lycalopex gri… 2023-11-30 00:00:00 1899-12-31 00:42:00
 8 Paso 9       camP9     Lycalopex gri… 2023-11-30 00:00:00 1899-12-31 02:51:00
 9 Paso 9       camP9     Lycalopex gri… 2023-12-03 00:00:00 1899-12-31 00:53:00
10 Paso 9       camP9     Conepatus chi… 2023-12-03 00:00:00 1899-12-31 03:19:00
# ℹ 240 more rows
# ℹ 2 more variables: Record_criteria <dbl>, Behavior <chr>

Next, we perform a validation check on the criteria themselves. We bind all datasets and filter for records where the Record_criteria (the independence time) is less than 5 minutes, as this may indicate a typo. We count the occurrences by criterion and dataset.

Code
# Check records with less than five minutes
sps |>
  dplyr::bind_rows(.id = "Dataset") |>
  dplyr::filter(!is.na(Record_criteria), Record_criteria < 5) |>
  dplyr::count(Record_criteria, Dataset)
# A tibble: 1 × 3
  Record_criteria Dataset       n
            <dbl> <chr>     <int>
1               1 Example13  3838

Continuing the validation, we check the consistency of criteria per camera. We iterate through each dataset to identify if any camera (Structure_id, Camera_id) has more than one distinct Record_criteria value associated with it. Any cameras with multiple, conflicting criteria are listed, as this indicates an inconsistency in the input data.

Code
# Checks if there is more than one criterion per camera
sps |>
  purrr::map(
    ~ .x |>
      dplyr::filter(!is.na(Record_criteria)) |>
      dplyr::distinct(Structure_id, Camera_id, Record_criteria) |>
      dplyr::count(Structure_id, Camera_id, sort = TRUE)
  ) |>
  purrr::discard(~ nrow(.x) == 0) |>
  dplyr::bind_rows(.id = "Dataset") |>
  dplyr::filter(n > 1)
# A tibble: 1 × 4
  Dataset   Structure_id Camera_id     n
  <chr>     <chr>        <chr>     <int>
1 Example12 8            8             2

This is the main validation check. We map over each dataset to: (1) standardize the date and time columns using dttm_update; (2) group by camera and species; (3) calculate the time (in minutes) since the previous record (diff_time_prev). We compare this difference with the camera’s Record_criteria, flagging duplicate records (“DUP”) or those that violate the independence criterion (“UH OH!”).

Finally, we bind all the processed datasets and display the first 20 records that failed the check (i.e., are not “OK”).

Code
# Checks if records are within the record_criteria
check <- sps |>
  purrr::map(function(dataset) {
    dataset |>
      dttm_update(date_col = "Record_date", time_col = "Record_time") |>
      dplyr::select(-Record_time, -Behavior) |>
      dplyr::group_by(Structure_id, Camera_id, Species) |>
      dplyr::arrange(Record_date) |>
      dplyr::mutate(
        diff_time_prev = ceiling(
          as.numeric(lubridate::interval(
            dplyr::lag(Record_date),
            Record_date
          )) /
            60
        ),
        check_diff_time_prev = dplyr::case_when(
          dplyr::lag(Record_date) == Record_date ~ "DUP",
          diff_time_prev < Record_criteria ~ "UH OH!",
          TRUE ~ "OK"
        )
      ) |>
      dplyr::ungroup() |>
      dplyr::arrange(Species, Camera_id, Record_date) |>
      tibble::rowid_to_column("id")
  })

check |>
  dplyr::bind_rows(.id = "Dataset") |>
  dplyr::filter(!is.na(Record_criteria)) |>
  dplyr::filter(check_diff_time_prev != "OK") |>
  dplyr::glimpse()
Rows: 24
Columns: 9
$ Dataset              <chr> "Example11", "Example11", "Example11", "Example11…
$ id                   <int> 41, 85, 132, 133, 134, 137, 155, 177, 185, 189, 2…
$ Structure_id         <chr> "Paso 10", "Paso 10", "Paso 11", "Paso 11", "Paso…
$ Camera_id            <chr> "camP10", "camP10", "camP11", "camP11", "camP11",…
$ Species              <chr> "Chaetophractus villosus", "Chaetophractus villos…
$ Record_date          <dttm> 2023-11-24 11:51:00, 2023-11-28 21:28:00, 2023-1…
$ Record_criteria      <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 60, 60,…
$ diff_time_prev       <dbl> 3, 4, 3, 3, 2, 2, 3, 2, 4, 3, 3, 3, 4, 3, 2, 13, …
$ check_diff_time_prev <chr> "UH OH!", "UH OH!", "UH OH!", "UH OH!", "UH OH!",…