Code
source("R/FUNCTIONS.R")This script checks the species record criteria, focusing on validating the minimum time (independence) between sequential records of the same species at the same camera.
We follow a pipeline: (1) load helper functions and read the species records sheet; (2) perform sanity checks on the Record_criteria values (e.g., values less than 5 minutes); (3) check for consistency, looking for cameras that have more than one independence criterion applied; (4) calculate the time difference between sequential records and flag those that violate the established independence criterion.
To begin, we source our external FUNCTIONS.R script. This makes our custom helper functions, like read_sheet and dttm_update, available for use throughout the analysis.
source("R/FUNCTIONS.R")First, we read the data from the “Species_records_camera” sheet located in the “Example” folder. Using our read_sheet function, we load the data into the sps list and then preview the first few rows of the first dataset.
# Read sheets
sps <- read_sheet(
path = "Example/11",
sheet = "Species_records_camera",
recurse = FALSE
)
head(sps)[1]$Example11
# A tibble: 250 × 7
Structure_id Camera_id Species Record_date Record_time
<chr> <chr> <chr> <dttm> <dttm>
1 Paso 9 camP9 Lycalopex gri… 2023-11-23 00:00:00 1899-12-31 00:24:00
2 Paso 9 camP9 Lycalopex gri… 2023-11-26 00:00:00 1899-12-31 03:03:00
3 Paso 9 camP9 Lycalopex gri… 2023-11-26 00:00:00 1899-12-31 18:10:00
4 Paso 9 camP9 Lycalopex gri… 2023-11-26 00:00:00 1899-12-31 18:12:00
5 Paso 9 camP9 Lycalopex gri… 2023-11-26 00:00:00 1899-12-31 23:19:00
6 Paso 9 camP9 Lycalopex gri… 2023-11-27 00:00:00 1899-12-31 11:47:00
7 Paso 9 camP9 Lycalopex gri… 2023-11-30 00:00:00 1899-12-31 00:42:00
8 Paso 9 camP9 Lycalopex gri… 2023-11-30 00:00:00 1899-12-31 02:51:00
9 Paso 9 camP9 Lycalopex gri… 2023-12-03 00:00:00 1899-12-31 00:53:00
10 Paso 9 camP9 Conepatus chi… 2023-12-03 00:00:00 1899-12-31 03:19:00
# ℹ 240 more rows
# ℹ 2 more variables: Record_criteria <dbl>, Behavior <chr>
Next, we perform a validation check on the criteria themselves. We bind all datasets and filter for records where the Record_criteria (the independence time) is less than 5 minutes, as this may indicate a typo. We count the occurrences by criterion and dataset.
# Check records with less than five minutes
sps |>
dplyr::bind_rows(.id = "Dataset") |>
dplyr::filter(!is.na(Record_criteria), Record_criteria < 5) |>
dplyr::count(Record_criteria, Dataset)# A tibble: 1 × 3
Record_criteria Dataset n
<dbl> <chr> <int>
1 1 Example13 3838
Continuing the validation, we check the consistency of criteria per camera. We iterate through each dataset to identify if any camera (Structure_id, Camera_id) has more than one distinct Record_criteria value associated with it. Any cameras with multiple, conflicting criteria are listed, as this indicates an inconsistency in the input data.
# Checks if there is more than one criterion per camera
sps |>
purrr::map(
~ .x |>
dplyr::filter(!is.na(Record_criteria)) |>
dplyr::distinct(Structure_id, Camera_id, Record_criteria) |>
dplyr::count(Structure_id, Camera_id, sort = TRUE)
) |>
purrr::discard(~ nrow(.x) == 0) |>
dplyr::bind_rows(.id = "Dataset") |>
dplyr::filter(n > 1)# A tibble: 1 × 4
Dataset Structure_id Camera_id n
<chr> <chr> <chr> <int>
1 Example12 8 8 2
This is the main validation check. We map over each dataset to: (1) standardize the date and time columns using dttm_update; (2) group by camera and species; (3) calculate the time (in minutes) since the previous record (diff_time_prev). We compare this difference with the camera’s Record_criteria, flagging duplicate records (“DUP”) or those that violate the independence criterion (“UH OH!”).
Finally, we bind all the processed datasets and display the first 20 records that failed the check (i.e., are not “OK”).
# Checks if records are within the record_criteria
check <- sps |>
purrr::map(function(dataset) {
dataset |>
dttm_update(date_col = "Record_date", time_col = "Record_time") |>
dplyr::select(-Record_time, -Behavior) |>
dplyr::group_by(Structure_id, Camera_id, Species) |>
dplyr::arrange(Record_date) |>
dplyr::mutate(
diff_time_prev = ceiling(
as.numeric(lubridate::interval(
dplyr::lag(Record_date),
Record_date
)) /
60
),
check_diff_time_prev = dplyr::case_when(
dplyr::lag(Record_date) == Record_date ~ "DUP",
diff_time_prev < Record_criteria ~ "UH OH!",
TRUE ~ "OK"
)
) |>
dplyr::ungroup() |>
dplyr::arrange(Species, Camera_id, Record_date) |>
tibble::rowid_to_column("id")
})
check |>
dplyr::bind_rows(.id = "Dataset") |>
dplyr::filter(!is.na(Record_criteria)) |>
dplyr::filter(check_diff_time_prev != "OK") |>
dplyr::glimpse()Rows: 24
Columns: 9
$ Dataset <chr> "Example11", "Example11", "Example11", "Example11…
$ id <int> 41, 85, 132, 133, 134, 137, 155, 177, 185, 189, 2…
$ Structure_id <chr> "Paso 10", "Paso 10", "Paso 11", "Paso 11", "Paso…
$ Camera_id <chr> "camP10", "camP10", "camP11", "camP11", "camP11",…
$ Species <chr> "Chaetophractus villosus", "Chaetophractus villos…
$ Record_date <dttm> 2023-11-24 11:51:00, 2023-11-28 21:28:00, 2023-1…
$ Record_criteria <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 60, 60,…
$ diff_time_prev <dbl> 3, 4, 3, 3, 2, 2, 3, 2, 4, 3, 3, 3, 4, 3, 2, 13, …
$ check_diff_time_prev <chr> "UH OH!", "UH OH!", "UH OH!", "UH OH!", "UH OH!",…