Code
source("R/FUNCTIONS.R")
use("lubridate", c("%within%", "%--%"))This document describes the process of identifying duplicated camera IDs within the same structure in the camera trap setup data. The goal is to ensure that each camera within a structure is uniquely identified, avoiding overlaps that could compromise data integrity.
We use our customized read_sheet function to load the camera trap setup data from all available spreadsheets. Hence, we need to load the FUNCTIONS.R. Another function, also needed to create unique id’s called unique_id is loaded. Also, two special operators from the lubridate package are brought to the memory.
source("R/FUNCTIONS.R")
use("lubridate", c("%within%", "%--%"))We load the camera trap setup data (ct) and look for duplicated camera IDs within each structure.
# Read camera trap setup data
ct <- read_sheet(
path = "Example",
sheet = "Camera_trap",
na = c("NA", "na")
)
ct |>
head(2)$Example1
# A tibble: 435 × 36
Structure_id Camera_id Camera_position Camera_view Camera_model Camera_setup
<chr> <chr> <chr> <chr> <chr> <chr>
1 BC1 (galeria) cam1 Externa <NA> Bushnell <NA>
2 BC1 (galeria) - <NA> <NA> <NA> <NA>
3 BC1 (galeria) VITA_04 Externa Abertura Bushnell <NA>
4 BC1 (galeria) - Externa Abertura <NA> <NA>
5 BC1 (galeria) - Externa Abertura <NA> <NA>
6 BC1 (galeria) - Externa Abertura <NA> <NA>
7 BC1 (galeria) - Externa Abertura <NA> <NA>
8 BC1 (galeria) - Externa Abertura <NA> <NA>
9 BC1 (galeria) VITA_07 Externa Abertura Bushnell <NA>
10 BC1 (galeria) VITA_15 Externa Abertura Bushnell <NA>
# ℹ 425 more rows
# ℹ 30 more variables: Camera_vision_photo <chr>, Start_date <dttm>,
# Start_time <dttm>, End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
# Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
# Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
# Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>,
# Problem5_to <dttm>, Problem6_from <dttm>, Problem6_to <dttm>, …
$Example2
# A tibble: 7 × 36
Structure_id Camera_id Camera_position Camera_view Camera_model Camera_setup
<chr> <chr> <chr> <chr> <chr> <chr>
1 CE2 cam_CE2 Externa Abertura Trail Camera … Secuencia d…
2 CE3 cam_CE3 Externa Abertura Trail Camera … Secuencia d…
3 CE4 cam_CE4 Externa Abertura Trail Camera … Secuencia d…
4 CE5 cam_CE5 Externa Abertura Trail Camera … Secuencia d…
5 CE6 cam_CE6 Externa Abertura <NA> Secuencia d…
6 CE7 cam_CE7 Externa Abertura Trail Camera … Secuencia d…
7 CE9 cam_CE9 Externa Abertura Trail Camera … Secuencia d…
# ℹ 30 more variables: Camera_vision_photo <chr>, Start_date <dttm>,
# Start_time <dttm>, End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
# Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
# Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
# Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>,
# Problem5_to <dttm>, Problem6_from <dttm>, Problem6_to <dttm>,
# Problem7_from <dttm>, Problem7_to <dttm>, Problem8_from <dttm>, …
We identify cases where the same camera ID appears more than once within the same structure.
duplicated_cameras <- ct |>
purrr::map(
~ .x |>
dplyr::count(Structure_id, Camera_id) |>
dplyr::filter(n > 1)
) |>
purrr::discard(~ nrow(.x) == 0) |>
dplyr::bind_rows(.id = "Dataset")
duplicated_cameras |>
head()# A tibble: 6 × 4
Dataset Structure_id Camera_id n
<chr> <chr> <chr> <int>
1 Example1 BC1 (galeria) - 6
2 Example1 BC1 (galeria) VITA_07 2
3 Example1 BC2 (drenagem) - 6
4 Example1 BC2 (drenagem) VITA_07_1 2
5 Example1 P10 (amola faca) - 14
6 Example1 P2 (mauricio) - 3
We extract the datasets with duplicated cameras and apply the unique_id function to generate a column Camera_id with the new unique camera IDs. The old ID remains in Camera_id_orig field.
dataset_dup_cameras <- duplicated_cameras |>
dplyr::distinct(Dataset) |>
dplyr::pull(Dataset)
ct_with_dupes <- ct[names(ct) %in% dataset_dup_cameras]
ct_uniq <- ct_with_dupes |>
purrr::map(~ unique_id(.x))
ct_uniq |>
head(2)$Example1
# A tibble: 435 × 38
double Structure_id Camera_id Camera_id_orig Camera_position Camera_view
<int> <chr> <chr> <chr> <chr> <chr>
1 1 BC1 (galeria) cam1 cam1 Externa <NA>
2 6 BC1 (galeria) -_A - <NA> <NA>
3 1 BC1 (galeria) VITA_04 VITA_04 Externa Abertura
4 6 BC1 (galeria) -_B - Externa Abertura
5 6 BC1 (galeria) -_C - Externa Abertura
6 6 BC1 (galeria) -_D - Externa Abertura
7 6 BC1 (galeria) -_E - Externa Abertura
8 6 BC1 (galeria) -_F - Externa Abertura
9 2 BC1 (galeria) VITA_07_A VITA_07 Externa Abertura
10 1 BC1 (galeria) VITA_15 VITA_15 Externa Abertura
# ℹ 425 more rows
# ℹ 32 more variables: Camera_model <chr>, Camera_setup <chr>,
# Camera_vision_photo <chr>, Start_date <dttm>, Start_time <dttm>,
# End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
# Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
# Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
# Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>, …
$Example1
# A tibble: 202 × 38
double Structure_id Camera_id Camera_id_orig Camera_position Camera_view
<int> <chr> <chr> <chr> <chr> <chr>
1 1 PFO01 cam01_1 cam01_1 Borda interna Interior
2 1 PFO01 cam02_2 cam02_2 Borda interna Interior
3 1 PFO01 cam06_3 cam06_3 Borda interna Interior
4 1 PFO01 cam02_4 cam02_4 Borda interna Interior
5 1 PFO01 cam02_5 cam02_5 Borda interna Interior
6 1 PFO01 cam04_6 cam04_6 Borda interna Interior
7 1 PFO01 cam02_8 cam02_8 Borda interna Interior
8 1 PFO01 cam02_9 cam02_9 Borda interna Interior
9 1 PFO01 cam06_7 cam06_7 Borda interna Interior
10 1 PFO02 cam04_15 cam04_15 Borda interna Interior
# ℹ 192 more rows
# ℹ 32 more variables: Camera_model <chr>, Camera_setup <chr>,
# Camera_vision_photo <chr>, Start_date <dttm>, Start_time <dttm>,
# End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
# Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
# Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
# Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>, …
We check if any dataset still has more than one camera ID per structure after applying the unique ID function.
ct_uniq |>
dplyr::bind_rows(.id = "Dataset") |>
dplyr::count(Dataset, Structure_id, Camera_id) |>
dplyr::filter(n > 1) |>
head(2)# A tibble: 2 × 4
Dataset Structure_id Camera_id n
<chr> <chr> <chr> <int>
1 Example1 BC1 (galeria) -_A 2
2 Example1 BC1 (galeria) -_B 2
We load the species records and cross-check them with the corrected camera trap data to ensure records are properly matched.
rec <- read_sheet(
path = "Example",
sheet = "Species_records_camera",
na = c("NA", "na")
) |>
purrr::map(\(x) {
x |>
dttm_update(
date_col = "Record_date",
time_col = "Record_time"
) |>
dplyr::select(-Record_time)
})
rec_with_dupes <- rec[names(rec) %in% dataset_dup_cameras]
rec_with_dupes |>
head(2)$Example1
# A tibble: 3,590 × 6
Structure_id Camera_id Species Record_date Record_criteria Behavior
<chr> <chr> <chr> <dttm> <dbl> <chr>
1 P1 (iguaçu) cam1 Cavia … 2017-05-09 03:59:00 NA Dentro
2 P3 (varzea) cam1 Aramid… 2017-05-01 08:37:00 NA Dentro
3 P3 (varzea) cam1 Leopar… 2017-05-05 21:09:00 NA Dentro
4 BC2 (drenagem) cam2 Didelp… 2018-07-30 04:18:00 NA <NA>
5 BC2 (drenagem) cam2 Didelp… 2018-07-30 20:02:00 NA <NA>
6 BC2 (drenagem) cam2 Didelp… 2018-07-31 00:10:00 NA <NA>
7 BC2 (drenagem) cam2 Didelp… 2018-08-01 01:30:00 NA <NA>
8 BC2 (drenagem) cam2 Didelp… 2018-08-01 03:07:00 NA <NA>
9 BC2 (drenagem) cam2 Didelp… 2018-08-02 00:48:00 NA <NA>
10 BC2 (drenagem) cam2 Didelp… 2018-08-02 01:09:00 NA <NA>
# ℹ 3,580 more rows
$Example1
# A tibble: 1,450 × 6
Structure_id Camera_id Species Record_date Record_criteria Behavior
<chr> <chr> <chr> <dttm> <dbl> <chr>
1 PFU02 cam05_109 Aramides… 2014-08-06 12:42:21 15 Dentro
2 PFU06 cam05_167 Aramides… 2014-10-06 13:41:37 15 Dentro
3 PFU06 cam05_167 Aramides… 2014-10-07 08:47:34 15 Dentro
4 PFU06 cam03_169 Aramides… 2014-11-28 07:54:43 15 Dentro
5 PFU06 cam03_169 Aramides… 2014-11-28 08:17:42 15 Dentro
6 PFU06 cam03_169 Aramides… 2014-12-03 12:37:33 15 Dentro
7 PFU06 cam03_169 Aramides… 2014-12-05 10:10:09 15 Dentro
8 PFU06 cam03_169 Aramides… 2014-12-05 12:38:26 15 Dentro
9 PFU06 cam03_169 Aramides… 2014-12-05 14:26:42 15 Dentro
10 PFU06 cam03_169 Aramides… 2014-12-06 10:22:53 15 Dentro
# ℹ 1,440 more rows
For each dataset with duplicated cameras, we generate Excel files with the corrected camera trap data and the matched records. We also identify records that do not fall within any sampling interval.
rows_with_errors <- list()
for (dataset in dataset_dup_cameras) {
cam <- ct_uniq[[dataset]] |>
dplyr::mutate(code = stringr::str_glue("S{Structure_id}-C{Camera_id_orig}"))
reg <- rec_with_dupes[[dataset]] |>
tibble::rowid_to_column("id") |>
dplyr::mutate(code = stringr::str_glue("S{Structure_id}-C{Camera_id}"))
intermediate_result <- reg |>
dplyr::full_join(
cam,
by = "code",
suffix = c("_rec", ""),
relationship = "many-to-many"
) |>
dplyr::mutate(
dplyr::across(
dplyr::ends_with("_time"),
~ stringr::str_sub(., start = -8, end = -4)
),
datetime_record = Record_date,
datetime_start = lubridate::ymd_hm(paste(
as.character(Start_date),
tidyr::replace_na(Start_time, "00:00")
)),
datetime_end = lubridate::ymd_hm(paste(
as.character(End_date),
tidyr::replace_na(End_time, "00:00")
)),
belongs_to = dplyr::if_else(
condition = datetime_record %within%
c(datetime_start %--% datetime_end),
Camera_id,
"nope"
)
)
intermediate_result |>
dplyr::distinct(id, belongs_to, .keep_all = TRUE) |>
dplyr::filter(!(dplyr::n() > 1 & belongs_to == "nope"), .by = "id") |>
dplyr::filter(!is.na(id)) |>
head()
rows_with_errors[[dataset]] <- intermediate_result |>
dplyr::filter(all(belongs_to == "nope"), .by = "id")
}
rows_with_errors |>
dplyr::bind_rows(.id = "dataset") |>
head()# A tibble: 6 × 51
dataset id Structure_id_rec Camera_id_rec Species Record_date
<chr> <int> <chr> <chr> <chr> <dttm>
1 Example1 1 P1 (iguaçu) cam1 Cavia sp. 2017-05-09 03:59:00
2 Example1 4 BC2 (drenagem) cam2 Didelphis a… 2018-07-30 04:18:00
3 Example1 5 BC2 (drenagem) cam2 Didelphis a… 2018-07-30 20:02:00
4 Example1 6 BC2 (drenagem) cam2 Didelphis a… 2018-07-31 00:10:00
5 Example1 7 BC2 (drenagem) cam2 Didelphis a… 2018-08-01 01:30:00
6 Example1 8 BC2 (drenagem) cam2 Didelphis a… 2018-08-01 03:07:00
# ℹ 45 more variables: Record_criteria <dbl>, Behavior <chr>, code <glue>,
# double <int>, Structure_id <chr>, Camera_id <chr>, Camera_id_orig <chr>,
# Camera_position <chr>, Camera_view <chr>, Camera_model <chr>,
# Camera_setup <chr>, Camera_vision_photo <chr>, Start_date <dttm>,
# Start_time <chr>, End_date <dttm>, End_time <chr>, Camera_problem <chr>,
# Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
# Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>, …
The output files contain:
This process ensures data consistency and helps identify potential issues with camera deployment or data entry.