Code
use("lubridate", c("%within%", "%--%"))This document describes the process of identifying duplicated camera IDs within the same structure in the camera trap setup data. The goal is to ensure that each camera within a structure is uniquely identified, avoiding overlaps that could compromise data integrity.
We use datapaperchecks::read_sheet to load the camera trap setup data from all available spreadsheets. We also bring two special operators from lubridate for interval checks.
use("lubridate", c("%within%", "%--%"))We load the camera trap setup data (ct) and look for duplicated camera IDs within each structure.
# Read camera trap setup data
ct <- datapaperchecks::read_sheet(
path = "Example",
sheet = "Camera_trap",
na = c("NA", "na")
)
ct |>
head(2)$Example1
# A tibble: 435 × 36
Structure_id Camera_id Camera_position Camera_view Camera_model Camera_setup
<chr> <chr> <chr> <chr> <chr> <chr>
1 BC1 (galeria) cam1 Externa <NA> Bushnell <NA>
2 BC1 (galeria) - <NA> <NA> <NA> <NA>
3 BC1 (galeria) VITA_04 Externa Abertura Bushnell <NA>
4 BC1 (galeria) - Externa Abertura <NA> <NA>
5 BC1 (galeria) - Externa Abertura <NA> <NA>
6 BC1 (galeria) - Externa Abertura <NA> <NA>
7 BC1 (galeria) - Externa Abertura <NA> <NA>
8 BC1 (galeria) - Externa Abertura <NA> <NA>
9 BC1 (galeria) VITA_07 Externa Abertura Bushnell <NA>
10 BC1 (galeria) VITA_15 Externa Abertura Bushnell <NA>
# ℹ 425 more rows
# ℹ 30 more variables: Camera_vision_photo <chr>, Start_date <dttm>,
# Start_time <dttm>, End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
# Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
# Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
# Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>,
# Problem5_to <dttm>, Problem6_from <dttm>, Problem6_to <dttm>, …
$Example2
# A tibble: 7 × 36
Structure_id Camera_id Camera_position Camera_view Camera_model Camera_setup
<chr> <chr> <chr> <chr> <chr> <chr>
1 CE2 cam_CE2 Externa Abertura Trail Camera … Secuencia d…
2 CE3 cam_CE3 Externa Abertura Trail Camera … Secuencia d…
3 CE4 cam_CE4 Externa Abertura Trail Camera … Secuencia d…
4 CE5 cam_CE5 Externa Abertura Trail Camera … Secuencia d…
5 CE6 cam_CE6 Externa Abertura <NA> Secuencia d…
6 CE7 cam_CE7 Externa Abertura Trail Camera … Secuencia d…
7 CE9 cam_CE9 Externa Abertura Trail Camera … Secuencia d…
# ℹ 30 more variables: Camera_vision_photo <chr>, Start_date <dttm>,
# Start_time <dttm>, End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
# Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
# Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
# Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>,
# Problem5_to <dttm>, Problem6_from <dttm>, Problem6_to <dttm>,
# Problem7_from <dttm>, Problem7_to <dttm>, Problem8_from <dttm>, …
We identify cases where the same camera ID appears more than once within the same structure.
duplicated_cameras <- ct |>
purrr::map(
~ .x |>
dplyr::count(Structure_id, Camera_id) |>
dplyr::filter(n > 1)
) |>
purrr::discard(~ nrow(.x) == 0) |>
dplyr::bind_rows(.id = "Dataset")
duplicated_cameras |>
head()# A tibble: 6 × 4
Dataset Structure_id Camera_id n
<chr> <chr> <chr> <int>
1 Example1 BC1 (galeria) - 6
2 Example1 BC1 (galeria) VITA_07 2
3 Example1 BC2 (drenagem) - 6
4 Example1 BC2 (drenagem) VITA_07_1 2
5 Example1 P10 (amola faca) - 14
6 Example1 P2 (mauricio) - 3
We extract the datasets with duplicated cameras and apply the unique_id function to generate a column Camera_id with the new unique camera IDs. The old ID remains in Camera_id_orig field.
dataset_dup_cameras <- duplicated_cameras |>
dplyr::distinct(Dataset) |>
dplyr::pull(Dataset)
ct_with_dupes <- ct[names(ct) %in% dataset_dup_cameras]
ct_uniq <- ct_with_dupes |>
purrr::map(~ datapaperchecks::unique_id(.x))
ct_uniq |>
head(2)$Example1
# A tibble: 435 × 38
double Structure_id Camera_id Camera_id_orig Camera_position Camera_view
<int> <chr> <chr> <chr> <chr> <chr>
1 1 BC1 (galeria) cam1 cam1 Externa <NA>
2 6 BC1 (galeria) -_A - <NA> <NA>
3 1 BC1 (galeria) VITA_04 VITA_04 Externa Abertura
4 6 BC1 (galeria) -_B - Externa Abertura
5 6 BC1 (galeria) -_C - Externa Abertura
6 6 BC1 (galeria) -_D - Externa Abertura
7 6 BC1 (galeria) -_E - Externa Abertura
8 6 BC1 (galeria) -_F - Externa Abertura
9 2 BC1 (galeria) VITA_07_A VITA_07 Externa Abertura
10 1 BC1 (galeria) VITA_15 VITA_15 Externa Abertura
# ℹ 425 more rows
# ℹ 32 more variables: Camera_model <chr>, Camera_setup <chr>,
# Camera_vision_photo <chr>, Start_date <dttm>, Start_time <dttm>,
# End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
# Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
# Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
# Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>, …
$Example1
# A tibble: 14 × 38
double Structure_id Camera_id Camera_id_orig Camera_position Camera_view
<int> <chr> <chr> <chr> <chr> <chr>
1 1 Ponte Bárbara GB GB Externa Interior
2 1 Ponte Índios MJ2 MJ2 Externa Interior
3 1 Ponte Samir MJ1 MJ1 Externa Interior
4 1 Ponte Beco 18 Rosa Rosa Externa Interior
5 1 Ponte Pituca MJ1 MJ1 Externa Interior
6 1 Ponte Reserva PD PD Externa Interior
7 1 Ponte 9 Irmãos GB GB Externa Interior
8 1 Ponte Fupala Rosa Rosa Externa Interior
9 1 Ponte Samir - Co… MJ1 MJ1 Externa Interior
10 1 Ponte Pituca - C… PD PD Externa Interior
11 1 Ponte Samir - Ma… MJ1 MJ1 Externa Interior
12 1 Ponte Pituca - M… PD PD Externa Interior
13 1 Ponte São Paulo MJ3 MJ3 Externa Interior
14 1 Ponte Manecão PD PD Externa Interior
# ℹ 32 more variables: Camera_model <chr>, Camera_setup <chr>,
# Camera_vision_photo <chr>, Start_date <dttm>, Start_time <dttm>,
# End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
# Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
# Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
# Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>,
# Problem5_to <dttm>, Problem6_from <dttm>, Problem6_to <dttm>, …
We check if any dataset still has more than one camera ID per structure after applying the unique ID function.
ct_uniq |>
dplyr::bind_rows(.id = "Dataset") |>
dplyr::count(Dataset, Structure_id, Camera_id) |>
dplyr::filter(n > 1) |>
head(2)# A tibble: 2 × 4
Dataset Structure_id Camera_id n
<chr> <chr> <chr> <int>
1 Example1 BC1 (galeria) -_A 2
2 Example1 BC1 (galeria) -_B 2
We load the species records and cross-check them with the corrected camera trap data to ensure records are properly matched.
rec <- datapaperchecks::read_sheet(
path = "Example",
sheet = "Species_records_camera",
na = c("NA", "na")
) |>
purrr::map(\(x) {
x |>
datapaperchecks::dttm_update(
date_col = "Record_date",
time_col = "Record_time"
) |>
dplyr::select(-Record_time)
})
rec_with_dupes <- rec[names(rec) %in% dataset_dup_cameras]
rec_with_dupes |>
head(2)$Example1
# A tibble: 3,590 × 6
Structure_id Camera_id Species Record_date Record_criteria Behavior
<chr> <chr> <chr> <dttm> <dbl> <chr>
1 P1 (iguaçu) cam1 Cavia … 2017-05-09 03:59:00 NA Dentro
2 P3 (varzea) cam1 Aramid… 2017-05-01 08:37:00 NA Dentro
3 P3 (varzea) cam1 Leopar… 2017-05-05 21:09:00 NA Dentro
4 BC2 (drenagem) cam2 Didelp… 2018-07-30 04:18:00 NA <NA>
5 BC2 (drenagem) cam2 Didelp… 2018-07-30 20:02:00 NA <NA>
6 BC2 (drenagem) cam2 Didelp… 2018-07-31 00:10:00 NA <NA>
7 BC2 (drenagem) cam2 Didelp… 2018-08-01 01:30:00 NA <NA>
8 BC2 (drenagem) cam2 Didelp… 2018-08-01 03:07:00 NA <NA>
9 BC2 (drenagem) cam2 Didelp… 2018-08-02 00:48:00 NA <NA>
10 BC2 (drenagem) cam2 Didelp… 2018-08-02 01:09:00 NA <NA>
# ℹ 3,580 more rows
$Example1
# A tibble: 3,132 × 6
Structure_id Camera_id Species Record_date Record_criteria Behavior
<chr> <chr> <chr> <dttm> <dbl> <chr>
1 Ponte 9 Irmãos GB Alouat… 2022-03-26 08:57:58 NA <NA>
2 Ponte 9 Irmãos GB Alouat… 2022-03-26 09:02:04 NA <NA>
3 Ponte 9 Irmãos GB Alouat… 2022-03-26 13:49:06 NA <NA>
4 Ponte 9 Irmãos GB Alouat… 2022-03-26 13:50:58 NA <NA>
5 Ponte 9 Irmãos GB Alouat… 2022-03-26 13:52:12 NA <NA>
6 Ponte 9 Irmãos GB Coendo… 2022-03-27 00:26:52 NA <NA>
7 Ponte 9 Irmãos GB Coendo… 2022-03-27 06:33:14 NA <NA>
8 Ponte 9 Irmãos GB Alouat… 2022-03-27 14:51:22 NA <NA>
9 Ponte 9 Irmãos GB Alouat… 2022-03-27 14:52:04 NA <NA>
10 Ponte 9 Irmãos GB Alouat… 2022-03-27 14:52:48 NA <NA>
# ℹ 3,122 more rows
For each dataset with duplicated cameras, we generate Excel files with the corrected camera trap data and the matched records. We also identify records that do not fall within any sampling interval.
rows_with_errors <- list()
for (dataset in dataset_dup_cameras) {
cam <- ct_uniq[[dataset]] |>
dplyr::mutate(code = stringr::str_glue("S{Structure_id}-C{Camera_id_orig}"))
reg <- rec_with_dupes[[dataset]] |>
tibble::rowid_to_column("id") |>
dplyr::mutate(code = stringr::str_glue("S{Structure_id}-C{Camera_id}"))
intermediate_result <- reg |>
dplyr::full_join(
cam,
by = "code",
suffix = c("_rec", ""),
relationship = "many-to-many"
) |>
dplyr::mutate(
dplyr::across(
dplyr::ends_with("_time"),
~ stringr::str_sub(., start = -8, end = -4)
),
datetime_record = Record_date,
datetime_start = lubridate::ymd_hm(paste(
as.character(Start_date),
tidyr::replace_na(Start_time, "00:00")
)),
datetime_end = lubridate::ymd_hm(paste(
as.character(End_date),
tidyr::replace_na(End_time, "00:00")
)),
belongs_to = dplyr::if_else(
condition = datetime_record %within%
c(datetime_start %--% datetime_end),
Camera_id,
"nope"
)
)
intermediate_result |>
dplyr::distinct(id, belongs_to, .keep_all = TRUE) |>
dplyr::filter(!(dplyr::n() > 1 & belongs_to == "nope"), .by = "id") |>
dplyr::filter(!is.na(id)) |>
head()
rows_with_errors[[dataset]] <- intermediate_result |>
dplyr::filter(all(belongs_to == "nope"), .by = "id")
}
rows_with_errors |>
dplyr::bind_rows(.id = "dataset") |>
head()# A tibble: 6 × 51
dataset id Structure_id_rec Camera_id_rec Species Record_date
<chr> <int> <chr> <chr> <chr> <dttm>
1 Example1 1 P1 (iguaçu) cam1 Cavia sp. 2017-05-09 03:59:00
2 Example1 4 BC2 (drenagem) cam2 Didelphis a… 2018-07-30 04:18:00
3 Example1 5 BC2 (drenagem) cam2 Didelphis a… 2018-07-30 20:02:00
4 Example1 6 BC2 (drenagem) cam2 Didelphis a… 2018-07-31 00:10:00
5 Example1 7 BC2 (drenagem) cam2 Didelphis a… 2018-08-01 01:30:00
6 Example1 8 BC2 (drenagem) cam2 Didelphis a… 2018-08-01 03:07:00
# ℹ 45 more variables: Record_criteria <dbl>, Behavior <chr>, code <glue>,
# double <int>, Structure_id <chr>, Camera_id <chr>, Camera_id_orig <chr>,
# Camera_position <chr>, Camera_view <chr>, Camera_model <chr>,
# Camera_setup <chr>, Camera_vision_photo <chr>, Start_date <dttm>,
# Start_time <chr>, End_date <dttm>, End_time <chr>, Camera_problem <chr>,
# Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
# Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>, …
The output files contain:
This process ensures data consistency and helps identify potential issues with camera deployment or data entry.