9  Check Duplicated Camera on Structure

9.1 Problem Description

This document describes the process of identifying duplicated camera IDs within the same structure in the camera trap setup data. The goal is to ensure that each camera within a structure is uniquely identified, avoiding overlaps that could compromise data integrity.

9.2 Problem Solving

9.2.1 Common steps

We use datapaperchecks::read_sheet to load the camera trap setup data from all available spreadsheets. We also bring two special operators from lubridate for interval checks.

Code
use("lubridate", c("%within%", "%--%"))

9.2.2 Data Loading

We load the camera trap setup data (ct) and look for duplicated camera IDs within each structure.

Code
# Read camera trap setup data
ct <- datapaperchecks::read_sheet(
  path = "Example",
  sheet = "Camera_trap",
  na = c("NA", "na")
)

ct |>
  head(2)
$Example1
# A tibble: 435 × 36
   Structure_id  Camera_id Camera_position Camera_view Camera_model Camera_setup
   <chr>         <chr>     <chr>           <chr>       <chr>        <chr>       
 1 BC1 (galeria) cam1      Externa         <NA>        Bushnell     <NA>        
 2 BC1 (galeria) -         <NA>            <NA>        <NA>         <NA>        
 3 BC1 (galeria) VITA_04   Externa         Abertura    Bushnell     <NA>        
 4 BC1 (galeria) -         Externa         Abertura    <NA>         <NA>        
 5 BC1 (galeria) -         Externa         Abertura    <NA>         <NA>        
 6 BC1 (galeria) -         Externa         Abertura    <NA>         <NA>        
 7 BC1 (galeria) -         Externa         Abertura    <NA>         <NA>        
 8 BC1 (galeria) -         Externa         Abertura    <NA>         <NA>        
 9 BC1 (galeria) VITA_07   Externa         Abertura    Bushnell     <NA>        
10 BC1 (galeria) VITA_15   Externa         Abertura    Bushnell     <NA>        
# ℹ 425 more rows
# ℹ 30 more variables: Camera_vision_photo <chr>, Start_date <dttm>,
#   Start_time <dttm>, End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
#   Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
#   Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
#   Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>,
#   Problem5_to <dttm>, Problem6_from <dttm>, Problem6_to <dttm>, …

$Example2
# A tibble: 7 × 36
  Structure_id Camera_id Camera_position Camera_view Camera_model   Camera_setup
  <chr>        <chr>     <chr>           <chr>       <chr>          <chr>       
1 CE2          cam_CE2   Externa         Abertura    Trail Camera … Secuencia d…
2 CE3          cam_CE3   Externa         Abertura    Trail Camera … Secuencia d…
3 CE4          cam_CE4   Externa         Abertura    Trail Camera … Secuencia d…
4 CE5          cam_CE5   Externa         Abertura    Trail Camera … Secuencia d…
5 CE6          cam_CE6   Externa         Abertura    <NA>           Secuencia d…
6 CE7          cam_CE7   Externa         Abertura    Trail Camera … Secuencia d…
7 CE9          cam_CE9   Externa         Abertura    Trail Camera … Secuencia d…
# ℹ 30 more variables: Camera_vision_photo <chr>, Start_date <dttm>,
#   Start_time <dttm>, End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
#   Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
#   Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
#   Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>,
#   Problem5_to <dttm>, Problem6_from <dttm>, Problem6_to <dttm>,
#   Problem7_from <dttm>, Problem7_to <dttm>, Problem8_from <dttm>, …

9.2.3 Identifying Duplicated Cameras

We identify cases where the same camera ID appears more than once within the same structure.

Code
duplicated_cameras <- ct |>
  purrr::map(
    ~ .x |>
      dplyr::count(Structure_id, Camera_id) |>
      dplyr::filter(n > 1)
  ) |>
  purrr::discard(~ nrow(.x) == 0) |>
  dplyr::bind_rows(.id = "Dataset")

duplicated_cameras |>
  head()
# A tibble: 6 × 4
  Dataset  Structure_id     Camera_id     n
  <chr>    <chr>            <chr>     <int>
1 Example1 BC1 (galeria)    -             6
2 Example1 BC1 (galeria)    VITA_07       2
3 Example1 BC2 (drenagem)   -             6
4 Example1 BC2 (drenagem)   VITA_07_1     2
5 Example1 P10 (amola faca) -            14
6 Example1 P2 (mauricio)    -             3

9.2.4 Handling Datasets with Duplicates

We extract the datasets with duplicated cameras and apply the unique_id function to generate a column Camera_id with the new unique camera IDs. The old ID remains in Camera_id_orig field.

Code
dataset_dup_cameras <- duplicated_cameras |>
  dplyr::distinct(Dataset) |>
  dplyr::pull(Dataset)

ct_with_dupes <- ct[names(ct) %in% dataset_dup_cameras]

ct_uniq <- ct_with_dupes |>
  purrr::map(~ datapaperchecks::unique_id(.x))

ct_uniq |>
  head(2)
$Example1
# A tibble: 435 × 38
   double Structure_id  Camera_id Camera_id_orig Camera_position Camera_view
    <int> <chr>         <chr>     <chr>          <chr>           <chr>      
 1      1 BC1 (galeria) cam1      cam1           Externa         <NA>       
 2      6 BC1 (galeria) -_A       -              <NA>            <NA>       
 3      1 BC1 (galeria) VITA_04   VITA_04        Externa         Abertura   
 4      6 BC1 (galeria) -_B       -              Externa         Abertura   
 5      6 BC1 (galeria) -_C       -              Externa         Abertura   
 6      6 BC1 (galeria) -_D       -              Externa         Abertura   
 7      6 BC1 (galeria) -_E       -              Externa         Abertura   
 8      6 BC1 (galeria) -_F       -              Externa         Abertura   
 9      2 BC1 (galeria) VITA_07_A VITA_07        Externa         Abertura   
10      1 BC1 (galeria) VITA_15   VITA_15        Externa         Abertura   
# ℹ 425 more rows
# ℹ 32 more variables: Camera_model <chr>, Camera_setup <chr>,
#   Camera_vision_photo <chr>, Start_date <dttm>, Start_time <dttm>,
#   End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
#   Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
#   Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
#   Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>, …

$Example1
# A tibble: 14 × 38
   double Structure_id      Camera_id Camera_id_orig Camera_position Camera_view
    <int> <chr>             <chr>     <chr>          <chr>           <chr>      
 1      1 Ponte Bárbara     GB        GB             Externa         Interior   
 2      1 Ponte Índios      MJ2       MJ2            Externa         Interior   
 3      1 Ponte Samir       MJ1       MJ1            Externa         Interior   
 4      1 Ponte Beco 18     Rosa      Rosa           Externa         Interior   
 5      1 Ponte Pituca      MJ1       MJ1            Externa         Interior   
 6      1 Ponte Reserva     PD        PD             Externa         Interior   
 7      1 Ponte 9 Irmãos    GB        GB             Externa         Interior   
 8      1 Ponte Fupala      Rosa      Rosa           Externa         Interior   
 9      1 Ponte Samir - Co… MJ1       MJ1            Externa         Interior   
10      1 Ponte Pituca - C… PD        PD             Externa         Interior   
11      1 Ponte Samir - Ma… MJ1       MJ1            Externa         Interior   
12      1 Ponte Pituca - M… PD        PD             Externa         Interior   
13      1 Ponte São Paulo   MJ3       MJ3            Externa         Interior   
14      1 Ponte Manecão     PD        PD             Externa         Interior   
# ℹ 32 more variables: Camera_model <chr>, Camera_setup <chr>,
#   Camera_vision_photo <chr>, Start_date <dttm>, Start_time <dttm>,
#   End_date <dttm>, End_time <dttm>, Camera_problem <chr>,
#   Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
#   Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>,
#   Problem4_from <dttm>, Problem4_to <dttm>, Problem5_from <dttm>,
#   Problem5_to <dttm>, Problem6_from <dttm>, Problem6_to <dttm>, …

9.2.5 Checking for Remaining Duplicates

We check if any dataset still has more than one camera ID per structure after applying the unique ID function.

Code
ct_uniq |>
  dplyr::bind_rows(.id = "Dataset") |>
  dplyr::count(Dataset, Structure_id, Camera_id) |>
  dplyr::filter(n > 1) |>
  head(2)
# A tibble: 2 × 4
  Dataset  Structure_id  Camera_id     n
  <chr>    <chr>         <chr>     <int>
1 Example1 BC1 (galeria) -_A           2
2 Example1 BC1 (galeria) -_B           2

9.2.6 Cross-checking with Species Records

We load the species records and cross-check them with the corrected camera trap data to ensure records are properly matched.

Code
rec <- datapaperchecks::read_sheet(
  path = "Example",
  sheet = "Species_records_camera",
  na = c("NA", "na")
) |>
  purrr::map(\(x) {
    x |>
      datapaperchecks::dttm_update(
        date_col = "Record_date",
        time_col = "Record_time"
      ) |>
      dplyr::select(-Record_time)
  })

rec_with_dupes <- rec[names(rec) %in% dataset_dup_cameras]

rec_with_dupes |>
  head(2)
$Example1
# A tibble: 3,590 × 6
   Structure_id   Camera_id Species Record_date         Record_criteria Behavior
   <chr>          <chr>     <chr>   <dttm>                        <dbl> <chr>   
 1 P1 (iguaçu)    cam1      Cavia … 2017-05-09 03:59:00              NA Dentro  
 2 P3 (varzea)    cam1      Aramid… 2017-05-01 08:37:00              NA Dentro  
 3 P3 (varzea)    cam1      Leopar… 2017-05-05 21:09:00              NA Dentro  
 4 BC2 (drenagem) cam2      Didelp… 2018-07-30 04:18:00              NA <NA>    
 5 BC2 (drenagem) cam2      Didelp… 2018-07-30 20:02:00              NA <NA>    
 6 BC2 (drenagem) cam2      Didelp… 2018-07-31 00:10:00              NA <NA>    
 7 BC2 (drenagem) cam2      Didelp… 2018-08-01 01:30:00              NA <NA>    
 8 BC2 (drenagem) cam2      Didelp… 2018-08-01 03:07:00              NA <NA>    
 9 BC2 (drenagem) cam2      Didelp… 2018-08-02 00:48:00              NA <NA>    
10 BC2 (drenagem) cam2      Didelp… 2018-08-02 01:09:00              NA <NA>    
# ℹ 3,580 more rows

$Example1
# A tibble: 3,132 × 6
   Structure_id   Camera_id Species Record_date         Record_criteria Behavior
   <chr>          <chr>     <chr>   <dttm>                        <dbl> <chr>   
 1 Ponte 9 Irmãos GB        Alouat… 2022-03-26 08:57:58              NA <NA>    
 2 Ponte 9 Irmãos GB        Alouat… 2022-03-26 09:02:04              NA <NA>    
 3 Ponte 9 Irmãos GB        Alouat… 2022-03-26 13:49:06              NA <NA>    
 4 Ponte 9 Irmãos GB        Alouat… 2022-03-26 13:50:58              NA <NA>    
 5 Ponte 9 Irmãos GB        Alouat… 2022-03-26 13:52:12              NA <NA>    
 6 Ponte 9 Irmãos GB        Coendo… 2022-03-27 00:26:52              NA <NA>    
 7 Ponte 9 Irmãos GB        Coendo… 2022-03-27 06:33:14              NA <NA>    
 8 Ponte 9 Irmãos GB        Alouat… 2022-03-27 14:51:22              NA <NA>    
 9 Ponte 9 Irmãos GB        Alouat… 2022-03-27 14:52:04              NA <NA>    
10 Ponte 9 Irmãos GB        Alouat… 2022-03-27 14:52:48              NA <NA>    
# ℹ 3,122 more rows

9.2.7 Output Generation

For each dataset with duplicated cameras, we generate Excel files with the corrected camera trap data and the matched records. We also identify records that do not fall within any sampling interval.

Code
rows_with_errors <- list()

for (dataset in dataset_dup_cameras) {
  cam <- ct_uniq[[dataset]] |>
    dplyr::mutate(code = stringr::str_glue("S{Structure_id}-C{Camera_id_orig}"))

  reg <- rec_with_dupes[[dataset]] |>
    tibble::rowid_to_column("id") |>
    dplyr::mutate(code = stringr::str_glue("S{Structure_id}-C{Camera_id}"))

  intermediate_result <- reg |>
    dplyr::full_join(
      cam,
      by = "code",
      suffix = c("_rec", ""),
      relationship = "many-to-many"
    ) |>
    dplyr::mutate(
      dplyr::across(
        dplyr::ends_with("_time"),
        ~ stringr::str_sub(., start = -8, end = -4)
      ),
      datetime_record = Record_date,
      datetime_start = lubridate::ymd_hm(paste(
        as.character(Start_date),
        tidyr::replace_na(Start_time, "00:00")
      )),
      datetime_end = lubridate::ymd_hm(paste(
        as.character(End_date),
        tidyr::replace_na(End_time, "00:00")
      )),
      belongs_to = dplyr::if_else(
        condition = datetime_record %within%
          c(datetime_start %--% datetime_end),
        Camera_id,
        "nope"
      )
    )

  intermediate_result |>
    dplyr::distinct(id, belongs_to, .keep_all = TRUE) |>
    dplyr::filter(!(dplyr::n() > 1 & belongs_to == "nope"), .by = "id") |>
    dplyr::filter(!is.na(id)) |>
    head()

  rows_with_errors[[dataset]] <- intermediate_result |>
    dplyr::filter(all(belongs_to == "nope"), .by = "id")
}

rows_with_errors |>
  dplyr::bind_rows(.id = "dataset") |>
  head()
# A tibble: 6 × 51
  dataset     id Structure_id_rec Camera_id_rec Species      Record_date        
  <chr>    <int> <chr>            <chr>         <chr>        <dttm>             
1 Example1     1 P1 (iguaçu)      cam1          Cavia sp.    2017-05-09 03:59:00
2 Example1     4 BC2 (drenagem)   cam2          Didelphis a… 2018-07-30 04:18:00
3 Example1     5 BC2 (drenagem)   cam2          Didelphis a… 2018-07-30 20:02:00
4 Example1     6 BC2 (drenagem)   cam2          Didelphis a… 2018-07-31 00:10:00
5 Example1     7 BC2 (drenagem)   cam2          Didelphis a… 2018-08-01 01:30:00
6 Example1     8 BC2 (drenagem)   cam2          Didelphis a… 2018-08-01 03:07:00
# ℹ 45 more variables: Record_criteria <dbl>, Behavior <chr>, code <glue>,
#   double <int>, Structure_id <chr>, Camera_id <chr>, Camera_id_orig <chr>,
#   Camera_position <chr>, Camera_view <chr>, Camera_model <chr>,
#   Camera_setup <chr>, Camera_vision_photo <chr>, Start_date <dttm>,
#   Start_time <chr>, End_date <dttm>, End_time <chr>, Camera_problem <chr>,
#   Problem1_from <dttm>, Problem1_to <dttm>, Problem2_from <dttm>,
#   Problem2_to <dttm>, Problem3_from <dttm>, Problem3_to <dttm>, …

9.2.8 Results Interpretation

The output files contain:

  • Corrected camera trap data with unique camera IDs per structure.
  • Matched species records with the corrected camera IDs.
  • Records that do not fall within any camera trap sampling interval.

This process ensures data consistency and helps identify potential issues with camera deployment or data entry.