11  Check Coordinates Parameters

11.1 Problem Description

This script ensures coordinates parameters are present, such as UTM zones and datum.

11.2 Problem Solving

We follow a pipeline: (1) load helper functions and read the target worksheet; (2) run basic completeness and format checks for coordinates (decimal degrees vs. UTM); (3) convert to sf objects, harmonize CRS, and compute spatial diagnostics; (4) extract final latitude/longitude and write an Excel output for review.

11.2.1 Common steps

To begin, we source our external FUNCTIONS.R script. This makes our custom helper functions, like read_sheet, available for use throughout the analysis, ensuring our workflow is reproducible.

Code
source("R/FUNCTIONS.R")

11.2.2 Specific steps

First, we read the data from the “Underpasses” and “Overpasses” sheets within the files in the “Example” folder. Using our custom read_sheet function, we load them into the under and over lists, respectively, and then preview the first few rows of each.

Code
# Read sheets
under <- read_sheet(path = "Example", sheet = "Underpasses", na = c("NA", "na"))

over <- read_sheet(path = "Example", sheet = "Overpasses", na = c("NA", "na"))

head(under)[1]
$Example1
# A tibble: 14 × 17
   Infrastructure_type Structure_id     Structure_type            Structure_cell
   <chr>               <chr>            <chr>                     <chr>         
 1 Rodovia             P1 (iguaçu)      Ponte                     <NA>          
 2 Rodovia             <NA>             Bueiro de concreto        <NA>          
 3 Rodovia             P2 (mauricio)    Ponte                     <NA>          
 4 Rodovia             BC2 (drenagem)   Bueiro de concreto        <NA>          
 5 Rodovia             BCS1             Bueiro com plataforma se… <NA>          
 6 Rodovia             P3 (varzea)      Ponte                     <NA>          
 7 Rodovia             P4 (fazenda)     Ponte                     <NA>          
 8 Rodovia             P5 (sapezal)     Ponte                     <NA>          
 9 Rodovia             P6 (passa três)  Ponte                     <NA>          
10 Rodovia             P7 (lourenço)    Ponte                     <NA>          
11 Rodovia             P8 (das pedras)  Ponte                     <NA>          
12 Rodovia             BCS2             Bueiro com plataforma se… <NA>          
13 Rodovia             P9 (cachorros)   Ponte                     <NA>          
14 Rodovia             P10 (amola faca) Ponte                     <NA>          
# ℹ 13 more variables: Structure_shape <chr>, Structure_photo <chr>,
#   Structure_age <dbl>, Structure_height <dbl>, Structure_length <dbl>,
#   Structure_width <dbl>, Waterbody_width <dbl>, Latitude <dbl>,
#   Longitude <dbl>, Utm_zone <chr>, X_easting <dbl>, Y_northing <dbl>,
#   Datum <chr>
Code
head(over)[1]
$Example1
# A tibble: 1 × 19
  Infrastructure_type Structure_id Structure_type           Structure_material
  <chr>               <chr>        <chr>                    <chr>             
1 Rodovia             PAerea       Ponte de dossel flexível Corda             
# ℹ 15 more variables: Structure_anchor_1 <chr>, Structure_anchor_2 <chr>,
#   Structure_branch_access <chr>, Structure_photo <chr>, Structure_age <dbl>,
#   Structure_height <dbl>, Structure_length <dbl>, Structure_width <dbl>,
#   Structure_internal_height <dbl>, Latitude <dbl>, Longitude <dbl>,
#   Utm_zone <chr>, X_easting <dbl>, Y_northing <dbl>, Datum <chr>

Next, we perform a data validation check to find potential data entry errors. We iterate through the lists to identify any records that incorrectly contain both decimal degree coordinates (Latitude) and UTM projection information (Utm_zone). The names of any datasets with these conflicting entries are then printed.

Code
# Checking if there are datasets filled with decimal degrees AND UTM Zone
under |>
  purrr::map(~ .x |> dplyr::filter(!is.na(Latitude), !is.na(Utm_zone))) |>
  purrr::discard(~ nrow(.x) == 0) |>
  names()
[1] "Example4" "Example4"
Code
over |>
  purrr::map(~ .x |> dplyr::filter(!is.na(Latitude), !is.na(Utm_zone))) |>
  purrr::discard(~ nrow(.x) == 0) |>
  names()
[1] "Example5" "Example5"

Continuing our validation, we look for the inverse problem: records that provide UTM coordinates (X_easting) but are missing the essential Utm_zone information. We list any datasets containing these incomplete records, as they cannot be projected correctly without the zone.

Code
# Checking if there are datasets filled UTM and did not filled UTM Zone
under |>
  purrr::map(~ .x |> dplyr::filter(!is.na(X_easting), is.na(Utm_zone))) |>
  purrr::discard(~ nrow(.x) == 0) |>
  names()
character(0)
Code
over |>
  purrr::map(~ .x |> dplyr::filter(!is.na(X_easting), is.na(Utm_zone))) |>
  purrr::discard(~ nrow(.x) == 0) |>
  names()
[1] "Example6" "Example6"

To make the data easier to work with, we combine the elements from the under and over lists into two single data frames. During this step, we add a Dataset column to preserve the data’s origin and a Position column ("Under" or "Over") to classify each structure.

Code
under_df <- under |>
  dplyr::bind_rows(.id = "Dataset") |>
  dplyr::mutate(Position = "Under") |>
  dplyr::select(1:3, Position, Utm_zone, Datum)

over_df <- over |>
  dplyr::bind_rows(.id = "Dataset") |>
  dplyr::mutate(Position = "Over") |>
  dplyr::select(1:3, Position, Utm_zone, Datum)

Here, we validate the format of the Utm_zone column itself. We filter for any entries that don’t follow the standard pattern of two digits and a letter (e.g., “22S”), then count and display these improperly formatted values to be corrected.

Code
# Finding problems in Utm_zone
under_df |>
  dplyr::bind_rows(over_df) |>
  dplyr::filter(!is.na(Utm_zone)) |>
  dplyr::filter(!stringr::str_detect(Utm_zone, "^\\d{2}[A-Za-z]$")) |>
  dplyr::count(Dataset, Position, Utm_zone) |>
  print(n = Inf)
# A tibble: 1 × 4
  Dataset  Position Utm_zone     n
  <chr>    <chr>    <chr>    <int>
1 Example3 Under    21          12

To understand the coordinate systems we’re dealing with, we inventory all the datums present in the data. By counting the occurrences of each Datum and Utm_zone combination, we can spot any inconsistencies that need to be harmonized before reprojection.

Code
# Finding problems in Datum
under_df |>
  dplyr::bind_rows(over_df) |>
  dplyr::count(Dataset, Position, Datum, Utm_zone) |>
  dplyr::arrange(Datum, Dataset) |>
  print(n = Inf)
# A tibble: 20 × 5
   Dataset   Position Datum      Utm_zone     n
   <chr>     <chr>    <chr>      <chr>    <int>
 1 Example3  Under    SIRGAS2000 20S        144
 2 Example3  Under    SIRGAS2000 21          12
 3 Example5  Over     SIRGAS2000 21J          4
 4 Example7  Under    SIRGAS2000 22K         32
 5 Example2  Over     WGS 84     <NA>        21
 6 Example1  Over     WGS84      <NA>         7
 7 Example1  Under    WGS84      <NA>        32
 8 Example11 Under    WGS84      <NA>         8
 9 Example12 Under    WGS84      <NA>        22
10 Example13 Over     WGS84      <NA>         8
11 Example2  Over     WGS84      <NA>        14
12 Example4  Under    WGS84      16J          4
13 Example4  Under    WGS84      <NA>        30
14 Example6  Over     WGS84      24L          9
15 Example6  Over     WGS84      <NA>         6
16 Example8  Over     WGS84      <NA>         9
17 Example8  Under    WGS84      <NA>         6
18 Example9  Over     WGS84      18S         25
19 Example1  Over     <NA>       <NA>         2
20 Example1  Under    <NA>       <NA>         4