Code
source("R/FUNCTIONS.R")This script ensures coordinates parameters are present, such as UTM zones and datum.
We follow a pipeline: (1) load helper functions and read the target worksheet; (2) run basic completeness and format checks for coordinates (decimal degrees vs. UTM); (3) convert to sf objects, harmonize CRS, and compute spatial diagnostics; (4) extract final latitude/longitude and write an Excel output for review.
To begin, we source our external FUNCTIONS.R script. This makes our custom helper functions, like read_sheet, available for use throughout the analysis, ensuring our workflow is reproducible.
source("R/FUNCTIONS.R")First, we read the data from the “Underpasses” and “Overpasses” sheets within the files in the “Example” folder. Using our custom read_sheet function, we load them into the under and over lists, respectively, and then preview the first few rows of each.
# Read sheets
under <- read_sheet(path = "Example", sheet = "Underpasses", na = c("NA", "na"))
over <- read_sheet(path = "Example", sheet = "Overpasses", na = c("NA", "na"))
head(under)[1]$Example1
# A tibble: 14 × 17
Infrastructure_type Structure_id Structure_type Structure_cell
<chr> <chr> <chr> <chr>
1 Rodovia P1 (iguaçu) Ponte <NA>
2 Rodovia <NA> Bueiro de concreto <NA>
3 Rodovia P2 (mauricio) Ponte <NA>
4 Rodovia BC2 (drenagem) Bueiro de concreto <NA>
5 Rodovia BCS1 Bueiro com plataforma se… <NA>
6 Rodovia P3 (varzea) Ponte <NA>
7 Rodovia P4 (fazenda) Ponte <NA>
8 Rodovia P5 (sapezal) Ponte <NA>
9 Rodovia P6 (passa três) Ponte <NA>
10 Rodovia P7 (lourenço) Ponte <NA>
11 Rodovia P8 (das pedras) Ponte <NA>
12 Rodovia BCS2 Bueiro com plataforma se… <NA>
13 Rodovia P9 (cachorros) Ponte <NA>
14 Rodovia P10 (amola faca) Ponte <NA>
# ℹ 13 more variables: Structure_shape <chr>, Structure_photo <chr>,
# Structure_age <dbl>, Structure_height <dbl>, Structure_length <dbl>,
# Structure_width <dbl>, Waterbody_width <dbl>, Latitude <dbl>,
# Longitude <dbl>, Utm_zone <chr>, X_easting <dbl>, Y_northing <dbl>,
# Datum <chr>
head(over)[1]$Example1
# A tibble: 1 × 19
Infrastructure_type Structure_id Structure_type Structure_material
<chr> <chr> <chr> <chr>
1 Rodovia PAerea Ponte de dossel flexível Corda
# ℹ 15 more variables: Structure_anchor_1 <chr>, Structure_anchor_2 <chr>,
# Structure_branch_access <chr>, Structure_photo <chr>, Structure_age <dbl>,
# Structure_height <dbl>, Structure_length <dbl>, Structure_width <dbl>,
# Structure_internal_height <dbl>, Latitude <dbl>, Longitude <dbl>,
# Utm_zone <chr>, X_easting <dbl>, Y_northing <dbl>, Datum <chr>
Next, we perform a data validation check to find potential data entry errors. We iterate through the lists to identify any records that incorrectly contain both decimal degree coordinates (Latitude) and UTM projection information (Utm_zone). The names of any datasets with these conflicting entries are then printed.
# Checking if there are datasets filled with decimal degrees AND UTM Zone
under |>
purrr::map(~ .x |> dplyr::filter(!is.na(Latitude), !is.na(Utm_zone))) |>
purrr::discard(~ nrow(.x) == 0) |>
names()[1] "Example4" "Example4"
over |>
purrr::map(~ .x |> dplyr::filter(!is.na(Latitude), !is.na(Utm_zone))) |>
purrr::discard(~ nrow(.x) == 0) |>
names()[1] "Example5" "Example5"
Continuing our validation, we look for the inverse problem: records that provide UTM coordinates (X_easting) but are missing the essential Utm_zone information. We list any datasets containing these incomplete records, as they cannot be projected correctly without the zone.
# Checking if there are datasets filled UTM and did not filled UTM Zone
under |>
purrr::map(~ .x |> dplyr::filter(!is.na(X_easting), is.na(Utm_zone))) |>
purrr::discard(~ nrow(.x) == 0) |>
names()character(0)
over |>
purrr::map(~ .x |> dplyr::filter(!is.na(X_easting), is.na(Utm_zone))) |>
purrr::discard(~ nrow(.x) == 0) |>
names()[1] "Example6" "Example6"
To make the data easier to work with, we combine the elements from the under and over lists into two single data frames. During this step, we add a Dataset column to preserve the data’s origin and a Position column ("Under" or "Over") to classify each structure.
under_df <- under |>
dplyr::bind_rows(.id = "Dataset") |>
dplyr::mutate(Position = "Under") |>
dplyr::select(1:3, Position, Utm_zone, Datum)
over_df <- over |>
dplyr::bind_rows(.id = "Dataset") |>
dplyr::mutate(Position = "Over") |>
dplyr::select(1:3, Position, Utm_zone, Datum)Here, we validate the format of the Utm_zone column itself. We filter for any entries that don’t follow the standard pattern of two digits and a letter (e.g., “22S”), then count and display these improperly formatted values to be corrected.
# Finding problems in Utm_zone
under_df |>
dplyr::bind_rows(over_df) |>
dplyr::filter(!is.na(Utm_zone)) |>
dplyr::filter(!stringr::str_detect(Utm_zone, "^\\d{2}[A-Za-z]$")) |>
dplyr::count(Dataset, Position, Utm_zone) |>
print(n = Inf)# A tibble: 1 × 4
Dataset Position Utm_zone n
<chr> <chr> <chr> <int>
1 Example3 Under 21 12
To understand the coordinate systems we’re dealing with, we inventory all the datums present in the data. By counting the occurrences of each Datum and Utm_zone combination, we can spot any inconsistencies that need to be harmonized before reprojection.
# Finding problems in Datum
under_df |>
dplyr::bind_rows(over_df) |>
dplyr::count(Dataset, Position, Datum, Utm_zone) |>
dplyr::arrange(Datum, Dataset) |>
print(n = Inf)# A tibble: 20 × 5
Dataset Position Datum Utm_zone n
<chr> <chr> <chr> <chr> <int>
1 Example3 Under SIRGAS2000 20S 144
2 Example3 Under SIRGAS2000 21 12
3 Example5 Over SIRGAS2000 21J 4
4 Example7 Under SIRGAS2000 22K 32
5 Example2 Over WGS 84 <NA> 21
6 Example1 Over WGS84 <NA> 7
7 Example1 Under WGS84 <NA> 32
8 Example11 Under WGS84 <NA> 8
9 Example12 Under WGS84 <NA> 22
10 Example13 Over WGS84 <NA> 8
11 Example2 Over WGS84 <NA> 14
12 Example4 Under WGS84 16J 4
13 Example4 Under WGS84 <NA> 30
14 Example6 Over WGS84 24L 9
15 Example6 Over WGS84 <NA> 6
16 Example8 Over WGS84 <NA> 9
17 Example8 Under WGS84 <NA> 6
18 Example9 Over WGS84 18S 25
19 Example1 Over <NA> <NA> 2
20 Example1 Under <NA> <NA> 4