This script ensures coordinates parameters are present, such as UTM zones and datum.
11.2 Problem Solving
We follow a pipeline: (1) read the target worksheet; (2) run basic completeness and format checks for coordinates (decimal degrees vs. UTM); (3) convert to sf objects, harmonize CRS, and compute spatial diagnostics; (4) extract final latitude/longitude and write an Excel output for review.
11.2.1 Common steps
We use explicit namespace calls such as datapaperchecks::read_sheet, keeping the workflow reproducible without a setup chunk.
11.2.2 Specific steps
First, we read the data from the “Underpasses” and “Overpasses” sheets within the files in the “Example” folder. Using datapaperchecks::read_sheet, we load them into the under and over lists, respectively, and then preview the first few rows of each.
Code
# Read sheetsunder <- datapaperchecks::read_sheet(path ="Example", sheet ="Underpasses", na =c("NA", "na"))over <- datapaperchecks::read_sheet(path ="Example", sheet ="Overpasses", na =c("NA", "na"))head(under)[1]
Next, we perform a data validation check to find potential data entry errors. We iterate through the lists to identify any records that incorrectly contain both decimal degree coordinates (Latitude) and UTM projection information (Utm_zone). The names of any datasets with these conflicting entries are then printed.
Code
# Checking if there are datasets filled with decimal degrees AND UTM Zoneunder |> purrr::map(~ .x |> dplyr::filter(!is.na(Latitude), !is.na(Utm_zone))) |> purrr::discard(~nrow(.x) ==0) |>names()
Continuing our validation, we look for the inverse problem: records that provide UTM coordinates (X_easting) but are missing the essential Utm_zone information. We list any datasets containing these incomplete records, as they cannot be projected correctly without the zone.
Code
# Checking if there are datasets filled UTM and did not filled UTM Zoneunder |> purrr::map(~ .x |> dplyr::filter(!is.na(X_easting), is.na(Utm_zone))) |> purrr::discard(~nrow(.x) ==0) |>names()
To make the data easier to work with, we combine the elements from the under and over lists into two single data frames. During this step, we add a Dataset column to preserve the data’s origin and a Position column ("Under" or "Over") to classify each structure.
Here, we validate the format of the Utm_zone column itself. We filter for any entries that don’t follow the standard pattern of two digits and a letter (e.g., “22S”), then count and display these improperly formatted values to be corrected.
# A tibble: 1 × 4
Dataset Position Utm_zone n
<chr> <chr> <chr> <int>
1 Example3 Under 21 12
To understand the coordinate systems we’re dealing with, we inventory all the datums present in the data. By counting the occurrences of each Datum and Utm_zone combination, we can spot any inconsistencies that need to be harmonized before reprojection.