This ensures that coordinates are present, in valid formats, and spatially coherent with the expected projection; it converts everything to WGS84, computes diagnostics (e.g., distance to reference features if provided), and prepares a clean table for export.
12.2 Problem Solving
We follow a pipeline: (1) read the target worksheet; (2) convert to sf objects, harmonize CRS, and compute spatial diagnostics; (3) extract final data frame with values beyond the threshold and write an Excel output for review.
12.2.1 Common steps
We use explicit namespace calls such as datapaperchecks::read_sheet, keeping the workflow reproducible without a setup chunk.
12.2.2 Specific steps
First, we read the data from the “Underpasses” and “Overpasses” sheets within the files in the “Example” folder. Using datapaperchecks::read_sheet, we load them into the under and over lists, respectively, and then preview the first few rows of each.
$Example2
# A tibble: 7 × 19
Infrastructure_type Structure_id Structure_type Structure_material
<chr> <chr> <chr> <chr>
1 Carretera CE2 Puente flexible Cable con sogas
2 Carretera CE3 Puente flexible Cable con sogas
3 Carretera CE4 Puente flexible Cable con sogas
4 Carretera CE5 Puente flexible Cable con sogas
5 Carretera CE6 Puente flixible Cable con sogas
6 Carretera CE7 Puente flexible Cable con sogas
7 Carretera CE9 Puente flexible Cable con sogas
# ℹ 15 more variables: Structure_anchor_1 <chr>, Structure_anchor_2 <chr>,
# Structure_branch_access <chr>, Structure_photo <chr>, Structure_age <dbl>,
# Structure_height <dbl>, Structure_lenght <dbl>, Structure_width <dbl>,
# Structure_internal_height <dbl>, Latitude <dbl>, Longitude <dbl>,
# Utm_zone <chr>, X_easting <dbl>, Y_northing <dbl>, Datum <chr>
This is the core of our spatial processing. We first determine the correct EPSG code using our add_epsg (see function 2.2.5) function for each record based on its datum and UTM zone. Then, processing each EPSG group separately, we convert the tabular data into sf spatial objects and reproject everything to a single, standardized system: WGS84 (EPSG: 4326).
With the data reprojected, we bind it back into a single sf data frame. We then use our set_feature_from_infrastructure function to automatically classify each structure (e.g., as ‘highway’, ‘railway’ or ‘man_made’ - see function 2.2.6), following OpenStreetMap (OSM) convention. Finally, we check for any structures that couldn’t be classified (feature column as NA), flagging them for review.
Code
epsg_uo_datum_zone_bind <- epsg_uo_datum_zone |> dplyr::bind_rows(.id ="epsg") |> dplyr::arrange(Dataset) |> datapaperchecks::set_feature_from_infrastructure()# Checking datasets that have na values on `Infrastructure_type`epsg_uo_datum_zone_bind |> sf::st_drop_geometry() |> dplyr::count(Dataset, Infrastructure_type, feature) |>print(n =Inf)
To validate the structure locations against real-world data, we use the function sec-calc-nearest-osm-dist (see function 2.2.7) to create a bounding box of the structures and a buffer around this bounding box. The function then calculates the distance from each point to its nearest corresponding feature from OSM. We loop through each feature type and dataset, using a try-catch block to gracefully handle any errors from the API or spatial computation. The function creates columns that helps to check if the distance between the structure and the closest OSM infrastructure are within the threshold we provided.
Code
nested_epsg_uo_datum_zone_bind <- epsg_uo_datum_zone_bind |>split(~feature)osm_result <-list()for (feature innames(nested_epsg_uo_datum_zone_bind)) { cli::cli_h1("Starting feature {feature}") df <- nested_epsg_uo_datum_zone_bind[[feature]] datasets <- df |> dplyr::distinct(Dataset) |> dplyr::pull(Dataset)for (dataset in datasets) { cli::cli_h3("Starting dataset {dataset}") result <-try( { nested_epsg_uo_datum_zone_bind[[feature]] |> dplyr::filter(Dataset == dataset) |> datapaperchecks::calc_nearest_osm_dist(feature = feature) },silent =TRUE )if (inherits(result, "try-error")) { clean_message <- base::conditionMessage(attr(result, "condition"))# Now, the cli alert will work safely cli::cli_alert_danger("Error in dataset {dataset}: {clean_message}")# Save the clean message to the results osm_result[[feature]][[dataset]] <- clean_messagenext } else { osm_result[[feature]][[dataset]] <- result } } cli::cli_alert_success("Finishing feature {feature}")}
For the Example9, we receive the error “→ There are no features within the buffer”, which means that no OSM feature was found within the buffer of the bounding box created. This is an issue that has to be asked to the researcher.
In the final step, we prepare the output. We gather all the results, filter for structures flagged as potential outliers (out_thresh == TRUE), extract the final latitude and longitude into clean columns, and export the resulting table.