12  Check Structure Location

12.1 Problem Description

This ensures that coordinates are present, in valid formats, and spatially coherent with the expected projection; it converts everything to WGS84, computes diagnostics (e.g., distance to reference features if provided), and prepares a clean table for export.

12.2 Problem Solving

We follow a pipeline: (1) read the target worksheet; (2) convert to sf objects, harmonize CRS, and compute spatial diagnostics; (3) extract final data frame with values beyond the threshold and write an Excel output for review.

12.2.1 Common steps

We use explicit namespace calls such as datapaperchecks::read_sheet, keeping the workflow reproducible without a setup chunk.

12.2.2 Specific steps

First, we read the data from the “Underpasses” and “Overpasses” sheets within the files in the “Example” folder. Using datapaperchecks::read_sheet, we load them into the under and over lists, respectively, and then preview the first few rows of each.

Code
# Read sheets
under <- datapaperchecks::read_sheet(
  path = "Example/10",
  sheet = "Underpasses",
  na = c("NA", "na")
)

over <- datapaperchecks::read_sheet(
  path = "Example/10",
  sheet = "Overpasses",
  na = c("NA", "na")
)

head(under)[1]
$Example2
# A tibble: 0 × 17
# ℹ 17 variables: Infrastructure_type <chr>, Structure_id <chr>,
#   Structure_type <chr>, Structure_cell <chr>, Structure_shape <chr>,
#   Structure_photo <chr>, Structure_age <dbl>, Structure_height <dbl>,
#   Structure_length <dbl>, Structure_width <dbl>, Waterbody_width <dbl>,
#   Latitude <dbl>, Longitude <dbl>, Utm_zone <chr>, X_easting <dbl>,
#   Y_northing <dbl>, Datum <chr>
Code
head(over)[1]
$Example2
# A tibble: 7 × 19
  Infrastructure_type Structure_id Structure_type  Structure_material
  <chr>               <chr>        <chr>           <chr>             
1 Carretera           CE2          Puente flexible Cable con sogas   
2 Carretera           CE3          Puente flexible Cable con sogas   
3 Carretera           CE4          Puente flexible Cable con sogas   
4 Carretera           CE5          Puente flexible Cable con sogas   
5 Carretera           CE6          Puente flixible Cable con sogas   
6 Carretera           CE7          Puente flexible Cable con sogas   
7 Carretera           CE9          Puente flexible Cable con sogas   
# ℹ 15 more variables: Structure_anchor_1 <chr>, Structure_anchor_2 <chr>,
#   Structure_branch_access <chr>, Structure_photo <chr>, Structure_age <dbl>,
#   Structure_height <dbl>, Structure_lenght <dbl>, Structure_width <dbl>,
#   Structure_internal_height <dbl>, Latitude <dbl>, Longitude <dbl>,
#   Utm_zone <chr>, X_easting <dbl>, Y_northing <dbl>, Datum <chr>

This is the core of our spatial processing. We first determine the correct EPSG code using our add_epsg (see function 2.2.5) function for each record based on its datum and UTM zone. Then, processing each EPSG group separately, we convert the tabular data into sf spatial objects and reproject everything to a single, standardized system: WGS84 (EPSG: 4326).

Code
# Distinct of different Datum + Utm_zone
uo_datum_zone <- dplyr::bind_rows(under, .id = "Dataset") |>
  dplyr::bind_rows(dplyr::bind_rows(over, .id = "Dataset")) |>
  dplyr::filter(!is.na(Datum)) |>
  dplyr::mutate(
    type = dplyr::if_else(!is.na(Utm_zone), "Projected", "Geodetic"),
    zone = stringr::str_sub(Utm_zone, 1, 2),
    hemis = dplyr::case_when(
      stringr::str_sub(Utm_zone, 3, 3) == "S" ~ "S",
      stringr::str_sub(Utm_zone, 3, 3) == "N" ~ "N",
      stringr::str_sub(Utm_zone, 3, 3) >= "N" ~ "N",
      stringr::str_sub(Utm_zone, 3, 3) < "N" ~ "S"
    )
  ) |>
  datapaperchecks::add_epsg()

nested_uo_datum_zone <- uo_datum_zone |>
  split(~epsg)

epsg_uo_datum_zone <- list()

for (epsg in names(nested_uo_datum_zone)) {
  cli::cli_h1("Starting epsg {epsg}")

  epsg_uo_datum_zone[[epsg]] <- nested_uo_datum_zone[[epsg]] |>
    dplyr::mutate(
      X = dplyr::case_when(
        !is.na(Longitude) ~ Longitude,
        TRUE ~ as.numeric(X_easting)
      ),
      Y = dplyr::case_when(
        !is.na(Latitude) ~ Latitude,
        TRUE ~ as.numeric(Y_northing)
      )
    ) |>
    sf::st_as_sf(
      coords = c("X", "Y"),
      remove = FALSE,
      crs = as.numeric(epsg)
    ) |>
    sf::st_transform(4326) |>
    dplyr::mutate(
      Lat = sf::st_coordinates(geometry)[, 2],
      Long = sf::st_coordinates(geometry)[, 1],
      New_Datum = "WGS 84",
      New_EPSG = 4326L
    )
}
── Starting epsg 4326 ──────────────────────────────────────────────────────────
── Starting epsg 31980 ─────────────────────────────────────────────────────────
── Starting epsg 31981 ─────────────────────────────────────────────────────────
── Starting epsg 32718 ─────────────────────────────────────────────────────────
── Starting epsg 32724 ─────────────────────────────────────────────────────────

With the data reprojected, we bind it back into a single sf data frame. We then use our set_feature_from_infrastructure function to automatically classify each structure (e.g., as ‘highway’, ‘railway’ or ‘man_made’ - see function 2.2.6), following OpenStreetMap (OSM) convention. Finally, we check for any structures that couldn’t be classified (feature column as NA), flagging them for review.

Code
epsg_uo_datum_zone_bind <- epsg_uo_datum_zone |>
  dplyr::bind_rows(.id = "epsg") |>
  dplyr::arrange(Dataset) |>
  datapaperchecks::set_feature_from_infrastructure()

# Checking datasets that have na values on `Infrastructure_type`
epsg_uo_datum_zone_bind |>
  sf::st_drop_geometry() |>
  dplyr::count(Dataset, Infrastructure_type, feature) |>
  print(n = Inf)
# A tibble: 5 × 4
  Dataset  Infrastructure_type feature      n
  <chr>    <chr>               <chr>    <int>
1 Example3 Rodovia             highway     52
2 Example6 Gasoduto            man_made     5
3 Example8 Carretera           highway      4
4 Example8 Ferrovía            railway      1
5 Example9 Ducto               man_made    25

To validate the structure locations against real-world data, we use the function sec-calc-nearest-osm-dist (see function 2.2.7) to create a bounding box of the structures and a buffer around this bounding box. The function then calculates the distance from each point to its nearest corresponding feature from OSM. We loop through each feature type and dataset, using a try-catch block to gracefully handle any errors from the API or spatial computation. The function creates columns that helps to check if the distance between the structure and the closest OSM infrastructure are within the threshold we provided.

Code
nested_epsg_uo_datum_zone_bind <- epsg_uo_datum_zone_bind |>
  split(~feature)

osm_result <- list()

for (feature in names(nested_epsg_uo_datum_zone_bind)) {
  cli::cli_h1("Starting feature {feature}")

  df <- nested_epsg_uo_datum_zone_bind[[feature]]

  datasets <- df |>
    dplyr::distinct(Dataset) |>
    dplyr::pull(Dataset)

  for (dataset in datasets) {
    cli::cli_h3("Starting dataset {dataset}")

    result <- try(
      {
        nested_epsg_uo_datum_zone_bind[[feature]] |>
          dplyr::filter(Dataset == dataset) |>
          datapaperchecks::calc_nearest_osm_dist(feature = feature)
      },
      silent = TRUE
    )

    if (inherits(result, "try-error")) {
      clean_message <- base::conditionMessage(attr(result, "condition"))

      # Now, the cli alert will work safely
      cli::cli_alert_danger("Error in dataset {dataset}: {clean_message}")

      # Save the clean message to the results
      osm_result[[feature]][[dataset]] <- clean_message
      next
    } else {
      osm_result[[feature]][[dataset]] <- result
    }
  }
  cli::cli_alert_success("Finishing feature {feature}")
}
── Starting feature highway ────────────────────────────────────────────────────
── Starting dataset Example3 
✖ Error in dataset Example3: object 'doc' not found
── Starting dataset Example8 
✖ Error in dataset Example8: object 'doc' not found
✔ Finishing feature highway
── Starting feature man_made ───────────────────────────────────────────────────
── Starting dataset Example6 
── Starting dataset Example9 
✖ Error in dataset Example9: object 'doc' not found
✔ Finishing feature man_made
── Starting feature railway ────────────────────────────────────────────────────
── Starting dataset Example8 
✔ Finishing feature railway

For the Example9, we receive the error “→ There are no features within the buffer”, which means that no OSM feature was found within the buffer of the bounding box created. This is an issue that has to be asked to the researcher.

In the final step, we prepare the output. We gather all the results, filter for structures flagged as potential outliers (out_thresh == TRUE), extract the final latitude and longitude into clean columns, and export the resulting table.

Code
tab_final <- osm_result |>
  purrr::list_flatten(name_spec = "{inner}") |>
  purrr::map_dfr("data") |>
  dplyr::relocate(feature_type, .after = feature) |>
  dplyr::filter(out_thresh == TRUE) |>
  dplyr::arrange(desc(distance_to)) |>
  dplyr::mutate(
    latitude = sf::st_coordinates(geometry)[, 2],
    longitude = sf::st_coordinates(geometry)[, 1]
  ) |>
  sf::st_drop_geometry()

tab_final
# A tibble: 0 × 11
# ℹ 11 variables: Dataset <chr>, Infrastructure_type <chr>, Structure_id <chr>,
#   id_osm <chr>, name <chr>, feature <chr>, feature_type <chr>,
#   distance_to <dbl>, out_thresh <lgl>, latitude <lgl>, longitude <lgl>