12 Check Structure Location

12.1 Problem Description

This ensures that coordinates are present, in valid formats, and spatially coherent with the expected projection; it converts everything to WGS84, computes diagnostics (e.g., distance to reference features if provided), and prepares a clean table for export.

12.2 Problem Solving

We follow a pipeline: (1) load helper functions and read the target worksheet; (2) convert to sf objects, harmonize CRS, and compute spatial diagnostics; (3) extract final data frame with values beyond the threshold and write an Excel output for review.

12.2.1 Common steps

To begin, we source our external FUNCTIONS.R script. This makes our custom helper functions, like read_sheet, available for use throughout the analysis, ensuring our workflow is reproducible.

Code

source("R/FUNCTIONS.R")

12.2.2 Specific steps

First, we read the data from the “Underpasses” and “Overpasses” sheets within the files in the “Example” folder. Using our custom read_sheet function, we load them into the under and over lists, respectively, and then preview the first few rows of each.

Code

# Read sheets
under <- read_sheet(
  path = "Example/10",
  sheet = "Underpasses",
  na = c("NA", "na")
)

over <- read_sheet(
  path = "Example/10",
  sheet = "Overpasses",
  na = c("NA", "na")
)

head(under)[1]

$Example2
# A tibble: 0 × 17
# ℹ 17 variables: Infrastructure_type <chr>, Structure_id <chr>,
#   Structure_type <chr>, Structure_cell <chr>, Structure_shape <chr>,
#   Structure_photo <chr>, Structure_age <dbl>, Structure_height <dbl>,
#   Structure_length <dbl>, Structure_width <dbl>, Waterbody_width <dbl>,
#   Latitude <dbl>, Longitude <dbl>, Utm_zone <chr>, X_easting <dbl>,
#   Y_northing <dbl>, Datum <chr>

Code

head(over)[1]

$Example2
# A tibble: 7 × 19
  Infrastructure_type Structure_id Structure_type  Structure_material
  <chr>               <chr>        <chr>           <chr>             
1 Carretera           CE2          Puente flexible Cable con sogas   
2 Carretera           CE3          Puente flexible Cable con sogas   
3 Carretera           CE4          Puente flexible Cable con sogas   
4 Carretera           CE5          Puente flexible Cable con sogas   
5 Carretera           CE6          Puente flixible Cable con sogas   
6 Carretera           CE7          Puente flexible Cable con sogas   
7 Carretera           CE9          Puente flexible Cable con sogas   
# ℹ 15 more variables: Structure_anchor_1 <chr>, Structure_anchor_2 <chr>,
#   Structure_branch_access <chr>, Structure_photo <chr>, Structure_age <dbl>,
#   Structure_height <dbl>, Structure_lenght <dbl>, Structure_width <dbl>,
#   Structure_internal_height <dbl>, Latitude <dbl>, Longitude <dbl>,
#   Utm_zone <chr>, X_easting <dbl>, Y_northing <dbl>, Datum <chr>

This is the core of our spatial processing. We first determine the correct EPSG code using our add_epsg (see function 2.2.4) function for each record based on its datum and UTM zone. Then, processing each EPSG group separately, we convert the tabular data into sf spatial objects and reproject everything to a single, standardized system: WGS84 (EPSG: 4326).

Code

# Distinct of different Datum + Utm_zone
uo_datum_zone <- dplyr::bind_rows(under, .id = "Dataset") |>
  dplyr::bind_rows(dplyr::bind_rows(over, .id = "Dataset")) |>
  dplyr::filter(!is.na(Datum)) |>
  dplyr::mutate(
    type = dplyr::if_else(!is.na(Utm_zone), "Projected", "Geodetic"),
    zone = stringr::str_sub(Utm_zone, 1, 2),
    hemis = dplyr::case_when(
      stringr::str_sub(Utm_zone, 3, 3) == "S" ~ "S",
      stringr::str_sub(Utm_zone, 3, 3) == "N" ~ "N",
      stringr::str_sub(Utm_zone, 3, 3) >= "N" ~ "N",
      stringr::str_sub(Utm_zone, 3, 3) < "N" ~ "S"
    )
  ) |>
  add_epsg()

nested_uo_datum_zone <- uo_datum_zone |>
  split(~epsg)

epsg_uo_datum_zone <- list()

for (epsg in names(nested_uo_datum_zone)) {
  cli::cli_h1("Starting epsg {epsg}")

  epsg_uo_datum_zone[[epsg]] <- nested_uo_datum_zone[[epsg]] |>
    dplyr::mutate(
      X = dplyr::case_when(
        !is.na(Longitude) ~ Longitude,
        TRUE ~ as.numeric(X_easting)
      ),
      Y = dplyr::case_when(
        !is.na(Latitude) ~ Latitude,
        TRUE ~ as.numeric(Y_northing)
      )
    ) |>
    sf::st_as_sf(
      coords = c("X", "Y"),
      remove = FALSE,
      crs = as.numeric(epsg)
    ) |>
    sf::st_transform(4326) |>
    dplyr::mutate(
      Lat = sf::st_coordinates(geometry)[, 2],
      Long = sf::st_coordinates(geometry)[, 1],
      New_Datum = "WGS 84",
      New_EPSG = 4326L
    )
}

── Starting epsg 4326 ──────────────────────────────────────────────────────────

── Starting epsg 31980 ─────────────────────────────────────────────────────────

── Starting epsg 31981 ─────────────────────────────────────────────────────────

── Starting epsg 32718 ─────────────────────────────────────────────────────────

── Starting epsg 32724 ─────────────────────────────────────────────────────────

With the data reprojected, we bind it back into a single sf data frame. We then use our set_feature_from_infrastructure function to automatically classify each structure (e.g., as ‘highway’, ‘railway’ or ‘man_made’ - see function 2.2.5), following OpenStreetMap (OSM) convention. Finally, we check for any structures that couldn’t be classified (feature column as NA), flagging them for review.

Code

epsg_uo_datum_zone_bind <- epsg_uo_datum_zone |>
  dplyr::bind_rows(.id = "epsg") |>
  dplyr::arrange(Dataset) |>
  set_feature_from_infrastructure()

# Checking datasets that have na values on `Infrastructure_type`
epsg_uo_datum_zone_bind |>
  sf::st_drop_geometry() |>
  dplyr::count(Dataset, Infrastructure_type, feature) |>
  print(n = Inf)

# A tibble: 5 × 4
  Dataset  Infrastructure_type feature      n
  <chr>    <chr>               <chr>    <int>
1 Example3 Rodovia             highway     52
2 Example6 Gasoduto            man_made     5
3 Example8 Carretera           highway      4
4 Example8 Ferrovía            railway      1
5 Example9 Ducto               man_made    25

To validate the structure locations against real-world data, we use the function sec-calc-nearest-osm-dist (see function 2.2.6) to create a bounding box of the structures and a buffer around this bounding box. The function then calculates the distance from each point to its nearest corresponding feature from OSM. We loop through each feature type and dataset, using a try-catch block to gracefully handle any errors from the API or spatial computation. The function creates columns that helps to check if the distance between the structure and the closest OSM infrastructure are within the threshold we provided.

Code

nested_epsg_uo_datum_zone_bind <- epsg_uo_datum_zone_bind |>
  split(~feature)

osm_result <- list()

for (feature in names(nested_epsg_uo_datum_zone_bind)) {
  cli::cli_h1("Starting feature {feature}")

  df <- nested_epsg_uo_datum_zone_bind[[feature]]

  datasets <- df |>
    dplyr::distinct(Dataset) |>
    dplyr::pull(Dataset)

  for (dataset in datasets) {
    cli::cli_h3("Starting dataset {dataset}")

    result <- try(
      {
        nested_epsg_uo_datum_zone_bind[[feature]] |>
          dplyr::filter(Dataset == dataset) |>
          calc_nearest_osm_dist(feature = feature)
      },
      silent = TRUE
    )

    if (inherits(result, "try-error")) {
      clean_message <- base::conditionMessage(attr(result, "condition"))

      # Now, the cli alert will work safely
      cli::cli_alert_danger("Error in dataset {dataset}: {clean_message}")

      # Save the clean message to the results
      osm_result[[feature]][[dataset]] <- clean_message
      next
    } else {
      osm_result[[feature]][[dataset]] <- result
    }
  }
  cli::cli_alert_success("Finishing feature {feature}")
}

── Starting feature highway ────────────────────────────────────────────────────

── Starting dataset Example3

── Starting dataset Example8

✔ Finishing feature highway

── Starting feature man_made ───────────────────────────────────────────────────

── Starting dataset Example6

── Starting dataset Example9

→ There are no features within the buffer

✖ Error in dataset Example9: is.data.frame(df) is not TRUE

✔ Finishing feature man_made

── Starting feature railway ────────────────────────────────────────────────────

── Starting dataset Example8

✔ Finishing feature railway

For the Example9, we receive the error “→ There are no features within the buffer”, which means that no OSM feature was found within the buffer of the bounding box created. This is an issue that has to be asked to the researcher.

In the final step, we prepare the output. We gather all the results, filter for structures flagged as potential outliers (out_thresh == TRUE), extract the final latitude and longitude into clean columns, and export the resulting table.

Code

tab_final <- osm_result |>
  purrr::list_flatten(name_spec = "{inner}") |>
  purrr::map_dfr("data") |>
  dplyr::relocate(feature_type, .after = feature) |>
  dplyr::filter(out_thresh == TRUE) |>
  dplyr::arrange(desc(distance_to)) |>
  dplyr::mutate(
    latitude = sf::st_coordinates(geometry)[, 2],
    longitude = sf::st_coordinates(geometry)[, 1]
  ) |>
  sf::st_drop_geometry()

tab_final

# A tibble: 7 × 13
  Dataset  Infrastructure_type Structure_id     id_osm    name    source feature
* <chr>    <chr>               <chr>            <chr>     <chr>   <chr>  <chr>  
1 Example3 Rodovia             OAC55A           787212494 Estrad… IBGE   highway
2 Example3 Rodovia             OAC72            397785888 Estrad… IBGE   highway
3 Example8 Carretera           PaAC01           26427103  Acceso… <NA>   highway
4 Example8 Carretera           PaAC03           26427103  Acceso… <NA>   highway
5 Example8 Carretera           PaAC02           26427103  Acceso… <NA>   highway
6 Example3 Rodovia             PONTE PERNAMBUCO 397785888 Estrad… IBGE   highway
7 Example3 Rodovia             OAC78            397785885 Estrad… IBGE   highway
# ℹ 6 more variables: feature_type <chr>, surface <chr>, distance_to <dbl>,
#   out_thresh <lgl>, latitude <dbl>, longitude <dbl>