Code
source("R/FUNCTIONS.R")This ensures that coordinates are present, in valid formats, and spatially coherent with the expected projection; it converts everything to WGS84, computes diagnostics (e.g., distance to reference features if provided), and prepares a clean table for export.
We follow a pipeline: (1) load helper functions and read the target worksheet; (2) convert to sf objects, harmonize CRS, and compute spatial diagnostics; (3) extract final data frame with values beyond the threshold and write an Excel output for review.
To begin, we source our external FUNCTIONS.R script. This makes our custom helper functions, like read_sheet, available for use throughout the analysis, ensuring our workflow is reproducible.
source("R/FUNCTIONS.R")First, we read the data from the “Underpasses” and “Overpasses” sheets within the files in the “Example” folder. Using our custom read_sheet function, we load them into the under and over lists, respectively, and then preview the first few rows of each.
# Read sheets
under <- read_sheet(
path = "Example/10",
sheet = "Underpasses",
na = c("NA", "na")
)
over <- read_sheet(
path = "Example/10",
sheet = "Overpasses",
na = c("NA", "na")
)
head(under)[1]$Example2
# A tibble: 0 × 17
# ℹ 17 variables: Infrastructure_type <chr>, Structure_id <chr>,
# Structure_type <chr>, Structure_cell <chr>, Structure_shape <chr>,
# Structure_photo <chr>, Structure_age <dbl>, Structure_height <dbl>,
# Structure_length <dbl>, Structure_width <dbl>, Waterbody_width <dbl>,
# Latitude <dbl>, Longitude <dbl>, Utm_zone <chr>, X_easting <dbl>,
# Y_northing <dbl>, Datum <chr>
head(over)[1]$Example2
# A tibble: 7 × 19
Infrastructure_type Structure_id Structure_type Structure_material
<chr> <chr> <chr> <chr>
1 Carretera CE2 Puente flexible Cable con sogas
2 Carretera CE3 Puente flexible Cable con sogas
3 Carretera CE4 Puente flexible Cable con sogas
4 Carretera CE5 Puente flexible Cable con sogas
5 Carretera CE6 Puente flixible Cable con sogas
6 Carretera CE7 Puente flexible Cable con sogas
7 Carretera CE9 Puente flexible Cable con sogas
# ℹ 15 more variables: Structure_anchor_1 <chr>, Structure_anchor_2 <chr>,
# Structure_branch_access <chr>, Structure_photo <chr>, Structure_age <dbl>,
# Structure_height <dbl>, Structure_lenght <dbl>, Structure_width <dbl>,
# Structure_internal_height <dbl>, Latitude <dbl>, Longitude <dbl>,
# Utm_zone <chr>, X_easting <dbl>, Y_northing <dbl>, Datum <chr>
This is the core of our spatial processing. We first determine the correct EPSG code using our add_epsg (see function 2.2.4) function for each record based on its datum and UTM zone. Then, processing each EPSG group separately, we convert the tabular data into sf spatial objects and reproject everything to a single, standardized system: WGS84 (EPSG: 4326).
# Distinct of different Datum + Utm_zone
uo_datum_zone <- dplyr::bind_rows(under, .id = "Dataset") |>
dplyr::bind_rows(dplyr::bind_rows(over, .id = "Dataset")) |>
dplyr::filter(!is.na(Datum)) |>
dplyr::mutate(
type = dplyr::if_else(!is.na(Utm_zone), "Projected", "Geodetic"),
zone = stringr::str_sub(Utm_zone, 1, 2),
hemis = dplyr::case_when(
stringr::str_sub(Utm_zone, 3, 3) == "S" ~ "S",
stringr::str_sub(Utm_zone, 3, 3) == "N" ~ "N",
stringr::str_sub(Utm_zone, 3, 3) >= "N" ~ "N",
stringr::str_sub(Utm_zone, 3, 3) < "N" ~ "S"
)
) |>
add_epsg()
nested_uo_datum_zone <- uo_datum_zone |>
split(~epsg)
epsg_uo_datum_zone <- list()
for (epsg in names(nested_uo_datum_zone)) {
cli::cli_h1("Starting epsg {epsg}")
epsg_uo_datum_zone[[epsg]] <- nested_uo_datum_zone[[epsg]] |>
dplyr::mutate(
X = dplyr::case_when(
!is.na(Longitude) ~ Longitude,
TRUE ~ as.numeric(X_easting)
),
Y = dplyr::case_when(
!is.na(Latitude) ~ Latitude,
TRUE ~ as.numeric(Y_northing)
)
) |>
sf::st_as_sf(
coords = c("X", "Y"),
remove = FALSE,
crs = as.numeric(epsg)
) |>
sf::st_transform(4326) |>
dplyr::mutate(
Lat = sf::st_coordinates(geometry)[, 2],
Long = sf::st_coordinates(geometry)[, 1],
New_Datum = "WGS 84",
New_EPSG = 4326L
)
}
── Starting epsg 4326 ──────────────────────────────────────────────────────────
── Starting epsg 31980 ─────────────────────────────────────────────────────────
── Starting epsg 31981 ─────────────────────────────────────────────────────────
── Starting epsg 32718 ─────────────────────────────────────────────────────────
── Starting epsg 32724 ─────────────────────────────────────────────────────────
With the data reprojected, we bind it back into a single sf data frame. We then use our set_feature_from_infrastructure function to automatically classify each structure (e.g., as ‘highway’, ‘railway’ or ‘man_made’ - see function 2.2.5), following OpenStreetMap (OSM) convention. Finally, we check for any structures that couldn’t be classified (feature column as NA), flagging them for review.
epsg_uo_datum_zone_bind <- epsg_uo_datum_zone |>
dplyr::bind_rows(.id = "epsg") |>
dplyr::arrange(Dataset) |>
set_feature_from_infrastructure()
# Checking datasets that have na values on `Infrastructure_type`
epsg_uo_datum_zone_bind |>
sf::st_drop_geometry() |>
dplyr::count(Dataset, Infrastructure_type, feature) |>
print(n = Inf)# A tibble: 5 × 4
Dataset Infrastructure_type feature n
<chr> <chr> <chr> <int>
1 Example3 Rodovia highway 52
2 Example6 Gasoduto man_made 5
3 Example8 Carretera highway 4
4 Example8 Ferrovía railway 1
5 Example9 Ducto man_made 25
To validate the structure locations against real-world data, we use the function sec-calc-nearest-osm-dist (see function 2.2.6) to create a bounding box of the structures and a buffer around this bounding box. The function then calculates the distance from each point to its nearest corresponding feature from OSM. We loop through each feature type and dataset, using a try-catch block to gracefully handle any errors from the API or spatial computation. The function creates columns that helps to check if the distance between the structure and the closest OSM infrastructure are within the threshold we provided.
nested_epsg_uo_datum_zone_bind <- epsg_uo_datum_zone_bind |>
split(~feature)
osm_result <- list()
for (feature in names(nested_epsg_uo_datum_zone_bind)) {
cli::cli_h1("Starting feature {feature}")
df <- nested_epsg_uo_datum_zone_bind[[feature]]
datasets <- df |>
dplyr::distinct(Dataset) |>
dplyr::pull(Dataset)
for (dataset in datasets) {
cli::cli_h3("Starting dataset {dataset}")
result <- try(
{
nested_epsg_uo_datum_zone_bind[[feature]] |>
dplyr::filter(Dataset == dataset) |>
calc_nearest_osm_dist(feature = feature)
},
silent = TRUE
)
if (inherits(result, "try-error")) {
clean_message <- base::conditionMessage(attr(result, "condition"))
# Now, the cli alert will work safely
cli::cli_alert_danger("Error in dataset {dataset}: {clean_message}")
# Save the clean message to the results
osm_result[[feature]][[dataset]] <- clean_message
next
} else {
osm_result[[feature]][[dataset]] <- result
}
}
cli::cli_alert_success("Finishing feature {feature}")
}
── Starting feature highway ────────────────────────────────────────────────────
── Starting dataset Example3
── Starting dataset Example8
✔ Finishing feature highway
── Starting feature man_made ───────────────────────────────────────────────────
── Starting dataset Example6
── Starting dataset Example9
→ There are no features within the buffer
✖ Error in dataset Example9: is.data.frame(df) is not TRUE
✔ Finishing feature man_made
── Starting feature railway ────────────────────────────────────────────────────
── Starting dataset Example8
✔ Finishing feature railway
For the Example9, we receive the error “→ There are no features within the buffer”, which means that no OSM feature was found within the buffer of the bounding box created. This is an issue that has to be asked to the researcher.
In the final step, we prepare the output. We gather all the results, filter for structures flagged as potential outliers (out_thresh == TRUE), extract the final latitude and longitude into clean columns, and export the resulting table.
tab_final <- osm_result |>
purrr::list_flatten(name_spec = "{inner}") |>
purrr::map_dfr("data") |>
dplyr::relocate(feature_type, .after = feature) |>
dplyr::filter(out_thresh == TRUE) |>
dplyr::arrange(desc(distance_to)) |>
dplyr::mutate(
latitude = sf::st_coordinates(geometry)[, 2],
longitude = sf::st_coordinates(geometry)[, 1]
) |>
sf::st_drop_geometry()
tab_final# A tibble: 7 × 13
Dataset Infrastructure_type Structure_id id_osm name source feature
* <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 Example3 Rodovia OAC55A 787212494 Estrad… IBGE highway
2 Example3 Rodovia OAC72 397785888 Estrad… IBGE highway
3 Example8 Carretera PaAC01 26427103 Acceso… <NA> highway
4 Example8 Carretera PaAC03 26427103 Acceso… <NA> highway
5 Example8 Carretera PaAC02 26427103 Acceso… <NA> highway
6 Example3 Rodovia PONTE PERNAMBUCO 397785888 Estrad… IBGE highway
7 Example3 Rodovia OAC78 397785885 Estrad… IBGE highway
# ℹ 6 more variables: feature_type <chr>, surface <chr>, distance_to <dbl>,
# out_thresh <lgl>, latitude <dbl>, longitude <dbl>