We verify whether the delivered photos and videos match the expected records in the Excel sheets. We standardize the media structure, copy found files into specific folders (ct, under, over), generate a partial-match report, and remove media that remain in the root without a correct destination.
14.2 Problem Solving
Pipeline: (1) use helper functions and check folder names versus Excel sheets; (2) stage raw media into a standard structure; (3) inventory media under Media while separating already-processed files (ct, under, over); (4) apply matching functions between sheets and files; (5) export partial matches and (optionally) clean stray files.
14.2.1 Common steps
These steps get the ground ready: we load shared utilities, list the incoming folders, and immediately compare their names with the Excel sheets. If something is off, we find out before spending time on matching. That early sanity check keeps the rest of the workflow focused on pairing files, not fixing misconfigured inputs.
14.2.1.1 Check folder names vs. Excel sheets
Next, we list the folders under Example/12 and compare those names to the Excel sheet names. Using waldo::compare, any mismatch is surfaced right away, so we can align folder and sheet names before matching files. This symmetry avoids silent skips caused by small naming drifts.
Here we scan every file under the incoming folders, keep only images or videos, and infer dataset and media type from the path and MIME type. Then we create the standardized destination folders and copy the files there. Executing this chunk once sets up a clean media tree for the later matching steps.
Now we read everything under Example/Media, capturing the dataset, media type, and filename from the path. We then split files already placed under ct, under, or over into media_files_types_list and remove them from the main working list. The result, media_list, is the on-disk ground truth for files still needing a destination, which we compare against the Excel expectations.
The matching workflow now uses functions from datapaperchecks directly (validation, loading, candidate creation, exact-copy handling, and report export). This keeps chapter code focused on data preparation, while package code centralizes matching logic and maintenance.
14.2.5 Run the full matching
We run datapaperchecks::run_check_match_media() with the media objects created in this chapter. The function processes ct, under, and over, writes the partial-match report to Example/Output/12, and returns the partial candidates by dataset.
Code
result <- datapaperchecks::run_check_match_media(media_list = media_list,media_files = media_files,media_files_types_list = media_files_types_list,first_take =TRUE)result