Beach Dataset: Scientific Documentation

“We cannot instrumentally measure every beach in the world, but we can consistently extract observable facts from public videos. By standardizing this process, tracking uncertainty, and verifying each entry twice, we turn scattered footage into structured, confidence-rated data — a method applicable well beyond beaches.”

This document describes the structure and methodology of the Beach Dataset, which is part of the Beach Review Guide (BRG) Project, used in geographical, environmental, and tourism research. The dataset follows principles of transparency, verifiability, tiered confidence classification, and human-in-the-loop quality control.

Data Quality Assurance: Two-Stage Moderation

Each record is checked twice for consistency and accuracy:

  1. Observer (Data Contributor) annotates fields strictly according to source evidence and project guidelines.
  2. Moderator (Administrator) independently verifies every field against source material, checks geographic consistency using satellite imagery.

Only after successful moderation is a record assigned a Moderation Date and made available for public use or scientific analysis. Unmoderated or rejected records are excluded from all exports and visualisations.

Dataset Structure

The dataset contains two record types, collected using two related data collection procedures:

These record types are stored separately in the database and serve different analytical purposes. Each field within either record type is assigned a confidence level (L1–L3) as defined below.

Data Collection Methodology

The methodology can be applied to any visual or narrative source that provides verifiable evidence of beach conditions. In the current implementation, the project relies mainly on publicly available user-generated videos from online platforms (primarily YouTube) due to their global coverage and accessibility.

Beach Profile

Beach Profiles are constructed using the following protocol:

Visit Report

Visit Reports are derived from individual source videos that satisfy the following inclusion criteria:

Each Visit Report is linked to one Beach Profile via beach_id.

Annotation Protocol

The detailed visual criteria used to distinguish categorical values are defined in an internal annotation guide available only to project contributors (observers). External users can treat categorical fields as discrete observational classes without needing access to the internal guidelines.

Limitations

The Beach Dataset, while methodologically rigorous within its design constraints, is subject to several limitations that affect its scope, representativeness, and interpretability:

The dataset is most suitable for descriptive comparisons, cross-site characterization, and preliminary hypothesis generation within the spatio-temporal limits of the source material.

Data Quality Framework

Within each record type, every field is assigned one of three confidence levels, informed by common practices in geographic data quality assessment:

Table 1
Level Name Definition Scientific usability
L1 Verified Observation Directly observable in source material or explicitly stated. Binary, categorical, or measured. No interpretation. High – suitable for statistical and spatial analysis.
L2 Inferred / Contextual Derived from verified observation plus external authoritative source (e.g., climatological average) or logical inference from complete coverage. Moderate – usable with documented uncertainty.
L3 Descriptive / Editorial Interpretive phrasing based on L1 evidence, formatted as recommendation or narrative. Contains no new factual claims. Low – excluded from scientific analysis; intended for interface only.

Field Definitions and Classification

Table 2 lists fields, their record type affiliation, scientific definition, permissible values, source, and confidence level.

Table 2
Field Record Type Definition Permissible Values Source Level
beach_idBeach ProfileUnique beach identifier (slug)Alphanumeric, lowercase, hyphenatedUser input, validatedL1
locationBeach ProfileAdministrative region (country, province, island)Controlled vocabularyUser selectionL1
latitudeBeach ProfileGeographic latitude of beach centroidDecimal degrees (WGS84)Satellite map (Google Maps, OSM)L1
longitudeBeach ProfileGeographic longitude of beach centroidDecimal degrees (WGS84)Satellite map (Google Maps, OSM)L1
length_mBeach ProfileBeach length along shorelineInteger (meters); missing values encoded as NULLSatellite ruler toolL1
width_mBeach ProfileAverage beach width (mean of min/max)Integer (meters); missing values encoded as NULLSatellite ruler toolL1
beach_typeBeach ProfileInfrastructure presence levelWild, Semi-organized, OrganizedSource video (ground-level view, consistent across videos)L1
sand_typeBeach ProfileSurface substrate compositionWhite sand, Golden sand, Dark sand, Gravel, Pebbles, Shells, RockSource video (ground-level view, consistent across videos)L1
water_entryBeach ProfileSlope gradient at shorelineGentle slope, Moderate slope, Steep drop, Reef edge, Rocky shelfSource video (nearshore view, stable feature)L1
water_bottomBeach ProfileSubstrate composition at seabedSandy, Silty, Pebbles, Coral, Rocky, SeagrassSource video (only if water clear and bottom visible, stable)L1
natural_shadeBeach ProfileCanopy cover over beach surfaceNone, Sparse, Moderate, FullSource video (daytime, full view, consistent)L1
toilets_presentBeach ProfilePresence of functional toiletstrue, NULLSource videoL1
showers_presentBeach ProfilePresence of functional showerstrue, NULLSource videoL1
sunbeds_availableBeach ProfilePresence of rentable sunbedstrue, NULLSource videoL1
food_drink_availableBeach ProfilePresence of on-beach vendors or cafestrue, NULLSource videoL1
safety_infrastructureBeach ProfilePresence of lifeguards, flags, or rescue towerstrue, NULLSource video (only if beach fully scanned across videos)L1
access_difficultyBeach ProfileWalking effort from nearest roadRoadside, Very Short walk (1-2 min), Short walk (<5 min), Trail (5–15 min), Long hike (>15 min), Boat requiredSource video (path shown or described, stable)L1
recommended_transportBeach ProfileTransport mode demonstrated or advisedScooter, Car/Taxi, Boat, WalkableSource video narration or footage (consistent)L1
parkingBeach ProfileProximity and type of parkingNone, Street only, Free lot, Paid lot, Guests onlySource video (parking visible, stable)L1
location_descriptionBeach ProfileGeographic position relative to landmarksFree text (≤70 chars)Map analysis (Google Maps, OSM)L2
visit_yearVisit ReportYear of on-site visitInteger (e.g., 2024)Video title/description/comments or upload dateL1
visit_monthVisit ReportMonth of on-site visitInteger (1–12); NULL if unconfirmedExplicit mention onlyL1
sky_conditionVisit ReportCloud cover at time of recordingSunny, Partly cloudy, Cloudy, Rainy, After rainSource video (sky visible)L1
windVisit ReportObserved wind intensityCalm, Light, Moderate, StrongSource video (vegetation, flags, water surface)L1
wave_heightVisit ReportVisual wave amplitudeCalm (<0.1 m), Light (0.1–0.2 m), Moderate (0.2–0.5 m), Rough (0.5–1.0 m), Very Rough (>1.0 m)Source video (shoreline view)L1
water_clarityVisit ReportUnderwater visibilityCrystal clear, Clear, Slightly cloudy, MurkySource video (only if water shown clearly)L1
water_cleanlinessVisit ReportPresence of debris or pollutants in waterClean, Some debris, Algae/seaweed, Polluted/muddySource video (water column)L1
sand_cleanlinessVisit ReportSurface litter or organic residueExcellent, Good, Some trash, PoorSource video (beach surface)L1
crowd_levelVisit ReportObserved density of beach usersEmpty, Few people, Moderate, Crowded, Very busySource video (beach in frame)L1
visitor_typeVisit ReportDominant demographic groupInternational tourists, Families, Couples, Solo travelers, Local touristsSource video (language, behavior, attire)L1
noise_levelVisit ReportAudible disturbance sourcesQuiet, Music from bars, Construction, Boat enginesSource video audio trackL1
time_of_dayVisit ReportApproximate recording timeMorning, Day, EveningExplicit mention or contextual cuesL1
air_temp_cVisit ReportMean monthly air temperatureInteger (°C); NULL if month unknownweatherspark.com (nearest station)L2
water_temp_cVisit ReportMean monthly sea surface temperatureInteger (°C); NULL if month unknownseatemperature.org (nearest location)L2
short_descriptionBeach ProfileConcise factual summary for interfaceFree text (≤70 chars, no adjectives)Editorial synthesis of L1 fieldsL3
short_sand_water_charBeach ProfilePhenomenological description of sand/waterFree text (≤70 chars)Editorial phrasing based on L1L3
best_forBeach ProfileUser group suitability (conditional)Free text (e.g., “Snorkelers: coral offshore”)Editorial interpretation of L1L3
avoid_ifBeach ProfileUser incompatibility statementFree text (e.g., “You seek peace (boats passing)”)Editorial contrast based on L1L3
special_notesVisit ReportContextual metadata about videoFree text (e.g., “Copter video”, “Jan 2025”)Curator annotationL3

Note: NULL values indicate unobserved or unverifiable conditions, not confirmed absence. Absence assertions (e.g., “no toilets”) are only recorded when ≥90% of the beach is visibly scanned and no such feature is present.

Data Availability, Use, and Attribution

The L1 and L2 portions of the Beach Dataset are not publicly downloadable at this time. They may be made available under a CC BY 4.0 license for non-commercial academic research upon request. The L3 layer (editorial content) is excluded from redistribution and is intended solely for user-facing presentation on the Beach Review Guide (BRG) platform.

Commercial use of any part of the dataset—including licensing, embedding in websites or mobile applications, use in commercial products, or resale—requires a separate written agreement with the project author.

To request access to the dataset or inquire about commercial licensing and collaboration, please contact: coconut@beachreviewguide.com.

For academic citation, please use the following format:

Vityasev, Y. M. (2026). Beach Dataset. Beach Review Guide. https://beachreviewguide.com/dataset-description/

Methodology Attribution and Intellectual Property

The data collection procedures, field definitions, confidence system (L1–L3), and moderation workflow constitute the methodological basis of the BRG project. The custom categorical scales (e.g., wave height, wind intensity, infrastructure, cleanliness) were designed specifically for visual assessment from user-generated videos. The methodology is an original work; using it outside the BRG project requires permission from the author.

Proper attribution is required in any derivative academic work that references this methodology:

“Data collection followed the Beach Review Guide (BRG) methodology (Vityasev, 2026).”