North Australia Sentinel 2 Satellite Composite Imagery - 15th percentile true colour (NESP MaC 3.17, AIMS)

Created 13/03/2025

Updated 13/03/2025

This dataset is true colour cloud-free composite satellite imagery optimised for mapping shallow marine habitats in northern Australia, based on 10-meter resolution Sentinel 2 data collected from 2015 to 2024. It contains composite imagery for 333 Sentinel 2 tiles of northern Australia and the Great Barrier Reef. This dataset offers improved visual clarity of shallow water features as compared to existing satellite imagery, allowing deeper marine features to be observed. These composites were specifically designed to address challenges such as sun glint, clouds and turbidity that typically hinder marine environment analyses. No tides were considered in the selection of the imagery and so this imagery corresponds to an 'All tide' image, approximating mean sea level. This dataset is an updated version (Version 2), published in July 2024, which succeeds the initial draft version (Version 1, published in March 2024). The current version spans imagery from 2015–2024, an extension of the earlier timeframe that covered 2018–2022. This longer temporal range allowed the imagery to be cleaner with lower image noise allowing deeper marine features to be visible. The deprecated draft version was removed from online download to save on storage space and is now only available on request. While the final imagery corresponds to true colour based primarily Sentinel 2 bands B2 (blue), B3 (green), and B4 (red), the near infrared (B8) band was used as part of sun glint correction and automated selection of low noise imagery. Contrast enhancement was applied to the imagery to compress the original 12 bit per channel Sentinel 2 imagery into the final 8-bit per channel GeoTiffs. Black and white point correction was used to enhance the contrast as much as possible without too much clipping of the darkest and lightest marine features. Gamma correction of 2 (red), 2 (green) and 2.3 (blue) was applied allow a wider dynamic range to be represented in the 8-bit data, helping to ensure that little precision was lost in representing darker marine features. As a result, the image brightness is not linearly scaled. Further details of the corrections applied is available from https://github.com/eatlas/AU_NESP-MaC-3-17_AIMS_S2-comp/blob/main/src/processors/s2processor.py. Methods: The satellite image composites were created by combining multiple Sentinel 2 images using the Google Earth Engine. The core algorithm was: 1. For each Sentinel 2 tile filter the "COPERNICUS/S2_HARMONIZED" image collection by - tile ID - maximum cloud cover 20% - date between '2015-06-27' and '2024-05-31' - asset_size > 100000000 (remove small fragments of tiles) Note: A maximum cloud cover of 20% was used to improve the processing times. In most cases this filtering does not have an effect on the final composite as images with higher cloud coverage mostly result in higher noise levels and are not used in the final composite. 2. Split images by "SENSING_ORBIT_NUMBER" (see "Using SENSING_ORBIT_NUMBER for a more balanced composite" for more information). 3. For each SENSING_ORBIT_NUMBER collection filter out all noise-adding images: 3.1 Calculate image noise level for each image in the collection (see "Image noise level calculation for more information") and sort collection by noise level. 3.2 Remove all images with a very high noise index (>15). 3.3 Calculate a baseline noise level using a minimum number of images (min_images_in_collection=30). This minimum number of images is needed to ensure a smoth composite where cloud "holes" in one image are covered by other images. 3.4 Iterate over remaining images (images not used in base noise level calculation) and check if adding image to the composite adds to or reduces the noise. If it reduces the noise add it to the composite. If it increases the noise stop iterating over images. 4. Combine SENSING_ORBIT_NUMBER collections into one image collection. 5. Remove sun-glint (true colour only) and apply atmospheric correction on each image (see "Sun-glint removal and atmospheric correction" for more information). 6. Duplicate image collection to first create a composite image without cloud masking and using the 30th percentile of the images in the collection (i.e. for each pixel the 30th percentile value of all images is used). 7. Apply cloud masking to all images in the original image collection (see "Cloud Masking" for more information) and create a composite by using the 30th percentile of the images in the collection (i.e. for each pixel the 30th percentile value of all images is used). 8. Combine the two composite images (no cloud mask composite and cloud mask composite). This solves the problem of some coral cays and islands being misinterpreted as clouds and therefore creating holes in the composite image. These holes are "plugged" with the underlying composite without cloud masking. (Lawrey et al. 2022) 9. The final composite was exported as cloud optimized 8 bit GeoTIFF Note: The following tiles were generated with no "maximum cloud cover" as they did not have enough images to create a composite with the standard settings: 46LGM, 46LGN, 46LHM, 50KKD, 50KPG, 53LMH, 53LMJ, 53LNH, 53LPH, 53LPJ, 54LVP, 57JVH, 59JKJ. Compositing Process: The dataset was created using a multi-step compositing process. A percentile-based image compositing technique was employed, with the 15th percentile chosen as the optimal value for most regions. This percentile was identified as the most effective in minimizing noise and enhancing key features such as coral reefs, islands, and other shallow water habitats. The 15th percentile was chosen as a trade off between the desire to select darker pixels that typically correspond to clearer water, and very dark values (often occurring at the 10th percentile) corresponding to cloud shadows. The cloud masking predictor would often misinterpret very pale areas, such as cays and beaches as clouds. To overcome this limitation a dual-image compositing method was used. A primary composite was generated with cloud masks applied, and a secondary, composite with no cloud masking was layered beneath to fill in potential gaps (or “holes”) caused by the cloud masking mistakes Image noise level calculation: The noise level for each image in this dataset is calculated to ensure high-quality composites by minimizing the inclusion of noisy images. This process begins by creating a water mask using the Normalized Difference Water Index (NDWI) derived from the NIR and Green bands. High reflectance areas in the NIR and SWIR bands, indicative of sun-glint, are identified and masked by the water mask to focus on water areas affected by sun-glint. The proportion of high sun-glint pixels within these water areas is calculated and amplified to compute a noise index. If no water pixels are detected, a high noise index value is assigned. In any set of satellite images, some will be taken under favourable conditions (low wind, low sun-glint, and minimal cloud cover), while others will be affected by high sun-glint or cloud. Combining multiple images into a composite reduces noise by averaging out these fluctuations. When all images have the same noise level, increasing the number of images in the composite reduces the overall noise. However, in practice, there is a mix of high and low noise images. The optimal composite is created by including as many low-noise images as possible while excluding high-noise ones. The challenge lies in the determining the acceptable noise threshold for a given scene as some areas are more cloudy and sun glint affected than others. To address this, we rank the available Sentinel 2 images for each scene by their noise index, from lowest to highest. The goal is to determine the ideal number of images (N) to include in the composite to minimize overall noise. For each N, we use the lowest noise images and estimate the final composite noise based on the noise index. This is repeated for all values of N up to a maximum of 200 images, and we select the N that results in the lowest noise. This approach has some limitations. It estimates noise based on sun glint and residual clouds (after cloud masking) using NIR bands, without accounting for image turbidity. The final composite noise is not directly measured as this would be computationally expensive. It is instead estimated by dividing the average noise of the selected images by the square root of the number of images. We found this method tends to underestimate the ideal image count, so we adjusted the noise estimates, scaling them by the inverse of their ranking, to favor larger sets of images. The algorithm is not fully optimized, and further refinement is needed to improve accuracy. Full details of the algorithm can be found in https://github.com/eatlas/AU_NESP-MaC-3-17_AIMS_S2-comp/blob/main/src/utilities/noise_predictor.py Sun glint removal and atmospheric correction: Sun glint was removed from the images using the infrared B8 band to estimate the reflection off the water from the sun glint. B8 penetrates water less than 0.5 m and so in water areas it only detects reflections off the surface of the water. The sun glint detected by B8 correlates very highly with the sun glint experienced by the visible channels (B2, B3 and B4) and so the sun glint in these channels can be removed by subtracting B8 from these channels. Eric Lawrey developed this algorithm by fine tuning the value of the scaling between the B8 channel and each individual visible channel (B2, B3 and B4) so that the maximum level of sun glint would be removed. This work was based on a representative set of images, trying to determine a set of values that represent a good compromise across different water surface conditions. This algorithm is an adjustment of the algorithm already used in Lawrey et al. 2022 Cloud Masking: Each image was processed to mask out clouds and their shadows before creating the composite image. The cloud masking uses the COPERNICUS/S2_CLOUD_PROBABILITY dataset developed by SentinelHub (Google, n.d.; Zupanc, 2017). The mask includes the cloud areas, plus a mask to remove cloud shadows. The cloud shadows were estimated by projecting the cloud mask in the direction opposite the angle to the sun. The shadow distance was estimated in two parts. A low cloud mask was created based on the assumption that small clouds have a small shadow distance. These were detected using a 35% cloud probability threshold. These were projected over 400 m, followed by a 150 m buffer to expand the final mask. A high cloud mask was created to cover longer shadows created by taller, larger clouds. These clouds were detected based on an 80% cloud probability threshold, followed by an erosion and dilation of 300 m to remove small clouds. These were then projected over a 1.5 km distance followed by a 300 m buffer. The parameters for the cloud masking (probability threshold, projection distance and buffer radius) were determined through trial and error on a small number of scenes. As such there are probably significant potential improvements that could be made to this algorithm. Erosion, dilation and buffer operations were performed at a lower image resolution than the native satellite image resolution to improve the computational speed. The resolution of these operations was adjusted so that they were performed with approximately a 4 pixel resolution during these operations. This made the cloud mask significantly more spatially coarse than the 10 m Sentinel imagery. This resolution was chosen as a trade-off between the coarseness of the mask verse the processing time for these operations. With 4-pixel filter resolutions these operations were still using over 90% of the total processing resulting in each image taking approximately 10 min to compute on the Google Earth Engine. (Lawrey et al. 2022) Format: GeoTiff - LZW compressed, 8 bit channels, 0 as NoData, Imagery as values 1 - 255. Internal tiling and overviews. Average size: 12500 x 11300 pixels and 300 MB per image. The images in this dataset are all named using a naming convention. An example file name is AU_AIMS_MARB-S2-comp_p15_TrueColour_51KTV_v2_2015-2024.tif. The name is made up from: - Dataset name (AU_AIMS_MARB-S2-comp) - An algorithm descriptor (p15 for 15th percentile), - Colour and contrast enhancement applied (TrueColour), - Sentinel 2 tile (example: 54LZP), - Version (v2), - Date range (2015 to 2024 for version 2) References: Google (n.d.) Sentinel-2: Cloud Probability. Earth Engine Data Catalog. Accessed 10 April 2021 from https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_CLOUD_PROBABILITY Zupanc, A., (2017) Improving Cloud Detection with Machine Learning. Medium. Accessed 10 April 2021 from https://medium.com/sentinel-hub/improving-cloud-detection-with-machine-learning-c09dc5d7cf13 Lawrey, E., & Hammerton, M. (2022). Coral Sea features satellite imagery and raw depth contours (Sentinel 2 and Landsat 8) 2015 – 2021 (AIMS) [Data set]. eAtlas. https://doi.org/10.26274/NH77-ZW79 Data Location: This dataset is filed in the eAtlas enduring data repository at: data\custodian\2023-2026-NESP-MaC-3\3.17_Northern-Aus-reef-mapping The source code is available on GitHub. Change log: This dataset will be progressively improved and made available for download. These additions will be noted in this change log. 2025-02-04 - Provided additional details in the citation section. 2024-10-19 - Additional details were added to the metadata record. 2024-07-22 - Version 2 composites using an improved contrast enhancement and a noise prediction algorithm to only include low noise images in composite (Git tag: "composites_v2") 2024-03-07 - Initial release draft composites using 15th percentile (Git tag: "composites_v1")

Files and APIs

Tags

Additional Info

Field Value
Title North Australia Sentinel 2 Satellite Composite Imagery - 15th percentile true colour (NESP MaC 3.17, AIMS)
Language eng
Licence notspecified
Landing Page https://devweb.dga.links.com.au/data/dataset/99566baf-1268-4fa3-9983-43238b8244bf
Contact Point
CSIRO Oceans & Atmosphere
e-atlas@aims.gov.au
Reference Period 27/06/2015 - 31/05/2024
Data Portal data.gov.au