File Structure at Three Processing Levels for the Ocean Color Instrument (OCI)

File Structure at Three Processing Levels for the Ocean Color Instrument (OCI)#

Authors: Anna Windle (NASA, SSAI), Ian Carroll (NASA, UMBC), Carina Poulin (NASA, SSAI)

PREREQUISITES

This notebook has the following prerequisites: OCI Data Access

Summary#

In this example we will use the earthaccess package to access an OCI Level-1B, Level-2, and Level-3 NetCDF file and open them using xarray.

NetCDF (Network Common Data Format) is a binary file format for storing multidimensional scientific data (variables). It is optimized for array-oriented data access and support a machine-independent format for representing scientific data. Files ending in .nc are NetCDF files.

XArray is a package that supports the use of multi-dimensional arrays in Python. It is widely used to handle Earth observation data, which often involves multiple dimensions — for instance, longitude, latitude, time, and channels/bands.

Learning Objectives#

At the end of this notebok you will know:

How to find groups in a NetCDF file
How to use xarray to open OCI data
What key variables are present in the groups within OCI L1B, L2, and L3 files

1. Setup#

We begin by importing all of the packages used in this notebook. If you have created an environment following the guidance provided with this tutorial, then the imports will be successful.

import cartopy.crs as ccrs
import earthaccess
import h5netcdf
import matplotlib.pyplot as plt
import numpy as np
import xarray as xr
import pandas as pd

Set (and persist to your user profile on the host, if needed) your Earthdata Login credentials.

auth = earthaccess.login(persist=True)

Back to top

4. Inspecting OCI L3 File Structure#

At Level-3 there are binned (B) and mapped (M) products available for OCI. The L3M remote sensing reflectance (Rrs) files contain global maps of Rrs. We’ll use the same earthaccess method to find the data.

tspan = ("2024-05-01", "2024-05-16")
bbox = (-76.75, 36.97, -75.74, 39.01)

results = earthaccess.search_data(
    short_name="PACE_OCI_L3M_RRS_NRT",
    temporal=tspan,
    bounding_box=bbox,
)

Granules found: 120

paths = earthaccess.open(results)

Opening 120 granules, approx size: 36.62 GB
using endpoint: https://obdaac-tea.earthdatacloud.nasa.gov/s3credentials

OCI L3 data do not have any groups, so we can open the dataset without the group argument. Let’s take a look at the first file.

Notice that OCI L3M data has lat and lon coordinates, so it’s easy to slice out a bounding box and map the “Rrs_442” variable.

fig = plt.figure()
ax = plt.axes(projection=ccrs.PlateCarree())
ax.coastlines()
ax.gridlines(draw_labels={"left": "y", "bottom": "x"})

rrs_442 = dataset["Rrs_442"].sel({"lat": slice(-25, -45), "lon": slice(10, 30)})
plot = rrs_442.plot(cmap="viridis", vmin=0, ax=ax)

../../_images/7b9601d6d56b648853dd3600dcb5eced27f5fd1906033ccb5a402e1102329be0.png

Also becuase the L3M variables have lat and lon coordinates, it’s possible to stack multiple granules along a new dimension that corresponds to time. Instead of xr.open_dataset, we use xr.open_mfdataset to create a single xarray.Dataset (the “mf” in open_mfdataset stands for multiple files) from an array of paths.

We also use a new search filter available in earthaccess.search_data: the granule_name argument accepts strings with the “*” wildcard. We need this to distinguish daily (“DAY”) from eight-day (“8D”) composites, as well as to get the 0.1 degree resolution projections.

tspan = ("2024-05-01", "2024-05-8")

results = earthaccess.search_data(
    short_name="PACE_OCI_L3M_CHL_NRT",
    temporal=tspan,
    granule_name="*.DAY.*.0p1deg.*",
)

Granules found: 8

paths = earthaccess.open(results)

Opening 8 granules, approx size: 0.03 GB
using endpoint: https://obdaac-tea.earthdatacloud.nasa.gov/s3credentials

The paths list is sorted temporally by default, which means the shape of the paths array specifies the way we need to tile the files together into larger arrays. We specify combine="nested" to combine the files according to the shape of the array of files (or file-like objects), even though paths is not a “nested” list in this case. The concat_dim="date" argument generates a new dimension in the combined dataset, because “date” is not an existing dimension in the individual files.

dataset = xr.open_mfdataset(
    paths,
    combine="nested",
    concat_dim="date",
)

Add a date dimension using the dates from the netCDF files.

dates = [ xr.open_dataset(a).attrs["time_coverage_end"] for a in paths]
dt = pd.to_datetime(dates)
dataset = dataset.assign_coords(date=dt.values)
dataset

<xarray.Dataset> Size: 207MB
Dimensions:  (date: 8, lat: 1800, lon: 3600, rgb: 3, eightbitcolor: 256)
Coordinates:
  * lat      (lat) float32 7kB 89.95 89.85 89.75 89.65 ... -89.75 -89.85 -89.95
  * lon      (lon) float32 14kB -179.9 -179.9 -179.8 ... 179.8 179.9 180.0
  * date     (date) datetime64[ns] 64B 2024-05-02T02:28:09 ... 2024-05-09T01:...
Dimensions without coordinates: rgb, eightbitcolor
Data variables:
    chlor_a  (date, lat, lon) float32 207MB dask.array<chunksize=(1, 512, 1024), meta=np.ndarray>
    palette  (date, rgb, eightbitcolor) uint8 6kB dask.array<chunksize=(1, 3, 256), meta=np.ndarray>
Attributes: (12/64)
    product_name:                      PACE_OCI.20240501.L3m.DAY.CHL.V1_0_0.c...
    instrument:                        OCI
    title:                             OCI Level-3 Standard Mapped Image
    project:                           Ocean Biology Processing Group (NASA/G...
    platform:                          PACE
    source:                            satellite observations from OCI-PACE
    ...                                ...
    identifier_product_doi:            10.5067/PACE/OCI/L3M/CHL/v1
    keywords:                          Earth Science > Oceans > Ocean Chemist...
    keywords_vocabulary:               NASA Global Change Master Directory (G...
    data_bins:                         1029726
    data_minimum:                      0.0041330624
    data_maximum:                      95.82805

A common reason to generate a single dataset from multiple, daily images is to create a composite. Compare the map from a single day …

chla = np.log10(dataset["chlor_a"])
chla.attrs.update(
    {
        "units": f'lg({dataset["chlor_a"].attrs["units"]})',
    }
)
plot = chla.sel(date = "2024-05-02").plot(aspect=2, size=4, cmap="GnBu_r")

../../_images/0caf7161928cb21998b5e8523d5a1c783321c43bd9477eec1c0a972b57c48567.png

… to a map of average values, skipping “NaN” values that result from clouds and the OCI’s tilt maneuver.

chla_avg = chla.mean("date")
chla_avg.attrs.update(
    {
        "long_name": chla.attrs["long_name"],
        "units": chla.attrs["units"],
    }
)
plot = chla_avg.plot(aspect=2, size=4, cmap="GnBu_r")

../../_images/94e8bb76c013730cd433c52c94c66fec90c2f246d73af5db98324a57d4a0eb00.png

We can also create a time series of mean values over the whole region.

chla_avg = chla.mean(dim=["lon", "lat"], keep_attrs=True)
plot = chla_avg.plot(linestyle='-', marker='o', color='b')

../../_images/cc1cb4e6167377a50872b8ebb2922dde60ae90ed9dabdd258853a4e326b685b3.png

You have completed the notebook on OCI file structure.