06 - Spatial Regridding and masking data cubes¶
A DeepESDL example notebook¶
This notebook demonstrates how to access two different data sets, transform one of them so that both share the same spatial grid, and mask one dataset by applying a condition from the other dataset.
Please, also refer to the DeepESDL documentation and visit the platform's website for further information!
Brockmann Consult, 2023
This notebook runs with the python environment deepesdl-xcube-1.1.2
, please checkout the documentation for help on changing the environment.
# mandatory imports
from xcube.core.gridmapping import GridMapping
from xcube.core.resampling import resample_in_space
from xcube.core.store import new_data_store
from IPython.display import JSON
import numpy as np
import xarray as xr
import datetime
Open CCI store
cci_store = new_data_store('cciodp')
Open s3 store and list all datasets
root = "deep-esdl-public"
s3_store = new_data_store("s3", root=root)
list(s3_store.get_data_ids())
['LC-1x2160x2160-1.0.0.levels', 'SMOS-freezethaw-1x720x720-1.0.1.zarr', 'SMOS-freezethaw-4267x10x10-1.0.1.zarr', 'black-sea-1x1024x1024.levels', 'black-sea-256x128x128.zarr', 'esa-cci-permafrost-1x1151x1641-0.0.2.levels', 'esdc-8d-0.25deg-1x720x1440-3.0.1.zarr', 'esdc-8d-0.25deg-256x128x128-3.0.1.zarr', 'ocean-1M-9km-1x1080x1080-1.4.0.levels', 'ocean-1M-9km-64x256x256-1.4.0.zarr', 'polar-100m-1x2048x2048-1.0.1.zarr']
Open Land Cover (LC) dataset from s3 bucket, which is saved as a multilevel dataset:
ml_LC = s3_store.open_data('LC-1x2160x2160-1.0.0.levels', decode_cf = True)
Lets open level 0, which is the base level and therefor has the highest resolution:
LC = ml_LC.get_dataset(0)
LC
<xarray.Dataset> Dimensions: (time: 11, lat: 64800, lon: 129600, bounds: 2) Coordinates: * lat (lat) float64 90.0 90.0 89.99 ... -89.99 -90.0 -90.0 * lon (lon) float64 -180.0 -180.0 -180.0 ... 180.0 180.0 * time (time) datetime64[ns] 2010-01-01 ... 2020-01-01 Dimensions without coordinates: bounds Data variables: change_count (time, lat, lon) uint8 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> crs (time) int32 dask.array<chunksize=(11,), meta=np.ndarray> current_pixel_state (time, lat, lon) float32 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> lat_bounds (time, lat, bounds) float64 dask.array<chunksize=(1, 2160, 2), meta=np.ndarray> lccs_class (time, lat, lon) uint8 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> lon_bounds (time, lon, bounds) float64 dask.array<chunksize=(1, 2160, 2), meta=np.ndarray> observation_count (time, lat, lon) uint16 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> processed_flag (time, lat, lon) float32 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> time_bounds (time, bounds) datetime64[ns] dask.array<chunksize=(1, 2), meta=np.ndarray> Attributes: (12/38) Conventions: CF-1.6 TileSize: 2025:2025 cdm_data_type: grid comment: contact: https://www.ecmwf.int/en/about/contact-us/get... creation_date: 20181130T095431Z ... ... time_coverage_end: 20101231 time_coverage_resolution: P1Y time_coverage_start: 20100101 title: Land Cover Map of ESA CCI brokered by CDS tracking_id: 96ac9aca-1ca7-45c6-b4a5-ab448c692646 type: ESACCI-LC-L4-LCCS-Map-300m-P1Y
Now let's search for fire datasets provided via the xcube cci store:
iterator = cci_store.search_data(cci_attrs=dict(ecv='FIRE'))
JSON([item.to_dict() for item in iterator])
/home/conda/deepesdl/4f055d4cb9f5a7bb59c11d4f960211d16d612f0f49fe5f6f4a110ccac09352f7-20230630-130445-100057-279-xcube-1.1.2/lib/python3.11/site-packages/xcube_cci/cciodp.py:1260: CciOdpWarning: Variable "vegetation_class_name" has no fill value, cannot set one. For parts where no data is available you will see random values. This is usually the case when data is missing for a time step. warnings.warn(f'Variable "{fixed_key}" has no fill value, '
<IPython.core.display.JSON object>
Open Fire dataset from xcube cci store:
fire = cci_store.open_data('esacci.FIRE.mon.L4.BA.MODIS.Terra.MODIS_TERRA.v5-1.grid')
fire
/home/conda/deepesdl/4f055d4cb9f5a7bb59c11d4f960211d16d612f0f49fe5f6f4a110ccac09352f7-20230630-130445-100057-279-xcube-1.1.2/lib/python3.11/site-packages/xcube_cci/chunkstore.py:299: UserWarning: Variable 'vegetation_class_name' is encoded as string. Will omit it from the dataset. warnings.warn(f"Variable '{variable_name}' is encoded as "
<xarray.Dataset> Dimensions: (time: 240, lat: 720, lon: 1440, vegetation_class: 18, nv: 2, bnds: 2) Coordinates: * lat (lat) float32 89.88 89.62 ... -89.62 -89.88 lat_bnds (lat, nv) float32 dask.array<chunksize=(720, 2), meta=np.ndarray> * lon (lon) float32 -179.9 -179.6 ... 179.6 179.9 lon_bnds (lon, nv) float32 dask.array<chunksize=(1440, 2), meta=np.ndarray> * time (time) datetime64[ns] 2001-01-16T12:00:0... time_bnds (time, bnds) datetime64[ns] dask.array<chunksize=(240, 2), meta=np.ndarray> * vegetation_class (vegetation_class) int32 10 20 ... 170 180 Dimensions without coordinates: nv, bnds Data variables: burned_area (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray> burned_area_in_vegetation_class (time, vegetation_class, lat, lon) float32 dask.array<chunksize=(1, 18, 360, 360), meta=np.ndarray> fraction_of_burnable_area (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray> fraction_of_observed_area (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray> number_of_patches (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray> standard_error (time, lat, lon) float32 dask.array<chunksize=(1, 720, 1440), meta=np.ndarray> Attributes: Conventions: CF-1.7 title: esacci.FIRE.mon.L4.BA.MODIS.Terra.MODIS_TERRA.v5... date_created: 2023-08-30T14:33:39.046347 processing_level: L4 time_coverage_start: 2001-01-01T00:00:00 time_coverage_end: 2021-01-01T00:00:00 time_coverage_duration: P7305DT0H0M0S history: [{'program': 'xcube_cci.chunkstore.CciChunkStore...
Subsetting time and space for the sake of an efficient example
Temporal subset, entire 2020:
start_date = datetime.datetime(2020,1,1)
stop_date = datetime.datetime(2020,12,31)
Spatial subset, southern/central Europe:
min_lat = 52.5
max_lat = 36.9
min_lon = -10.
max_lon = 22.9
Subsetting the Land Cover dataset, results in only one time slice per year for LC:
LC = LC.sel(lat=slice(min_lat, max_lat),
lon=slice(min_lon, max_lon),
time=slice(start_date, stop_date))
LC.lccs_class.isel(time=0).plot.imshow()
<matplotlib.image.AxesImage at 0x7f97b5d1dc50>
LC
<xarray.Dataset> Dimensions: (time: 1, lat: 5616, lon: 11844, bounds: 2) Coordinates: * lat (lat) float64 52.5 52.5 52.49 52.49 ... 36.91 36.9 36.9 * lon (lon) float64 -9.999 -9.996 -9.993 ... 22.89 22.9 22.9 * time (time) datetime64[ns] 2020-01-01 Dimensions without coordinates: bounds Data variables: change_count (time, lat, lon) uint8 dask.array<chunksize=(1, 1620, 1440), meta=np.ndarray> crs (time) int32 dask.array<chunksize=(1,), meta=np.ndarray> current_pixel_state (time, lat, lon) float32 dask.array<chunksize=(1, 1620, 1440), meta=np.ndarray> lat_bounds (time, lat, bounds) float64 dask.array<chunksize=(1, 1620, 2), meta=np.ndarray> lccs_class (time, lat, lon) uint8 dask.array<chunksize=(1, 1620, 1440), meta=np.ndarray> lon_bounds (time, lon, bounds) float64 dask.array<chunksize=(1, 1440, 2), meta=np.ndarray> observation_count (time, lat, lon) uint16 dask.array<chunksize=(1, 1620, 1440), meta=np.ndarray> processed_flag (time, lat, lon) float32 dask.array<chunksize=(1, 1620, 1440), meta=np.ndarray> time_bounds (time, bounds) datetime64[ns] dask.array<chunksize=(1, 2), meta=np.ndarray> Attributes: (12/38) Conventions: CF-1.6 TileSize: 2025:2025 cdm_data_type: grid comment: contact: https://www.ecmwf.int/en/about/contact-us/get... creation_date: 20181130T095431Z ... ... time_coverage_end: 20101231 time_coverage_resolution: P1Y time_coverage_start: 20100101 title: Land Cover Map of ESA CCI brokered by CDS tracking_id: 96ac9aca-1ca7-45c6-b4a5-ab448c692646 type: ESACCI-LC-L4-LCCS-Map-300m-P1Y
Subsetting the Fire dataset:
fire = fire.sel(lat=slice(min_lat, max_lat),
lon=slice(min_lon, max_lon),
time=slice(start_date, stop_date))
fire.burned_area.isel(time=0).plot.imshow(vmin=-1)
<matplotlib.image.AxesImage at 0x7f38542adb90>
Resample to the same Grid
Here we use xcube's GridMapping method to extract the specification of both grids. Fire is the grid to be transformed.
source_gm = GridMapping.from_dataset(fire)
source_gm
class: RegularGridMapping
- is_regular: True
- is_j_axis_up: False
- is_lon_360: False
- crs: EPSG:4326
- xy_res: (0.25, 0.25)
- xy_bbox: (-10, 37, 23, 52.5)
- ij_bbox: (0, 0, 132, 62)
- xy_dim_names: ('lon', 'lat')
- xy_var_names: ('lon', 'lat')
- size: (132, 62)
- tile_size: (132, 62)
The target grid mapping is the one from the Land Cover dataset:
target_gm = GridMapping.from_dataset(LC)
target_gm
class: RegularGridMapping
- is_regular: True
- is_j_axis_up: False
- is_lon_360: False
- crs: EPSG:4326
- xy_res: (0.002777775, 0.002777775)
- xy_bbox: (-10, 36.90000000138889, 22.899999998611108, 52.4999999986111)
- ij_bbox: (0, 0, 11844, 5616)
- xy_dim_names: ('lon', 'lat')
- xy_var_names: ('lon', 'lat')
- size: (11844, 5616)
- tile_size: (2160, 2160)
Now we resample Fire to the grid provided by Land Cover:
resampled_fire = resample_in_space(fire,
source_gm=source_gm,
target_gm=target_gm)
Lets compare the different grid mappings, to see whether the resampeled fire datasets has the desired grid mapping now:
Our source gridmapping, the original fire grid mapping:
source_gm.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
The grid mapping of Land Cover which was the target grid mapping:
target_gm.crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
The grid mapping of our resampeled Fire, which is as expeced the same as the target grid mapping.
GridMapping.from_dataset(resampled_fire).crs
<Geographic 2D CRS: EPSG:4326> Name: WGS 84 Axis Info [ellipsoidal]: - Lat[north]: Geodetic latitude (degree) - Lon[east]: Geodetic longitude (degree) Area of Use: - name: World. - bounds: (-180.0, -90.0, 180.0, 90.0) Datum: World Geodetic System 1984 ensemble - Ellipsoid: WGS 84 - Prime Meridian: Greenwich
Create a mask from LC classes
Now that we have our two datasets with the same gridmapping we can use the land cover classes to mask the Fire dataset.
LC.lccs_class.attrs['flag_values'] = np.array(LC.lccs_class.attrs['flag_values'])
Extracting only categories, which concern trees:
tree_categories = [[LC.lccs_class.attrs['flag_values'][i],j] for i,j in enumerate(LC.lccs_class.attrs["flag_meanings"].split(" ")) if "tree" in j]
tree_categories
[[12, 'cropland_rainfed_tree_or_shrub_cover'], [50, 'tree_broadleaved_evergreen_closed_to_open'], [60, 'tree_broadleaved_deciduous_closed_to_open'], [61, 'tree_broadleaved_deciduous_closed'], [62, 'tree_broadleaved_deciduous_open'], [70, 'tree_needleleaved_evergreen_closed_to_open'], [71, 'tree_needleleaved_evergreen_closed'], [72, 'tree_needleleaved_evergreen_open'], [80, 'tree_needleleaved_deciduous_closed_to_open'], [81, 'tree_needleleaved_deciduous_closed'], [82, 'tree_needleleaved_deciduous_open'], [90, 'tree_mixed'], [100, 'mosaic_tree_and_shrub'], [151, 'sparse_tree'], [160, 'tree_cover_flooded_fresh_or_brakish_water'], [170, 'tree_cover_flooded_saline_water']]
LC_tree_mask = LC.lccs_class.isin([e[0] for e in tree_categories]).compute()
Although the coordiantes of the LC dataset and the resampled_fire dataset are almost identical, tiny numerical differences would prevent the masking from working. Thus, the coordinates from the resampled_fire dataset are assigned as the coordinates to the LC_tree_mask
LC_tree_mask = LC_tree_mask.assign_coords(lat = resampled_fire.lat,
lon = resampled_fire.lon).compute()
LC_tree_mask.plot()
<matplotlib.collections.QuadMesh at 0x7fd8c2c6aed0>
For the masking to work, both data arrays must have identical coordinates, we thus enforce this condition by assigning coordinates and merging into one dataset
Because both dasets share the same spatial grid, we can mask the burned_area variable of the fire dataset with the LC_tree_mask and create a new data variable into the resampled_fire datacube. By selecting the only time slice, we remove the time information here.
resampled_fire['burned_tree_area'] = resampled_fire.burned_area.where(LC_tree_mask.isel(time=0))
resampled_fire
<xarray.Dataset> Dimensions: (time: 12, lat: 5616, lon: 11844, vegetation_class: 18, bnds: 2) Coordinates: * time (time) datetime64[ns] 2020-01-16T12:00:0... * vegetation_class (vegetation_class) int32 10 20 ... 170 180 time_bnds (time, bnds) datetime64[ns] dask.array<chunksize=(12, 2), meta=np.ndarray> * lon (lon) float64 -9.999 -9.996 ... 22.9 22.9 * lat (lat) float64 52.5 52.5 52.49 ... 36.9 36.9 lon_bnds (lon, bnds) float64 -10.0 -9.997 ... 22.9 lat_bnds (lat, bnds) float64 52.5 52.5 ... 36.9 36.9 Dimensions without coordinates: bnds Data variables: burned_area (time, lat, lon) float32 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> burned_area_in_vegetation_class (time, vegetation_class, lat, lon) float32 dask.array<chunksize=(1, 1, 2160, 2160), meta=np.ndarray> fraction_of_burnable_area (time, lat, lon) float32 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> fraction_of_observed_area (time, lat, lon) float32 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> number_of_patches (time, lat, lon) float32 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> standard_error (time, lat, lon) float32 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> burned_tree_area (time, lat, lon) float32 dask.array<chunksize=(1, 2160, 2160), meta=np.ndarray> Attributes: Conventions: CF-1.7 title: esacci.FIRE.mon.L4.BA.MODIS.Terra.MODIS_TERRA.v5... date_created: 2023-08-30T14:33:39.046347 processing_level: L4 time_coverage_start: 2001-01-01T00:00:00 time_coverage_end: 2021-01-01T00:00:00 time_coverage_duration: P7305DT0H0M0S history: [{'program': 'xcube_cci.chunkstore.CciChunkStore...
Plot the burned area of the fire dataset for the first time stamp.
resampled_fire.burned_area.isel(time=0).plot.imshow(vmin=-1)
<matplotlib.image.AxesImage at 0x7fe6a846a190>
Now lets plot the burned area with the Landcover information. Note, that burned area which is 0 but within the tree category will show up with the value 0 in the plot:
resampled_fire.burned_tree_area.isel(time=0).plot.imshow(vmin=-1)
<matplotlib.image.AxesImage at 0x7fe6b039a990>
Lets save the resampled fire datacube locally, so we can visualize it faster.
resampled_fire.to_zarr("resampled_fire.zarr")
<xarray.backends.zarr.ZarrStore at 0x7fe679a06260>
from xcube.webapi.viewer import Viewer
resampled_fire = xr.open_zarr("resampled_fire.zarr")
viewer = Viewer()
viewer.add_dataset(resampled_fire)
'1d292bff-4857-455d-bb10-059ebea6843e'
viewer.info()
Server: https://deep.earthsystemdatalab.net/user/alicebalfanz/proxy/8003 Viewer: https://deep.earthsystemdatalab.net/user/alicebalfanz/proxy/8003/viewer/?serverUrl=https://deep.earthsystemdatalab.net/user/alicebalfanz/proxy/8003
viewer.show()
WARNING:tornado.general:404 GET /viewer/config/config.json (127.0.0.1): xcube viewer has not been been configured WARNING:tornado.general:404 GET /viewer/config/config.json (127.0.0.1): xcube viewer has not been been configured WARNING:tornado.access:404 GET /viewer/config/config.json (127.0.0.1) 6.21ms WARNING:tornado.access:404 GET /viewer/config/config.json (127.0.0.1) 6.21ms WARNING:tornado.general:404 GET /viewer/config/config.json (127.0.0.1): xcube viewer has not been been configured WARNING:tornado.general:404 GET /viewer/config/config.json (127.0.0.1): xcube viewer has not been been configured WARNING:tornado.access:404 GET /viewer/config/config.json (127.0.0.1) 3.07ms WARNING:tornado.access:404 GET /viewer/config/config.json (127.0.0.1) 3.07ms