Page on storage formats#325
Conversation
|
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
|
🎊 PR Preview 1868ad2 has been successfully built and deployed to https://xarray-contrib-xarray-tutorial-preview-pr-325.surge.sh 🕐 Build time: 0.011s 🤖 By surge-preview |
| @@ -0,0 +1,191 @@ | |||
| { | |||
There was a problem hiding this comment.
I did a read-through of the new HDF5 / NetCDF-4 section and a bit of updated text for HDF5 if this feels helpful ... (see block below)...
`suggestion ## HDF5 HDF5 (Hierarchical Data Format, version 5) is a **general-purpose container** for large, heterogeneous, hierarchical data. It includes these core components: * **Groups** *Nodes* in a directed graph that starts at the root/. They behave like folders in a UNIX filesystem (absolute paths,/sub/group/dataset), and *may* form cycles or self-links—although most scientific tools avoid that complexity. * **Datasets** Rectangular N-dimensional arrays stored inside groups. Each dimension can optionally carry a **dimension scale**, an auxiliary dataset that describes the coordinate values along that axis. * **Attributes** Small pieces of metadata (strings, scalars, short arrays) attached to the file, any group, or any dataset. * **Storage features** Chunking, compression, checksums, parallel I/O via MPI-IO, and more. These are orthogonal to the logical data model. ## NetCDF4`
...
Reply via ReviewNB
| @@ -0,0 +1,191 @@ | |||
| { | |||
There was a problem hiding this comment.
Here is at least parts of TIFF/GeoTiff to use:
I think we can add something on raserio/GDAL here though that I did not add...
https://docs.xarray.dev/en/stable/user-guide/io.html#rasterio
---
## TIFF & GeoTIFF
TIFF (Tag Image File Format) is a *flexible* raster container widely used in remote sensing and GIS.
A **GeoTIFF** is simply a TIFF that stores additional additional georeferencing information tags (CRS, affine transform, etc.) so software knows where each pixel sits on Earth.
### Core ideas
* **Images (“IFDs”)** – each “page” in a TIFF holds a 2-D array of pixels.
Multi-band rasters (e.g. RGB, multi-spectral) appear as *separate* IFDs or as extra samples within one IFD.
* **Tags** – key–value metadata pairs (datatype, compression, nodata value, CRS, resolution, etc.).
GeoTIFF adds standardised tags like ModelPixelScaleTag, ModelTiepointTag, GeoKeyDirectoryTag.
* **Compression / tiling** – DEFLATE, LZW, etc. Tiling lets software fetch small windows efficiently.
### Practical notes for xarray users
* **Read** – use rioxarray.open_rasterio() (wraps rasterio) to get an immediate, Dask-chunked DataArray.
* **Write** – DataArray.rio.to_raster("out.tif"); choose compression + tiling via driver_kwargs.
* **Dimensionality** – TIFF is inherently 2-D per band; no native time or vertical axis. If you need 4-D data, NetCDF or Zarr is usually a better fit.
* **Metadata depth** – single-level tags only (no nested groups). For rich hierarchies, stick to HDF5 / NetCDF-4.
* **Cloud-optimized GeoTIFF (COG)** – same format, arranged so HTTP range requests can stream windows efficiently; xarray handles it transparently when rasterio is compiled with libcurl.
Reply via ReviewNB
|
@negin513 thanks for your comments - I think this is ready for review / merge. |

Closes part of #321