Extending the Open Data Cube#

Beyond the configuration available in ODC, there are three extension points provided for implementing different types of data storage and indexing.

  • Drivers for Reading Data

  • Drivers for Writing Data

  • Alternative types of Index

Support for Plug-in drivers#

A light weight implementation of a driver loading system has been implemented in datacube/drivers/driver_cache.py which uses setuptools dynamic service and plugin discovery mechanism to name and define available drivers. This code caches the available drivers in the current environment, and allows them to be loaded on demand, as well as handling any failures due to missing dependencies or other environment issues.

Data Read Plug-ins#

Entry point group:

datacube.plugins.io.read.

Read plug-ins are specified as supporting particular uri protocols and formats, both of which are fields available on existing Datasets

A ReadDriver returns a DataSource implementation, which is chosen based on:

  • Dataset URI protocol, eg. s3://

  • Dataset format. As stored in the Data Cube Dataset.

  • Current system settings

  • Available IO plugins

If no specific datasource.storage.DataSource can be found, a default datacube.storage._rio.RasterDatasetDataSource is returned, which uses rasterio to read from the local file system or a network resource.

The DataSource maintains the same interface as before, which works at the individual dataset+time+band level for loading data. This is something to be addressed in the future.

See also odc-loader for a full-scale implementation.

Example code to implement a reader driver#

def init_reader_driver():
    return AbstractReaderDriver()

class AbstractReaderDriver(object):
    def supports(self, protocol: str, fmt: str) -> bool:
        pass
    def new_datasource(self, band: BandInfo) -> DataSource:
        return AbstractDataSource(band)

class AbstractDataSource(object):  # Same interface as before
    ...

Driver specific metadata will be present in BandInfo.driver_data if saved during write_dataset_to_storage

Example Pickle Based Driver#

Available in /examples/io_plugin. Includes an example setup.py as well as example Read and Write Drivers.

Data Write Plug-ins#

Entry point group:

datacube.plugins.io.write

Are selected based on their name. The storage.driver field has been added to the ingestion configuration file which specifies the name of the write driver to use. Drivers can specify a list of names that they can be known by, as well as publicly defining their output format, however this information isn’t used by the ingester to decide which driver to use. Not specifying a driver counts as an error, there is no default.

Data write support is not currently a priority and may be removed or relocated into another package in future.

See also odc-geo’s COG-writing methods.

Example code to implement a writer driver#

def init_writer_driver():
    return AbstractWriterDriver()

class AbstractWriterDriver(object):
    @property
    def aliases(self):
        return []  # List of names this writer answers to

    @property
    def format(self):
        return ''  # Format that this writer supports

    def mk_uri(self, file_path, storage_config):
        """
        Constructs a URI from the file_path and storage config.

        A typical implementation should return f'{scheme}://{file_path}'

        Example:
            file_path = '/path/to/my_file.nc'
            storage_config = {'driver': 'NetCDF CF'}

            mk_uri(file_path, storage_config) should return 'file:///path/to/my_file.nc'

        :param Path file_path: The file path of the file to be converted into a URI during the ingest process.
        :param dict storage_config: The dict holding the storage config found in the ingest definition.
        :return: file_path as a URI that the Driver understands.
        :rtype: str
        """
        return f'file://{file_path}'  # URI that this writer supports

    def write_dataset_to_storage(self, dataset, file_uri,
                                 global_attributes=None,
                                 variable_params=None,
                                 storage_config=None,
                                 **kwargs):
        ...
        return {}  # Can return extra metadata to be saved in the index with the dataset

Extra metadata will be saved into the database and loaded into BandInfo during a load operation.

NetCDF Writer Driver#

Name:

netcdf, NetCDF CF

Format:

NetCDF

Implementation:

datacube.drivers.netcdf.driver.NetcdfWriterDriver

Index Plug-ins#

Entry point group:

datacube.plugins.index

A connection to an Index is required to find data in the Data Cube.

ODC Configuration environments have an index_driver parameter, which specifies the name of the Index Driver to use. See ODC Configuration (Basics).

A set of abstract base classes are defined in datacube.index.abstract. An index plugin is expected to supply implementations of all these abstract base classes. If any abstract methods is not relevant to or implementable by a particular Index Driver, that method should defined to raise a NotImplementedError.

Legacy Implementation#

The legacy postgres index driver uses a PostgreSQL database for all storage and retrieval, with search supported by json indexes.

PostGIS Implementation#

The new postgis index driver uses a PostgreSQL database for all storage and retrieval, with spatial search using postgis spatial indexes.

Default Implementation#

Not specifying an index driver (or specifying default as the index driver) results in the postgres index driver. In a future release, this will switch to the postgis index driver.

Null Implementation#

datacube-core includes a minimal “null” index driver, that implements an index that is always empty. The code for this driver is located at datacube.index.null and can be used by setting the index_driver to null in the configuration file.

The null index driver may be useful:

  1. for ODC use cases where no database access is required;

  2. for testing scenarios where no database access is required; or

  3. as an example/template for developing other index drivers.

Memory Implementation#

datacube-core includes a non-persistent, local, in-memory index driver. The index is maintained in local memory and is not backed by a database. The code for this driver is located at datacube.index.memory and can be used by setting the index_driver to memory in the configuration file.

The memory index driver may be useful:

  1. for ODC use cases where there is no need for the index to be re-used beyond the current session;

  2. for testing scenarios where no index persistence is required; or

  3. as an example/template for developing other index drivers.

Drivers Plugin Management Module#

Drivers are registered in setup.py -> entry_points:

entry_points={
    'datacube.plugins.io.read': [
        'netcdf = datacube.drivers.netcdf.driver:reader_driver_init',
    ],
    'datacube.plugins.io.write': [
        'netcdf = datacube.drivers.netcdf.driver:writer_driver_init',
    ],
    'datacube.plugins.index': [
        'default = datacube.index.postgres.index:index_driver_init',
        'null = datacube.index.null.index:index_driver_init',
        *extra_plugins['index'],
    ],
}

These are drivers datacube-core ships with. When developing a custom driver one does not need to add them to datacube-core/setup.py, rather you have to define these in the setup.py of your driver package.

Data Cube Drivers API#

This module implements a simple plugin manager for storage and index drivers.

members:

imported-members:

datacube.drivers.index_driver_by_name(name)[source]#

Lookup index driver by name

Return type:

AbstractIndexDriver | None

Returns:

Initialised index driver instance

Returns:

None if driver with this name doesn’t exist

datacube.drivers.index_drivers()[source]#

Returns a set of driver names

Return type:

set[str]

datacube.drivers.new_datasource(band)[source]#

Returns a newly constructed data source to read dataset band data.

An appropriate DataSource implementation is chosen based on:

  • Dataset URI (protocol part)

  • Dataset format

  • Current system settings

  • Available IO plugins

This function will return the default RasterDatasetDataSource if no more specific DataSource can be found.

Parameters:

band (BandInfo) – The band to choose data source from.

Return type:

DataSource | None

datacube.drivers.reader_drivers()[source]#

Returns list driver names

Return type:

list[str]

datacube.drivers.storage_writer_by_name(name)[source]#

Lookup writer driver by name

Returns:

Initialised writer driver instance or None if driver with this name doesn’t exist

datacube.drivers.writer_drivers()[source]#

Returns list driver names

Return type:

list[str]

References and History#