|
| 1 | +# xarray-multiscale |
| 2 | + |
| 3 | +Simple tools for creating multiscale representations of large images. |
| 4 | + |
| 5 | +## Installation |
| 6 | + |
| 7 | +`pip install xarray-multiscale` |
| 8 | + |
| 9 | +## Motivation |
| 10 | + |
| 11 | +Many image processing applications benefit from representing images at multiple scales (also known as [image pyramids] (https://en.wikipedia.org/wiki/Pyramid_(image_processing)). This package provides tools for generating lazy multiscale representations of N-dimensional data using [`xarray`](http://xarray.pydata.org/en/stable/) to ensure that the downsampled images have the correct axis coordinates. |
| 12 | + |
| 13 | +Why are coordinates important for this application? Because a downsampled image is typically scaled and *translated* relative to the source image. Without a coordinate-aware representation of the data, the scaling and translation information is easily lost. |
| 14 | + |
| 15 | + |
| 16 | +## Usage |
| 17 | + |
| 18 | +Generate a multiscale representation of a numpy array: |
| 19 | + |
| 20 | +```python |
| 21 | +from xarray_multiscale import multiscale, windowed_mean |
| 22 | +import numpy as np |
| 23 | + |
| 24 | +data = np.arange(4) |
| 25 | +multiscale(data, windowed_mean, 2) |
| 26 | +""" |
| 27 | +[<xarray.DataArray (dim_0: 4)> |
| 28 | +array([0, 1, 2, 3]) |
| 29 | +Coordinates: |
| 30 | +* dim_0 (dim_0) float64 0.0 1.0 2.0 3.0, |
| 31 | +<xarray.DataArray (dim_0: 2)> |
| 32 | +array([0, 2]) |
| 33 | +Coordinates: |
| 34 | +* dim_0 (dim_0) float64 0.5 2.5] |
| 35 | +""" |
| 36 | +``` |
| 37 | + |
| 38 | + |
| 39 | +By default, the values of the downsampled arrays are cast to the same data type as the input. This behavior can be changed with the ``preserve_dtype`` keyword argument to ``multiscale``. |
| 40 | + |
| 41 | +Generate a multiscale representation of an ``xarray.DataArray``: |
| 42 | + |
| 43 | + |
| 44 | +```python |
| 45 | +from xarray_multiscale import multiscale, windowed_mean |
| 46 | +from xarray import DataArray |
| 47 | +import numpy as np |
| 48 | + |
| 49 | +data = np.arange(16).reshape((4,4)) |
| 50 | +coords = (DataArray(np.arange(data.shape[0]), dims=('y',), attrs={'units' : 'm'}), |
| 51 | + DataArray(np.arange(data.shape[0]), dims=('x',), attrs={'units' : 'm'})) |
| 52 | + |
| 53 | +arr = DataArray(data, coords) |
| 54 | +multiscale(arr, windowed_mean, (2,2)) |
| 55 | +""" |
| 56 | +[<xarray.DataArray (y: 4, x: 4)> |
| 57 | +array([[ 0, 1, 2, 3], |
| 58 | + [ 4, 5, 6, 7], |
| 59 | + [ 8, 9, 10, 11], |
| 60 | + [12, 13, 14, 15]]) |
| 61 | +Coordinates: |
| 62 | +* y (y) int64 0 1 2 3 |
| 63 | +* x (x) int64 0 1 2 3, <xarray.DataArray (y: 2, x: 2)> |
| 64 | +array([[ 2, 4], |
| 65 | + [10, 12]]) |
| 66 | +Coordinates: |
| 67 | +* y (y) float64 0.5 2.5 |
| 68 | +* x (x) float64 0.5 2.5] |
| 69 | +""" |
| 70 | +``` |
| 71 | + |
| 72 | +Dask arrays work too. Note the control over output chunks via the ``chunks`` keyword argument. |
| 73 | + |
| 74 | +```python |
| 75 | +from xarray_multiscale import multiscale, windowed_mean |
| 76 | +import dask.array as da |
| 77 | + |
| 78 | +arr = da.random.randint(0, 255, (10,10,10)) |
| 79 | +multiscale(arr, windowed_mean, 2, chunks=2) |
| 80 | +""" |
| 81 | +[<xarray.DataArray 'randint-f83260ed51a44f24aeccd95bc23e73ae' (dim_0: 10, |
| 82 | + dim_1: 10, |
| 83 | + dim_2: 10)> |
| 84 | +dask.array<rechunk-merge, shape=(10, 10, 10), dtype=int64, chunksize=(2, 2, 2), chunktype=numpy.ndarray> |
| 85 | +Coordinates: |
| 86 | +* dim_0 (dim_0) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 |
| 87 | +* dim_1 (dim_1) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 |
| 88 | +* dim_2 (dim_2) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0, |
| 89 | +<xarray.DataArray 'astype-0c3c3e397345ddeedff07ecf2d9fad17' (dim_0: 5, |
| 90 | + dim_1: 5, dim_2: 5)> |
| 91 | +dask.array<rechunk-merge, shape=(5, 5, 5), dtype=int64, chunksize=(2, 2, 2), chunktype=numpy.ndarray> |
| 92 | +Coordinates: |
| 93 | +* dim_0 (dim_0) float64 0.5 2.5 4.5 6.5 8.5 |
| 94 | +* dim_1 (dim_1) float64 0.5 2.5 4.5 6.5 8.5 |
| 95 | +* dim_2 (dim_2) float64 0.5 2.5 4.5 6.5 8.5, |
| 96 | +<xarray.DataArray 'astype-675175a39bec4fea06b8668053458285' (dim_0: 2, |
| 97 | + dim_1: 2, dim_2: 2)> |
| 98 | +dask.array<astype, shape=(2, 2, 2), dtype=int64, chunksize=(2, 2, 2), chunktype=numpy.ndarray> |
| 99 | +Coordinates: |
| 100 | +* dim_0 (dim_0) float64 1.5 5.5 |
| 101 | +* dim_1 (dim_1) float64 1.5 5.5 |
| 102 | +* dim_2 (dim_2) float64 1.5 5.5] |
| 103 | +""" |
| 104 | +``` |
| 105 | + |
| 106 | +### Caveats |
| 107 | + |
| 108 | +* Arrays that are not evenly divisible by the downsampling factors will be trimmed as needed. If this behavior is undesirable, consider padding your array appropriately prior to downsampling. |
| 109 | +* For chunked arrays (e.g., dask arrays), the current implementation divides the input data into *contiguous* chunks. This means that attempting to use downsampling schemes based on sliding windowed smoothing will produce edge artifacts. |
| 110 | + |
| 111 | +### Development |
| 112 | + |
| 113 | +This project is developed using [`hatch`](https://hatch.pypa.io/latest/). |
| 114 | +Run tests with `hatch run test:pytest`. |
| 115 | +Serve docs with `hatch run docs:serve`. |
0 commit comments