Skip to content

zonalstats includes numpy masked data areas in stats calculations #246

@sebastianclarke

Description

@sebastianclarke

Describe the bug
Issue occurs when calling the zonal_stats function and passing a numpy array and affine, rather than a geo-referenced raster. If you have previously used numpy.ma to mask a portion of the data, that portion of the data is still included in stats calculation. I was expecting that pixels masked in this manner would be ignored during stats calculation, in the same way that nodata pixels are.

My current workaround is to set these pixels explicitly to the nodata value, which then results in them being discounted accordingly, but this is a little in-elegant, and more to the point I think the expectation is that, if zonal_stats supports numpy data, it should respect any masks already applied. This was certainly my expectation, and I've seen one or two other reports online of people being caught out by this.

To Reproduce

The below example shows roughly how to reproduce. Naturally it's trivial, and you would never do this in anger, you would just pass the raster directly to the zonal_stats function. Assume in any actual example, we are doing something else to the raster data in numpy-land other than simply reading it and masking it. Note that it does require you provide your own test data (a raster and intersecting geometry).

import rasterio
from rasterstats import zonal_stats
from numpy import ma

with rasterio.open("somerasterdataset.tif") as my_raster:
    my_data = my_raster.read(1)    # 8-bit, unsigned int, values 0-255
    nodata = my_raster.nodata
    affine = my_raster.transform

masked_data = ma.masked_greater(my_data, 100)

stats = zonal_stats(
    geometry,
    masked_data,
    affine=affine,
    nodata=nodata,
    stats=['mean', 'count']
    all_touched=True,
 )    # Still includes pixels > 100, so mean is too high

my_data[my_data>100] = nodata

stats = zonal_stats(
    geometry,
    my_data,
    affine=affine,
    nodata=nodata,
    stats=['mean', 'count']
    all_touched=True,
 )    # Works as I expect

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions