-
Notifications
You must be signed in to change notification settings - Fork 115
Description
Describe the bug
Issue occurs when calling the zonal_stats function and passing a numpy array and affine, rather than a geo-referenced raster. If you have previously used numpy.ma to mask a portion of the data, that portion of the data is still included in stats calculation. I was expecting that pixels masked in this manner would be ignored during stats calculation, in the same way that nodata pixels are.
My current workaround is to set these pixels explicitly to the nodata value, which then results in them being discounted accordingly, but this is a little in-elegant, and more to the point I think the expectation is that, if zonal_stats supports numpy data, it should respect any masks already applied. This was certainly my expectation, and I've seen one or two other reports online of people being caught out by this.
To Reproduce
The below example shows roughly how to reproduce. Naturally it's trivial, and you would never do this in anger, you would just pass the raster directly to the zonal_stats function. Assume in any actual example, we are doing something else to the raster data in numpy-land other than simply reading it and masking it. Note that it does require you provide your own test data (a raster and intersecting geometry).
import rasterio
from rasterstats import zonal_stats
from numpy import ma
with rasterio.open("somerasterdataset.tif") as my_raster:
my_data = my_raster.read(1) # 8-bit, unsigned int, values 0-255
nodata = my_raster.nodata
affine = my_raster.transform
masked_data = ma.masked_greater(my_data, 100)
stats = zonal_stats(
geometry,
masked_data,
affine=affine,
nodata=nodata,
stats=['mean', 'count']
all_touched=True,
) # Still includes pixels > 100, so mean is too high
my_data[my_data>100] = nodata
stats = zonal_stats(
geometry,
my_data,
affine=affine,
nodata=nodata,
stats=['mean', 'count']
all_touched=True,
) # Works as I expect