I have a relatively large RGBA image (converted to numpy) that I need to replace all colors which do not appear in a list. How could I do this in a pythonic fast way?
Using simple iteration I have a solution to this problem, however due to the images being quite large (2500 x 2500) this process is very slow.
# Keep only these colors in the image, otherwise replace with (0,255,0,255)
palette = [[0,0,0,255],[0, 255, 0,255], [255, 0, 0,255], [128, 128, 128,255], [0, 0, 255,255], [255, 0, 255,255], [0, 255, 255,255], [255, 255, 255,255], [128, 128, 0,255], [0, 128, 128,255], [128, 0, 128,255]]
# Current slow solution with a 2500 x 2500 x 4 array (mask)
for z in range(mask.shape[0]):
for y in range(mask.shape[1]):
if (mask[z,y,:].tolist() not in palette):
mask[z, y] = (0,255,0,255)
Expected operating time per image: less than half a minute
Current time: two minutes
That's definitely not some time windows you should be looking at. Here's an approach with broadcasting:
# palette.shape == (4,11)
palette = np.array(palette).transpose()
# sample a.shape == (2,2,4)
a= np.array([[[ 28, 231, 203, 235],
[255, 0, 0,255]],
[[ 50, 152, 36, 151],
[252, 43, 63, 25]]])
# mask
# all(2) force all channels to be equal
# any(-1) matches any color
mask = (a[:,:,:, None] == palette).all(2).any(-1)
# replace color
rep_color = np.array([0,255,0,255])
# np.where to the rescue:
ret = np.where(mask[:,:,None], a, rep_color[None,None,:])
The sample:

becomes

and for a = np.random.randint(0,256, (2500,2500,4)), it takes:
5.26 s ± 179 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
Update: if you forces everything to be np.uint8 you can merge the channels to an int32 and get even faster speed:
a = np.random.randint(0,256, (2500,2500,4), dtype=np.uint8)
p = np.array(palette, dtype=np.uint8).transpose()
# zip the data into 32 bits
# could be even faster if we handle the memory directly
aa = a[:,:,0] * (2**24) + a[:,:,1]*(2**16) + a[:,:,2]*(2**8) + a[:,:,3]
pp = p[0]*(2**24) + p[1]*(2**16) + p[2]*(2**8) + p[3]
mask = (aa[:,:,None]==pp).any(-1)
ret = np.where(mask[:,:,None], a, rep_color[None,None,:])
which takes:
1.34 s ± 29.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
i had a go with pyvips. This is a threaded, streaming image processing library, so it's fast and doesn't need much memory.
import sys
import pyvips
from functools import reduce
# Keep only these colors in the image, otherwise replace with (0,255,0,255)
palette = [[0,0,0,255], [0, 255, 0,255], [255, 0, 0,255], [128, 128, 128,255], [0, 0, 255,255], [255, 0, 255,255], [0, 255, 255,255], [255, 255, 255,255], [128, 128, 0,255], [0, 128, 128,255], [128, 0, 128,255]]
im = pyvips.Image.new_from_file(sys.argv[1], access="sequential")
# test our image against each sample ... bandand() will AND all image bands
# together, ie. we want pixels where they all match
masks = [(im == colour).bandand() for colour in palette]
# OR all the masks together to find pixels which are in the palette
mask = reduce((lambda x, y: x | y), masks)
# pixels not in the mask become [0, 255, 0, 255]
im = mask.ifthenelse(im, [0, 255, 0, 255])
im.write_to_file(sys.argv[2])
With a 2500x 2500 pixel PNG on this 2015 i5 laptop I see:
$ /usr/bin/time -f %M:%e ./replace-pyvips.py ~/pics/x.png y.png
55184:0.92
So a max of 55mb of memory, and 0.92s of elapsed time.
I tried Quang Hoang's excellent numpy version for comparison:
p = np.array(palette).transpose()
# mask
# all(2) force all channels to be equal
# any(-1) matches any color
mask = (a[:,:,:, None] == p).all(2).any(-1)
# replace color
rep_color = np.array([0,255,0,255])
# np.where to the rescue:
a = np.where(mask[:,:,None], a, rep_color[None,None,:])
im = Image.fromarray(a.astype('uint8'))
im.save(sys.argv[2])
Run on the same 2500 x 2500 pixel image:
$ /usr/bin/time -f %M:%e ./replace-broadcast.py ~/pics/x.png y.png
413504:3.08
A peak of 410MB of memory, and 3.1s elapsed.
Both versions could be sped up further by comparing uint32, as Hoang says.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With