Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Using Apache Spark and OpenCV for image analysis

I want to do some image analysis on a large amount of images (thousands) and I want to try to use Spark to speed this up. For testing purposes I am using docker compose to setup a standalone cluster locally.

I want to do some basic analysis such as computing gradients, edge detection, etc. I can successfully load my images into a dataframe using:

images = spark.read.format("image").option("dropInvalid", True).load("/opt/spark-data/")

I tried to call OpenCV functions such as Sobel, using udf. But I am unable to load the image data into a format that OpenCV can work with.

Is there any way I can convert the image data in a way such that I can use OpenCV functions? Or are there better ways to do this than using OpenCV?

like image 433
SilverTear Avatar asked Jan 27 '26 21:01

SilverTear


1 Answers

I was able to make this work from help from this post.

def convertImageGeneric( image, fa , down_width = 500, down_height = 500):
 import numpy as np
 import cv2
 fa = cv2.SIFT_create(400)
 cv2_image = cv2.cvtColor(
       np.reshape(image.data, (image.height, image.width, image.nChannels)), # this handles the image conversion
       cv2.COLOR_BGR2GRAY
      )
 preds = fa.detect( image , None )
 return (image.origin, Vectors.dense(no_more_numpy(preds)) )
like image 110
Matt Andruff Avatar answered Jan 30 '26 23:01

Matt Andruff