I'm building a Windows Service application that takes as input a directory containing scanned images. My application will iterates through all images and for every image, it will perform some OCR operations in order to grab the barcode, invoice number and customer number.
Some background info:
My question:
Since it's doing stuff with images on the file system I'm unsure if it will really make a difference if I change my application in a way that it will use .NET Parallel Tasks.
Can anybody give me advice about this?
Many thanks!
If processing an image takes longer than reading N images from the disk, then processing multiple images concurrently is a win. Figure you can read a 2 MB file from disk in under 100 ms (including seek time). Figure one second to read 8 images into memory.
So if your image processing takes more than a second per image, I/O isn't a problem. Do it concurrently. You can scale that down if you need to (i.e. if processing takes 1/2 second, then you're probably best off with only 4 concurrent images).
You should be able to test this fairly quickly: write a program that randomly reads images off the disk, and calculate the average time to open, read, and close the file. Also write a program that processes a sample of the images and compute the average processing time. Those numbers should tell you whether or not concurrent processing will be helpful.
I think the answer is, 'It Depends'.
I'd try running the application with some type of Performance Monitoring (even the one in Task Manager) and see how high the CPU gets.
If the CPU is maxing out; it would improve performance to run it in paralell. If not, the disk is the bottleneck and without some other changes, you probably wouldn't get much (if any) gain.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With