I am trying to automate the instrumentation which i am doing in the application, but the problem is i am dealing with applications which do not exits by itself after processing. For example take any pdfviewer/reader, If i open a file the file is displayed and i can see that the application has processed the file.
By processing of file by the application, what i mean is that file has been displayed successfully by the application.
Application can be any GUI pdf viewer for ex adobe reader, xpdf, foxitreader or any image viewer for ex gpicview, etc. File formats can be of any type not any specific file format.
Also i don't have the source code of an application, i am dealing with binary of the application.
But while automating the process, i want to know when the application has processed the file. What i can initially think is there would be some basic blocks which says that after it is executed it has finished processing the file and exit my instrumentation when the particular basic block has executed.
But the problem here is how to identify that basic block ?
Probably the easiest and most reliable thing that you can do automatically for black-box executables is to look at their CPU usage. When they're done loading, all their threads should be (mostly) idle, maybe waking up occasionally if they wait for events with a non-infinite timeout. (And from miscellaneous GUI events like mouse movement).
Make sure you wait long enough to detect the difference between blocked on disk I/O vs. blocked waiting for user input. (On Unix-like OSes, this is the difference between Disk-sleep and Sleep, as shown by D vs. S in stuff like top's process list.)
If you don't want to rely on the OS to detect disk-sleep vs. regular sleep, just wait a few times longer than the maximum disk I/O request service time (~= a few times disk latency, lower if the process under test is the only process doing I/O). If the black-box process hasn't used any CPU time in that interval, you can assume it's done loading and is displaying the file on screen.
Of course, as @Ped7g points out, it may not have parsed the entire file. It may load it lazily, on-demand, as the user scrolls through a large PDF for example. Watching CPU time should be a reasonable way to detect when a process has finished updating after programmatically sending it a page-down command.
I think you should be able to get good reliable results from this. You might need a heuristic that considers multiple inputs, like system I/O performance or outstanding disk-IO requests, if you want to reliably decide that a process is done loading without waiting as long as the worst-possible case.
As discussed in comments, looking for the process to reach EOF on a file descriptor is not reliable for this purpose (it might mmap it). I'll leave this here in case it's interesting or useful for anyone, but for your use you might want to ignore this entirely. At best, you might use this as an input to your heuristic for deciding when a process is done loading.
On most OSes, there are some facilities for processes to trace other processes. On Linux, the main one is the ptrace API. Commands like strace use it to trace system calls. I believe Windows has something similar, and I assume OS X does, too.
So you can look for the open() system call on the PDF to find the right fd, then look for mmap, read(), and close() system calls on it. If read() returns 0, it's at EOF. If it's closed without mmap, the process is done with it (unless it opens it again, or used dup() or dup2() for some reason).
You could parse strace's text output, or use the ptrace API yourself.
Alternatively, on Linux you can look at the file position in /proc/<PID>/fdinfo/<FD>. Other OSes probably have similar facilities for seeing the file position of open file descriptors / file handles.
For example, I happen to have evince open displaying a PDF. In `/proc/
$ ll /proc/4241/fd
...
lr-x------ 1 peter peter 64 Oct 21 06:43 14 -> /f/p/docs/agner_fog.microarchitecture.pdf # is anyone really surprised this is the PDF I had open? :P
...
$ ls -lL /proc/4241/fd/14 # follow the symlink to see the file size
-rw-rw-r-- 1 peter peter 2078709 Feb 4 2016 /proc/4241/fd/14
$ m /proc/4241/fdinfo/14 # alias for less
pos: 2078709
flags: 0100000
mnt_id: 49
This confirms my guess the evince will have the file position at EOF when it's done reading the file. You should probably wait several milliseconds and check again, in case the software under test loops over the file again.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With