Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Out-of-memory error in parfor: kill the slave, not the master

When an out-of-memory error is raised in a parfor, is there any way to kill only one Matlab slave to free some memory instead of having the entire script terminate?

Here is what happens by default when an out-of-memory error occurs in a parfor: the script terminated, as shown in the screenshot below.

enter image description here

I wish there was a way to just kill one slave (i.e. removing a worker from parpool) or stop using it to release as much memory as possible from it:

enter image description here

like image 590
Franck Dernoncourt Avatar asked Jan 30 '26 15:01

Franck Dernoncourt


1 Answers

If you get a out of memory in the master process there is no chance to fix this. For out of memory on the slave, this should do it:

The simple idea of the code: Restart the parfor again and again with the missing data until you get all results. If one iteration fails, a flag (file) is written which let's all iterations throw an error as soon as the first error occurred. This way we get "out of the loop" without wasting time producing other out of memory.

%Your intended iterator
iterator=1:10;
%flags which indicate what succeeded
succeeded=false(size(iterator));
%result array
result=nan(size(iterator));
FLAG='ANY_WORKER_CRASHED';
while ~all(succeeded)
    fprintf('Another try\n')
    %determine which iterations should be done
    todo=iterator(~succeeded);
    %initialize array for the remaining results
    partresult=nan(size(todo));
    %initialize flags which indicate which iterations succeeded (we can not
    %throw erros, it throws aray results)
    partsucceeded=false(size(todo));
    %flag indicates that any worker crashed. Have to use file based
    %solution, don't know a better one. #'
    delete(FLAG);
    try
    parfor falseindex=1:sum(~succeeded)
        realindex=todo(falseindex);
        try
            % The flag is used to let all other workers jump out of the
            % loop as soon as one calculation has crashed.
            if exist(FLAG,'file')
                error('some other worker crashed');
            end
            % insert your code here
            %dummy code which randomly trowsexpection
            if rand<.5
                error('hit out of memory')
            end
            partresult(falseindex)=realindex*2
            % End of user code
            partsucceeded(falseindex)=true;
            fprintf('trying to run %d and succeeded\n',realindex)
        catch ME
            % catch errors within workers to preserve work
            partresult(falseindex)=nan
            partsucceeded(falseindex)=false;
            fprintf('trying to run %d but it failed\n',realindex)
            fclose(fopen(FLAG,'w'));
        end
    end
    catch
        %reduce poolsize by 1
        newsize = matlabpool('size')-1;
        matlabpool close
        matlabpool(newsize) 
    end
    %put the result of the current iteration into the full result
    result(~succeeded)=partresult;
    succeeded(~succeeded)=partsucceeded;
end
like image 77
Daniel Avatar answered Feb 02 '26 07:02

Daniel



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!