OpenCL 1.1 specification says:
cl_int clEnqueueBarrier(cl_command_queue command_queue)
clEnqueueBarrier is a synchronization point that ensures that all queued commands in command_queue have finished execution before the next batch of commands can begin execution.
cl_int clFinish(cl_command_queue command_queue)
Blocks until all previously queued OpenCL commands in command_queue are issued to the associated device and have completed. clFinish does not return until all queued commands in command_queue have been processed and completed. clFinish is also a synchronization point.
Should have to do something with the in-order or out-of-order execution, but I can't see the difference. Are they ever needed if I have in-order execution? At the moment I do something like:
...
for(...){
    clEnqueuNDRangeKernel(...);
    clFlush(command_queue);
    clFinish(command_queue);
}
...
on an Nvidia GPU. Any relevant comment is appreciated.
You need to enqueue a barrier if you are writing an out-of-order queue as one method of ensure dependency.  You could also use the cl_event objects to ensure correct ordering of commands on the command queue.  
If you are writing your code such that you call a clFinish after every kernel invocation, then using clEnqueueBarrier will not have any impact on your code, since you are already ensuring ordering.
The point of using the clEnqueueBarrier would be a case such as:
clEnqueueNDRangeKernel(queue, kernel1);
clEnqueueBarrier(queue);
clEnqueueNDRangeKernel(queue, kernel2);
In this case, kernel2 depends on the results of kernel1. If this queue is out-of-order, then without the barrier kernel2 could execute before kernel1, causing incorrect behavior. You could achieve the same ordering with:
clEnqueueNDRangeKernel(queue, kernel1);
clFinish(queue);
clEnqueueNDRangeKernel(queue, kernel2);
because clFinish will wait until the queue is empty (all kernels/data transfers have finished).  However, clFinish will wait until kernel1 finishes, in this case, while clEnqueueBarrier should immediately return control back to the application (allowing you to enqueue more kernels or perform other useful work.
As a side note, I think that clFinish will implicitly call clFlush so you should not need to call it every time.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With