I've been having some issues with transfering a GPU buffer into CPU for performing sorting operations. The buffer is a GL_SHADER_STORAGE_BUFFER composed of 300.000 float values. The transfer operation with glGetBufferSubData is taking around 10ms, and with glMapBufferRange, it takes more than 100 ms.
The code Im using is the following:
std::vector<GLfloat> viewRow;
unsigned int viewRowBuffer = -1;
int length = -1;
void bindRowBuffer(unsigned int buffer){
glBindBuffer(GL_SHADER_STORAGE_BUFFER, buffer);
glBindBufferBase(GL_SHADER_STORAGE_BUFFER, 3, buffer);
}
void initRowBuffer(unsigned int &buffer, std::vector<GLfloat> &row, int lengthIn){
// Generate and initialize buffer
length = lengthIn;
row.resize(length);
memset(&row[0], 0, length*sizeof(float));
glGenBuffers(1, &buffer);
bindRowBuffer(buffer);
glBufferStorage(GL_SHADER_STORAGE_BUFFER, row.size() * sizeof(float), &row[0], GL_DYNAMIC_STORAGE_BIT | GL_MAP_READ_BIT | GL_MAP_WRITE_BIT);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
}
void cleanRowBuffer(unsigned int buffer) {
float zero = 0.0;
glClearNamedBufferData(buffer, GL_R32F, GL_RED, GL_FLOAT, &zero);
}
void readGPUbuffer(unsigned int buffer, std::vector<GLfloat> &row) {
glGetBufferSubData(GL_SHADER_STORAGE_BUFFER,0,length *sizeof(float),&row[0]);
glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
}
void readGPUMapBuffer(unsigned int buffer, std::vector<GLfloat> &row) {
float* data = (float*)glMapBufferRange(GL_SHADER_STORAGE_BUFFER, 0, length*sizeof(float), GL_MAP_READ_BIT); glBindBuffer(GL_SHADER_STORAGE_BUFFER, 0);
memcpy(&row[0], data, length *sizeof(float));
glUnmapBuffer(GL_SHADER_STORAGE_BUFFER);
}
The main is doing:
bindRowBuffer(viewRowBuffer);
cleanRowBuffer(viewRowBuffer);
countPixs.bind();
glActiveTexture(GL_TEXTURE0);
glBindTexture(GL_TEXTURE_2D, gPatch);
countPixs.setInt("gPatch", 0);
countPixs.run(SCR_WIDTH/8, SCR_HEIGHT/8, 1);
countPixs.unbind();
readGPUbuffer(viewRowBuffer, viewRow);
Where countPixs is a compute shader, but I'm possitive the problem is not there because if I comment the run command, the read takes exactly the same amount of time.
The weird thing is that if I execute a getbuffer of only 1 float:
glGetBufferSubData(GL_SHADER_STORAGE_BUFFER,0, 1 *sizeof(float),&row[0]);
It takes exactly the same time... so I'm guessing there is something wrong all-the-way... maybe related to the GL_SHADER_STORAGE_BUFFER?
This is likely to be an GPU-CPU synchronization/round trip caused delay. I.e. once you map your buffer the previous GL command(s) which touched the buffer needs to complete immediately causing pipeline stall. Note that drivers are lazy: it is very probable GL commands have not even started executing yet.
If you can: glBufferStorage(..., GL_MAP_PERSISTENT_BIT) and map the buffer persistently. This avoids completely re-mapping and allocation of any GPU memory and you can keep the mapped pointer over draw calls with some caveats:
After reading GL 4.5 docs bit more I found out that glFenceSync is mandatory in order to guarantee the data has arrived from the GPU, even with GL_MAP_COHERENT_BIT:
If GL_MAP_COHERENT_BIT is set and the server does a write, the app must call glFenceSync with GL_SYNC_GPU_COMMANDS_COMPLETE (or glFinish). Then the CPU will see the writes after the sync is complete.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With