I have a small piece of code which runs perfectly on Nvidia old architecture (Tesla T10 processor) but not on Fermi (Tesla M2090)
I learned that Fermi behaves slightly differently. Due to which unsafe code might work correctly on old architectures, while on Fermi it catches the bug.
But I don't know how to resolve it.
Here is my code:
__global__void exec (int *arr_ptr, int size, int *result) {int tx = threadIdx.x; int ty = threadIdx.y; *result = arr_ptr[-2];}
void run(int *arr_dev, int size, int *result) {
cudaStream_t stream = 0; int *arr_ptr = arr_dev + 5; dim3 threads(1,1,1); dim3 grid (1,1); exec<<<grid, threads, 0, stream>>>(arr_ptr, size, result);}
since I am accessing arr_ptr[-2], the fermi throws CUDA_EXCEPTION_10, Device Illegal Address. But it is not. The address is legal.
Can anyone help me on this.
My driver code is
int main(){
int *arr;
int *arr_dev = NULL;
int result = 1;
arr = (int*)malloc(10*sizeof(int));
for(int i = 0; i < 10; i++)
arr[i] = i;
if(arr_dev == NULL)
{
cudaMalloc((void**)&arr_dev, 10);
cudaMemcpy(arr_dev, arr, 10*sizeof(int), cudaMemcpyHostToDevice);
}
run(arr_dev, 10, &result);
printf("%d \n", result);
return 0;
}
Fermi cards have much better memory protection on the device and will detect out of bounds conditions which will appear to "work" on older cards. Use cuda-memchk (or the cuda-memchk mode in cuda-gdb) to get a better handle on what is going wrong.
EDIT:
This is the culprit:
cudaMalloc((void**)&arr_dev, 10);
which should be
cudaMalloc((void**)&arr_dev, 10*sizeof(int));
This will result in this code
int *arr_ptr = arr_dev + 5;
passing a pointer to the device which is out of bounds.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With