Why are non-positive strides disallowed in the blas gemm family of functions?

Question

The netlib documentation of sgemm states that the array strides LDA and LDB must be >= 1, and large enough so that columns don't overlap. Indeed, the implementation in Apple's Accelerate/veclib framework checks these conditions and exists if they are violated.

I really don't understand why this limitation exists. Can't BLAS simply believe me that I really want a stride of zero, or a negative stride? As far as I understand, Fortran integers are signed by default, so parameter types don't seem to be the reaons (disclaimer: I don't know Fortan very well).

Indeed, there exist very reasonable use cases of non-positive array strides:

zero stride: in a multi-dimensional array class, a stride of zero enables numpy-style broadcasting.
negative stride: negating a stride allows to view an array in reverse order along any axis without copying. This can be useful e.g. when flipping convolution kernels, and convolutions can be implemented efficiently using gemm. Alternatively, the vertical axis of an image can be flipped, which is handy since there exist different conventions: axis pointing upwards in postscript/pdf, and downwards in the png format (and many others).

I am interested in two aspects:

I'd like to understand why the restriction exists. Is it really just because the designers of BLAS did not think about such use cases? Am I the victim of someone trying to catch a bug that's indeed a feature? Or does the restriction result in better performance? I have a hard time imagining the latter.
Is there a way to work around the restriction without sacrificing (too much) performance? For now I need something that works on a Mac in C++, but long term it should still be based on BLAS, so it works across platforms.

alfC · Accepted Answer

EDIT 2024:

It turns out that axpy was a bad example. The answer is that more general BLAS functions do not generally accept stride == 0 arrays. See bellow for details.

ANSWER from 2023 (outdated):

Regarding the case of zero strides: I am using a C++ multidimensional array library (disclaimer: my own) that supports broadcasting and BLAS interfaces. I can try many cases quickly to see what BLAS versions support zero strides.

Here is the example of axpy; in Linux at least, all the versions of BLAS I have available support zero strides (broadcasted arrays).

#include<multi/array.hpp>  // from https://gitlab.com/correaa/boost-multi
#include<multi/adaptors/blas.hpp>

namespace multi = boost::multi;

int main() {
    multi::array<double, 1> y = {1.0, 2.0, 3.0};

    // regular arrays, stride != 0
    multi::array<double, 1> const x = {1.0, 1.0, 1.0};

    multi::blas::axpy(1.0, x.begin(), y);

    assert(y[0] == 2.0);
    assert(y[1] == 3.0);
    assert(y[2] == 4.0);

    // broadcasted array of 0D into 1D, stride == 0
    multi::array<double, 0> const x0(1.0);

    // x0.broadcasted() is {..., 1.0, 1.0, ...}

    multi::blas::axpy(1.0, x0.broadcasted().begin(), y);

    assert(y[0] == 3.0);
    assert(y[1] == 4.0);
    assert(y[2] == 5.0);
}

The program compiles and runs correctly with BLAS (netlib), openBLAS, MKL, NVHPC (PGI), ATLAS and BLIS. (Unfortunately, I don't have a Mac device to Apple libs.)

The good news is that none of these libraries check for zero strides and give the correct answer, at least for axpy.

c++ axpy.cpp -Iinclude/ `pkg-config --libs blas` && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs openblas` && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs mkl-static-ilp64-seq` && ./a.out
c++ axpy.cpp -Iinclude/ /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/lib/libblas.so && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs blas-atlas` && ./a.out
c++ axpy.cpp -Iinclude/ /usr/lib/x86_64-linux-gnu/blis64-openmp/libblas64.so && ./a.out

EDIT 2024:

It turns out that axpy was a bad example. When I try to do the same passing a (constant) array with stride 0 to GEMV, all the implementations of BLAS that I have access to give an error message and return 0 values in the destination.

$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs blas` && ./a.out

 ** On entry to DGEMV  parameter number  8 had an illegal value

$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs openblas` && ./a.out

 ** On entry to DGEMV  parameter number  8 had an illegal value

$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs blas-atlas` && ./a.out 

 ** On entry to DGEMV  parameter number  8 had an illegal value

$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp /usr/lib/x86_64-linux-gnu/blis-openmp/libblas.so && ./a.out 

 ** On entry to DGEMV  parameter number  8 had an illegal value

$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs mkl-static-ilp64-seq` && ./a.out

Intel oneMKL ERROR: Parameter 8 was incorrect on entry to DGEMV .

/opt/nvidia/hpc_sdk/Linux_x86_64/24.7/compilers/bin/nvc++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp /opt/nvidia/hpc_sdk/Linux_x86_64/24.7/compilers/lib/libblas_lp64.so.0 && ./a.out 

 ** On entry to DGEMV  parameter number  8 had an illegal value

Of course, parameter number 8 is incx, the stride of the input vector. https://www.netlib.org/lapack/explore-html/d7/dda/group__gemv.html

Even in Apple Mac the test fail (hey! at least I learned that in Mac, BLAS uses cblas). (It also terminates immediately in Mac, which I like)

 BLAS error: Parameter number 9 passed to cblas_dgemv had an invalid value

This is the code to test. Note that for the version with stride 1 there is no problem.

    multi::array<double, 2> const a = {
        {1.0, 2.0, 3.0},
        {4.0, 5.0, 6.0}
    };

    {
        multi::array<double, 1> const ones({3}, 1.0);

        assert( ones.stride() == 1 );

        assert( ones[0] == 1.0 );
        assert( ones[1] == 1.0 );
        assert( ones[2] == 1.0 );

        multi::array<double, 1> sum_by_rows({2}, 0.0);
        blas::gemv_n(1.0, a.begin(), 2, ones.begin(), 0.0, sum_by_rows.begin());

        std::cout << sum_by_rows[0] << " " << sum_by_rows[1] << "
";
        BOOST_TEST( std::abs( sum_by_rows[0] - (1.0 + 2.0 + 3.0)) < 1.0e-8 );
        BOOST_TEST( std::abs( sum_by_rows[1] - (4.0 + 5.0 + 6.0)) < 1.0e-8 );
    }
    // the code below simulates the same operation with an array of stride zero
    {
        multi::array<double, 0> const one(1.0);
        auto const& ones = one.broadcasted();  // BLAS doesn't work with stride zero

        assert( ones.stride() == 0 );

        assert( ones[0] == 1.0 );
        assert( ones[1] == 1.0 );
        assert( ones[2] == 1.0 );

        multi::array<double, 1> sum_by_rows({2}, 0.0);
        blas::gemv_n(1.0, a.begin(), 2, ones.begin(), 0.0, sum_by_rows.begin());

        std::cout << sum_by_rows[0] << " " << sum_by_rows[1] << "
";
        assert( std::abs( sum_by_rows[0] - (1.0 + 2.0 + 3.0)) < 1.0e-8 );
        assert( std::abs( sum_by_rows[1] - (4.0 + 5.0 + 6.0)) < 1.0e-8 );
    }

I am going to archive the test here: https://gitlab.com/correaa/boost-multi/-/blob/master/include/boost/multi/adaptors/blas/test/gemv.cpp

Why are non-positive strides disallowed in the blas gemm family of functions?

Tags:

c++

arrays

c

blas

stride

tglas

1 Answers

alfC

Recent Activity

Donate For Us

Why are non-positive strides disallowed in the blas gemm family of functions?

Tags:

c++

arrays

c

blas

stride

tglas

1 Answers

alfC

Related questions

Recent Activity

Donate For Us