The netlib documentation of sgemm states that the array strides LDA and LDB must be >= 1, and large enough so that columns don't overlap. Indeed, the implementation in Apple's Accelerate/veclib framework checks these conditions and exists if they are violated.
I really don't understand why this limitation exists. Can't BLAS simply believe me that I really want a stride of zero, or a negative stride? As far as I understand, Fortran integers are signed by default, so parameter types don't seem to be the reaons (disclaimer: I don't know Fortan very well).
Indeed, there exist very reasonable use cases of non-positive array strides:
I am interested in two aspects:
EDIT 2024:
It turns out that axpy was a bad example.
The answer is that more general BLAS functions do not generally accept stride == 0 arrays.
See bellow for details.
ANSWER from 2023 (outdated):
Regarding the case of zero strides: I am using a C++ multidimensional array library (disclaimer: my own) that supports broadcasting and BLAS interfaces. I can try many cases quickly to see what BLAS versions support zero strides.
Here is the example of axpy; in Linux at least, all the versions of BLAS I have available support zero strides (broadcasted arrays).
#include<multi/array.hpp> // from https://gitlab.com/correaa/boost-multi
#include<multi/adaptors/blas.hpp>
namespace multi = boost::multi;
int main() {
multi::array<double, 1> y = {1.0, 2.0, 3.0};
// regular arrays, stride != 0
multi::array<double, 1> const x = {1.0, 1.0, 1.0};
multi::blas::axpy(1.0, x.begin(), y);
assert(y[0] == 2.0);
assert(y[1] == 3.0);
assert(y[2] == 4.0);
// broadcasted array of 0D into 1D, stride == 0
multi::array<double, 0> const x0(1.0);
// x0.broadcasted() is {..., 1.0, 1.0, ...}
multi::blas::axpy(1.0, x0.broadcasted().begin(), y);
assert(y[0] == 3.0);
assert(y[1] == 4.0);
assert(y[2] == 5.0);
}
The program compiles and runs correctly with BLAS (netlib), openBLAS, MKL, NVHPC (PGI), ATLAS and BLIS. (Unfortunately, I don't have a Mac device to Apple libs.)
The good news is that none of these libraries check for zero strides and give the correct answer, at least for axpy.
c++ axpy.cpp -Iinclude/ `pkg-config --libs blas` && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs openblas` && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs mkl-static-ilp64-seq` && ./a.out
c++ axpy.cpp -Iinclude/ /opt/nvidia/hpc_sdk/Linux_x86_64/23.9/compilers/lib/libblas.so && ./a.out
c++ axpy.cpp -Iinclude/ `pkg-config --libs blas-atlas` && ./a.out
c++ axpy.cpp -Iinclude/ /usr/lib/x86_64-linux-gnu/blis64-openmp/libblas64.so && ./a.out
EDIT 2024:
It turns out that axpy was a bad example.
When I try to do the same passing a (constant) array with stride 0 to GEMV, all the implementations of BLAS that I have access to give an error message and return 0 values in the destination.
$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs blas` && ./a.out
** On entry to DGEMV parameter number 8 had an illegal value
$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs openblas` && ./a.out
** On entry to DGEMV parameter number 8 had an illegal value
$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs blas-atlas` && ./a.out
** On entry to DGEMV parameter number 8 had an illegal value
$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp /usr/lib/x86_64-linux-gnu/blis-openmp/libblas.so && ./a.out
** On entry to DGEMV parameter number 8 had an illegal value
$ c++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp `pkg-config --libs mkl-static-ilp64-seq` && ./a.out
Intel oneMKL ERROR: Parameter 8 was incorrect on entry to DGEMV .
/opt/nvidia/hpc_sdk/Linux_x86_64/24.7/compilers/bin/nvc++ -std=c++17 -I./include/ include/boost/multi/adaptors/blas/test/gemv.cpp /opt/nvidia/hpc_sdk/Linux_x86_64/24.7/compilers/lib/libblas_lp64.so.0 && ./a.out
** On entry to DGEMV parameter number 8 had an illegal value
Of course, parameter number 8 is incx, the stride of the input vector. https://www.netlib.org/lapack/explore-html/d7/dda/group__gemv.html
Even in Apple Mac the test fail (hey! at least I learned that in Mac, BLAS uses cblas). (It also terminates immediately in Mac, which I like)
BLAS error: Parameter number 9 passed to cblas_dgemv had an invalid value
This is the code to test. Note that for the version with stride 1 there is no problem.
multi::array<double, 2> const a = {
{1.0, 2.0, 3.0},
{4.0, 5.0, 6.0}
};
{
multi::array<double, 1> const ones({3}, 1.0);
assert( ones.stride() == 1 );
assert( ones[0] == 1.0 );
assert( ones[1] == 1.0 );
assert( ones[2] == 1.0 );
multi::array<double, 1> sum_by_rows({2}, 0.0);
blas::gemv_n(1.0, a.begin(), 2, ones.begin(), 0.0, sum_by_rows.begin());
std::cout << sum_by_rows[0] << " " << sum_by_rows[1] << "\n";
BOOST_TEST( std::abs( sum_by_rows[0] - (1.0 + 2.0 + 3.0)) < 1.0e-8 );
BOOST_TEST( std::abs( sum_by_rows[1] - (4.0 + 5.0 + 6.0)) < 1.0e-8 );
}
// the code below simulates the same operation with an array of stride zero
{
multi::array<double, 0> const one(1.0);
auto const& ones = one.broadcasted(); // BLAS doesn't work with stride zero
assert( ones.stride() == 0 );
assert( ones[0] == 1.0 );
assert( ones[1] == 1.0 );
assert( ones[2] == 1.0 );
multi::array<double, 1> sum_by_rows({2}, 0.0);
blas::gemv_n(1.0, a.begin(), 2, ones.begin(), 0.0, sum_by_rows.begin());
std::cout << sum_by_rows[0] << " " << sum_by_rows[1] << "\n";
assert( std::abs( sum_by_rows[0] - (1.0 + 2.0 + 3.0)) < 1.0e-8 );
assert( std::abs( sum_by_rows[1] - (4.0 + 5.0 + 6.0)) < 1.0e-8 );
}
I am going to archive the test here: https://gitlab.com/correaa/boost-multi/-/blob/master/include/boost/multi/adaptors/blas/test/gemv.cpp
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With