Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

FLOPS what really is a FLOP

Tags:

c

flops

I came from this thread: FLOPS Intel core and testing it with C (innerproduct)

As I began writing simple test scripts, a few questions came into my mind.

  1. Why floating point? What is so significant about floating point that we have to consider? Why not a simple int?

  2. If I want to measure FLOPS, let say I am doing the inner product of two vectors. Must the two vectors be float[] ? How will the measurement be different if I use int[]?

  3. I am not familiar with Intel architectures. Let say I have the following operations:

    float a = 3.14159; float b = 3.14158;
    for(int i = 0; i < 100; ++i) {
        a + b;
    }
    

    How many "floating point operations" is this?

  4. I am a bit confused because I studied a simplified 32bit MIPS architecture. For every instruction, there is 32 bits, like 5 bit for operand 1 and 5 bit for operand 2 etc. so for intel architectures (specifically the same architecture from the previous thread), I was told that the register can hold 128 bit. For SINGLE PRECISION floating point, 32bit per float point number, does that mean for each instruction fed to the processor, it can take 4 floating point numbers? Don't we also have to account for bits involved in operands and other parts of the instruction? How can we just feed 4 floating point numbers to a cpu without any specific meaning to this?

I don't know whether my approach of thinking everything in bits and pieces make sense. If not, what "height" of perspective should I be looking at?


2 Answers

1.) Floating point operations simply represent a wider range of math than fixed-width integers. Additionally, heavily numerical or scientific applications (which would typically be the one who actually test a CPU's pure computational power) probably rely on Floating point ops more than anything.

2.) They would have to both be float. The CPU won't add an integer and a float, one or the other would implicitly be converted (most likely the integer would be converted to the float ), so it would still just be floating point operations.

3.) That would be 100 floating point operations, as well as 100 integer operations, as well as some (100?) control-flow/branch/comparison operations. There'd generally also be loads and stores but you don't seem to be storing the value :)

4.) I'm not sure how to begin with this one, you seem to have a general perspective on the material, but you have confused some of the details. Yes an individual instruction may be partitioned into sections similar to:

|OP CODE | Operand 1 | Operand 2 | (among many, many others)

However, operand 1 and operand 2 don't have to contain the actual values to be added. They could just contain the registers to be added. For example take this SSE instruction:

mulps      %%xmm3, %%xmm1

It's telling the execution unit to multiply the contents of register xmm3 and the contents of xmm1 and store the result in xmm3. Since the registers hold 128-bit values, I'm doing the operation on 128-bit values, this is independent of the size of the instruction. Unfortunately x86 does not have a similar instruction breakdown as MIPS due to it being a CISC architecture. An x86 instruction can have anywhere between 1 and 16(!) bytes.

As for your question, I think this is all very fun stuff to know, and it helps you build intuition about the speed of math-intensive programs, as well as giving you a sense of upper limits to be achieved when optimizing. I'd never try and directly correlate this to the actual run time of a program though, as too many other factors contribute to the actual end performance.

like image 76
Falaina Avatar answered Oct 31 '25 12:10

Falaina


  1. Floating point and integer operation use different pipelines on the chip, so they run at different speeds (on simple/old enough architectures there may be no native floating point support at all, making floating point operation very slow). So if you are trying to estimate real world performance for problems that use floating point math, you need to know how fast these operation are.

  2. Yes, you must use floating point data. See #1.

  3. A FLOP is typically defined as an average over a particular mixture of operations that is intended to be representative of the real world problem you want to model. For your loop, you would just count each addition as 1 operation making a total of 100 operations. BUT: this is not representative of most real world jobs and you may have to take steps to prevent the compiler from optimizing all the work out.

  4. Vectorized or SIMD (Single Instruction Multiple Data) can do exactly that. Example of SIMD systems in use right now include AltiVec (on PowerPC series chips) and MMX/SSE/... on Intel x86 and compatible. Such improvements in chips should get credit for doing more work, so your trivial loop above would still be counted as 100 operation even if there are only 25 fetch and work cycles. Compilers either need to be very smart, or receive hints from the programmer to make use of SIMD units (but most front-line compilers are very smart these days).

like image 42
dmckee --- ex-moderator kitten Avatar answered Oct 31 '25 12:10

dmckee --- ex-moderator kitten



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!