Fused Multiply Add (FMA) – One flop or two?
I am having a friendly argument with a colleague over how you calculate the peak number of floating operations per second (flops) for devices that support Fused Multiply Add (FMA). The FMA operation is d=a+b*c, an operation that can be done in one cycle on devices that support it.
I say that an FMA operation is two flops, he says it’s one. So, when I calculate the theoretical peak of a device I get twice the value he does. So, what do you think..is FMA one flop or two?