I ran into a basic type stability issue where dividing two Integers will produce some concrete type of AbstractFloat.
typeof(60 * 5 / 60)
> Float64
Now this is the safe thing to do, but it incurs runtime overhead converting to a float.
What if we know that division will always result in a number with remainder 0, ie. an Integer?  
We can use either:
div(60 * 5 , 60) 
fld(60 * 5 , 60)  
Which gives us some concrete type of Integer, however this approach still has overhead which we can see from the LLVM IR:
@code_llvm  div(60 * 5 , 60)
So is there any magic we can do to remove the runtime overhead when we know that the result will not have a remainder?
Possible Solution Paths:
I would prefer this be solved using a Julia construct, even if we need to create it, rather than injecting LLVM IR... But then again, we could just wrap that injection into a Julia function...
Or maybe we need a macro like @inbounds for safe integer division resulting in an integer.
Or maybe there is some purely mathematical way to perform this that applies to any language?
Integer division is one of the slowest cache-independent operations on a CPU; indeed, floating-point division is faster on most CPUs (test it yourself and see). If you know what you'll be dividing by in advance (and want to divide by it many times), it can be worth precomputing factors that allow you to replace integer-division with multiplication/shift/add. There are many sites that describe this basic idea, here's one.
For an implementation in julia, see https://gist.github.com/simonster/a3b691e71cc2b3826e39
You're right — there is a little bit of overhead in the div function, but it's not because there may be a remainder.  It's because div(typemin(Int),-1) is an error, as is div(x, 0).  So the overhead you're seeing in @code_llvm is just the checks for those cases.  The LLVM instruction that you want is just sdiv i64 %0, %1… and the processor will even throw a SIGFPE in those error conditions.  We can use llvmcall to create our own "overhead-free" version:
julia> unsafe_div(x::Int64,y::Int64) = Base.llvmcall("""
           %3 = sdiv i64 %0, %1
           ret i64 %3""", Int64, Tuple{Int64, Int64}, x, y)
unsafe_div (generic function with 1 method)
julia> unsafe_div(8,3)
2
julia> @code_llvm unsafe_div(8,3)
define i64 @julia_unsafe_div_21585(i64, i64) {
top:
  %2 = sdiv i64 %0, %1
  ret i64 %2
}
julia> unsafe_div(8,0)
ERROR: DivideError: integer division error
 in unsafe_div at none:1
So if that works, why does Julia insist on inserting those checks into the LLVM IR itself? It's because LLVM considers those error cases to be undefined behavior within its optimization passes. So if LLVM can ever prove that it would error through static analysis, it changes its output to skip the division (and subsequent exception) entirely! This custom div function is indeed unsafe:
julia> f() = unsafe_div(8,0)
f (generic function with 2 methods)
julia> f()
13315560704
julia> @code_llvm f()
define i64 @julia_f_21626() {
top:
  ret i64 undef
}
On my machine (an old Nehalem i5), this unsafe version can speed div up by about 5-10%, so the overhead here isn't really all that terrible relative to the inherent cost of integer division.  As @tholy notes, it's still really slow compared to almost all other CPU operations, so if you're frequently dividing by the same number, you may want to investigate the alternatives in his answer.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With