According to Wikipedia, the uniform distribution is the "maximum entropy probability distribution". Thus, if I have two sequences (one uniformly distributed and one with repeated values), both of length k, then I would expect the entropy of the uniformly distributed sequence to be higher than the sequence of repeated values. However, this is not what is observed when running the following code in R:
require(entropy)
entropy(runif(1024), method="ML", unit="log2")
entropy(rep(1,1024), method="ML", unit="log2")
The first output produces around 9.7 bits of entropy, while the second produces exactly 10 bits of entropy (log base 2 of 1024 = 10). Why does the uniform distribution not have more than 10 bits of entropy?
I think you are misunderstanding what the first argument, y, in entropy() represents. As mentioned in ?entropy, it gives a vector of counts. Those counts together give the relative frequencies of each of the symbols from which messages on this "discrete source of information" are composed. 
To see how that plays out, have a look at a simpler example, that of a binary information source with just two symbols (1/0, on/off, A/B, what have you). In this case, all of the following will give the entropy for a source in which the relative frequencies of the two symbols are the same (i.e. half the symbols are As and half are Bs):
entropy(c(0.5, 0.5))
# [1] 0.6931472
entropy(c(1,1))
# [1] 0.6931472
entropy(c(1000,1000))
# [1] 0.6931472
entropy(c(0.0004, 0.0004))  
# [1] 0.6931472
entropy(rep(1,2))
# [1] 0.6931472
Because those all refer to the same underlying distribution, in which probability is maximally spread out among the available symbols, they each give the highest possible entropy for a two-state information source (log(2) = 0.6931472)). 
When you do instead entropy(runif(2)), you are supplying relative probabilities for the two symbols that are randomly selected from the uniform distribution. Unless those two randomly selected numbers are exactly equal, you are telling entropy() that you've got an information source with two symbols that are used with different frequencies. As a result, you'll always get a computed entropy that's lower than log(2).  Here's a quick example to illustrate what I mean:
set.seed(4)
(x <- runif(2))
# [1] 0.585800305 0.008945796
freqs.empirical(x)  ## Helper function called by `entropy()` via `entropy.empirical()`
# [1] 0.98495863 0.01504137
## Low entropy, as you should expect 
entropy(x)
# [1] 0.07805556
## Essentially the same thing; you can interpret this as the expected entropy
## of a source from which a message with 984 '0's and 15 '1's has been observed
entropy(c(984, 15))
In summary, by passing the y= argument a long string of 1s, as in entropy(rep(1, 1024)), you are describing an information source that is a discrete analogue of the uniform distribution. Over the long run or in a very long message, each of its 1024 letters is expected to occur with equal frequency, and you can't get any more uniform than that! 
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With