Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Probability space and expected running time: what are they?

I've come across an issue where I'm to calculate or state the expected running time of an algorithm, and to state the probability space of such a running time. I will be frank and say I really have no idea what I'm supposed to do. What's the difference between expected running time and average running time, and what is a probability space and why is this needed?

like image 671
user2023360 Avatar asked Dec 04 '25 15:12

user2023360


1 Answers

They are asking you to model the running time of an algorithm as a random variable defined over a sample space consisting of the set of inputs of that algorithm, and to compute the expectation of that random variable with respect to some probability defined over the sample space, as the average of its values with respect to that probability.

Here's an example. Consider Quicksort. Quicksort is a sorting algorithm, and so it makes sense to model its set of inputs as the set of permutations of size N. So, the sample space is the set of permutations of size N. Endow that sample space with the uniform probability, and then define the running time T_N as the random variable which maps elements s of the sample space to the corresponding running time of the algorithm on input s. Then, the expectation of T_N is the average of its value over the elements of the sample space, i.e., over the set of permutations of size N. As maybe you know, E[T_N] is O(N*log(N)) (E[X] is the expectation of random variable X).

The only caveat in the previous paragraph is that you have to define mathematically what you mean by "running time". In practice, you have to take some basic operations made by the algorithm, like the number of swaps in a sorting algorithm, or the number of comparisons, and study them as the precisely-defined random variables that they are. After that, if you want a predictive model for your computing environment, you need to measure the time taken by your system to execute each such basic operations, and compute the expected running time as a weighted sum of the average number of basic operations times the execution time of that operation in your system, for every operation you include in your model (an example follows).

In the case of Quicksort, the number of comparisons C_N has been studied in detail, and its expectation E[C_N] is approximately 2*N*ln(N)-0.846*N. The same is true for the number of swaps S_N, whose expectation is about 0.333*N*ln(N)-1.346*N. These results are proven in chapter 1 of the book cited at the end of this post.

If you measure the running time of a comparison to be T_c, and that of a swap to be T_s in your system, your prediction of running time will be E[T_N] = T_c * E[C_N] + T_s * E[S_N]. You can then replace the previous expressions for E[C_N] and E[S_N] in order to have a precise predictive formula.

Maybe you can profit from reviewing the basics of discrete probability theory, and/or looking at an algorithm analysis textbook like Sedgewick's and Flajolet "Introduction to the Analysis of Algorithms".

like image 55
naitoon Avatar answered Dec 07 '25 16:12

naitoon