Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Finding Median in Large Integer File of Integers

I was asked in an interview the following. I didn't get it but trying to solve it at home. I believe we have to use the Median of Median algorithm...

Q: Finding Median in Large Integer File of Integers

Find the median from a large file of integers. You can not access the numbers by index, can only access it sequentially. And the numbers cannot fit in memory.

I found a solution online (rewrote in Python) but there are a few things I do not understand.. I kind of get the algorithm but not 100% sure.

a) Why do we check left >= right?

b) When count < k, we call self.findMedianInLargeFile(numbers,k,max(result+1,guess),right). Why do we call max(result+1, guess) as left?

c) when count > k, why do we use result as right?

class Solution:

def findMedianInLargeFile(self, numbers,k,left,right):

    if left >= right:
        return left

    result = left
    guess = (left + right ) // 2
    count = 0

    # count the number that is less than guess

    for i in numbers:
        if i <= guess:
            count+=1
            result = max(result,i)

    if count == k:
        return result
    elif count < k: # if the number of items < guess is < K
        return self.findMedianInLargeFile(numbers,k,max(result+1,guess),right)
    else: 
        return self.findMedianInLargeFile(numbers,k,left,result)



def findMedian(self, numbers):
    length = len(numbers)

    if length % 2 == 1: # odd
        return self.findMedianInLargeFile(numbers,length//2 + 1,-999999999,999999999)
    else:
        return (self.findMedianInLargeFile(numbers,length//2,-999999999,999999999) + self.findMedianInLargeFile(numbers,length//2 +1 ,-999999999,999999999)) / 2
like image 435
user82383 Avatar asked Oct 16 '25 11:10

user82383


1 Answers

This is just binary search by median value

Compare with example code

function binary_search(A, n, T):
    L := 0
    R := n − 1
    while L <= R:
        m := floor((L + R) / 2)
        if A[m] < T:
            L := m + 1
        else if A[m] > T:
            R := m - 1
        else:
            return m
    return unsuccessful
  • if left >= right: stops iterations when borders collide

  • when count < k, we call self.findMedianInLargeFile(numbers,k,max(result+1,guess),right) because our guess was too small, and median value is bigger than quessed value.

  • similar but reversed situation for else case
like image 150
MBo Avatar answered Oct 19 '25 00:10

MBo



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!