Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Fast Monte-Carlo simulation with numpy?

I'm following the exercises from "Doing Bayesian Data Analysis" in both R and Python.

I would like to find a fast method of doing Monte-Carlo simulation that uses constant space.

The problem below is trivial, but serves as a good test for different methods:

ex 4.3

Determine the exact probability of drawing a 10 from a shuffled pinochle deck. (In a pinochle deck, there are 48 cards. There are six values: 9, 10, Jack, Queen, King, Ace. There are two copies of each value in each of the standard four suits: hearts, diamonds, clubs, spades.)

(A) What is the probability of getting a 10?

Of course, the answer is 1/6.

The fastest solution I could find (comparable to the speed of R) is generating a large array of card draws using np.random.choice, then applying a Counter. I don't like the idea of creating arrays unnecessarily, so I tried using a dictionary and a for loop, drawing one card at a time and incrementing the count for that type of card. To my surprise, it was much slower!

The full code is below for the 3 methods I tested. _Is there a way of doing this that will be as performant as method1(), but using constant space?

Python code: (Google Colab link)

deck = [c for c in ['9','10','Jack','Queen','King','Ace'] for _ in range(8)]
num_draws = 1000000

def method1():
  draws = np.random.choice(deck, size=num_draws, replace=True)
  df = pd.DataFrame([Counter(draws)])/num_draws
  print(df)
  
def method2():
  card_counts = defaultdict(int)
  for _ in range(num_draws):
    card_counts[np.random.choice(deck, replace=True)] += 1
  df = pd.DataFrame([card_counts])/num_draws
  print(df)
  
def method3():
  card_counts = defaultdict(int)
  for _ in range(num_draws):
    card_counts[deck[random.randint(0, len(deck)-1)]] += 1
  df = pd.DataFrame([card_counts])/num_draws
  print(df)

Python timeit() results:

method1: 1.2997

method2: 23.0626

method3: 5.5859

R code:

card = sample(deck, numDraws, replace=TRUE)
print(as.data.frame(table(card)/numDraws))
like image 853
Jules Kuehn Avatar asked Oct 30 '25 22:10

Jules Kuehn


1 Answers

Here's one with np.unique+np.bincount -

def unique():    
    unq,ids = np.unique(deck, return_inverse=True)
    all_ids = np.random.choice(ids, size=num_draws, replace=True)
    ar = np.bincount(all_ids)/num_draws
    return pd.DataFrame(ar[None], columns=unq)

How does NumPy help here?

There are two major improvements that's helping us here :

  1. We convert the string data to numeric. NumPy works well with such data. To achieve this, we are using np.unique.

  2. We use np.bincount to replace the counting step. Again, it works well with numeric data and we do have that from the numeric conversion done at the start of this method.

  3. NumPy in general works well with large data, which is the case here.


Timings with given sample dataset comparing against fastest method1 -

In [177]: %timeit method1()
328 ms ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [178]: %timeit unique()
12.4 ms ± 265 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
like image 96
Divakar Avatar answered Nov 01 '25 13:11

Divakar



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!