Fast Monte-Carlo simulation with numpy?

Question

I'm following the exercises from "Doing Bayesian Data Analysis" in both R and Python.

I would like to find a fast method of doing Monte-Carlo simulation that uses constant space.

The problem below is trivial, but serves as a good test for different methods:

ex 4.3

Determine the exact probability of drawing a 10 from a shuffled pinochle deck. (In a pinochle deck, there are 48 cards. There are six values: 9, 10, Jack, Queen, King, Ace. There are two copies of each value in each of the standard four suits: hearts, diamonds, clubs, spades.)

(A) What is the probability of getting a 10?

Of course, the answer is 1/6.

The fastest solution I could find (comparable to the speed of R) is generating a large array of card draws using np.random.choice, then applying a Counter. I don't like the idea of creating arrays unnecessarily, so I tried using a dictionary and a for loop, drawing one card at a time and incrementing the count for that type of card. To my surprise, it was much slower!

The full code is below for the 3 methods I tested. _Is there a way of doing this that will be as performant as method1(), but using constant space?

Python code: (Google Colab link)

deck = [c for c in ['9','10','Jack','Queen','King','Ace'] for _ in range(8)]
num_draws = 1000000

def method1():
  draws = np.random.choice(deck, size=num_draws, replace=True)
  df = pd.DataFrame([Counter(draws)])/num_draws
  print(df)
  
def method2():
  card_counts = defaultdict(int)
  for _ in range(num_draws):
    card_counts[np.random.choice(deck, replace=True)] += 1
  df = pd.DataFrame([card_counts])/num_draws
  print(df)
  
def method3():
  card_counts = defaultdict(int)
  for _ in range(num_draws):
    card_counts[deck[random.randint(0, len(deck)-1)]] += 1
  df = pd.DataFrame([card_counts])/num_draws
  print(df)

Python timeit() results:

method1: 1.2997

method2: 23.0626

method3: 5.5859

R code:

card = sample(deck, numDraws, replace=TRUE)
print(as.data.frame(table(card)/numDraws))

Divakar · Accepted Answer

Here's one with np.unique+np.bincount -

def unique():    
    unq,ids = np.unique(deck, return_inverse=True)
    all_ids = np.random.choice(ids, size=num_draws, replace=True)
    ar = np.bincount(all_ids)/num_draws
    return pd.DataFrame(ar[None], columns=unq)

How does NumPy help here?

There are two major improvements that's helping us here :

We convert the string data to numeric. NumPy works well with such data. To achieve this, we are using np.unique.
We use np.bincount to replace the counting step. Again, it works well with numeric data and we do have that from the numeric conversion done at the start of this method.
NumPy in general works well with large data, which is the case here.

Timings with given sample dataset comparing against fastest method1 -

In [177]: %timeit method1()
328 ms ± 16.3 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [178]: %timeit unique()
12.4 ms ± 265 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Fast Monte-Carlo simulation with numpy?

Tags:

python

pandas

numpy

Jules Kuehn

1 Answers

How does NumPy help here?

Divakar

Recent Activity

Donate For Us

Fast Monte-Carlo simulation with numpy?

Tags:

python

pandas

numpy

Jules Kuehn

1 Answers

How does NumPy help here?

Divakar

Related questions

Recent Activity

Donate For Us