Conditional "at-least" in Python, with pseudo-randomiztion

Question

I would like to do pseudo-randomization, meaning to randomize something by respecting certain rules.

Imagine the following DataFrame:

ColX 
 N
 N
 N
 N
 N
 N
 N
 N
 D
 D
 D

N stands for neutral and D for deviant. Before having a Deviant, I would like to at least have two Neutrals (but there can be more neutrals between deviants) and everything must be in a random order.

As result: ColX should look like

ColX
 N
 N
 D
 N
 N
 N
 D
 N
 N
 N
 D

I was wondering what kind of function I could use in python (function in pandas or other packages) or R (any function in a library that permits this?)

Thank you in advance.

Brad Solomon · Accepted Answer

Here's one way you can do this with NumPy, with a tiny speedup for looping provided by itertools:

from itertools import repeat
import numpy as np


def gen_chunk(high=5):
    """Example: gen_chunk(high=6) --> array(['n', 'n', 'n', 'd']"""
    return np.append(np.repeat('n', np.random.randint(low=2, high=high)), 'd')


def gen_series(chunks=3, high=5):
    return np.concatenate([gen_chunk(high=high) for _ in repeat(None, 3)])


df = pd.DataFrame(gen_series())

Walkthrough:

You can independently generate each "chunk" of 2 or more N's followed by 1 D. That is what get_chunk() does above. In this case, it generates a NumPy array of N's followed by 1 D, where the number of N's is a random integer between 2 and your high parameter.

Then in gen_series(), you can build individual chunks (3 of them is the default here) and concatenate them into a single 1d array.

Update

The above uses a constant high parameter in each chunk's generation. Perhaps this doesn't meet the definition of psueorandom that you are looking for. To use a different high with each chunk generation, you could do:

def gen_series(chunks, max_high):
    """Use a randomly selected `high` value for each chunk."""
    highs = np.random.randint(low=3, high=max_high, size=chunks)
    return np.concatenate([gen_chunk(high=high) for high in highs])

Either construction should be fairly quick:

%timeit gen_series(chunks=1000, high=10)
# 36.9 µs ± 1.93 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Conditional "at-least" in Python, with pseudo-randomiztion

Tags:

python

pandas

numpy

Pierre Jardinet

1 Answers

Brad Solomon

Recent Activity

Donate For Us

Conditional "at-least" in Python, with pseudo-randomiztion

Tags:

python

pandas

numpy

Pierre Jardinet

1 Answers

Brad Solomon

Related questions

Recent Activity

Donate For Us