Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Same seed in two different editors give me different results (Pycharm and Jupyter Notebook)

I have the following code:

import json
import pandas as pd
import numpy as np
import random

pd.set_option('expand_frame_repr', False)  # To view all the variables in the console

# read data
records = []
with open('./data/data_file.txt', 'r') as file:
    for line in file:
        record = json.loads(line)
        records.append(record)

# construct list of ids
ids = set()
for record in records:
    for w in record['A']:
        ids.add(w['NAME'])

random.seed(1234); sampled_ids = random.sample(ids,50)

When I run this code one time in Pycharm IDE and then immediately after in a Jupyter Notebook - I get different ids sampled in each one. What's going on?

P.S
I used the semicolon on the last line because I found out that if I try to set the seed on one line and then sample on the next line - even in the same IDE I get different results each run. This is truly mysterious to me. I use Python 3.7

like image 280
Corel Avatar asked Dec 22 '25 03:12

Corel


1 Answers

The cause of such a behaviour is lying in set. Set is constructed from objects based on their hash values (the elements of a set must be hashable, i.e. must have __hash__ method), and hash values differ when starting another console. (Not always, but that's another theme). For example, there are results from two consols in the same IDE:

1/A:

arr1 = set('skevboa;gj[pvemoeprnjpdbr ]p')
random.seed(1234)
random.sample(arr1, 3)
Out[47]: ['p', 'k', ']']
random.seed(1234)
random.sample(arr1, 3)
Out[48]: ['p', 'k', ']']
hash('s')
Out[49]: 1861403979552045688

2/A:

arr1 = set('skevboa;gj[pvemoeprnjpdbr ]p')
random.seed(1234)
random.sample(arr1, 3)
Out[29]: [';', 'a', 'b']
random.seed(1234)
random.sample(arr1, 3)
Out[30]: [';', 'a', 'b']
hash('s')
Out[31]: -2409441490032867064

Knowing the source of problem you can choose a method to solve the issue. For example, using sorted:

1/A:

random.seed(1234)
random.sample(sorted(arr1), 3)
Out[50]: ['p', ']', ' ']

2/A:

random.seed(1234)
random.sample(sorted(arr1), 3)
Out[32]: ['p', ']', ' ']
like image 171
Vadim Shkaberda Avatar answered Dec 23 '25 21:12

Vadim Shkaberda