I have the following code:
import random
import numpy as np
import pandas as pd
num_seq = 100
len_seq = 20
nts = 4
sequences = np.random.choice(nts, size = (num_seq, len_seq), replace=True)
sequences = np.unique(sequences, axis=0) #sorts the sequences
d = {}
pr = 5
for i in range(num_seq):
globals()['seq_' + str(i)] = np.tile(sequences[i,:],(pr,1))
d['seq_' + str(i)] = np.tile(sequences[i,:],(pr,1))
pool = np.empty((0,len_seq),dtype=int)
for i in range(num_seq):
pool = np.concatenate((pool,eval('seq_' +str(i))))
I want to convert the dictionary d into a Numpy array (or a dictionary with just one entry). My code works, producing pool. However, at bigger values for num_seq, len_seq and pr, it takes a very long time.
The execution time is critical, thus my question: is there a more efficient way of doing this?
Here is a list of important points:
np.concatenate runs in O(n) so your second loop is running in O(n^2) time. You can append the value to a list and np.vstack all the values in the end (in O(n) time).globals() is slow and known to be a bad practice (because it can easily break your code in nasty ways);eval(...) is slow too and also unsafe, so avoid it;Here is an example of faster code (in replacement of the second loop):
pool = np.vstack([d[f'seq_{i}'] for i in range(num_seq)])
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With