I need to create a range skipping over every 4th number, starting from 5. For example, if the range a is from 1-20, then the numbers 5,9,13,17 will be excluded.
a = [1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 15, 16, 18, 19, 20]
What I tried is creating a regular range, and then a second range consisting the numbers I want to skip, and then remove the second range from the first one.
a = list(range(1,21))
b = list(range(5,21,4))
for x in b:
if x in a:
a.remove(x)
This works, but not for a very large range. Is there a more efficient way to do it?
Solution:
For efficiency, I would recommend using a generator expression like this:
r = (x for x in range(1,21) if x not in range(5,21,4))
or, equivalently, and without needing to write the upper bound twice:
r = (x for x in range(1,21) if x == 1 or x % 4 != 1)
You can use this generator as you would use a normal sequence (list/tuple)*, and convert the generator to a list with list()
if you absolutely need to.
Justification:
The advantage of this approach is that it does not require storing all of the items in memory, so you can make the upper bound arbitrarily large without any performance problems.
*(Sort of-there are caveats, as mentioned by the commenters below. e.g. if you want fast membership checks, you would be better off just using the two ranges separately)
Use a set
and another list comprehension:
a = range(1, 21)
b = set(range(5, 21, 4))
[i for i in a if i not in b]
# [1, 2, 3, 4, 6, 7, 8, 10, 11, 12, 14, 15, 16, 18, 19, 20]
You could remove the set
from the second range, but I am finding this is slower than set inclusion checking:
Functions
def chris(m):
a = range(1, m)
b = set(range(5, m, 4))
return [i for i in a if i not in b]
def chris2(m):
a = range(1, m)
b = range(5, m, 4)
return [i for i in a if i not in b]
def ollin(m):
return list(x for x in range(1,m) if x not in range(5,m,4))
def ollin2(m):
return list(x for x in range(1,m) if x == 1 or x % 4 != 1)
def smac(m):
return [v for i, v in enumerate(range(1,m)) if i == 0 or i % 4 != 0]
Setup
from timeit import timeit
import pandas as pd
import matplotlib.pyplot as plt
res = pd.DataFrame(
index=['chris', 'chris2', 'ollin', 'ollin2', 'smac'],
columns=[10, 50, 100, 500, 1000, 5000, 10000],
dtype=float
)
for f in res.index:
for c in res.columns:
stmt = '{}(c)'.format(f)
setp = 'from __main__ import c, {}'.format(f)
res.at[f, c] = timeit(stmt, setp, number=50)
ax = res.div(res.min()).T.plot(loglog=True)
ax.set_xlabel("N");
ax.set_ylabel("time (relative)");
plt.show()
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With