I'm trying to write a Python script that finds all integers (N) where a certain power of the sum of the digits of N is equal to N. For example, N=81 qualifies, because 8 + 1 = 9, and a certain power of 9 (namely 2) = 81.
The ranges I chose are arbitrary. My script works but it is very, very slow. Ideally, I'd want to find the first 30 such integers in about 6000ms.
My first solution:
def powerOfSum1():
listOfN = []
arange = [a for a in range(11, 1000000)] #range of potential Ns
prange = [a for a in range(2, 6)] # a range for the powers to calculate
for num in arange:
sumOfDigits = sum(map(int, str(num)))
powersOfSum = [sumOfDigits**p for p in prange]
if num in powersOfSum:
listOfN.append(num)
return listOfN
In my second solution, I tried storing all the powers for each sumOfDigits, but that did not improve the performance much.
def powerOfSum2():
listOfN = []
powers= {}
for num in range(11, 1000000):
sumOfDigits = sum(map(int, str(num)))
summ = str(sumOfDigits)
if summ in powers:
if num in powers[summ]:
listOfN.append(num)
else:
powersOfSum = [sumOfDigits**p for p in range(2,6)]
powers[summ] = powersOfSum
if num in powers[summ]:
listOfN.append(num)
return listOfN
I haven't studied data structures and algorithms yet, so I'd appreciate any pointers on making this script more efficient.
This is a great time to break out a profiler and see where your code is spending all its time. To do that, I put a little cProfiler wrapper around your code:
#!/usr/bin/env python
import cProfile
import math
def powerOfSum1():
listOfN = []
arange = [a for a in range(11, 1000000)] #range of potential Ns
prange = [a for a in range(2, 6)] # a range for the powers to calculate
for num in arange:
sumOfDigits = sum(map(int, str(num)))
powersOfSum = [sumOfDigits**p for p in prange]
if num in powersOfSum:
listOfN.append(num)
return listOfN
def main():
cProfile.run('powerOfSum1()')
if __name__ == "__main__":
main()
Running this, here's what I got:
⌁ [alex:/tmp] 44s $ python powers.py
1999993 function calls in 4.089 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.005 0.005 4.089 4.089 <string>:1(<module>)
1 0.934 0.934 4.084 4.084 powers.py:7(powerOfSum1)
999989 2.954 0.000 2.954 0.000 {map}
10 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.017 0.009 0.017 0.009 {range}
999989 0.178 0.000 0.178 0.000 {sum}
If you look, it seems like most of the time is being spend on that map call, where you turn the num to a string, then make each digit an int, which is then summed.
It makes a lot of sense that this would be the slow part of the program. Not only do you do it a lot, but it's a slow operation: On that one line, you do a string-parsing operation, and then map an int conversion function over each char in the string, and then you sum them up.
I bet that if you can compute the sum of digits without doing string conversion first, it'll be a lot faster.
Let's try that. I made some other changes, like removing the redundant list comprehensions at the start. Here's what I got:
#!/usr/bin/env python
#started at 47.56
import cProfile
import math
MAXNUM = 10000000
powersOf10 = [10 ** n for n in range(0, int(math.log10(MAXNUM)))]
def powerOfSum1():
listOfN = []
arange = range(11, MAXNUM) #range of potential Ns
prange = range(2, 6) # a range for the powers to calculate
for num in arange:
sumOfDigits = 0
for p in powersOf10:
sumOfDigits += num / p % 10
powersOfSum = []
curr = sumOfDigits
for p in prange:
curr = curr * sumOfDigits
if num < curr:
break
if num == curr:
listOfN.append(num)
return listOfN
def main():
cProfile.run('powerOfSum1()')
if __name__ == "__main__":
main()
What does cProfile have to say?
⌁ [alex:/tmp] 3m42s $ python powers.py
15 function calls in 0.959 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.006 0.006 0.959 0.959 <string>:1(<module>)
1 0.936 0.936 0.953 0.953 powers.py:13(powerOfSum1)
10 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects}
1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects}
2 0.017 0.009 0.017 0.009 {range}
4 seconds to 0.9 seconds? Much better.
If you want to really see the effect, add an extra zero to the upper bound of arange. On my box, the original code takes ~47 seconds. The modified code takes ~10.
The profiler is your friend, and string conversion isn't free when you do it hundreds of thousands of times :)
Your solution examines every possible integer to see it could be a solution. It's more efficient to only examine the integers that are actually powers to see if they are valid answers -- because there are fewer of those. Here is something that does this. But it will probably take longer than 6s to find 30 -- they get scarce fast.
EDIT -- updated to do the faster digit-summing suggested in the comments by Padraic Cunningham and fjarri, and then updated to add a couple of more tweaks, make it a generator, and make it Python-3 friendly.
It still gets slow fast, but the algorithm may be parallelizable -- maybe you could put the digit-summing in a separate process.
EDIT 2 -- Shaved off some time by doing a quick check of whether the base and the result value are equal modulo 9. There may be further number theory tricks afoot...
import heapq
import functools
def get_powers():
heap = []
push = functools.partial(heapq.heappush, heap)
pop = functools.partial(heapq.heappop, heap)
nextbase = 3
nextbasesquared = nextbase ** 2
push((2**2, 2, 2))
while 1:
value, base, power = pop()
if base % 9 == value % 9:
r = 0
n = value
while n:
r, n = r + n % 10, n // 10
if r == base:
yield value, base, power
power += 1
value *= base
push((value, base, power))
if value > nextbasesquared:
push((nextbasesquared, nextbase, 2))
nextbase += 1
nextbasesquared = nextbase ** 2
for i, info in enumerate(get_powers(), 1):
print(i, info)
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With