Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How can I make my python program run faster?

I am reading in a .csv file and creating a pandas dataframe. The file is a file of stocks. I am only interested in the date, the company, and the closing cost. I want my program to find the max profit with the starting date, the ending date and the company. It needs to use the divide and conquer algorithm. I only know how to use for loops but it takes forever to run. The .csv file is 200,000 rows. How can I get this to run fast?

import pandas as pd
import numpy as np
import math

def cleanData(file):
    df = pd.read_csv(file)
    del df['open']
    del df['low']
    del df['high']
    del df['volume']
    return np.array(df)
    
df = cleanData('prices-split-adjusted.csv')

bestStock = [None, None, None, float(-math.inf)]

def DAC(data):
    global bestStock

    if len(data) > 1:
        mid = len(data)//2
        left = data[:mid]
        right = data[mid:]
    
        DAC(left)
        DAC(right)
    
        for i in range(len(data)):
            for j in range(i+1,len(data)):
                if data[i,1] == data[j,1]:
                    profit = data[j,2] - data[i,2]
                    if profit > bestStock[3]:
                        bestStock[0] = data[i,0]
                        bestStock[1] = data[j,0]
                        bestStock[2] = data[i,1]
                        bestStock[3] = profit
                    
                    print(bestStock)
    print('\n')
    return bestStock
    
print(DAC(df))
like image 217
Anonomous Avatar asked Dec 06 '25 08:12

Anonomous


1 Answers

I've got two things for your consideration (my answer tries not to change your algorithm approach i.e. nested loops and recursive funcs and tackles the low lying fruits first):

  1. Unless you are debugging, try to avoid print() inside a loop. (in your case .. print(bestStock) ..) The I/O overhead can add up esp. if you are looping across large datasets and printing to screen often. Once you are OK with your code, comment it out to run on your full dataset and uncomment it only during debugging sessions. You can expect to see some improvement in speed without having to print to screen in the loop.

  2. If you are after even more ways to 'speed it up', I found in my case (similar to yours which I often encounter especially in search/sort problems) that simply by switching the expensive part (the python 'For' loops) to Cython (and statically defining variable types .. this is KEY! to SPEEEEDDDDDD) gives me several orders of magnitude speed ups even before optimizing implementation. Check Cython out https://cython.readthedocs.io/en/latest/index.html. If thats not enough, then parrelism is your next best friend which would require rethinking your code implementation.

like image 117
aaronlhe Avatar answered Dec 08 '25 20:12

aaronlhe



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!