Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Tensorflow, Flask, and TFLearn Memory Leak

I'm running the following program and each time I hit the 'build' API call I see about another 1 GB of memory being taken up after the process completes. I'm trying to eliminate everything from memory but I'm not sure what remains.

import tensorflow as tf
import tflearn
from flask import Flask, jsonify
from tflearn.layers.core import input_data, dropout, fully_connected
from tflearn.layers.conv import conv_2d, max_pool_2d
from tflearn.layers.normalization import local_response_normalization
from tflearn.layers.estimator import regression

app = Flask(__name__)

keep_prob = .8
num_labels = 3
batch_size = 64

class AlexNet():

    def __init__(self):

        @app.route('/build')
        def build():
            g = tf.Graph()
            with g.as_default():
                sess = tf.Session()

                # Building 'AlexNet'
                network = input_data(shape=[None, 227, 227, 3])
                network = conv_2d(network, 96, 11, strides=4, activation='relu')
                network = max_pool_2d(network, 3, strides=2)
                network = local_response_normalization(network)
                network = conv_2d(network, 256, 5, activation='relu')
                network = max_pool_2d(network, 3, strides=2)
                network = local_response_normalization(network)
                network = conv_2d(network, 384, 3, activation='relu')
                network = conv_2d(network, 384, 3, activation='relu')
                network = conv_2d(network, 256, 3, activation='relu')
                network = max_pool_2d(network, 3, strides=2)
                network = local_response_normalization(network)
                network = fully_connected(network, 4096, activation='tanh')
                network = dropout(network, keep_prob)
                network = fully_connected(network, 4096, activation='tanh')
                network = dropout(network, keep_prob)
                network = fully_connected(network, num_labels, activation='softmax')
                network = regression(network, optimizer="adam",
                                     loss='categorical_crossentropy',
                                     learning_rate=0.001, batch_size=batch_size)

                model = tflearn.DNN(network, tensorboard_dir="./tflearn_logs/",
                                    checkpoint_path=None, tensorboard_verbose=0, session=sess)

                sess.run(tf.initialize_all_variables())
                sess.close()

            tf.reset_default_graph()

            del g
            del sess
            del model
            del network
            return jsonify(status=200)


if __name__ == "__main__":
    AlexNet()
    app.run(host='0.0.0.0', port=5000, threaded=True)
like image 322
bradden_gross Avatar asked Oct 16 '25 05:10

bradden_gross


1 Answers

I'm not sure if you have found the answer but IMHO, you are not supposed to put long running tasks in the HTTP request handler. Because HTTP is stateless and supposed to respond to the call almost immediately. That's why we have the concept task queues, async tasks etc. The rule of thumb with server side development is responding to the request as quick as possible. And if you try to build a convolutional deep neural network within the HTTP request, it's normal that it's not really feasible. Because ideal HTTP request should respond in a coupe of seconds. Your DNN Classifier session running can take too many seconds (need to try).

The hackiest solution would be creating a python thread within the request and let the request respond to the HTTP call without blocking. Meanwhile your thread can go ahead and builds your model. And then you can write your model somewhere or send a mail notification etc.

Here you go:

How can I add a background thread to flask?

like image 115
Hakan Avatar answered Oct 18 '25 04:10

Hakan



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!