Reinforcement Learning on Tensorflow without Gym

Question

I am currently trying to create a simple ANN learning environment for reinforcement learning. I already did fitting via neuronal network to substitute a physical model for a neuronal network. Now i would like to create a simple reinforcement learning model out of curiosity.

To create this model I thought it would be a good option to manipulate the loss function to not calculate the difference between expectation and model output but to run a simple simulation a few rounds and calculate where the model can earn points for a specific target. In case of the example code below the model is a simple mass damper system that starts with a random excitation and speed. The model can exert a force upon it. The points are based upon the distance from the equilibrium. At the end I invert the points by dividing one by the amount of points earned. I am not sure if this is the right approach but I wanted to try anyway for the sake of learning. Now I get the error message No gradients provided for any variable: . I am not sure how to solve it.

Here is my code:

import time
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Input, Dense, Conv2D, Reshape,concatenate, Flatten, UpSampling2D, AveragePooling2D,LayerNormalization
import random


#Physical Parameters
m = 1 #kg
k = 1 #N/m
c = 0.01
dt = 0.01 
opt = keras.optimizers.Adam(learning_rate=0.01)


def getnewstate(u,v,f):
    #Calculate new state of mass spring damper system
    a = (f-v*c-k*u)/m
    v = v+a*dt
    u = u+v*dt
    return (u,v)


def generatemodel():
    #Generate simple keras model
    kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01)
    bias_initializer=tf.keras.initializers.Zeros()
    InputLayer = Input(shape=(2))
    Outputlayer = Dense(1,activation='linear')(InputLayer)
    model = Model(inputs=InputLayer, outputs=Outputlayer)      
    
    return model
    
def lossfunction(u,v,model):
    #Costume loss function
    loss = 0;
    t    = 0;
    t_last = 0;
    #do for 100 timesteps (to ses if it runs at all)
    for j in range(100):

        x = [];
        x.append(np.array([u,v]))
        x = np.array(x)        
        f=model(x) 

        f=f.numpy()[0][0]

        (u,v) = getnewstate(u,v,f)

        points = 1000/(abs(u)+1)
        loss=loss+1/points
        t += dt;
    
    return(loss)
    
def dotraining(model):  
    #traububg loop
    for epoch in range(100):
        print("
Start of epoch %d" % (epoch,))
        start_time = time.time()
        loss_value = 0;
        # Iterate over the batches of the dataset.
        for step in range(100):
            with tf.GradientTape() as tape:
                loss_value=[]
                for i in range(10):
                    #Randomize Starting Condition
                    u = random.random()-0.5;
                    v = random.random()-0.5;
                    x = [];
                    x.append(np.array([u,v]))
                    x = np.array(x)
                    #feed model
                    logits = model(x, training=True)
                    #calculate loss
                    loss_value.append(lossfunction(u,v,model))
                    
                    
                print(step)
            print(loss_value)
            loss = loss_value
            loss = tf.convert_to_tensor(loss)
            grads = tape.gradient(loss, model.trainable_weights)
            opt.apply_gradients(zip(grads, model.trainable_weights))

    
            # Log every 200 batches.
            if step % 200 == 0:
                print(
                    "Training loss (for one batch) at step %d: %.4f"
                    % (step, float(loss_value))
                )
                print("Seen so far: %d samples" % ((step + 1) * 64))   

        print("Time taken: %.2fs" % (time.time() - start_time))



model=generatemodel()
x = []
x.append(np.array([1.0,2.0]))
print(np.shape(x))
f=model(np.array(x))
dotraining(model)

Viktoriya Malyasova · Accepted Answer

The problem is that, when you cast f to numpy here:

f=f.numpy()[0][0]

it stops being a tensor and tensorflow doesn't track its gradient any more.

For tensorflow to compute gradient, you must get from inputs to loss using only tensor operations.

Reinforcement Learning on Tensorflow without Gym

Tags:

python

machine-learning

tensorflow

keras

reinforcement-learning

Christian Pommer

1 Answers

Viktoriya Malyasova

Recent Activity

Donate For Us

Reinforcement Learning on Tensorflow without Gym

Tags:

python

machine-learning

tensorflow

keras

reinforcement-learning

Christian Pommer

1 Answers

Viktoriya Malyasova

Related questions

Recent Activity

Donate For Us