Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Reinforcement Learning on Tensorflow without Gym

I am currently trying to create a simple ANN learning environment for reinforcement learning. I already did fitting via neuronal network to substitute a physical model for a neuronal network. Now i would like to create a simple reinforcement learning model out of curiosity.

To create this model I thought it would be a good option to manipulate the loss function to not calculate the difference between expectation and model output but to run a simple simulation a few rounds and calculate where the model can earn points for a specific target. In case of the example code below the model is a simple mass damper system that starts with a random excitation and speed. The model can exert a force upon it. The points are based upon the distance from the equilibrium. At the end I invert the points by dividing one by the amount of points earned. I am not sure if this is the right approach but I wanted to try anyway for the sake of learning. Now I get the error message No gradients provided for any variable: . I am not sure how to solve it.

Here is my code:

import time
import numpy as np
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.models import Model
from tensorflow.keras.optimizers import SGD
from tensorflow.keras.layers import Input, Dense, Conv2D, Reshape,concatenate, Flatten, UpSampling2D, AveragePooling2D,LayerNormalization
import random


#Physical Parameters
m = 1 #kg
k = 1 #N/m
c = 0.01
dt = 0.01 
opt = keras.optimizers.Adam(learning_rate=0.01)


def getnewstate(u,v,f):
    #Calculate new state of mass spring damper system
    a = (f-v*c-k*u)/m
    v = v+a*dt
    u = u+v*dt
    return (u,v)


def generatemodel():
    #Generate simple keras model
    kernel_initializer=tf.keras.initializers.RandomNormal(stddev=0.01)
    bias_initializer=tf.keras.initializers.Zeros()
    InputLayer = Input(shape=(2))
    Outputlayer = Dense(1,activation='linear')(InputLayer)
    model = Model(inputs=InputLayer, outputs=Outputlayer)      
    
    return model
    
def lossfunction(u,v,model):
    #Costume loss function
    loss = 0;
    t    = 0;
    t_last = 0;
    #do for 100 timesteps (to ses if it runs at all)
    for j in range(100):

        x = [];
        x.append(np.array([u,v]))
        x = np.array(x)        
        f=model(x) 

        f=f.numpy()[0][0]

        (u,v) = getnewstate(u,v,f)

        points = 1000/(abs(u)+1)
        loss=loss+1/points
        t += dt;
    
    return(loss)
    
def dotraining(model):  
    #traububg loop
    for epoch in range(100):
        print("\nStart of epoch %d" % (epoch,))
        start_time = time.time()
        loss_value = 0;
        # Iterate over the batches of the dataset.
        for step in range(100):
            with tf.GradientTape() as tape:
                loss_value=[]
                for i in range(10):
                    #Randomize Starting Condition
                    u = random.random()-0.5;
                    v = random.random()-0.5;
                    x = [];
                    x.append(np.array([u,v]))
                    x = np.array(x)
                    #feed model
                    logits = model(x, training=True)
                    #calculate loss
                    loss_value.append(lossfunction(u,v,model))
                    
                    
                print(step)
            print(loss_value)
            loss = loss_value
            loss = tf.convert_to_tensor(loss)
            grads = tape.gradient(loss, model.trainable_weights)
            opt.apply_gradients(zip(grads, model.trainable_weights))

    
            # Log every 200 batches.
            if step % 200 == 0:
                print(
                    "Training loss (for one batch) at step %d: %.4f"
                    % (step, float(loss_value))
                )
                print("Seen so far: %d samples" % ((step + 1) * 64))   

        print("Time taken: %.2fs" % (time.time() - start_time))



model=generatemodel()
x = []
x.append(np.array([1.0,2.0]))
print(np.shape(x))
f=model(np.array(x))
dotraining(model)
like image 476
Christian Pommer Avatar asked Dec 27 '25 19:12

Christian Pommer


1 Answers

The problem is that, when you cast f to numpy here:

f=f.numpy()[0][0]

it stops being a tensor and tensorflow doesn't track its gradient any more.

For tensorflow to compute gradient, you must get from inputs to loss using only tensor operations.

like image 91
Viktoriya Malyasova Avatar answered Dec 30 '25 17:12

Viktoriya Malyasova



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!