Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to add L1 Regularization to PyTorch NN Model?

When searching for ways to implement L1 regularization in PyTorch Models, I came across this question, which is now 2 years old so i was wondering if theres anything new on this topic?

I also found this recent approach of dealing with the missing l1 function. However I don't understand how to use it for a basic NN as shown below.

class FFNNModel(nn.Module):
    def __init__(self, input_dim, output_dim, hidden_dim, dropout_rate):
        super(FFNNModel, self).__init__()
        self.input_dim = input_dim
        self.output_dim = output_dim
        self.hidden_dim = hidden_dim
        self.dropout_rate = dropout_rate
        self.drop_layer = nn.Dropout(p=self.dropout_rate)
        self.fully = nn.ModuleList()
        current_dim = input_dim
        for h_dim in hidden_dim:
            self.fully.append(nn.Linear(current_dim, h_dim))
            current_dim = h_dim
        self.fully.append(nn.Linear(current_dim, output_dim))

    def forward(self, x):
        for layer in self.fully[:-1]:
            x = self.drop_layer(F.relu(layer(x)))
        x = F.softmax(self.fully[-1](x), dim=0)
        return x

I was hoping simply putting this before training would work:

model = FFNNModel(30,5,[100,200,300,100],0.2)
regularizer = _Regularizer(model)
regularizer = L1Regularizer(regularizer, lambda_reg=0.1)

with

out = model(inputs)
loss = criterion(out, target) + regularizer.__add_l1()

Does anyone understand how to apply these 'ready to use' classes?

like image 279
Quastiat Avatar asked Oct 28 '25 12:10

Quastiat


2 Answers

I haven't run the code in question, so please reach back if something doesn't exactly work. Generally, I would say that the code you linked is needlessly complicated (it may be because it tries to be generic and allow all the following kinds of regularization). The way it is to be used is, I suppose

model = FFNNModel(30,5,[100,200,300,100],0.2)
regularizer = L1Regularizer(model, lambda_reg=0.1)

and then

out = model(inputs)
loss = criterion(out, target) + regularizer.regularized_all_param(0.)

You can check that regularized_all_param will just iterate over parameters of your model and if their name ends with weight, it will accumulate their sum of absolute values. For some reason the buffer is to be manually initialized, that's why we pass in the 0..

Really though, if you wish to efficiently regularize L1 and don't need any bells and whistles, the more manual approach, akin to your first link, will be more readable. It would go like this

l1_regularization = 0.
for param in model.parameters():
    l1_regularization += param.abs().sum()
loss = criterion(out, target) + l1_regularization

This is really what is at heart of both approaches. You use the Module.parameters method to iterate over all model parameters and you sum up their L1 norms, which then becomes a term in your loss function. That's it. The repo you linked comes up with some fancy machinery to abstract it away but, judging by your question, fails :)

like image 162
Jatentaki Avatar answered Oct 30 '25 01:10

Jatentaki


SIMPLE SOLUTION for anyone stumbling over this:

There were always some issues with the Regularizer_ classes in the link above so i solved the issue using regular functions, adding an orthogonal regularizer as well:

def l1_regularizer(model, lambda_l1=0.01):
    lossl1 = 0
    for model_param_name, model_param_value in model.named_parameters():
            if model_param_name.endswith('weight'):
                lossl1 += lambda_l1 * model_param_value.abs().sum()
    return lossl1    
        
def orth_regularizer(model, lambda_orth=0.01):
    lossorth = 0
    for model_param_name, model_param_value in model.named_parameters():
            if model_param_name.endswith('weight'):
                param_flat = model_param_value.view(model_param_value.shape[0], -1)
                sym = torch.mm(param_flat, torch.t(param_flat))
                sym -= torch.eye(param_flat.shape[0])
                lossorth += lambda_orth * sym.sum()
    return lossorth  

and during training do:

loss = criterion(outputs, y_data)\
      +l1_regularizer(model, lambda_l1=lambda_l1)\
      +orth_regularizer(model, lambda_orth=lambda_orth)   
like image 24
Quastiat Avatar answered Oct 30 '25 01:10

Quastiat



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!