多GRU层神经网络权重矩阵分析

I have trained a Neural Network with a multi GRU layer in it. I did not use a L1/L2 regularization only gradient descent. I used the pytorch implementationnn.GRU(1024,1024,4,0.1)After training I checked the Wight Matrices.I found some strange effects in hh_l0_in:

For the learnable hidden-hidden layer the standard deviation of the secondary diagonal is a lot higher then in the rest of the Matrices. The standard deviation of the Wight Matrices for the first layer:

avg:0.0005793942std:0.27631217

avg:0.008843938std:0.28335693

avg:2.674491e-05std:0.08714065

avg:-0.00070823624std:0.087942585

avg:4.8848284e-05std:0.12259285ske:0.024917802381963283kur:7.325522493002966hsk:27.256551562628268hta:1860.4869170125573

avg:0.0023218004std:0.12732618ske:2.5480260834968385kur:16.072225736307036hsk:103.89389710637231hta:785.1604778707374

avg:0.0003275321std:0.30212235

avg:0.07846515std:2.5184803

avg:-4.97445e-05std:0.2345184

avg:-0.03676583std:2.0746834

avg:-0.01897307std:0.707518

avg:-19.582924std:1.5654982

The other layer have similar effects however the standard diviation of the output layer is different:

avg:-18.329617std:8.657197

The hh_lX_in also has a average of ~-20. Is this a effect of the input data or is it normal for GRU to have this bigger standard deviation on the secondary matrix. What causes this effect?

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签