编程开源技术交流,分享技术与知识

网站首页 > 开源技术 正文

某小伙用10行tensorlayer深度学习代码,导入Slim模型迁移学习

wxchong 2024-08-04 02:41:54 开源技术 13 ℃ 0 评论

笔者博客(用tensorflow迁移学习猫狗分类)笔者讲到用tensorlayer的VGG16模型迁移学习图像分类,那麽问题来了,tensorlayer没提供的模型怎么办呢?别担心,tensorlayer提供了tensorflow中的slim模型导入功能,代码例子在tutorial_inceptionV3_tfslim。

那么什么是slim?slim到底有什么用?

slim是一个使构建,训练,评估神经网络变得简单的库。它可以消除原生tensorflow里面很多重复的模板性的代码,让代码更紧凑,更具备可读性。另外slim提供了很多计算机视觉方面的著名模型(VGG, AlexNet等),我们不仅可以直接使用,甚至能以各种方式进行扩展。(笔者注:总之功能跟tensorlayer差不多嘛)更多介绍可以看这篇文章:【Tensorflow】辅助工具篇——tensorflow slim(TF-Slim)介绍

要进行迁移学习,首先需要slim模型代码以及预训练好的权重参数,这些谷歌都有提供下载,可以看到主页下面有各个模型以及在imagenet训练集下的参数地址。

列表还列出了各个模型的top1、top5的正确率,模型很多了。

好了我们下载Inception-ResNet-v2以及inception_resnet_v2_2016_08_30.tar.gz,py文件和解压出来的.ckpt文件放到项目根目录下面。至于为什么不用tensorlayer例子提供的Inception V3?因为Inception-ResNet-v2正确率高啊。(哈哈真正原因最后来讲)。

我们依旧进行猫狗分类,按照教程导入模型修改num_classes再导入训练数据,直接训练是会报错的,因为最后的Logits层几个参数在恢复时维度不匹配。

最后几个参数是不能恢复了,笔者也没有找到选择性恢复.ckpt参数的tensorflow方法。怎么办呢?幸好群里面有位朋友提供了一个方法,参见【Tensorflow 迁移学习】:

主要思想是:先把所有.ckpt参数恢复成npz格式,再选择恢复npz中的参数,恢复npz中的参数就跟前一篇博客操作一样的了。

所以整个过程分两步走:

1.将参数恢复然后保存为npz格式:

下面是具体代码:

  1. import os

  2. import time

  3. from recordutil import*

  4. import numpy as np

  5. # from tensorflow.contrib.slim.python.slim.nets.resnet_v2 import resnet_v2_152

  6. # from tensorflow.contrib.slim.python.slim.nets.vgg import vgg_16

  7. import skimage

  8. import skimage.io

  9. import skimage.transform

  10. import tensorflow as tf

  11. from tensorlayer.layers import*

  12. # from scipy.misc import imread, imresize

  13. # from tensorflow.contrib.slim.python.slim.nets.alexnet import alexnet_v2

  14. from inception_resnet_v2 import(inception_resnet_v2_arg_scope, inception_resnet_v2)

  15. from scipy.misc import imread, imresize

  16. from tensorflow.python.ops import variables

  17. import tensorlayer as tl

  18. slim = tf.contrib.slim

  19. try:

  20. from data.imagenet_classes import*

  21. exceptExceptionas e:

  22. raiseException(

  23. "{} / download the file from: https://github.com/zsdonghao/tensorlayer/tree/master/example/data".format(e))

  24. n_epoch =200

  25. learning_rate =0.0001

  26. print_freq =2

  27. batch_size =32

  28. ## InceptionV3 / All TF-Slim nets can be merged into TensorLayer

  29. x = tf.placeholder(tf.float32, shape=[None,299,299,3])

  30. # 输出

  31. y_ = tf.placeholder(tf.int32, shape=[None,], name='y_')

  32. net_in = tl.layers.InputLayer(x, name='input_layer')

  33. with slim.arg_scope(inception_resnet_v2_arg_scope()):

  34. network = tl.layers.SlimNetsLayer(

  35. prev_layer=net_in,

  36. slim_layer=inception_resnet_v2,

  37. slim_args={

  38. 'num_classes':1001,

  39. 'is_training':True,

  40. },

  41. name='InceptionResnetV2'# <-- the name should be the same with the ckpt model

  42. )

  43. # network = fc_layers(net_cnn)

  44. sess = tf.InteractiveSession()

  45. network.print_params(False)

  46. # network.print_layers()

  47. saver = tf.train.Saver()

  48. # 加载预训练的参数

  49. # tl.files.assign_params(sess, npz, network)

  50. tl.layers.initialize_global_variables(sess)

  51. saver.restore(sess,"inception_resnet_v2.ckpt")

  52. print("Model Restored")

  53. all_params = sess.run(network.all_params)

  54. np.savez('inception_resnet_v2.npz', params=all_params)

  55. sess.close()

2.部分恢复npz参数然后训练模型:

首先我们修改模型最后一层参数,由于进行的是2分类学习,所以做如下修改:

  1. with slim.arg_scope(inception_resnet_v2_arg_scope()):

  2. network = tl.layers.SlimNetsLayer(

  3. prev_layer=net_in,

  4. slim_layer=inception_resnet_v2,

  5. slim_args={

  6. 'num_classes':2,

  7. 'is_training':True,

  8. },

  9. name='InceptionResnetV2'# <-- the name should be the same with the ckpt model

  10. )

num_classes改为2,is_training为True。

接着定义输入输出以及损失函数:

  1. sess = tf.InteractiveSession()

  2. # saver = tf.train.Saver()

  3. y = network.outputs

  4. y_op = tf.argmax(tf.nn.softmax(y),1)

  5. cost = tl.cost.cross_entropy(y, y_, name='cost')

  6. correct_prediction = tf.equal(tf.cast(tf.argmax(y,1), tf.float32), tf.cast(y_, tf.float32))

  7. acc = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

下面是定义训练参数,我们只训练最后一层的参数,打印参数出来我们看到:

  1. [TL] param 900:InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/weights:0(5,5,128,768) float32_ref

  2. [TL] param 901:InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta:0(768,) float32_ref

  3. [TL] param 902:InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_mean:0(768,) float32_ref

  4. [TL] param 903:InceptionResnetV2/AuxLogits/Conv2d_2a_5x5/BatchNorm/moving_variance:0(768,) float32_ref

  5. [TL] param 904:InceptionResnetV2/AuxLogits/Logits/weights:0(768,2) float32_ref

  6. [TL] param 905:InceptionResnetV2/AuxLogits/Logits/biases:0(2,) float32_ref

  7. [TL] param 906:InceptionResnetV2/Logits/Logits/weights:0(1536,2) float32_ref

  8. [TL] param 907:InceptionResnetV2/Logits/Logits/biases:0(2,) float32_ref

  9. [TL] num of params:56940900

从param 904开始训练就行了,参数恢复到param 903

下面是训练函数以及恢复部分参数,加载样本数据:

  1. # 定义 optimizer

  2. train_params = network.all_params[904:]

  3. print('训练参数:', train_params)

  4. # # 加载预训练的参数

  5. # tl.files.assign_params(sess, params, network)

  6. train_op = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost, var_list=train_params)

  7. img, label = read_and_decode("D:\\001-Python\\train299.tfrecords")

  8. # 使用shuffle_batch可以随机打乱输入

  9. X_train, y_train = tf.train.shuffle_batch([img, label],

  10. batch_size=batch_size, capacity=200,

  11. min_after_dequeue=100)

  12. tl.layers.initialize_global_variables(sess)

  13. params = tl.files.load_npz('','inception_resnet_v2.npz')

  14. params = params[0:904]

  15. print('当前参数大小:', len(params))

  16. tl.files.assign_params(sess, params=params, network=network)

下面依旧是训练模型的代码,跟上一篇一样:

  1. # # 训练模型

  2. coord = tf.train.Coordinator()

  3. threads = tf.train.start_queue_runners(sess=sess, coord=coord)

  4. step =0

  5. filelist = getfilelist()

  6. for epoch in range(n_epoch):

  7. start_time = time.time()

  8. val, l = sess.run([X_train, y_train])#next_data(filelist, batch_size)#

  9. for X_train_a, y_train_a in tl.iterate.minibatches(val, l, batch_size, shuffle=True):

  10. sess.run(train_op, feed_dict={x: X_train_a, y_: y_train_a})

  11. if epoch +1==1or(epoch +1)% print_freq ==0:

  12. print("Epoch %d of %d took %fs"%(epoch +1, n_epoch, time.time()- start_time))

  13. train_loss, train_acc, n_batch =0,0,0

  14. for X_train_a, y_train_a in tl.iterate.minibatches(val, l, batch_size, shuffle=True):

  15. err, ac = sess.run([cost, acc], feed_dict={x: X_train_a, y_: y_train_a})

  16. train_loss += err

  17. train_acc += ac

  18. n_batch +=1

  19. print(" train loss: %f"%(train_loss / n_batch))

  20. print(" train acc: %f"%(train_acc / n_batch))

  21. # tl.files.save_npz(network.all_params, name='model_vgg_16_2.npz', sess=sess)

  22. coord.request_stop()

  23. coord.join(threads)

batchsize为20训练200代,部分结果如下:

  1. Epoch156 of 200 took 12.568609s

  2. train loss:0.382517

  3. train acc:0.950000

  4. Epoch158 of 200 took 12.457161s

  5. train loss:0.382509

  6. train acc:0.850000

  7. Epoch160 of 200 took 12.385407s

  8. train loss:0.320393

  9. train acc:1.000000

  10. Epoch162 of 200 took 12.489218s

  11. train loss:0.480686

  12. train acc:0.700000

  13. Epoch164 of 200 took 12.388841s

  14. train loss:0.329189

  15. train acc:0.850000

  16. Epoch166 of 200 took 12.446472s

  17. train loss:0.379127

  18. train acc:0.900000

  19. Epoch168 of 200 took 12.888571s

  20. train loss:0.365938

  21. train acc:0.900000

  22. Epoch170 of 200 took 12.850605s

  23. train loss:0.353434

  24. train acc:0.850000

  25. Epoch172 of 200 took 12.855129s

  26. train loss:0.315443

  27. train acc:0.950000

  28. Epoch174 of 200 took 12.906666s

  29. train loss:0.460817

  30. train acc:0.750000

  31. Epoch176 of 200 took 12.830738s

  32. train loss:0.421025

  33. train acc:0.900000

  34. Epoch178 of 200 took 12.852572s

  35. train loss:0.418784

  36. train acc:0.800000

  37. Epoch180 of 200 took 12.951322s

  38. train loss:0.316057

  39. train acc:0.950000

  40. Epoch182 of 200 took 12.866213s

  41. train loss:0.363328

  42. train acc:0.900000

  43. Epoch184 of 200 took 13.012520s

  44. train loss:0.379462

  45. train acc:0.850000

  46. Epoch186 of 200 took 12.934583s

  47. train loss:0.472857

  48. train acc:0.750000

  49. Epoch188 of 200 took 13.038168s

  50. train loss:0.236005

  51. train acc:1.000000

  52. Epoch190 of 200 took 13.056378s

  53. train loss:0.266042

  54. train acc:0.950000

  55. Epoch192 of 200 took 13.016137s

  56. train loss:0.255430

  57. train acc:0.950000

  58. Epoch194 of 200 took 13.013147s

  59. train loss:0.422342

  60. train acc:0.900000

  61. Epoch196 of 200 took 12.980659s

  62. train loss:0.353984

  63. train acc:0.900000

  64. Epoch198 of 200 took 13.033676s

  65. train loss:0.320018

  66. train acc:0.950000

  67. Epoch200 of 200 took 12.945982s

  68. train loss:0.288049

  69. train acc:0.950000

好了,迁移学习Inception-ResNet-v2结束。

作者说SlimNetsLayer是能导入任何Slim Model的。笔者已经验证过导入Inception-ResNet-v2和VGG16成功,Inception V3导入后训练了两三天,正确率一直在10到70之间波动(跟笔者的心情一样不稳定),笔者一直找不出原因,心累,希望哪位朋友再去验证一下Inception V3咯。

关注小编,请加入昂钛客人工智能、深度学习社群。我们在这里等你,可以交流学习,案例。笔者也在这里。

加入请点击扩展链接!

Tags:

本文暂时没有评论,来添加一个吧(●'◡'●)

欢迎 发表评论:

最近发表
标签列表