forked from chenzomi12/AISystem
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path01.srt
1152 lines (864 loc) · 16.6 KB
/
01.srt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
1
00:00:00,000 --> 00:00:04,560
字幕生成: BLACK 字幕校对: 方鸿渐
2
00:00:06,200 --> 00:00:10,640
唉,现在看看时间已经到了凌晨35分
3
00:00:10,640 --> 00:00:12,160
就晚上12点半了
4
00:00:13,520 --> 00:00:17,040
我终于知道为什么大家叫做这种晚上更新视频的up主
5
00:00:17,040 --> 00:00:19,200
或者人叫做肝帝啊
6
00:00:19,200 --> 00:00:21,080
原来确实很伤肝
7
00:00:21,080 --> 00:00:22,960
我不是说你很能干
8
00:00:23,720 --> 00:00:25,920
Hey! That's pretty good!
9
00:00:27,040 --> 00:00:29,440
录完这个视频呢我就要去休息了
10
00:00:29,440 --> 00:00:31,680
那今天我给大家去汇报的一个内容呢
11
00:00:31,680 --> 00:00:34,720
就是推理引擎的模型小型化
12
00:00:34,720 --> 00:00:35,560
一个新的内容啊
13
00:00:35,560 --> 00:00:36,960
模型小型化哦
14
00:00:36,960 --> 00:00:37,480
然后呢
15
00:00:37,480 --> 00:00:40,360
在进入正式的一些算法之前呢
16
00:00:40,360 --> 00:00:44,800
我想给大家去介绍一下推理的一个具体的参数
17
00:00:44,800 --> 00:00:46,520
或者相关的参数
18
00:00:46,520 --> 00:00:47,560
那在模型小型化
19
00:00:47,560 --> 00:00:49,600
会分开三个内容去介绍的
20
00:00:49,600 --> 00:00:52,280
第一个就是基础的参数的概念
21
00:00:52,280 --> 00:00:53,800
了解完基础的参数
22
00:00:53,800 --> 00:00:58,680
才知道模型小型化到底什么样才是有效的
23
00:00:58,680 --> 00:01:04,080
应该用什么参数或者什么指标去衡量小型化
24
00:01:04,080 --> 00:01:04,680
那接着呢
25
00:01:04,680 --> 00:01:06,920
去看看CNN小型化
26
00:01:06,920 --> 00:01:11,120
最后再看看Transformer小型化的一些内容
27
00:01:14,480 --> 00:01:14,960
其实呢
28
00:01:14,960 --> 00:01:16,760
随着AI业务的发展呢
29
00:01:16,760 --> 00:01:17,680
模型啊
30
00:01:17,680 --> 00:01:20,360
应该相对来说是越来越大
31
00:01:20,360 --> 00:01:21,320
那这个圈圈呢
32
00:01:21,320 --> 00:01:23,360
就代表模型的参数量
33
00:01:23,360 --> 00:01:24,800
模型的参数量越大
34
00:01:24,800 --> 00:01:27,000
精度呢是越高
35
00:01:27,000 --> 00:01:28,440
不管是哪种方式也好
36
00:01:28,440 --> 00:01:30,080
走上面这条也好啊
37
00:01:30,080 --> 00:01:31,840
确实越大呢
38
00:01:31,840 --> 00:01:33,920
精度是越高的
39
00:01:33,920 --> 00:01:34,600
那这一点呢
40
00:01:34,600 --> 00:01:35,440
是毋庸置疑的
41
00:01:35,440 --> 00:01:37,640
只是大到什么程度
42
00:01:37,640 --> 00:01:38,920
高到什么程度
43
00:01:38,920 --> 00:01:39,640
中间呢
44
00:01:39,640 --> 00:01:42,160
可能会有一个权衡
45
00:01:42,160 --> 00:01:43,520
那往右边一看呢
46
00:01:43,520 --> 00:01:46,120
现在的大模型越来越多
47
00:01:46,120 --> 00:01:47,880
确实已经变成大模型了
48
00:01:47,880 --> 00:01:49,240
既然叫大模型
49
00:01:49,240 --> 00:01:50,520
这个红色这条线呢
50
00:01:50,520 --> 00:01:51,560
就是Large Scale了
51
00:01:51,560 --> 00:01:57,400
就大规模的使得到了模型的参数量进一步的增加
52
00:01:57,400 --> 00:01:58,040
这个时候呢
53
00:01:58,040 --> 00:02:01,440
计算量要求也是非常的高
54
00:02:01,440 --> 00:02:02,720
那基于这一点呢
55
00:02:02,720 --> 00:02:06,760
来看看具体的参数量怎么去评价的
56
00:02:09,760 --> 00:02:10,320
那现在呢
57
00:02:10,320 --> 00:02:11,320
有几个指针
58
00:02:11,320 --> 00:02:11,840
第一个呢
59
00:02:11,840 --> 00:02:13,960
就是Flops
60
00:02:13,960 --> 00:02:17,080
指的是浮点运算的次数
61
00:02:17,080 --> 00:02:19,160
Floating Point Operation
62
00:02:19,160 --> 00:02:19,880
那这个时候呢
63
00:02:19,880 --> 00:02:22,640
一般的会以Flops作为计算量
64
00:02:22,640 --> 00:02:25,560
来去衡量算法模型的时间的复杂度
65
00:02:25,560 --> 00:02:26,360
那接着呢
66
00:02:26,360 --> 00:02:28,280
看一个概念
67
00:02:28,280 --> 00:02:29,720
它也叫FlopS
68
00:02:29,720 --> 00:02:31,000
但是那个S呢
69
00:02:31,000 --> 00:02:32,760
就变成大写了
70
00:02:32,760 --> 00:02:33,760
那这种情况下呢
71
00:02:33,760 --> 00:02:37,440
要做每秒所运行的浮点的运算的次数
72
00:02:37,440 --> 00:02:40,600
Floating Point Operation Per Second
73
00:02:40,600 --> 00:02:41,800
Per Second这个S呢
74
00:02:41,800 --> 00:02:43,720
就变成了一个缩写
75
00:02:43,720 --> 00:02:45,880
每秒所运行的浮点运算次数呢
76
00:02:45,880 --> 00:02:47,760
叫做运算的
77
00:02:47,760 --> 00:02:49,600
或者叫做充足的理解
78
00:02:49,600 --> 00:02:50,720
计算的速率
79
00:02:50,720 --> 00:02:52,800
去衡量硬件的一个指针
80
00:02:52,800 --> 00:02:54,640
还有模型速度的一个指针
81
00:02:54,640 --> 00:02:57,560
作为芯片的一个算力指针
82
00:02:57,560 --> 00:02:58,120
接下来呢
83
00:02:58,120 --> 00:02:59,400
看一下第三个概念
84
00:02:59,400 --> 00:03:00,960
就是MACCs
85
00:03:00,960 --> 00:03:03,000
乘加的操作次数
86
00:03:03,000 --> 00:03:06,200
Multiple Accumulator Operations
87
00:03:06,200 --> 00:03:07,440
通常来说呢
88
00:03:07,440 --> 00:03:09,600
乘加的操作次数就MACCs呢
89
00:03:09,600 --> 00:03:11,920
是第一个点浮点运算次数
90
00:03:11,920 --> 00:03:13,720
Flops的一半
91
00:03:13,720 --> 00:03:14,280
举个例子
92
00:03:14,280 --> 00:03:15,280
就是现在呢
93
00:03:15,280 --> 00:03:17,040
有很多矩阵的相乘呢
94
00:03:17,040 --> 00:03:19,120
W0乘以X0呢
95
00:03:19,120 --> 00:03:23,240
把它视为简单的一个乘法的操作
96
00:03:23,240 --> 00:03:24,360
那大部分时候呢
97
00:03:24,360 --> 00:03:27,320
都会做非常大量的乘法的运算
98
00:03:27,320 --> 00:03:28,560
或者乘加的运算
99
00:03:28,560 --> 00:03:29,720
在推理芯片
100
00:03:29,720 --> 00:03:31,320
或者AI加速芯片里面
101
00:03:31,320 --> 00:03:33,320
那这也是其中一个指标
102
00:03:33,320 --> 00:03:34,640
那第四个指标呢
103
00:03:34,640 --> 00:03:36,360
就是Pramas
104
00:03:36,360 --> 00:03:40,200
这个是去衡量刚才的那些图里面
105
00:03:40,200 --> 00:03:43,040
非常有利的一个指标数
106
00:03:43,040 --> 00:03:44,440
那就模型的大小
107
00:03:44,440 --> 00:03:46,880
说白了就是模型的大小数
108
00:03:46,880 --> 00:03:47,640
那这个时候呢
109
00:03:47,640 --> 00:03:50,760
会直接影响到对内存的占用量
110
00:03:50,760 --> 00:03:51,240
单位呢
111
00:03:51,240 --> 00:03:52,120
通常为M
112
00:03:52,120 --> 00:03:54,040
就是KB MB这个M
113
00:03:54,040 --> 00:03:54,760
那这个M呢
114
00:03:54,760 --> 00:03:56,280
主要是指MB
115
00:03:56,280 --> 00:03:57,680
而参数量呢
116
00:03:57,680 --> 00:04:00,760
一般用Float32去表示
117
00:04:00,760 --> 00:04:02,800
因为一般存储或训练的时候呢
118
00:04:02,800 --> 00:04:05,160
都是用FP32去训练的
119
00:04:05,160 --> 00:04:05,920
那这个时候呢
120
00:04:05,920 --> 00:04:06,960
模型的大小呢
121
00:04:06,960 --> 00:04:08,400
是参数量的4倍
122
00:04:09,880 --> 00:04:12,360
下面来再看另外两个指标
123
00:04:12,360 --> 00:04:13,720
另外两个指标呢
124
00:04:13,720 --> 00:04:14,640
叫做MAC
125
00:04:14,640 --> 00:04:16,000
跟MACC是不一样的
126
00:04:16,000 --> 00:04:17,440
大家要注意一下
127
00:04:17,440 --> 00:04:18,320
MAC呢
128
00:04:18,320 --> 00:04:20,920
这个指标经常去用到
129
00:04:20,920 --> 00:04:22,640
就是内存访问的代价
130
00:04:22,680 --> 00:04:24,200
Memory Assessed Post
131
00:04:25,480 --> 00:04:26,120
MAC呢
132
00:04:26,120 --> 00:04:28,160
主要是指输一个简单的样本
133
00:04:28,160 --> 00:04:29,600
那以图像为例子
134
00:04:29,600 --> 00:04:31,040
我输入对系统呢
135
00:04:31,040 --> 00:04:32,400
输入一个图片
136
00:04:32,400 --> 00:04:34,800
那完成一个整体的前向传播
137
00:04:34,800 --> 00:04:36,760
或者一个简单的卷积之后呢
138
00:04:36,760 --> 00:04:39,600
对内存的一个交换的总量
139
00:04:39,600 --> 00:04:40,960
就模型的空间复杂度
140
00:04:40,960 --> 00:04:41,560
单位呢
141
00:04:41,560 --> 00:04:45,120
用byte来去做一个统计
142
00:04:45,120 --> 00:04:46,240
那最后一个呢
143
00:04:46,240 --> 00:04:47,880
就是内存的带宽
144
00:04:47,880 --> 00:04:48,680
内存的带宽
145
00:04:48,680 --> 00:04:50,480
这一个参数量是非常重要的
146
00:04:50,480 --> 00:04:52,520
就我觉得里面比较重要的
147
00:04:52,520 --> 00:04:54,440
几个参数量吧
148
00:04:54,440 --> 00:04:55,440
或者几个指针
149
00:04:55,440 --> 00:04:56,840
就是内存的带宽
150
00:04:56,840 --> 00:04:57,840
MAC
151
00:04:57,840 --> 00:04:58,440
Flops
152
00:04:58,440 --> 00:04:59,600
模型的参数
153
00:04:59,600 --> 00:05:01,760
这四个其实是比较重要的
154
00:05:01,760 --> 00:05:02,560
内存的带宽呢
155
00:05:02,560 --> 00:05:04,320
主要决定了将数据呢
156
00:05:04,320 --> 00:05:06,280
从内存里面移到ALU
157
00:05:06,280 --> 00:05:07,800
或者kernel的内核
158
00:05:07,800 --> 00:05:11,200
或者TensorRT里面去做计算的速率
159
00:05:11,200 --> 00:05:13,840
就是搬运内存的一个速率
160
00:05:15,040 --> 00:05:16,600
那内存带宽的值呢
161
00:05:16,600 --> 00:05:17,200
这个值呢
162
00:05:17,200 --> 00:05:19,600
决定于内存和计算内核之间的数据
163
00:05:19,600 --> 00:05:21,040
传输的速率
164
00:05:21,040 --> 00:05:22,520
这个值是越高越好
165
00:05:22,520 --> 00:05:23,480
但肯定了
166
00:05:23,480 --> 00:05:25,320
因为硬件的设计的问题
167
00:05:25,320 --> 00:05:26,640
或者功耗的问题
168
00:05:26,640 --> 00:05:28,560
还有那个价格的问题呢
169
00:05:28,560 --> 00:05:29,520
这个内存带宽呢
170
00:05:29,520 --> 00:05:31,040
会有一定的峰值的
171
00:05:33,680 --> 00:05:35,560
那现在来看看几个
172
00:05:35,560 --> 00:05:37,680
比较典型的一个计算
173
00:05:37,680 --> 00:05:39,880
那在标准的卷积层
174
00:05:39,880 --> 00:05:40,600
那这个时候呢
175
00:05:40,600 --> 00:05:41,640
参数量呢
176
00:05:41,640 --> 00:05:43,160
就等于kernel的h
177
00:05:43,160 --> 00:05:43,920
kernel的w
178
00:05:43,920 --> 00:05:45,920
乘以kernel的in和kernel的out
179
00:05:45,920 --> 00:05:47,040
参数量呢
180
00:05:47,040 --> 00:05:48,600
大部分是这么去计算的
181
00:05:48,600 --> 00:05:49,840
为啥都是kernel呢
182
00:05:51,040 --> 00:05:53,080
为啥都是kernel跟输入的数据呢
183
00:05:53,080 --> 00:05:55,440
是因为大部分的参数量啊
184
00:05:56,040 --> 00:05:57,040
都是从kernel
185
00:05:57,040 --> 00:05:58,480
或者kernelchannel
186
00:05:58,480 --> 00:06:00,400
c in c out的一些参数量
187
00:06:01,240 --> 00:06:02,480
那在算FLOPs
188
00:06:02,480 --> 00:06:04,120
浮点运算精度的时候呢
189
00:06:04,120 --> 00:06:05,880
就会再乘以一个h和w
190
00:06:05,880 --> 00:06:07,840
就是图像的长和宽
191
00:06:08,280 --> 00:06:08,880
那下面呢
192
00:06:08,880 --> 00:06:10,720
还有好几个就是全连接了
193
00:06:10,720 --> 00:06:11,880
全连接比较简单
194
00:06:11,920 --> 00:06:13,920
基本上都是c in c out去计算
195
00:06:13,960 --> 00:06:15,440
然后也没有其他了
196
00:06:15,560 --> 00:06:16,320
另外的话
197
00:06:16,320 --> 00:06:17,400
卷积呢
198
00:06:17,400 --> 00:06:18,800
还有group卷积
199
00:06:18,800 --> 00:06:20,040
所以说group卷积呢
200
00:06:20,040 --> 00:06:21,240
可能里面呢
201
00:06:21,240 --> 00:06:22,880
就会做成一个group
202
00:06:22,920 --> 00:06:24,320
然后做FLOPs的时候呢
203
00:06:24,320 --> 00:06:26,280
会对w除一个group
204
00:06:27,720 --> 00:06:29,240
当然还有Depth-wise卷积
205
00:06:29,240 --> 00:06:30,160
Depth-wise卷积呢
206
00:06:30,160 --> 00:06:31,720
就会除一个c in
207
00:06:31,720 --> 00:06:32,360
然后呢
208
00:06:32,360 --> 00:06:33,080
这个flops呢
209
00:06:33,080 --> 00:06:34,400
也是相同的
210
00:06:34,440 --> 00:06:37,200
所以说要了解算法
211
00:06:37,200 --> 00:06:38,520
为啥要了解算法
212
00:06:38,520 --> 00:06:39,840
要了解kernel呢
213
00:06:39,880 --> 00:06:41,600
就是因为有很多不同的
214
00:06:41,600 --> 00:06:42,560
计算的方式
215
00:06:42,600 --> 00:06:45,480
都会影响整个的系统的效率
216
00:06:45,480 --> 00:06:45,880
嗯
217
00:06:50,000 --> 00:06:50,720
那下面呢
218
00:06:50,720 --> 00:06:52,200
以英伟达的T4呢
219
00:06:52,200 --> 00:06:53,160
去了解一下
220
00:06:53,160 --> 00:06:55,480
这些具体的参数指针
221
00:06:55,520 --> 00:06:57,640
那这个就是英伟达T4的一个
222
00:06:57,640 --> 00:07:00,440
GPU双联路的一个具体的推理的性能
223
00:07:00,480 --> 00:07:01,880
还有训练的性能
224
00:07:01,920 --> 00:07:02,520
可以看到啊
225
00:07:02,520 --> 00:07:04,200
t4大部分都是做推理的
226
00:07:04,200 --> 00:07:05,720
所以训练可以不看
227
00:07:05,760 --> 00:07:07,320
那推理的性能可以看到
228
00:07:07,320 --> 00:07:09,080
确实它有非常对比起
229
00:07:09,200 --> 00:07:09,880
CPU呢
230
00:07:09,880 --> 00:07:11,560
有非常大的
231
00:07:11,560 --> 00:07:14,480
或者非常高的一个性能的提升
232
00:07:14,480 --> 00:07:15,880
而这些性能的提升呢
233
00:07:15,880 --> 00:07:17,520
看看它具体的规格
234
00:07:17,800 --> 00:07:20,440
里面的一个tensor的内核数
235
00:07:20,440 --> 00:07:21,760
还有cuda的内核数
236
00:07:21,960 --> 00:07:22,920
cuda的内核数呢
237
00:07:22,920 --> 00:07:24,080
就意味着对于vector的
238
00:07:24,080 --> 00:07:25,040
计算或者线程数呢
239
00:07:25,040 --> 00:07:26,640
可以做的非常的多
240
00:07:26,680 --> 00:07:28,520
而tensor的内核数呢
241
00:07:28,520 --> 00:07:31,320
就对一些稠密的矩阵的运算
242
00:07:31,520 --> 00:07:31,920
接着呢
243
00:07:31,920 --> 00:07:32,800
可以看一下
244
00:07:32,800 --> 00:07:33,760
其实很多
245
00:07:33,760 --> 00:07:34,720
不管是单精度
246
00:07:34,720 --> 00:07:35,440
浮点精度
247
00:07:35,440 --> 00:07:37,320
还是int4呢
248
00:07:37,320 --> 00:07:39,240
都会有一个TFLOPS
249
00:07:39,280 --> 00:07:40,360
那这个S呢
250
00:07:40,360 --> 00:07:42,815
是大小每秒处理的数据量