Knowledge_Distillation

约 185 字小于 1 分钟

2025-07-08

第一步是训练Net-T；
第二步是在高温T下，蒸馏Net-T的知识到Net-S

https://intellabs.github.io/distiller/knowledge_distillation.html#hinton-et-al-2015

高温蒸馏过程的目标函数由distill loss(对应soft target)和student loss(对应hard target)加权得到。

L=\alpha L_{s o f t}+\beta L_{h a r d}\\

其中

{\cal L}_{s o f t}=-\sum_{j}^{N}\ p_{j}^{T}\log(q_{j}^{T})\\ p_{i}^{T}\,=\,\frac{\exp(v_{i}/T)}{\sum_{k}^{N}\exp(v_{k}/T)}\,\,,\,q_{i}^{T}\,=\,\frac{\exp(z_{i}/T)}{\sum_{k}^{N}\exp(z_{k}/T)}

随后：

{\cal L}_{h a r d}=-\sum_{j}^{N}c_{j}\log(q_{j}^{1})\\ q_{i}^{1}\,=\,\frac{\exp(z_{i})}{\sum_{k}^{N}\exp(z_{k})}