Hinge loss

在機器學習中，鉸鏈損失是一個用於訓練分類器的損失函數。鉸鏈損失被用於「最大間格分類」，因此非常適合用於支持向量機 (SVM)。^[1] 对于一个预期输出 $t={\pm }1$ ，分类结果 $y$ 的鉸鏈損失定義為

t = 1

时变量

y

（水平方向）的铰链损失（蓝色，垂直方向）与0/1损失（垂直方向；绿色为

y < 0

，即分类错误）。注意铰接损失在

abs(y) < 1

时也会给出惩罚，对应于支持向量机中间隔的概念。

\ell (y)=\max(0,1-t\cdot y)

特別注意：以上式子的 $y$ 應該使用分類器的「原始輸出」，而非預測標籤。例如，在線性支持向量機當中， $y=\mathbf {w} \cdot \mathbf {x} +b$ ，其中 $(\mathbf {w} ,b)$ 是超平面参数， $\mathbf {x}$ 是輸入資料點。

當 $t$ 和 $y$ 同號（意即分類器的輸出 $y$ 是正確的分類），且 $|y|\geq 1$ 时，鉸鏈損失 $\ell (y)=0$ 。但是，當它們異號（意即分類器的輸出 $y$ 是错误的分類）時， $\ell (y)$ 隨 $y$ 線性增長。套用相似的想法，如果 $|y|<1$ ，即使 $t$ 和 $y$ 同號（意即分類器的分類正確，但是間隔不足），此時仍然會有損失。

扩展编辑

二元支持向量机经常通过一对多（winner-takes-all strategy，WTA SVM）或一对一（max-wins voting，MWV SVM）策略来扩展为多元分类，^[2] 铰接损失也可以做出类似的扩展，已有数个不同的多元分类铰接损失的变体被提出。^[3] 例如，Crammer 和 Singer ^[4] 将一个多元线性分类的铰链损失定义为^[5]

\ell (y)=\max(0,1+\max _{y\neq t}\mathbf {w} _{y}\mathbf {x} -\mathbf {w} _{t}\mathbf {x} )

其中 $t$ 为目的标签， $\mathbf {w} _{t}$ 和 $\mathbf {w} _{y}$ 该模型的参数。

Weston 和 Watkins 提出了一个类似的定义，但使用求和代替了最大值：^[6]^[3]

\ell (y)=\sum _{y\neq t}\max(0,1+\mathbf {w} _{y}\mathbf {x} -\mathbf {w} _{t}\mathbf {x} )

在结构预测中，铰接损失可以进一步扩展到结构化输出空间。支持间隔调整的结构化支持向量机可以使用如下所示的铰链损失变体，其中 $w$ 表示SVM的参数， $y$ 为SVM的预测结果， $φ$ 为联合特征函数， $Δ$ 为汉明损失:

{\begin{aligned}\ell (\mathbf {y} )&=\max(0,\Delta (\mathbf {y} ,\mathbf {t} )+\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {y} )\rangle -\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {t} )\rangle )\\&=\max(0,\max _{y\in {\mathcal {Y}}}\left(\Delta (\mathbf {y} ,\mathbf {t} )+\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {y} )\rangle \right)-\langle \mathbf {w} ,\phi (\mathbf {x} ,\mathbf {t} )\rangle )\end{aligned}}

优化算法编辑

铰链损失是一种凸函数，因此许多机器学习中常用的凸优化器均可用于优化铰链损失。它不是可微函数，但拥有一个关于线性 SVM 模型参数 $w$ 的次导数

{\frac {\partial \ell }{\partial w_{i}}}={\begin{cases}-t\cdot x_{i}&{\text{if }}t\cdot y<1\\0&{\text{otherwise}}\end{cases}}

其评分函数为 $y=\mathbf {w} \cdot \mathbf {x}$

三个铰链损失的变体

z = ty

：“普通变体”（蓝色），平方变体（绿色），以及 Rennie 和 Srebro 提出的分段平滑变体（红色）。

然而，由于铰接损失在 $ty=1$ 处不可导， Zhang 建议在优化时可使用平滑的变体建议，^[7] 如Rennie 和 Srebro 提出的分段平滑^[8]

\ell (y)={\begin{cases}{\frac {1}{2}}-ty&{\text{if}}~~ty\leq 0,\\{\frac {1}{2}}(1-ty)^{2}&{\text{if}}~~0<ty\leq 1,\\0&{\text{if}}~~1\leq ty\end{cases}}

或平方平滑。

\ell _{\gamma }(y)={\begin{cases}{\frac {1}{2\gamma }}\max(0,1-ty)^{2}&{\text{if}}~~ty\geq 1-\gamma \\1-{\frac {\gamma }{2}}-ty&{\text{otherwise}}\end{cases}}

Modified Huber loss $L$ 是 $\gamma =2$ 时损失函数的特例，此时 $L(t,y)=4\ell _{2}(y)$ 中。

参考文献编辑

^ Rosasco, L.; De Vito, E. D.; Caponnetto, A.; Piana, M.; Verri, A. Are Loss Functions All the Same? (PDF). Neural Computation. 2004, 16 (5): 1063–1076 [2019-06-04]. PMID 15070510. doi:10.1162/089976604773135104. （原始内容 (PDF)于2020-01-11）.
^ Duan, K. B.; Keerthi, S. S. Which Is the Best Multiclass SVM Method? An Empirical Study (PDF). Multiple Classifier Systems. LNCS 3541. 2005: 278–285 [2019-06-04]. ISBN 978-3-540-26306-7. doi:10.1007/11494683_28. （原始内容 (PDF)于2017-08-08）.
^ ^3.0 ^3.1 Doğan, Ürün; Glasmachers, Tobias; Igel, Christian. A Unified View on Multi-class Support Vector Classification (PDF). Journal of Machine Learning Research. 2016, 17: 1–32 [2019-06-04]. （原始内容 (PDF)于2018-05-05）. 引用错误：带有name属性“unifiedview”的<ref>标签用不同内容定义了多次
^ Crammer, Koby; Singer, Yoram. On the algorithmic implementation of multiclass kernel-based vector machines (PDF). Journal of Machine Learning Research. 2001, 2: 265–292 [2019-06-04]. （原始内容 (PDF)于2015-08-29）.
^ Moore, Robert C.; DeNero, John. L₁ and L₂ regularization for multiclass hinge loss models (PDF). Proc. Symp. on Machine Learning in Speech and Language Processing. 2011 [2019-06-04]. （原始内容 (PDF)于2017-08-28）.
^ Weston, Jason; Watkins, Chris. Support Vector Machines for Multi-Class Pattern Recognition (PDF). European Symposium on Artificial Neural Networks. 1999 [2019-06-04]. （原始内容 (PDF)于2018-05-05）.
^ Zhang, Tong. Solving large scale linear prediction problems using stochastic gradient descent algorithms (PDF). ICML. 2004 [2019-06-04]. （原始内容 (PDF)于2019-06-04）.
^ Rennie, Jason D. M.; Srebro, Nathan. Loss Functions for Preference Levels: Regression with Discrete Ordered Labels (PDF). Proc. IJCAI Multidisciplinary Workshop on Advances in Preference Handling. 2005 [2019-06-04]. （原始内容 (PDF)于2015-11-06）.

[1] Rosasco, L.; De Vito, E. D.; Caponnetto, A.; Piana, M.; Verri, A. Are Loss Functions All the Same? (PDF). Neural Computation. 2004, 16 (5): 1063–1076 [2019-06-04]. PMID 15070510. doi:10.1162/089976604773135104. （原始内容 (PDF)于2020-01-11）.

[duan2005-2] Duan, K. B.; Keerthi, S. S. Which Is the Best Multiclass SVM Method? An Empirical Study (PDF). Multiple Classifier Systems. LNCS 3541. 2005: 278–285 [2019-06-04]. ISBN 978-3-540-26306-7. doi:10.1007/11494683_28. （原始内容 (PDF)于2017-08-08）.

[unifiedview-3] 3.0 ^3.1 Doğan, Ürün; Glasmachers, Tobias; Igel, Christian. A Unified View on Multi-class Support Vector Classification (PDF). Journal of Machine Learning Research. 2016, 17: 1–32 [2019-06-04]. （原始内容 (PDF)于2018-05-05）. 引用错误：带有name属性“unifiedview”的<ref>标签用不同内容定义了多次

[4] Crammer, Koby; Singer, Yoram. On the algorithmic implementation of multiclass kernel-based vector machines (PDF). Journal of Machine Learning Research. 2001, 2: 265–292 [2019-06-04]. （原始内容 (PDF)于2015-08-29）.

[5] Moore, Robert C.; DeNero, John. L₁ and L₂ regularization for multiclass hinge loss models (PDF). Proc. Symp. on Machine Learning in Speech and Language Processing. 2011 [2019-06-04]. （原始内容 (PDF)于2017-08-28）.

[6] Weston, Jason; Watkins, Chris. Support Vector Machines for Multi-Class Pattern Recognition (PDF). European Symposium on Artificial Neural Networks. 1999 [2019-06-04]. （原始内容 (PDF)于2018-05-05）.

[zhang-7] Zhang, Tong. Solving large scale linear prediction problems using stochastic gradient descent algorithms (PDF). ICML. 2004 [2019-06-04]. （原始内容 (PDF)于2019-06-04）.

[8] Rennie, Jason D. M.; Srebro, Nathan. Loss Functions for Preference Levels: Regression with Discrete Ordered Labels (PDF). Proc. IJCAI Multidisciplinary Workshop on Advances in Preference Handling. 2005 [2019-06-04]. （原始内容 (PDF)于2015-11-06）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

www.wiki2.zh-cn.nina.az

Hinge loss

扩展编辑

优化算法编辑

参考文献编辑

帕克徹斯特車站 (IRT佩勒姆線)

帕克斯-麥克萊倫演算法

帕克鎮區 (堪薩斯州塞奇威克縣)

帕克黑爾M82狙擊步槍

帕利亚诺

帕努瓦·普利马尼楠

帕努加·里奥

帕勒斯坦 (得克萨斯州)

帕基略·费尔南德斯

帕坦人

马步康

马殿魁

马格努斯绿盐

马格太阳能体育场

马格德堡主教座堂

文章

扩展 编辑

优化算法 编辑

参考文献 编辑

文章

扩展编辑

优化算法编辑

参考文献编辑