TY - CONF
T1 - Socially Responsible Hate Speech Detection
T2 - Can Classifiers Reflect Social Stereotypes?
AU - Vargas, Francielle
AU - Carvalho, Isabelle
AU - Hürriyetoǧlu, Ali
AU - Pardo, Thiago A.S.
AU - Benevenuto, Fabrício
N1 - Funding Information:
This project was partially funded by the SINCH, FAPESP, FAPEMIG, and CNPq, as well as the Ministry of Science, Technology and Innovation, with resources of Law N. 8.248, of October 23, 1991, within the scope of PPI-SOFTEX, coordinated by Softex and published as Residence in TIC 13, DOU 01245.010222/2022-44.
Publisher Copyright:
© 2023 Incoma Ltd. All rights reserved.
PY - 2023
Y1 - 2023
N2 - Recent studies have shown that hate speech technologies may propagate social stereotypes against marginalized groups. Nevertheless, there has been a lack of realistic approaches to assess and mitigate biased technologies. In this paper, we introduce a new approach to analyze the potential of hate-speech classifiers to reflect social stereotypes through the investigation of stereotypical beliefs by contrasting them with counter-stereotypes. We empirically measure the distribution of stereotypical beliefs by analyzing the distinctive classification of tuples containing stereotypes versus counterstereotypes in machine learning models and datasets. Experiment results show that hate speech classifiers attribute unreal or negligent offensiveness to social identity groups by reflecting and reinforcing stereotypical beliefs regarding minorities. Furthermore, we also found out that models that embed expert and context information from offensiveness markers present promising results to mitigate social stereotype bias towards socially responsible hate speech detection.
AB - Recent studies have shown that hate speech technologies may propagate social stereotypes against marginalized groups. Nevertheless, there has been a lack of realistic approaches to assess and mitigate biased technologies. In this paper, we introduce a new approach to analyze the potential of hate-speech classifiers to reflect social stereotypes through the investigation of stereotypical beliefs by contrasting them with counter-stereotypes. We empirically measure the distribution of stereotypical beliefs by analyzing the distinctive classification of tuples containing stereotypes versus counterstereotypes in machine learning models and datasets. Experiment results show that hate speech classifiers attribute unreal or negligent offensiveness to social identity groups by reflecting and reinforcing stereotypical beliefs regarding minorities. Furthermore, we also found out that models that embed expert and context information from offensiveness markers present promising results to mitigate social stereotype bias towards socially responsible hate speech detection.
UR - http://www.scopus.com/inward/record.url?scp=85179180014&partnerID=8YFLogxK
M3 - Paper
AN - SCOPUS:85179180014
SP - 1187
EP - 1196
ER -