Methods for Text Style Transfer: Text Detoxification Case

PyCon DE & PyData Berlin 2023

Global access to the Internet has enabled the spread of information throughout the world and has offered many new possibilities. On the other hand, alongside the advantages, the exponential and uncontrolled growth of user-generated content on the Internet has also facilitated the spread of toxicity and hate speech. Much work has been done in the direction of offensive speech detection. However, there is another more proactive way to fight toxic speech -- how a suggestion for a user as a detoxified version of the message. In this presentation, we will provide an overview how texts detoxification task can be solved. The proposed approaches can be reused for any text style transfer task for both monolingual and multilingual use-cases.

Firstly, we will shortly introduce the research direction of NLP for Social Good. Then, we will show the main direction of research in text style transfer field. This field suffers from the lack of parallel data. We will describe our approach for such parallel dataset collection and show that it can be applied for any language. Then, we will show how monolingual, multilingual, and cross-lingual models can be trained for texts detoxification. In the end, we will discuss ethical issues connected with this task and tackling of toxic and hate speech in general. The whole presented work is based on the peer-reviewed papers from ACL and EMNLP conferences.

Speakers: Daryna Dementieva