To Break a Hate-Speech Detection Algorithm, Try ‘Love’

Author: Louise Matsakis / Source: WIRED

For all the advances being made in the field, artificial intelligence still struggles when it comes to identifying hate speech. When he testified before Congress in April, Facebook CEO Mark Zuckerberg said it was “one of the hardest” problems. But, he went on, he was optimistic that “over a five- to 10-year period, we will have AI tools that can get into some of the linguistic nuances of different types of content to be more accurate in flagging things for our systems.

” For that to happen, however, humans will need first to define for ourselves what hate speech means—and that can be hard because it’s constantly evolving and often dependent on context.

“Hate speech can be tricky to detect since it is context and domain dependent. Trolls try to evade or even poison such [machine learning] classifiers,” says Aylin Caliskan, a computer science researcher at George Washington University who studies how to fool artificial intelligence.

In fact, today’s state-of-the-art hate-speech-detecting AIs are susceptible to trivial workarounds, according to a new study to be presented at the ACM Workshop on Artificial Intelligence and Security in October. A team of machine learning researchers from Aalto University in Finland, with help from the University of Padua in Italy, were able to successfully evade seven different hate-speech-classifying algorithms using simple attacks, like inserting typos. The researchers found all of the algorithms were vulnerable, and argue humanity’s trouble defining hate speech contributes to the problem. Their work is part of an ongoing project called Deception Detection via Text Analysis.

If you want to create an algorithm that classifies hate speech, you need to teach it what hate speech is, using data sets of examples that are labeled hateful or not.

That requires a human to decide when something is hate speech. Their labeling is going to be subjective on some level, although researchers can try to mitigate the effect of any single opinion by using groups of people and majority votes. Still, the data sets for hate-speech algorithms are always going to be made up of a series of human judgment calls. That doesn’t mean AI researchers shouldn’t use them, but they have to be upfront about what they really represent.

“In my view, hate-speech data sets are fine as long as we are clear what they are: they reflect the majority view of the people who collected or labeled the data,” says Tommi Gröndahl, a doctoral candidate at Aalto University and the lead author of the paper. “They do not provide us with a definition of hate speech, and they cannot be used to solve disputes concerning whether something ‘really’ constitutes hate speech.”

In this case, the data sets came from Twitter and Wikipedia comments, and were labeled by crowdsourced micro-laborers as hateful or not (one model also had a third label for “offensive speech”). The researchers discovered that the algorithms didn’t work when they swapped their data sets, meaning the machines can’t identify hate speech in new…

Click here to read more

The post To Break a Hate-Speech Detection Algorithm, Try ‘Love’ appeared first on FeedBox.

Ссылка на первоисточник

Понравилась статья? Подпишитесь на канал, чтобы быть в курсе самых интересных материалов

Feedbox

To Break a Hate-Speech Detection Algorithm, Try ‘Love’