По пунктам:

  1. How do algorithms find toxic comments among millions of others?
  2. How do these algorithms work?
  3. Is it possible to bypass them by avoiding direct threats or inappropriate language?
  4. There are 7,000+ languages in the world. Does that mean each language needs its own algorithm?
  5. Ethics of discourse are changing constantly. Do algorithms take that into account?
  6. How do robots tell apart actual toxicity and simple ribbing?
  7. Isn’t this censorship? After all, this means machines decide what I can and cannot say.
  8. Do robots make the decision to delete toxic content?
  9. Are algorithms universal or does each website or network need a custom one?

How do algorithms find toxic comments among millions of others?

The basic idea is fairly simple: for a long time, developers teach an algorithm to tell regular and toxic comments apart by showing it pre-parsed examples of both. Then, the algorithm begins to scan new comments for toxicity. At this point, the number of text entries doesn’t matter: a well-trained algorithm can process millions of comments in mere seconds.

How do these algorithms work?

They usually consist of two parts: the algorithm itself, which converts text into numbers, and the model, which gauges toxicity. Why is that so? First of all, computers don’t perceive words as we do. Computers work with numbers, so instead of the words “Moscow” or “Russia” they’ll see, say, numbers 37 and 42. Secondly, instead of simply numbering words in the order in which they appear in a dictionary, they are denoted so as to easily identify contextual relationships. For instance, in the equation Moscow – Russia + France, the resulting number will match the word “Paris”. The predictive model itself (which decides whether a comment is toxic or not) processes these numerical representations of text lines and makes a decision. In recent times, for most languages both of these stages are carried out using deep neural networks.

Is it possible to bypass them by avoiding direct threats or inappropriate language?

Yes, it’s quite possible and not even very difficult. For now, algorithms still struggle with understanding implied speech and euphemisms. There are a number of tasks that they can’t handle too well. For instance, they can’t quite discern what is sarcasm and what isn’t, as well as identify objects of pronouns, such as in the example, “There’s a table in the kitchen. It’s very beautiful”, with “it” referencing the “table.” Nevertheless, current research shows that people, for the most part, aren’t too good at these things either. Moreover, it is often impossible to come to a definite conclusion based on the available snippets of text. Still, at this point these systems are developed enough to be able to carry out these tasks better than an average human, even if not with a 100% accuracy.

There are 7,000+ languages in the world. Does that mean each language needs its own algorithm?

Technically, yes, every language needs its own algorithm – if by that we mean a specially-trained machine learning model. Until not too long ago, companies would consider the “mechanics” of a given language, the amount of parsed data, the task at hand, and then pick an existing algorithm that would fit best – or create their own. Since about 2016, the transfer learning technique has been gaining popularity. What that means is specialists teach a massive model to work with a language that already has a lot of data, then use a smaller sample size to teach it the specifics of the target language. It’s quite effective for certain language pairings and tasks, but others still require the classic approach.

Ethics of discourse are changing constantly. Do algorithms take that into account?

All such algorithms are constantly updated and retrained. The practical experience of major companies shows that models need to receive additional training every couple of weeks and be rebuilt completely once every two or three years. Using recent text corpuses, of course. It should be kept in mind that if you task an algorithm that’s been trained on recent data to analyze older texts, there will be more errors. And the older the text, the more errors will occur. To avoid that, algorithms must be trained using data from the same timeframe as the text you’re planning to analyze.

How do robots tell apart actual toxicity and simple ribbing?

Often enough, these and other fine points of language can’t be discerned even by humans. Algorithms are usually trained using parsed text corpuses. They contain texts that have already been tagged by humans to show what is a joke and what isn’t. There are usually specific parameters within these algorithms that make it possible to tune its sensitivity to such things.

Isn’t this censorship? After all, this means machines decide what I can and cannot say.

It’s not really the robot that decides what you can and cannot write, but the people who created it. It’s like asking whether you not getting enough sleep is the fault of your neighbor or their drill. In both cases, the machine is but a tool used by humans to accomplish a task.

Do robots make the decision to delete toxic content?

That depends on the tasks given to the algorithm. Usually, algorithms not only report whether a comment is toxic or not, but also denote their level of confidence in the result. For that reason, comments that the algorithm is confident are toxic are deleted automatically; others are sent over to human moderators for further consideration. The number of texts that are removed automatically (or whether they’re removed at all) is up to each specific company’s policy and existing user agreements.

Are algorithms universal or does each website or network need a custom one?

Generalized algorithms and the way they’re trained are more or less the same for everyone. Each specific algorithm used by a particular company is usually developed, trained, and tuned based on the circumstances and available resources. At the foundation are the latest developments employed by tech giants such as Google and Facebook, which regularly share entirely new algorithms, models, and architectures developed by their research departments. Regular researchers and specialists at small companies can’t contend with the big names, as almost all modern solutions (unless we’re talking about low-resource languages) require massive computing power. But even the most advanced algorithms don’t take local characteristics into account, so there’s always room for improvement when you’re dealing with a specific task.