Although generative models are trained on human-written texts, they still have some detectable patterns. They use monotonous and overused phrases, predictable sentence structures, and a lot of repetitions. Moreover, they tend to add fake links, make logical mistakes, and produce word-for-word translations.
One promising way to efficiently detect AI-like texts is to use AI algorithms. However, most of them do not work well with the Russian language and can only distinguish human- and AI-created content. The main challenge is to identify AI traces in texts that were initially made by humans but checked for errors and polished by neural networks.
To that end, researchers from ITMO’s Computer Technologies Lab proposed an AI detector that can identify human, AI-generated, and AI-modified texts by their content and writing style. The algorithm proved efficient: it showed 94% accuracy for 5,500 texts in Russian when choosing between man-made and AI texts and 80% – when detecting man-made, AI, or AI-modified texts.
The detector relies on two large language models (LLMs) to see to what extent they label a given text as "surprising" or “unusual." If their opinions differ largely, the text is most likely to be flagged as AI. For it to work correctly in Russian, the researchers incorporated the analysis of linguistic features: word and sentence length, use of different parts of speech, vocabulary diversity, readability, etc. All these criteria are taken into account by the classifier.
The team trained the algorithm on their own. For that, they collected a body of over 4,000 same-topic texts in Russian written by a human (scientific papers, essays, and news articles), modified by AI, or generated by AI (ChatGPT, Gemini, or DeepSeek).
One of the technology’s features is an “obfuscator” that works as a case-by-case editor based on the detector’s database. The algorithm removes unnecessary hyphens, turns lists into paragraphs, and marks or rewrites "suspicious” sentences, keeping the original’s meaning and readability. On the one hand, the tool will be useful to test whether detectors can spot generated content after it was run through the obfuscator. On the other – the module can be used to remove any AI style indicators in texts before publication. At the same time, the developers oppose using the tool to disguise a text’s origin and stress the importance of proper AI labeling.
The service can be implemented in different fields: to check papers in education, to label AI-generated content in media, or to track the use of AI in key reports and correspondence within a company through an automatic text checking. The algorithm’s demo is available to the public on the Hugging Face Spaces; any registered user can upload their text to see its metrics.
“We’re working on a user-friendly interface for our service and a special feature that will let users analyze several texts simultaneously and thus speed up the process. In the fall, we’ll be hiring a team of young researchers to help us develop the project and by spring, we hope to launch our technology in a test mode at ITMO to detect and correct machine-generated text in students’ graduation papers,” says Viacheslav Shalamov, the team’s supervisor and a member of staff at ITMO’s Information Technologies and Programming Faculty.
Viacheslav Shalamov. Photo courtesy of the subject
