"In English, Please": De-Jargonizer Helps Make Science Accessible
Israeli scientists have developed software that can analyze a scientific article’s level of comprehensibility. The algorithm, named De-Jargonizer, separates words into three categories according to their frequency of use and, using a simple formula, determines how accessible the text will be to a wider audience. Having tested their software on articles from PLOS ONE, the scientists learned that the abstracts of some articles can consist of rare scientific terms by up to 27 percent. Developers hope that their program will help adapt texts to be more accessible to a majority of people.
Fans of science often cannot understand the entirety of highly niche words used by researchers. Scientists, in turn, often do not account for this issue when telling people about their research or studies. The academic and the sci-pop styles of writing differ noticeably in their lexicon and sentence structure. The former, mostly seen in scientific journals, books or conference materials, is intended for readers who are already familiar with the subject matter and related vocabulary. Popular science writing, however, uses words that are familiar to most people regardless of their knowledge of the subject, as well as various metaphors and jokes.
“Since scientists are used to applying professional jargon in their work, it is hard for them to avoid these special terms in their speech. Intuitively, they understand that they need to use less scientific words when speaking to regular people and not their colleagues. Yet they still use too many words that are off-putting to those whose interest they are trying to spark. Besides, there is no single standard which scientists could use to adapt their writing to,” – opine researchers from Israel’s Holon and Technion Institutes of Technology.
Researchers often fall victim to the so-called curse of knowledge, they add. This is a form of cognitive bias in which individuals with knowledge of a subject have difficulty explaining it to someone with no experience in the area due to not being able to put themselves in the other person’s place. This might be familiar to scientists who publish articles in peer-reviewed journals and read lectures: the use of specific vocabulary can lead to the subject matter being difficult to understand to readers and listeners – explain the researchers.
To help scientists determine which words will be challenging to the public and should be replaced or expanded upon, they have developed special software that can detect such words in text. An article about the project has been published in the scientific journal PLOS ONE. According to the authors, the program can help scientists relay information more efficiently not only to other specialists in their own field of work, but to those working in other areas, as well as politicians and the public.
De-Jargonizer is a software that processes scientific texts and reports on the percentage of special and rare words. It also displays an index, represented by a pair of glasses, that signifies whether or not the text can be accessible to a wider audience. To use the service, one only needs to upload a text file to its website or paste the text into a special box. Having processed it, the software highlights words from different categories (frequent words, rare or obscure terms) with colors. The algorithm has a user-friendly interface and is freely available on its website.
To determine the frequency of use of each word and assign it to one of the three groups and provide authors with the corresponding percentages, researchers have created a massive (500 thousand unique entries) article database. All words in the database are split into three categories: frequent (the 2000 most commonly used words in the English language and their derivatives), rare (less frequent) and jargonisms (scientific terms). Using these resources, the algorithm determines the text’s accessibility to the public and rates it on a scale from 0 to 100.
The De-Jargonizer. Credit: scienceandpublic.com
Authors have tested De-Jargonizer on 500 articles from PLOS’s various journals. For their research, the scientists used abstracts and summaries that are usually intended for a wider audience. Results show that abstracts of biology articles contain up to 10 percent of rare words, while the summaries have about 8 percent. Earlier research showed that, to understand the text properly, the readers must be familiar with at least 98% of the words, meaning that even the short summaries can often be incomprehensible to laymen.
According to Dmitry Muromtsev, head of ITMO University’s Department of Informatics and Applied Mathematics and the International Laboratory of Information Science and Semantic Technologies, linguistic services like this one are always made through a similar process: developers assemble a large database of documents and analyze it using a set of linguistic criteria – morphology, use, tense and others.
One of the most well-known services that analyze the frequency of use of words and phrases is the Google Ngram search engine. It lets users track the popularity of words or phrases in a massive collection of publications dating back to 16th century and available on Google Books. Since 2016, users can search for terms in American and British English, French, German, Spanish, Italian, Russian, Hebrew and Simplified Chinese. In addition, the program can search through specialized text databases, such as the British National Corpus.
Still, most linguistic services are still mostly made in English and there is a lack of similar programs in other languages – and the ones that exist are of lesser quality, explains Mr. Muromtsev. The reasons are obvious: almost everyone, at some point, works in English, while other languages are mostly used by their native speakers. However, a number of software products and services, like, for instance, grammar checkers and others, perform well enough in Russian and other languages, he adds.
“The basic ideas and algorithms are more or less similar for all such services. They use a set of standardized approaches to text processing. The distinguishing feature is that all these algorithms have to be finely tuned for each specific language. At our laboratory, for example, we, too, are working on such projects. It takes a great amount of experimenting to create them and a wide range of tools is used to tune them. After all, when we talk, we use the rules we’ve been learning practically since birth – at school, at home, etc. The same thing needs to be done to a machine – it needs to be taught from scratch, and taught well,” comments Dmitry Muromtsev. “ Speaking of project by the Israeli scientists, it’s really great that the developers have succeeded in finding a great case that would let them focus on a specific audience – researchers who write scientific articles – and which caters to their specific needs. With these two factors combined, they managed to create something genuinely interesting, even if there isn’t anything particularly innovative in their approach.”
De-Jargonizer’s vocabulary, note the authors, is based on that of news sites, which tend to use words that are understood by the majority of people. The project’s current database contains approximately 90 million words. As of now, the service only supports texts in English, but the developers are planning to update the corpus on a regular basis and include support for other languages.
Publications in scientific journals have long served as a primary way of showcasing the results of research and discoveries. It is the data that is used by science communicators and journalists to inform the public of new inventions. The project’s creators hope that it will be a useful tool for scientists and will help them relay information to readers in a more accessible form. They also note that this software will be useful to science communicators and those who teach SciComm courses.
“Scientists and science communicators can use our service to adapt their messages for the general public. Educators can use it to track students’ progress as they learn to write releases, memos and messages for non-specialists,” – note the authors, citing examples of the program’s use in writing such texts.
Back in 2008, the Three Minute Thesis competition was launched at the University of Queensland in Australia. This annual competition now brings together young researchers from more than 200 of the world’s universities. Participants’ goal is to explain their thesis in three minutes in a way that anyone would understand.
Every year, events involving science communicators and young scientists are held all over the world – lectures, Science Slams, talk shows, etc. Yet science communicators and enthusiasts are agreed: to make science truly accessible to the general public, a lot more work will have to be done.
Twitter flash-mob helping scientists connect with the public. Credit: twitter.com/iamscicomm
Tips and goals
According to Zoe Doubleday, a researcher from the University of Adelaide (Australia), today’s scientists still don’t always pay enough attention to languages. In her column on The Conversation, an independent scientific media outlet, she lists a few tips that can help make one’s article clearer to its readers and turn their attention to the key results of the research.
“Be concise, unique, inspiring. But let’s be clear. We are not advocating sensationalism. Scientists are wary of sensationalism, and for a good reason. Science is about facts and objectivity, not hyperbole to sell a story. However, we maintain that objectivity is not at odds with adding a creative element to our writing, or making it clearer, more accessible and interesting to read,” – writes Ms. Doubleday.
It can often be difficult to create exciting scientific content for regular readers, says Jamie Vernon, chief editor of the American Scientist journal and Sigma Xi’s Executive Director and CEO. For that reason, it is important that editors have a system for working with authors when writing such material.
“All the articles in the journal are written by scientists, but we work with them very closely to create the kind of content the journal needs. First of all, we ask them to describe the most important results of their research in the beginning – most scientific articles tend to have the results in the end. Secondly, our editors help the scientists with the wording and finding replacements for the more complex terms. Thirdly, we ask scientists to write a separate article for American Scientist while working on their scientific article, so as so have the materials published nearly simultaneously. We motivate them with the fact that their work will be on Twitter and Facebook and will result in more citations and responses. Our benefit is that the work on material doesn’t get stretched out over the years,” – he notes.
Thorough work on the terminology is very important when preparing any popular science material, says Dmitry Malkov, head of ITMO University’s Science Communication and Outreach Office. It is often exhausting and takes a lot of time – this is especially the case when writing press releases, the format of which is much more strict than that of articles and columns and tends to forbid any use of metaphors and analogies. Still, even press releases call for modification of source material: scientists who insist on keeping all of the obscure terminology in such releases take a risk, as the material might simply fail to reach its audience, he notes.
“There is no system that fits each case. This is complex work that requires a lot of thinking and sorting through the options while focusing on specific target audience. No application can replace the human mind – at least not yet. However, the importance of the Israeli scientists’ project, for me, is in its usefulness in evaluating the final product. It’s not that simple. Having figured out the vocabulary of an article today, tomorrow you might forget that there was a time when you weren’t familiar with the topic. We try to avoid that, but the issue of this “curse of knowledge” is a real one,” – comments Mr. Malkov – “I believe that the authors of that article in PLOS have done a very great thing, and a system like this can become a sort of a clutch for scientists who have to evaluate their own writing. They also offer science communication professors to use it to track their students’ progress. I think the De-Jargonizer can easily become a part of the teaching process at our Master’s program in Science Communication and at our SciComm course that we offer to ITMO University’s postgraduate students.”