It can find data with 91% accuracy and is useful for censoring online publications, the researchers say.
Developers from Shenyang Ligong University and the Chinese Academy of Sciences have created a technology with AI elements based on Google’s BERT algorithm, which can filter “harmful information” on the Internet with high accuracy, writes the South China Morning Post.
It allows you to find censored texts with an accuracy of 91%, the researchers say. At the same time, for example, keyword search gives an accuracy of 70%, and in a human — trained neural network-80%.
Google’s open-source BERT algorithm can’t analyze texts longer than 512 words, so the developers created an algorithm that breaks the long text into segments that are available for analysis, and then collects the text back.
According to the researchers, the solution contains a dictionary with keywords and their forms. The algorithm can also look for hidden subtext between the lines, as users in China use homonyms or add hyphens between characters to avoid censorship.
The Internet in China is tightly controlled, many sites, including Google, Facebook, Twitter are blocked, and some of the content on the available sites is prohibited. At the same time, the Chinese language is too complex to use conventional solutions to search for “prohibited” information, the newspaper notes.
The solution would be useful to “find and filter information from online publications,” said lead researcher Li Shu and her colleagues. Now the Chinese government and companies rely on an” army of censors ” who manually check content on the Internet, but this is too expensive and inefficient, the newspaper writes.