Text comparison using data compression

Szczegóły
Abstrakt

Tytuł:: Text comparison using data compression
Autorzy:: Platos, J.
Prilepok, M.
Snasel, V.
Data publikacji:: 2013
Słowa kluczowe:: Kolmogorov complexity
text compression
comparison algorithms
kompresja danych
przetwarzanie numeryczne
aproksymacja
Język:: angielski
Dostawca treści:: BazTech
: Artykuł

Przejdź do źródła

Similarity detection is very important in the field of spam detection, plagiarism detection or topic detection. The main algorithm for comparison of text document is based on the Kolmogorov Complexity, which is one of the perfect measures for computation of the similarity of two strings in defined alphabet. Unfortunately, this measure is incomputable and we must define several approximations which are not metric at all, but in some circumstances are close to this behaviour and may be used in practice.

W artykule omówiono metody rozpoznawania podobieństwa tekstu. Głównie używanym algorytmem jest Kolmogotov Complexity. Głównym ograniczeniem jest brak możliwości dane algorytmu są trudne do dalszego przetwarzania numerycznego – zaproponowano szereg aproksymacji.

Informacja

Text comparison using data compression