Gene prediction by the noise-assisted MEMD and wavelet transform for identifying the protein coding regions

Szczegóły
Abstrakt

Tytuł:: Gene prediction by the noise-assisted MEMD and wavelet transform for identifying the protein coding regions
Autorzy:: Zheng, Qian
Chen, Tao
Zhou, Wenxiang
Xie, Lei
Su, Hongye
Data publikacji:: 2021
Słowa kluczowe:: noise-assisted multivariate empirical mode decomposition
NA-MEMD
protein coding region
wavelet transform
signal processing
gene prediction
empiryczna dekompozycja sygnału
region kodujący
transformata falkowa
Język:: angielski
Dostawca treści:: BazTech
: Artykuł

Przejdź do źródła

The analysis of protein coding regions of DNA sequences is one of the most fundamental applications in bioinformatics. A number of model-independent approaches have been developed for differentiating between the protein-coding and non-protein-coding regions of DNA. However, these methods are often based on univariate analysis algorithms, which leads to the loss of joint information among four nucleotides of DNA. In this article, we introduce a method on basis of the noise-assisted multivariate empirical mode decomposition (NA-MEMD) and the modified Gabor-wavelet transform (MGWT). The NA-MEMD algorithm, as a multivariate analysis tool, is utilized to reconstruct the numerical analyzed sequence since it enables a matched-scale decomposition across all variables and eliminates the mode mixing. By virtues of NA-MEMD, the MGWT method achieves a stable improvement on the general identification performance. We compare our method with other Digital Signal Processing (DSP) methods on two representative DNA sequences and three benchmark datasets. The results reveal that our method can enhance the spectra of the analyzed sequences, and improve the robustness of MGWT to different DNA sequences, thus obtaining higher identification accuracies of protein coding regions over other applied methods. In addition, another comparative experiment with the model-dependent method (AUGUSTUS) on the recently proposed benchmark dataset G3PO verifies the superiority of model-independent methods (especially NA-MEMD-MGWT) for identifying coding regions of the poor-quality DNA sequences.

Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Informacja

Gene prediction by the noise-assisted MEMD and wavelet transform for identifying the protein coding regions