Explorations into Deep Learning Text Architectures for Dense Image Captioning

Szczegóły
Abstrakt

Tytuł:: Explorations into Deep Learning Text Architectures for Dense Image Captioning
Autorzy:: Toshevska, Martina
Stojanovska, Frosina
Zdravevski, Eftim
Lameski, Petre
Gievska, Sonja
Data publikacji:: 2020
Słowa kluczowe:: computer vision
decoding
encoding
feature extraction
artificial intelligence
natural language processing
neural nets
text analysis
wizja komputerowa
dekodowanie
kodowanie
ekstrakcja cech
sztuczna inteligencja
przetwarzanie języka naturalnego
sieci neuronowe
analiza tekstu
Język:: angielski
Dostawca treści:: BazTech
: Artykuł

Przejdź do źródła

Image captioning is the process of generating a textual description that best fits the image scene. It is one of the most important tasks in computer vision and natural language processing and has the potential to improve many applications in robotics, assistive technologies, storytelling, medical imaging and more. This paper aims to analyse different encoder-decoder architectures for dense image caption generation while focusing on the text generation component. Already trained models for image feature generation are utilized with transfer learning. These features are used for describing the regions using three different models for text generation. We propose three deep learning architectures for generating one-sentence captions of Regions of Interest (RoIs). The proposed architectures reflect several ways of integrating features from images and text. The proposed models were evaluated and compared with several metrics for natural language generation.

1. Track 1: Artificial Intelligence

2. Technical Session: 15th International Symposium Advances in Artificial Intelligence and Applications

3. Opracowanie rekordu ze środków MNiSW, umowa Nr 461252 w ramach programu "Społeczna odpowiedzialność nauki" - moduł: Popularyzacja nauki i promocja sportu (2021).

Informacja

Explorations into Deep Learning Text Architectures for Dense Image Captioning