作者Oruganti, Ram Manohar
ProQuest Information and Learning Co
Rochester Institute of Technology. Computer Engineering
書名Image Description using Deep Neural Networks [electronic resource]
出版項2016
說明1 on line resource (96 pages)
附註Source: Masters Abstracts International, Volume: 55-05
Adviser: Raymond W. Ptucha
Thesis (M.S.)--Rochester Institute of Technology, 2016
Includes bibliographical references
Current research in computer vision and machine learning has demonstrated some great abilities at detecting and recognizing objects in natural images. Current state-of-the-art results in object detection, classification and localization in ImageNet Challenges have the validation accuracy for top 5 predictions for classification to be at 3.08% while similar classification experiments run by trained humans report an accuracy of 5.1%. While some people might argue that human accuracy is a function of training time it can be said with great confidence that automated classification models are at least as good as trained humans in classification problems. The ability of these models to analyze and describe complex images, however, is still an active area of research
Image description is a good starting point for imparting artificial intelligence to machines by allowing them to analyze and describe complex visual scenes. This thesis work introduces a generic end-to-end trainable Fusion-based Recurrent Multi-Modal (FRMM) architecture to address multi-modal applications. FRMM allows each input modality to be independent in terms of architecture, parameters and length of input sequences. FRMM image description models seamlessly blend convolutional neural network feature descriptors with sequential language data in a recurrent framework. In addition to introducing FRMMs, this work also analyzes the impact of varying activation functions and vocabulary size. For training and testing Flickr8k, Flickr30K and MSCOCO datasets have been used, demonstrating state-of-the-art description results
Electronic reproduction. Ann Arbor, Mich. : ProQuest, 2017
Mode of access: World Wide Web
School code: 0465
主題Computer engineering
Computer science
Electronic books.
0464
0984
ISBN/ISSN9781339829302
QRCode
相關連結: click for full text (PQDT) (網址狀態查詢中....)
館藏地 索書號 條碼 處理狀態  

Go to Top