書單推薦 新書推薦 |
視覺問答 : 理論與實踐 讀者對象:本書適用于計算機相關(guān)專業(yè)人員
本書共5部分,內(nèi)容包括:基礎(chǔ)理論、圖像視覺問答、視頻視覺問答、視覺問答高級任務、總結(jié)與展望。
目錄
第1 章簡介..................................................................1 1.1 視覺問答的動機........................................................1 1.2 人工智能任務中的視覺問答...........................................4 1.3 視覺問答類別..........................................................5 1.3.1 數(shù)據(jù)分類驅(qū)動......................................................6 1.3.2 任務分類驅(qū)動......................................................7 1.3.3 其他..............................................................7 參考文獻.....................................................................8 第1 部分基礎(chǔ)理論 第2 章深度學習基礎(chǔ)......................................................15 2.1 神經(jīng)網(wǎng)絡...............................................................15 2.2 卷積神經(jīng)網(wǎng)絡..........................................................17 2.3 循環(huán)神經(jīng)網(wǎng)絡及變體...................................................18 2.4 編碼器-解碼器結(jié)構(gòu)....................................................20 2.5 注意力機制.............................................................20 2.6 記憶網(wǎng)絡...............................................................21 2.7 Transformer 網(wǎng)絡和BERT............................................22 2.8 圖神經(jīng)網(wǎng)絡.............................................................24 參考文獻.....................................................................25 第3 章問答基礎(chǔ)知識......................................................27 3.1 基于規(guī)則的方法........................................................27 3.2 基于信息檢索的方法...................................................28 3.3 問答的神經(jīng)語義解析...................................................29 3.4 問答知識庫.............................................................29 參考文獻.....................................................................30 第2 部分圖像視覺問答 第4 章經(jīng)典視覺問答......................................................35 4.1 簡介....................................................................35 4.2 數(shù)據(jù)集..................................................................36 4.3 生成與分類:兩種回答策略...........................................40 4.4 聯(lián)合嵌入...............................................................40 4.4.1 序列到序列編碼器-解碼器模型......................................40 4.4.2 雙線性編碼模型....................................................43 4.5 注意力機制.............................................................45 4.5.1 堆疊注意力網(wǎng)絡....................................................45 4.5.2 分層問題-圖像協(xié)同注意力..........................................47 4.5.3 自底向上和自頂向下的注意力.......................................49 4.6 記憶網(wǎng)絡...............................................................51 4.6.1 改進的動態(tài)記憶網(wǎng)絡...............................................51 4.6.2 記憶增強網(wǎng)絡......................................................52 4.7 組合推理...............................................................54 4.7.1 神經(jīng)模塊網(wǎng)絡......................................................55 4.7.2 動態(tài)神經(jīng)模塊網(wǎng)絡..................................................56 4.8 圖神經(jīng)網(wǎng)絡.............................................................58 4.8.1 圖卷積網(wǎng)絡........................................................58 4.8.2 圖注意力網(wǎng)絡......................................................60 4.8.3 視覺問答圖卷積網(wǎng)絡...............................................62 4.8.4 視覺問答圖注意力網(wǎng)絡.............................................63 參考文獻.....................................................................66 第5 章基于知識的視覺問答..............................................71 5.1 簡介....................................................................71 5.2 數(shù)據(jù)集..................................................................72 5.3 知識庫..................................................................74 5.3.1 數(shù)據(jù)庫百科........................................................74 5.3.2 ConceptNet........................................................74 5.4 知識嵌入...............................................................75 5.4.1 文字對矢量表示法..................................................75 5.4.2 基于BERT 的表征.................................................78 5.5 問題-查詢轉(zhuǎn)換.........................................................79 5.5.1 基于查詢映射的方法...............................................79 5.5.2 基于學習的方法....................................................81 5.6 查詢知識庫的方法.....................................................82 5.6.1 RDF ..............................................................82 5.6.2 記憶網(wǎng)查詢........................................................83 參考文獻.....................................................................84 第6 章視覺問答的視覺和語言預訓練..................................88 6.1 簡介....................................................................88 6.2 常規(guī)預訓練模型........................................................89 6.2.1 ELMo .............................................................89 6.2.2 GPT ..............................................................89 6.2.3 MLM .............................................................90 6.3 視覺和語言預訓練的常用方法.........................................93 6.3.1 單流方法..........................................................94 6.3.2 雙流方法..........................................................96 6.4 視覺問答及其下游任務微調(diào)...........................................98 參考文獻.....................................................................101 第3 部分視頻視覺問答 第7 章視頻表征學習.....................................................·105 7.1 人工標注的局部視頻描述符...........................................105 7.2 數(shù)據(jù)驅(qū)動的深度學習的視頻特征表示.................................107 7.3 視頻表征的自監(jiān)督學習................................................109 參考文獻.....................................................................110 第8 章視頻問答...........................................................·112 8.1 簡介....................................................................112 8.2 數(shù)據(jù)集..................................................................112 8.2.1 多步推理數(shù)據(jù)集....................................................113 8.2.2 單步推理數(shù)據(jù)集....................................................116 8.3 使用編碼器-解碼器結(jié)構(gòu)的傳統(tǒng)視頻時空推理.........................118 參考文獻.....................................................................123 第9 章視頻問答的高級模型.............................................·126 9.1 時空特征注意力........................................................126 9.2 記憶網(wǎng)絡...............................................................129 9.3 時空圖神經(jīng)網(wǎng)絡........................................................130 參考文獻.....................................................................132 第4 部分視覺問答高級任務 第10 章具身視覺問答...................................................·137 10.1 簡介...................................................................137 10.2 模擬器、數(shù)據(jù)集和評估標準..........................................138 10.2.1 模擬器...........................................................138 10.2.2 數(shù)據(jù)集...........................................................140 10.2.3 評估指標.........................................................141 10.3 語言引導的視覺導航.................................................142 10.3.1 視覺和語言導航...................................................142 10.3.2 遠程對象定位.....................................................147 10.4 具身問答..............................................................148 10.5 交互式問答............................................................150 參考文獻.....................................................................151 第11 章醫(yī)學視覺問答...................................................·153 11.1 簡介...................................................................153 11.2 數(shù)據(jù)集.................................................................154 11.3 醫(yī)學視覺問答的經(jīng)典方法............................................156 11.4 醫(yī)學視覺問答的元學習方法..........................................159 11.5 基于BERT 的醫(yī)學視覺問答方法....................................160 參考文獻.....................................................................162 第12 章基于文本的視覺問答...........................................·165 12.1 簡介...................................................................165 12.2 數(shù)據(jù)集.................................................................166 12.2.1 TextVQA.........................................................166 12.2.2 ST-VQA .........................................................167 12.2.3 OCR-VQA .......................................................168 12.3 OCR 標記表示........................................................168 12.4 簡單融合模型.........................................................169 12.5 基于Transformer 的模型............................................170 12.6 圖模型.................................................................172 參考文獻.....................................................................173 第13 章視覺問題生成...................................................·176 13.1 簡介...................................................................176 13.2 數(shù)據(jù)融合中的視覺問題生成..........................................176 13.2.1 從答案生成問題...................................................177 13.2.2 從圖像生成問題...................................................178 13.2.3 對抗學習.........................................................179 13.3 作為視覺理解問題的視覺問題生成..................................180 參考文獻.....................................................................182 第14 章視覺對話.........................................................·185 14.1 簡介...................................................................185 14.2 數(shù)據(jù)集.................................................................186 14.3 注意力機制............................................................187 14.3.1 具有注意力的分層循環(huán)編碼器和記憶網(wǎng)絡...........................187 14.3.2 歷史條件圖像注意力編碼器........................................188 14.3.3 序列協(xié)同注意力生成模型..........................................190 14.3.4 協(xié)同網(wǎng)絡.........................................................192 14.4 視覺指代表達理解....................................................194 14.5 基于圖的方法.........................................................195 14.5.1 視覺表示的場景圖................................................196 14.5.2 用于視覺和對話表示的圖卷積網(wǎng)絡.................................197 14.6 預訓練模型............................................................199 14.6.1 VD-BERT ........................................................200 14.6.2 Visual-Dialog BERT ..............................................201 參考文獻.....................................................................202 第15 章指代表達理解...................................................·204 15.1 簡介...................................................................204 15.2 數(shù)據(jù)集.................................................................205 15.3 二階段模型............................................................206 15.3.1 聯(lián)合嵌入.........................................................206 15.3.2 協(xié)同注意力模型...................................................208 15.3.3 圖模型...........................................................209 15.4 一階段模型............................................................211 15.5 推理過程理解.........................................................212 參考文獻.....................................................................213 第5 部分總結(jié)與展望 第16 章總結(jié)與展望......................................................·219 16.1 總結(jié)...................................................................219 16.2 展望...................................................................219 16.2.1 視覺問答的可解釋性..............................................219 16.2.2 消除偏見.........................................................220 16.2.3 附加設(shè)置及應用...................................................221 參考文獻.....................................................................221
你還可能感興趣
我要評論
|






