簡易檢索 / 詳目顯示

研究生: 黃英旗
Ying Chi Huage
論文名稱: 以語音呈現模式導讀網頁文件之研究
Research on Web Accessing with Aural Rendering Model
指導教授: 葉耀明
Yeh, Yao-Ming
學位類別: 碩士
Master
系所名稱: 資訊教育研究所
Graduate Institute of Information and Computer Education
論文出版年: 2002
畢業學年度: 90
語文別: 中文
論文頁數: 107
中文關鍵詞: 全球資訊網應用可擴展標籤語言應用語音瀏覽器網路可及性語音呈現模式
英文關鍵詞: WWW Application, XML Application, Voice Browser, Web Accessibility, Aural Rendering Model
論文種類: 學術論文
相關次數: 點閱:185下載:5
分享至:
查詢本校圖書館目錄 查詢臺灣博碩士論文知識加值系統 勘誤回報
  • 自從網際網路發展以來,網際網路顯然已成為一個無遠弗屆的知識庫藏系統,其中又以具有多媒體型態的全球資訊網的發展最受人矚目。然而傳統的瀏覽器軟體只能以視覺型態來呈現網頁資訊,即使搭配現有的商用語音導讀軟體,還是無法以聽覺型態來呈現正確的網頁資訊,甚至還會引發資訊認知的誤導。由於近年來無線通訊、語音辨識、語音合成三者技術的發展,使得人們有機會能夠隨時隨地只需透過行動電話就可以獲得網頁的資訊。因此建立新型態的語音瀏覽服務機制,勢必能幫助人們透過語音通訊服務取得所需的網頁資訊。
    基於上述的原因與動機,本論文提出一套語音呈現模式(Aural Rendering Model,ARM),並且實作出一個語音呈現模式設計家(Aural Rendering Model Designer,AURMOD)的系統以解決上述的問題。ARM的設計理念是將以視覺形式呈現的網頁資訊,自動加入適當的語意資訊,轉換成以聽覺形式呈現的語音文件,並搭配現有技術成熟的語音合成器,將網頁文件內的資訊以語音形式導讀給一般人或視覺障礙者來聽取網頁內的資訊。藉著此系統的便利性,即使是視覺障礙者,也能夠如同一般人,即時且便利地取得全球資訊網的網頁資訊。

    Since the Internet develops affluently, Internet obviously has become boundless and limitless knowledge-base system. The World Wide Web, WWW for short, which provides multimedia information, is the most popular framework. Traditional browser can only provide visual-type presentation for the web information. Even the browser which is integrated with commercial aural software can cause confusion, when user uses its speech synthesizer to read the web content. Recent advances in wireless communication, speech recognition, and speech synthesis technologies have made it possible for people to obtain the Internet information from any place at any time by using only a cellular phone. Hence, building new model architecture for Voice Browser enables people to have access to the Internet information via vocal communicative services.
    On basis of the reasons and the motivation mentioned above, this study proposes an Aural Rendering Model, ARM for short. Furthermore, we implement a software system named Aural Rendering Model Designer, AURMOD for short, to resolve the above-mentioned problems. The purpose of designing ARM is to transform visual-type web pages into aural-type vocal documents, automatically adding necessary semantic meanings to ensure no loss of any relevant information; then, accompanied with the mature Speech Synthesizer, which can read out the information on the web page, people with and without visual disabilities can both “read” the web pages by listening. With the convenience this system provides, people with visual disabilities can access the web pages instantly and efficiently as ordinary people do.
    Since the Internet develops affluently, Internet obviously has become boundless and limitless knowledge-base system. The World Wide Web, WWW for short, which provides multimedia information, is the most popular framework. Traditional browser can only provide visual-type presentation for the web information. Even the browser which is integrated with commercial aural software can cause confusion, when user uses its speech synthesizer to read the web content. Recent advances in wireless communication, speech recognition, and speech synthesis technologies have made it possible for people to obtain the Internet information from any place at any time by using only a cellular phone. Hence, building new model architecture for Voice Browser enables people to have access to the Internet information via vocal communicative services.
    On basis of the reasons and the motivation mentioned above, this study proposes an Aural Rendering Model, ARM for short. Furthermore, we implement a software system named Aural Rendering Model Designer, AURMOD for short, to resolve the above-mentioned problems. The purpose of designing ARM is to transform visual-type web pages into aural-type vocal documents, automatically adding necessary semantic meanings to ensure no loss of any relevant information; then, accompanied with the mature Speech Synthesizer, which can read out the information on the web page, people with and without visual disabilities can both “read” the web pages by listening. With the convenience this system provides, people with visual disabilities can access the web pages instantly and efficiently as ordinary people do.

    附表目錄............................................. iii 附圖目錄................................................v 第一章 緒論.......................................... 1 第一節 研究背景.................................................. 1 第二節 研究動機................................................. 2 第三節 研究目的.................................................. 4 第四節 論文架構.................................................... 4 第二章 相關文獻探討................................. 5 第一節 XML標籤語言和Java語言技術..................... 5 第二節 XHTML標籤語言................................. 7 第三節 語音合成技術和JSML標籤語言….................. 8 第四節 三種語音標籤語言的分析與比較…................11 第五節 無障礙網頁服務…............................ 13 第三章 語音呈現模式設計.............................. 17 第一節 語音呈現模式概念......................... 17 第二節 分析XHTML語言文件結構......................... 21 第三節 建立SIML標籤語言............................. 22 第四節 視覺呈現模式轉換聽覺呈現模式之機制............ 32 第五節 閱讀模式設計.................................. 41 第四章 AURMOD系統架構與軟體環境介紹.................. 45 第一節 AURMOD系統主體模組架構....................... 45 第二節 AURMOD軟體環境介紹........................... 50 第五章 系統評估..................................... 56 第一節 實驗方法設計................................. 56 第二節 實驗結果討論.................................. 58 第六章 結論與未來發展方向............................ 69 第一節 結論.......................................... 69 第二節 未來發展方向................................ 70 參考文獻.......................................... 75 附錄一 SIML語言之文件型態定義........................ 79 附錄二 JSML語言之文件型態定義....................... 83 附錄三 關鍵詞彙模板文件….......................... 84 附錄四 閱讀模式記錄文件……...................... 88 附錄五 實驗問卷樣本與範例網頁…..................... 89 附錄六 實驗數據統計表….............................. 95 附錄七 AURMOD系統安裝程序…....................... 97 附錄八 AURMOD之UML組織圖............................ 105 附表目錄 表2-3-1 JSML語言標籤分類說明表...................... 9 表2-4-1 三種語音標籤語言的標籤對照分類表............ 12 表3-1-1 需要過濾的XHTML文件標籤分類表............ 21 表3-3-1 已過濾XHTML標籤對映SIML標籤的轉換表...........24 表3-3-2 需要重建SIML語言之標籤分類表............... 25 表3-3-3 第一類標籤轉換SIML範例(文件標題、文件主體). 27 表3-3-4 第二類標籤轉換SIML範例(章節)............. 27 表3-3-5 第三類標籤轉換SIML範例(超連結)............ 28 表3-3-6 第四類標籤轉換SIML範例(圖片).............. 29 表3-3-7 第五類標籤轉換SIML範例(列表).............. 30 表3-3-7 第六類標籤轉換SIML範例(表格)............... 32 表3-4-1 語意相近的SIML標籤和JSML標籤之標籤對映分類表..34 表3-4-2 第一類標籤轉換JSML範例(文件標頭、文件主體)..38 表3-4-3 第二類標籤轉換JSML範例(章節)................38 表3-4-4 第三類標籤轉換JSML範例(超連結)............ 39 表3-4-5 第四類標籤轉換JSML範例(圖片)............... 39 表3-4-6 第五類標籤轉換JSML範例(列表)............... 40 表3-4-7 第六類標籤轉換JSML範例(表格)............... 40 表3-5-1 閱讀模式分類表............................... 43 表3-5-2 數字閱讀模式屬性設定表.......................44 表5-2-1 新的超連結轉換SIML範例...................... 60 表5-2-2 新的超連結轉換JSML範例....................... 61 附圖目錄 圖2-3-1 JSML文件的語音合成程序................. 10 圖2-5-1 網頁文件的三種語音呈現策略................. 15 圖3-1-1 主從式語音通訊服務環境部署圖................ 18 圖3-1-2 八種XHTML文件結構示意圖.......................19 圖3-1-3 閱讀模式概念關聯圖.......................... 19 圖3-1-4 範例表格圖................................... 20 圖3-1-5 AURMOD運作概念圖............................. 20 圖3-4-1 SIML文件結構標籤轉換為JSML文件之流程圖........35 圖3-5-1 閱讀模式環境部署圖.......................... 42 圖4-1-1 AURMOD系統主體架構概念圖..................... 45 圖4-1-2 資訊流程圖圖例與涵義.................... 46 圖4-1-3 SIML文件生產者內部資訊流程圖............... 47 圖4-1-4 JSML文件生產者內部資訊流程圖................. 48 圖4-1-5 文字合成語音執行者內部資訊流程圖............. 49 圖4-1-6 AURMOD系統主體架構流程圖................... 49 圖4-2-1 AURMOD系統主畫面............................ 50 圖4-2-2 XHTML文件瀏覽畫面........................... 50 圖4-2-3 「XHTML文件原始檔」執行畫面................... 51 圖4-2-4 呼叫外部文件瀏覽器的對話視窗................. 51 圖4-2-5 SIML文件瀏覽畫面............................ 51 圖4-2-6 「SIML文件修改」執行畫面..................... 52 圖4-2-7 「找尋節點」執行畫面......................... 52 圖4-2-8 「修改數字閱讀模式」執行畫面................. 52 圖4-2-9 閱讀模式選擇器執行畫面................ 53 圖4-2-10 JSML文件瀏覽畫面.......................... 53 圖4-2-11 「添加聲韻節點」執行主畫面................. 54 圖4-2-12 「添加聲韻節點」執行進階畫面.............. 54 圖4-2-13 關鍵詞彙編輯器執行畫面...................... 55 圖5-2-1 受測人數分布統計圖............................58 圖5-2-2 範例一「網頁標題」答題統計圖................. 59 圖5-2-3 範例一「圖片主題」答題統計圖................. 59 圖5-2-4 範例一「圖片位址」答題統計圖................. 59 圖5-2-5 範例一「超連結主題」答題統計圖............... 59 圖5-2-6 範例一「超連結網址」答題統計圖............... 59 圖5-2-7 範例一「文件標題」結構滿意度統計圖............61 圖5-2-8 範例一「圖片」結構滿意度統計圖............... 61 圖5-2-9 範例一「超連結」結構滿意度統計圖.......... 61 圖5-2-10 範例二「網頁標題」答題統計圖................ 62 圖5-2-11 範例二「表格主題」答題統計圖................. 62 圖5-2-12 範例二「表格北部氣溫」答題統計圖.......... 62 圖5-2-13 範例二「表格中部氣候」答題統計圖............. 62 圖5-2-14 範例二「表格南部降雨率」答題統計圖............62 圖5-2-15 範例二「表格東南部紫外線」答題統計圖......... 62 圖5-2-16 範例二「表格」結構滿意度統計圖.............. 63 圖5-2-17 分割前的大型表格架構圖..................... 64 圖5-2-18 以Region軸心分割的小型表格一架構圖........... 64 圖5-2-19 以Region軸心分割的小型表格二架構圖............64 圖5-2-20 以表格標題列軸心分割的小型表格一之一架構圖....65 圖5-2-21 以表格標題列軸心分割的小型表格一之二架構圖... 65 圖5-2-22 以表格標題列軸心分割的小型表格二之一架構圖....65 圖5-2-23 以表格標題列軸心分割的小型表格二之二架構圖....65 圖5-2-24 範例三「網頁標題」答題統計圖................. 65 圖5-2-25 範例三「章節主題」答題統計圖................. 65 圖5-2-26 範例三「章節內容」答題統計圖................. 66 圖5-2-27 範例三「列表單字項目」答題統計圖............. 66 圖5-2-28 範例三「章節」結構滿意度統計圖............... 66 圖5-2-29 範例三「列表」結構滿意度統計圖............... 66 圖5-2-30 關鍵詞彙實用性滿意度統計圖.................. 70 附錄圖5-1 範例一網頁的瀏覽畫面...................... 92 附錄圖5-2 範例二網頁的瀏覽畫面................... 93 附錄圖5-3 範例三網頁的瀏覽畫面..................... 94 附錄圖7-1 Sun Java2 SDK安裝畫面一....................97 附錄圖7-2 Sun Java2 SDK安裝畫面二................... 97 附錄圖7-3 IBM Speech for Java2 v1.0安裝畫面........ 98 附錄圖7-4 IBM ViaVoice Identified Runtime v1.0安裝畫面一.................................................... 98 附錄圖7-5 IBM ViaVoice Identified Runtime v1.0安裝畫面二.................................................... 98 附錄圖7-6 IBM ViaVoice Dictation Runtime v8.0安裝畫面一.................................................... 99 附錄圖7-7 IBM ViaVoice Dictation Runtime v8.0安裝畫面二................................................... 99 附錄圖7-8 IBM ViaVoice Command & Control Runtime v7.0安裝畫面 一….............................................100 附錄圖7-9 IBM ViaVoice Command & Control Runtime v7.0安裝畫面二…..............................................100 附錄圖7-10 IBM ViaVoice TTS Runtime v6.21安裝畫面一.. 101 附錄圖7-11 IBM ViaVoice TTS Runtime v6.21安裝畫面二...101 附錄圖7-12 IBM Speech Engines註冊畫面............... 102 附錄圖7-13 .......................................... 102 附錄圖7-14 系統環境變數修改畫面二................... 103 附錄圖7-15 AURMOD安裝畫面.................... 103 附錄圖7-16 AURMOD捷徑修改畫面....................... 104 附錄圖7-17 AURMOD執行畫面........................... 104 附錄圖8-1 AURMOD套件圖............................. 105 附錄圖8-2 XHTML Filter類別圖....................... 105 附錄圖8-3 XHTML To SIML類別圖...................... 106 附錄圖8-4 SIML To JSML類別圖...................... 106 附錄圖8-5 JSML Markup Editor類別圖.................. 107 附錄圖8-6 Text To Speech類別圖...................... 107

    【1】陳連壎,以全球資訊網為基礎的個別化隨取書籍模式與設計,國立台灣師範大學資訊教育系碩士論文,中華民國八十八年六月。
    【2】李進寶、周二銘、王華沛(民86):電腦相關輔具分析調查研究報告,台北:內政部委託資訊工業策進會調查報告。
    【3】Agarwal, R., Y. Muthusamy, and V. Viswanathan, “Voice Browsing the Web for Information Access”, WWW Consortium. http://www.w3.org/Voice/1998/ Workshop/RajeevAgarwal.html
    【4】AT&T Corporation, The AT&T Labs Natural Voices. http://www.research.att. com/projects/tts/
    【5】Bobby, http://www.cast.org/bobby/
    【6】Bray, T., J. Paoli, and C.M.Sperberg-McQueen, “Extensible Markup Language 1.0”, W3C Recommendation. WWW Consortium, Oct. 2000. http:// www.w3.org/TR/REC-xml
    【7】Chisholm, W., G. Vanderheiden, and I. Jacobs, “Web Content Accessibility Guidelines 1.0”, W3C Recommendation. WWW Consortium, May 1999. http:// www.w3.org/ TR/WAI-WEBCONTENT
    【8】Clark, J. “Comparison of SGML and XML”, W3C Note. WWW Consortium, Dec. 1997. http://www.w3.org/TR/NOTE-sgml-xml
    【9】Danielsen, P. J. “The Promise of a Voice-Enabled Web”. IEEE Vol. 33 pp. 104-106. Aug. 2000.
    【10】Hemphill, C.T., P.R. Thrift, and J.C. Linn, “Speech-Aware Multimedia”, IEEE Multimedia, Vol 3, no. 1, Spring 1996.
    【11】Hunt, A. “JSpeech Markup Language”, W3C Note. WWW Consortium, June 2001. http://www.w3.org/TR/jsml
    【12】IBM Corporation, ViaVoice System, http://www-4.ibm.com/software/speech/
    【13】IBM Corporation, XML Parser for Java, http://www.alphaworks.ibm.com/ tech/xml4j/
    【14】 James, F. “AHA: Audio HTML Access”, The Six International World Wide Web Conference. Ed, by Michael R. Genesereth and Anna Patterson, Santa Clara, CA, 7-11 April 1997. IW3C2, pp. 129-139.
    【15】James, F. “Presenting HTML Structure in Audio: User Satisfaction with Audio Hypertext”, ICAD 96 Proceedings, Xerox PARC, 4-6 Nov. 1996, pp. 97-103.
    【16】James, F. “Lessons from Developing Audio HTML Interfaces”, ASSETS 98, April 1998, pp. 15-17.
    【17】Kondo, K. and C. Hemphill, “Surfin' the World Wide Web with Japanese”, Acoustics, Speech, and Signal Processing, 1997. ICASSP-97, 1997 IEEE International Conference, pp.1151-1154 vol. 2. April 1997.
    【18】Ouahid, H. and A. Kormouch, “Converting Web Pages into Well-formed XML Documents”, IEEE 1999, pp.676-680.
    【19】Raggett, D., A.L. Hors, and I. Jacobs, “HTML 4.01 Specification”, W3C Recommendation. WWW Consortium, Dec. 1999. http://www.w3.org/TR/ html401/
    【20】Raggett, D. and O. Ben-Natan, “Voice Browsers”, W3C Note. WWW Con- sortium, Jan. 1998. http://www.w3.org/TR/NOTE-voice
    【21】Rollins, S. and N. Sundaresan, “AVoN calling: AXL for voice-enabled Web navigation”, Elsevier Science, Computer Networks, Vol: 33, Issue: 1-6, pp. 533-551, June 2000.
    【22】Sun Microsystems, “Java Speech Markup Language Specification”, Beta Version 0.6. Oct. 1999. http://java.sun.com/products/java-media/speech/ forDevelopers/JSML/index.html
    【23】Sun Microsystems, “Java Speech API Programmer's Guide”, Version 1.0. Oct. 1998. http://java.sun.com/products/java-media/speech/forDevelopers/JSML/ index.html
    【24】 The Centre for Speech Technology Research, Festiva Speech Synthesis System, http://www.cstr.ed.ac.uk/projects/ festival/
    【25】Unicode Consortium, “Unicode Character Set”. http://www.unicode.org/
    【26】W3C DOM Working Group, “Document Object Model”, W3C Recommen- dation. WWW Consortium. http://www.w3.org/ DOM/
    【27】 W3C HTML working group, “XHTML 1.0: The Extensible Hyper Text Markup Language”, W3C Recommendation. WWW Consortium, Jan. 2000. http:// www.w3.org/TR/xhtml1/
    【28】W3C Voice Browser Working Group, “Voice Extensible Markup Language”, Version 2.0, W3C Working Draft. WWW Consortium, April 2002. http:// www.w3.org/TR/voicexml20/
    【29】Walker, M.R. and A. Hunt, “Speech Synthesis Markup Language Specification for the Speech Interface Framework”, W3C Working Draft. WWW Consortium, Jan. 2001. http://www.w3.org/TR /speech-synthesis
    【30】Wang, H., Y. Chou, and B. Chen, “Surfing The Chinese Web Pages By Unconstrained Mandarin Speech”, Consumer Electronics, 1998. ICCE. International Conference, pp.84-85, June 2-4, 1998.
    【31】Waters, C. “Universal Web Design”, New Riders Co., 1997.
    【32】WWW Consortium, “Voice Browser Activity Voice enabling the Web”. http:// www.w3.org/Voice/
    【33】WWW Consortium, “Web Accessibility Initiative”. http://www.w3.org/WAI/

    QR CODE