跳到主要內容

臺灣博碩士論文加值系統

(44.200.169.3) 您好!臺灣時間:2022/12/01 01:18
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

: 
twitterline
研究生:高霆耀
研究生(外文):Ting-yao Kao
論文名稱:基於Web之商家景點擷取與資料庫建置
論文名稱(外文):Points of Interest Extraction from Unstructured Web
指導教授:張嘉惠張嘉惠引用關係
學位類別:碩士
校院名稱:國立中央大學
系所名稱:資訊工程學系
學門:工程學門
學類:電資工程學類
論文種類:學術論文
論文出版年:2015
畢業學年度:103
語文別:英文
論文頁數:44
中文關鍵詞:電子地圖網路爬蟲資訊擷取POI資料庫
外文關鍵詞:electronic mapweb crawlerinformation extractionPOI database
相關次數:
  • 被引用被引用:0
  • 點閱點閱:242
  • 評分評分:
  • 下載下載:25
  • 收藏至我的研究室書目清單書目收藏:0
隨著行動裝置的普及,區域搜尋成為了一項新興的熱門服務。然而區域搜尋要能提供完整的服務,必需要讓使用者能夠準確地搜尋到附近的餐廳、旅館、巴士站、卡拉OK、圖書館、藥局等各式各樣食衣住行育樂等地點(Point of Interest, POI),為此我們要建構一個完整地POI資料庫供使用者查詢。近年來因為網際網路的盛行,使用者開始會在他們的部落格或是社交網路上分享旅遊經驗或是POI的資料,同時也有越來越多的商家或組織有自己的官方網頁,並且在網頁上詳細的介紹他們的資料。隨著這類型的網頁越來越多,整個網際網路也成為了最大的POI資訊來源。
在本篇論文中我們提出一個基於Web資訊的POI建置系統,系統可以分為兩大部分,第一部分為包含地址網頁(Address-bearing Page, ABP)的爬取,目的是在網路中尋找ABP,這些網頁中會包含許多POI以及可用來做為檢索的POI相關描述訊息。第二部分為POI擷取系統,透過條件隨機域(Conditional Random Field, CRF)作為學習演算法產生的中文組織名稱辨識模型及中文地址辨識模型,找出網頁中所有出現的地址和組織名稱,接著再將地址與組織名稱配對成POI資料,最後再為每一個POI擷取其相關資訊。

With the increased popularity of mobile devices, local search has become a new popular service. However, a complete local search service have to provide nearby POIs (Point-of-Interest) like stores, shops, gas stations, parking lots, bus stops, drugstore for users. Therefore, we need a powerful POI database to support that. In recent years, the web has become the largest data source of POIs. With the prevalence of Internet, people will share their travel experience and information of POIs that they had been visited on social network, their blogs, and even check-in post. Besides, many companies and organizations publish their business on their own websites. Those webpages contain a large number of POIs.
In this paper, we propose a POI database construction system based on the immense data of the Web. Our system could be separated into two parts: the query-based crawler, the POI extraction system. The goal of query-based crawler is to collect ABP (address-bearing pages) from the web as address is a good indicator of POIs. The second part is POI extraction system. We use CRF (Conditional Random Field) to train a Chinese postal address recognition model and a Chinese organization recognition model. Then POI extraction system extracts addresses and POI names from ABP with these two CRF models and pairs an address and a POI name as a POI. In the end, POI extraction system will extract POI associated information for each POI to construct a complete POI data.

中文摘要 i
ABSTRACT ii
Table of Contents iv
List of Figures vi
List of Tables vii
I. INTRODUCTION 1
1.1. General Background Information 1
1.2. Chapter Summary 3
II. RELATED WORK 5
2.1. Crawling 6
2.2. Information Extraction 7
2.3. Geographic Information Retrieval & POI Map Search 8
III. POI DATA CONSTRUCTION SYSTEM 10
3.1. Query-based Crawler 10
3.1.1. Query String 11
3.1.2. Improvement of Crawling Efficiency 12
3.2. POI Extraction Module 13
3.2.1. Address and POI Name Recognition 13
3.2.2. Address and POI Name Pairing 15
3.3. POI Associated Information Extraction 19
IV. EXPERIMENT 22
4.1. Efficiency of Query-based Crawler 22
4.1.1. Comparison of address patterns 22
4.1.2. Improvement by Proxy Server 23
4.1.3. Comparison of Different Crawlers 24
4.2. POI Pairing Accuracy 25
4.3. Performance Evaluation of POI Associated Information 28
V. CONCLUSION & FUTURE WORK 30
REFERENCE 31

[1] D. Ahlers, Business entity retrieval and data provision for yellow pages by local search. Integrating IR technologies for professional search, ECIR, 2013.
[2] D. Ahlers and S. Boll, Location-based Web search. The Geospatial Web, 55-66, Springer, 2007.
[3] S. Chakrabarti, M. Van den Berg and B. Dom, Focused crawling: a new approach to topic specific Web resource discovery, WWW, 1999.
[4] C.-C. Chang, C.-J., LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2(27):1-27, 2011.
[5] C.-H. Chang, C.-Y. Huang and Y.-S. Su, Chinese Postal Address and Associated Information Extraction,” The 26th Annual Conference of the Japanese Society for Artificial Intelligence, 2012.
[6] J. Cho and H. Garcia-Molina, The evolution of the Web and implications for an incremental crawler, VLDB ’00 Proceedings of the 26th International Conference on Very Large Data Bases, 200-209, 2000.
[7] H.-M. Chuang, C.-H. Chang, Verification of POI and Location Pairs via Weakly Labeled Data. WWW 2015 Workshop, May 18–22, 2015.
[8] J. Foley, M. Bendersky and V.Josifovski, Learning to extract local events from the Web, SIGIR, Chile, August 9-13, 2015.
[9] Y. He, D. Xin, V. Ganti, S. Rajaraman and N. Shah, Crawling deep Web entity pages, International Conference on Web Search and Data Mining, 2013.
[10] Y.-Y. Huang, C.-L. Chou, C.-H. Chang, Web NER Model Generator Tool based on Google Snippets, submitted for publication, 2015.
[11] C. B. Jones and R. S. Purves, Geographical information retrieval, International Journal of Geographical Information Science, 22(3), 219–228, 2008.
[12] Y.-Y. Lin, C.-H. Chang, 網頁商家名稱擷取與 地址配對之研究 (Store Name Extraction and Name-Address Matching on the Web) [In Chinese]. ROCLING, 2014.
[13] Y. Ling, J. Yang and L. He, Chinese organization name recognition based on multiple features, in Pacific Asia conference on Intelligence and Security Informatics, 136-144, 2012.
[14] M. Najork and J. L. Wiener, Breadth-first crawling yields high-quality pages, Proceedings of the 10th international conference on World Wide Web, 114-118, 2001.
[15] C. Olston and M. Najork, Web crawling. Foundations and trends, information retrieval, 4(3), 175-246, 2010.
[16] M. Sanderson and J. Kohler, Analyzing geographic queries, in Workshop on Geographic Information Retrieval (SIGIR), Sheffield, UK, 2004.
[17] P. Serdyukov, V. Murdock and R. V. Zwol, Placing Flickr photos on a map. SIGIR, MA, USA, 2009.
[18] V. Shkapenyuk and T. Suel, Design and implementation of a high-performance distributed Web crawler, Proceedings of the 18th International Conference on Data Engineering, San Jose, CA, USA, 2002.
[19] A. Popescu and A. Shabou, Towards precise POI localization with social media. ACM Multimedia Conference, Catalunya, Spain, Oct. 21-25, 2013.
[20] S. Zhang and X. Wang, Automatic Recognition of Chinese Organization Name Based on Conditional Random Fields, Natural Language Processing and Knowledge Engineering, Sheffield, 229-233, 2007.

連結至畢業學校之論文網頁點我開啟連結
註: 此連結為研究生畢業學校所提供,不一定有電子全文可供下載,若連結有誤,請點選上方之〝勘誤回報〞功能,我們會盡快修正,謝謝!
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top