研究生(外文):Yun-Hui Chen
論文名稱(外文):Applying Topic Model for Cross-lingual Patent Retrieval and Trend Analysis
指導教授(外文):San-Yih Hwang
外文關鍵詞:Cross-Lingual MappingCross-Lingual Topic ModelPatent AnalysisTopic ModelPatent Retrieval
技術趨勢的檢視與預測協助企業做出在研發相關活動上的決策,而專利是技術能力的代理指標, 提供可靠的訊息來揭露技術資訊與發展。由於專利是屬地保護主義,在一個國家授予專利時,該專利只在該國家有效,在其他國家沒有被保護。對於跨國企業(MNEs),在許多國家申請專利對於保護其全球發明很重要。隨著跨國專利的快速增長,跨語言專利檢索和趨勢檢測對發明者和專利檢索者至關重要。跨語言主題模型使跨國企業能夠預測和比較不同國家的主題趨勢。我們收集來自美國專利商標局(USPTO)和中華民國智慧財產局(TIPO)的專利資料,應用一種將跨語言詞嵌入結合到隱含狄利克雷分佈(LDA)中的方法,該方法為Post-matching LDA(PMLDA)。我們使用模型的產出來檢視跨國企業的跨語言技術主題趨勢並比較使用跨語言主題、跨語言詞嵌入與專利局分類的跨語言專利檢索效果。
Technology trends detection and forecasting help companies making decision on further R&D related activities. Patents are proxy measure of technological capability that provides reliable information that reveal technological information and development. Since patents are territorial rights, when a patent is granted in a country, it is only valid in that country and has no protection in other countries. For multinational enterprises (MNEs), filing patents in many countries is important to protect their invention globally. With the rapid increase in global patents, cross-lingual patent retrieval and trend detection is important for inventors and patent searchers. Cross-lingual topic modeling enables MNEs to forecast and compare topic trends in different countries. We apply a method that incorporate cross-lingual word embedding into Latent Dirichlet Allocation (LDA), called Post-matching LDA (PMLDA), on patent data collected from United States Patent and Trademark Office (USPTO) and Taiwan Intellectual Property Office (TIPO) to forecast cross-lingual topic trends of MNEs using the output of the model. We further compare the performance of cross-lingual patent retrieval based on cross-lingual topic model, cross-lingual embedding, and patent classification.
論文審訂書 i
誌謝 ii
摘要 iii
CHAPTER 1– Introduction 1
CHAPTER 2– Related Work 5
2.1 Patent Retrieval 5
2.2 Topic modeling on patent data 6
2.3 Cross-lingual topic model 9
CHAPTER 3 – Patent Data Description 11
CHAPTER 4 – Methodology 13
4.1 Latent Dirichlet Allocation (LDA) 14
4.2 Post-matching LDA (PMLDA) 17
CHAPTER 5 – Experiments and discussion 21
5.1 Monolingual word representation 21
5.2 Cross-lingual word representation 21
5.3 Topic number setting 21
5.4 Representative words of cross-lingual topics 26
5.5 Heatmap reflecting relationships between cross-lingual topics and CPC 31
5.6 Similar patent retrieval across languages 32
5.7 Discussion of technology trend 34
CHAPTER 5 – Conclusion 38
Reference 39
Appendix 44
