( 您好!臺灣時間:2024/06/14 08:01
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::


研究生(外文):Yun-Hui Chen
論文名稱(外文):Applying Topic Model for Cross-lingual Patent Retrieval and Trend Analysis
指導教授(外文):San-Yih Hwang
外文關鍵詞:Cross-Lingual MappingCross-Lingual Topic ModelPatent AnalysisTopic ModelPatent Retrieval
  • 被引用被引用:2
  • 點閱點閱:254
  • 評分評分:
  • 下載下載:29
  • 收藏至我的研究室書目清單書目收藏:0
技術趨勢的檢視與預測協助企業做出在研發相關活動上的決策,而專利是技術能力的代理指標, 提供可靠的訊息來揭露技術資訊與發展。由於專利是屬地保護主義,在一個國家授予專利時,該專利只在該國家有效,在其他國家沒有被保護。對於跨國企業(MNEs),在許多國家申請專利對於保護其全球發明很重要。隨著跨國專利的快速增長,跨語言專利檢索和趨勢檢測對發明者和專利檢索者至關重要。跨語言主題模型使跨國企業能夠預測和比較不同國家的主題趨勢。我們收集來自美國專利商標局(USPTO)和中華民國智慧財產局(TIPO)的專利資料,應用一種將跨語言詞嵌入結合到隱含狄利克雷分佈(LDA)中的方法,該方法為Post-matching LDA(PMLDA)。我們使用模型的產出來檢視跨國企業的跨語言技術主題趨勢並比較使用跨語言主題、跨語言詞嵌入與專利局分類的跨語言專利檢索效果。
Technology trends detection and forecasting help companies making decision on further R&D related activities. Patents are proxy measure of technological capability that provides reliable information that reveal technological information and development. Since patents are territorial rights, when a patent is granted in a country, it is only valid in that country and has no protection in other countries. For multinational enterprises (MNEs), filing patents in many countries is important to protect their invention globally. With the rapid increase in global patents, cross-lingual patent retrieval and trend detection is important for inventors and patent searchers. Cross-lingual topic modeling enables MNEs to forecast and compare topic trends in different countries. We apply a method that incorporate cross-lingual word embedding into Latent Dirichlet Allocation (LDA), called Post-matching LDA (PMLDA), on patent data collected from United States Patent and Trademark Office (USPTO) and Taiwan Intellectual Property Office (TIPO) to forecast cross-lingual topic trends of MNEs using the output of the model. We further compare the performance of cross-lingual patent retrieval based on cross-lingual topic model, cross-lingual embedding, and patent classification.
論文審訂書 i
誌謝 ii
摘要 iii
CHAPTER 1– Introduction 1
CHAPTER 2– Related Work 5
2.1 Patent Retrieval 5
2.2 Topic modeling on patent data 6
2.3 Cross-lingual topic model 9
CHAPTER 3 – Patent Data Description 11
CHAPTER 4 – Methodology 13
4.1 Latent Dirichlet Allocation (LDA) 14
4.2 Post-matching LDA (PMLDA) 17
CHAPTER 5 – Experiments and discussion 21
5.1 Monolingual word representation 21
5.2 Cross-lingual word representation 21
5.3 Topic number setting 21
5.4 Representative words of cross-lingual topics 26
5.5 Heatmap reflecting relationships between cross-lingual topics and CPC 31
5.6 Similar patent retrieval across languages 32
5.7 Discussion of technology trend 34
CHAPTER 5 – Conclusion 38
Reference 39
Appendix 44
Aharonson, Barak S., and Melissa A. Schilling. 2016. “Mapping the Technological
Landscape: Measuring Technology Distance, Technological Footprints, and Technology Evolution.” Research Policy 45 (1): 81–96. https://doi.org/10.1016/j.respol.2015.08.001.
Blei, David M., Andrew Y. Ng, and Michael I. Jordan. 2003. “Latent Dirichlet Allocation.” J. Mach. Learn. Res. 3 (March): 993–1022.
Bouma, Gerlof. 2009. “Normalized (Pointwise) Mutual Information in Collocation Extraction.” Proceedings of the Biennial GSCL Conference 2009, January.
Chang, Chia-Hsuan, San-Yih Hwang, and Tou-Hsiang Xui. 2018. “Incorporating Word Embedding into Cross-Lingual Topic Modeling.” In 2018 IEEE International Congress on Big Data (BigData Congress), 17–24. https://doi.org/10.1109/BigDataCongress.2018.00010.
Chang, Jonathan, Sean Gerrish, Chong Wang, Jordan L. Boyd-graber, and David M. Blei. 2009. “Reading Tea Leaves: How Humans Interpret Topic Models.” In Advances in Neural Information Processing Systems 22, edited by Y. Bengio, D. Schuurmans, J. D. Lafferty, C. K. I. Williams, and A. Culotta, 288–296. Curran Associates, Inc. http://papers.nips.cc/paper/3700-reading-tea-leaves-how-humans-interpret-topic-models.pdf.
Chen, Hongshu, Guangquan Zhang, Donghua Zhu, and Jie Lu. 2017. “Topic-Based Technological Forecasting Based on Patent Data: A Case Study of Australian Patents from 2000 to 2014.” Technological Forecasting and Social Change 119 (June): 39–52. https://doi.org/10.1016/j.techfore.2017.03.009.
Cho, Han Pil, Hyunsu Lim, Dongmin Lee, Hunhee Cho, and Kyung-In Kang. 2018. “Patent Analysis for Forecasting Promising Technology in High-Rise Building Construction.” Technological Forecasting and Social Change 128 (March): 144–53. https://doi.org/10.1016/j.techfore.2017.11.012.
Gur, Furkan Amil, and Thomas Greckhamer. 2019. “Know Thy Enemy: A Review and Agenda for Research on Competitor Identification.” Journal of Management 45 (5): 2072–2100. https://doi.org/10.1177/0149206317744250.
Hao, Shudong, Jordan Boyd-Graber, and Michael J. Paul. 2018. “Lessons from the Bible on Modern Topics: Low-Resource Multilingual Topic Model Evaluation.” In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), 1090–1100. New Orleans, Louisiana: Association for Computational Linguistics. https://doi.org/10.18653/v1/N18-1099.
Jagarlamudi, Jagadeesh, and Hal Daumé. 2010. “Extracting Multilingual Topics from Unaligned Comparable Corpora.” In Proceedings of the 32Nd European Conference on Advances in Information Retrieval, 444–456. ECIR’2010. Berlin, Heidelberg: Springer-Verlag. https://doi.org/10.1007/978-3-642-12275-0_39.
Kuhn, Jeffrey, Kenneth Younge, and Alan Marco. 2020. “Patent Citations Reexamined.” The RAND Journal of Economics 51 (1): 109–32. https://doi.org/10.1111/1756-2171.12307.
Lau, Jey Han, David Newman, and Timothy Baldwin. 2014. “Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality.” In Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, 530–539. Gothenburg, Sweden: Association for Computational Linguistics. https://doi.org/10.3115/v1/E14-1056.
Lee, Mingook, and Sungjoo Lee. 2017. “Identifying New Business Opportunities from Competitor Intelligence: An Integrated Use of Patent and Trademark Databases.” Technological Forecasting and Social Change 119 (June): 170–83. https://doi.org/10.1016/j.techfore.2017.03.026.
Li, Xin, Qianqian Xie, Tugrul Daim, and Lucheng Huang. 2019. “Forecasting Technology Trends Using Text Mining of the Gaps between Science and Technology: The Case of Perovskite Solar Cell Technology.” Technological Forecasting and Social Change 146 (September): 432–49. https://doi.org/10.1016/j.techfore.2019.01.012.
Magdy, Walid, and Gareth J.F. Jones. 2011. “A Study on Query Expansion Methods for Patent Retrieval.” In Proceedings of the 4th Workshop on Patent Information Retrieval, 19–24. PaIR ’11. Glasgow, Scotland, UK: Association for Computing Machinery. https://doi.org/10.1145/2064975.2064982.
Melero, Eduardo, Neus Palomeras, and David Wehrheim. 2020. “The Effect of Patent Protection on Inventor Mobility.” Management Science, May. https://doi.org/10.1287/mnsc.2019.3500.
Mikolov, Tomas, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. “Efficient Estimation of Word Representations in Vector Space.” ArXiv:1301.3781 [Cs], September. http://arxiv.org/abs/1301.3781.
Mikolov, Tomas, Quoc V. Le, and Ilya Sutskever. 2013. “Exploiting Similarities among Languages for Machine Translation.” ArXiv:1309.4168 [Cs], September. http://arxiv.org/abs/1309.4168.
Mikolov, Tomas, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. 2013. “Distributed Representations of Words and Phrases and Their Compositionality.” In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 3111–3119. Curran Associates, Inc. http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf.
Mimno, David, Hanna M. Wallach, Jason Naradowsky, David A. Smith, and Andrew McCallum. 2009. “Polylingual Topic Models.” In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, 880–889. Singapore: Association for Computational Linguistics. https://www.aclweb.org/anthology/D09-1092.
Ni, Xiaochuan, Jian-Tao Sun, Jian Hu, and Zheng Chen. 2009. “Mining Multilingual Topics from Wikipedia.” In Proceedings of the 18th International Conference on World Wide Web, 1155–1156. WWW ’09. New York, NY, USA: ACM. https://doi.org/10.1145/1526709.1526904.
Sievert, Carson, and Kenneth Shirley. 2014. “LDAvis: A Method for Visualizing and Interpreting Topics.” In Proceedings of the Workshop on Interactive Language Learning, Visualization, and Interfaces, 63–70. Baltimore, Maryland, USA: Association for Computational Linguistics. https://doi.org/10.3115/v1/W14-3110.
Song, Kisik, Karp Soo Kim, and Sungjoo Lee. 2017. “Discovering New Technology Opportunities Based on Patents: Text-Mining and F-Term Analysis.” Technovation 60–61 (February): 1–14. https://doi.org/10.1016/j.technovation.2017.03.001.
Tang, Jie, Weichang Li, Adam K. Usadi, Bo Wang, Yang Yang, Po Hu, Yanting Zhao, et al. 2012. “PatentMiner: Topic-Driven Patent Analysis and Mining.” In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’12, 1366. Beijing, China: ACM Press. https://doi.org/10.1145/2339530.2339741.
Venugopalan, Subhashini, and Varun Rai. 2015. “Topic Based Classification and Pattern Identification in Patents.” Technological Forecasting and Social Change 94 (May): 236–50. https://doi.org/10.1016/j.techfore.2014.10.006.
Yang, Yang, Jie Tang, and Juanzi Li. 2018. “Learning to Infer Competitive Relationships in Heterogeneous Networks.” ACM Transactions on Knowledge Discovery from Data 12 (1): 1–23. https://doi.org/10.1145/3051127.
Yoon, Byungun, and Christopher L. Magee. 2018. “Exploring Technology Opportunities by Visualizing Patent Information Based on Generative Topographic Mapping and Link Prediction.” Technological Forecasting and Social Change 132 (July): 105–17. https://doi.org/10.1016/j.techfore.2018.01.019.
Yoon, Janghyeok, Wonchul Seo, Byoung-Youl Coh, Inseok Song, and Jae-Min Lee. 2017. “Identifying Product Opportunities Using Collaborative Filtering-Based Patent Analysis.” Computers & Industrial Engineering 107 (May): 376–87. https://doi.org/10.1016/j.cie.2016.04.009.
Younge, Kenneth A., and Jeffrey M. Kuhn. 2016. “Patent-to-Patent Similarity: A Vector Space Model.” SSRN Scholarly Paper ID 2709238. Rochester, NY: Social Science Research Network. https://doi.org/10.2139/ssrn.2709238.
第一頁 上一頁 下一頁 最後一頁 top