跳到主要內容

臺灣博碩士論文加值系統

(98.84.18.52) 您好!臺灣時間:2024/10/04 01:10
字體大小: 字級放大   字級縮小   預設字形  
回查詢結果 :::

詳目顯示

我願授權國圖
: 
twitterline
研究生:楊韻蓉
研究生(外文):YANG, YUN-RONG
論文名稱:美國證券詐欺中的特徵提取與熱點主題探勘:LDA主題模型的應用
論文名稱(外文):Feature Extraction and Hot Topic Mining in US Securities Fraud:An Application of LDA Topic Modeling
指導教授:朱珊瑩朱珊瑩引用關係
指導教授(外文):CHU, SHAN-YING
口試委員:葉錦徽高克爾
口試委員(外文):YEH, JIN-HUEIASKAR KOSHOEV
口試日期:2022-07-25
學位類別:碩士
校院名稱:中原大學
系所名稱:國際商學碩士學位學程
學門:商業及管理學門
學類:貿易學類
論文種類:學術論文
論文出版年:2022
畢業學年度:110
語文別:英文
論文頁數:64
中文關鍵詞:證券犯罪文字探勘隱含狄利克雷分布主題模型特徵提取
外文關鍵詞:securities fraudtext miningLatent Dirichlet Allocationtopic modelingfeature extraction
DOI:10.6840/cycu202201587
相關次數:
  • 被引用被引用:0
  • 點閱點閱:662
  • 評分評分:
  • 下載下載:0
  • 收藏至我的研究室書目清單書目收藏:0
資訊披露一直是有關當局穩定市場秩序的措施之一,儘管現有法規存有要求上市公司資訊披露的制度,但當今各種違反資訊披露制度的行為層出不窮,這說明資訊披露制度仍有不足。為保護普通投資人因證券犯罪行為而蒙受損失,本文以加強現有市場訊息為目的,提供處於資訊弱勢方的普通投資人更全面的訊息。

本研究利用文字探勘技術中的隱含狄利克雷分配(Latent Dirichlet allocation, LDA)方法,根據美國證券交易委員會(U.S. Securities and Exchange Commission,SEC) 1995 年至 2021 年公開發布之訴訟發布文件(Litigation Release) 進行主題建模。透過自動化文本分類方法,從證券欺詐案件中分類出主題以提取美國證券詐欺行為之關鍵特徵以及挖掘歷年證券犯罪訴訟案件之熱點議題。據研究結果指出,投資型計畫是當前最主要的起訴問題,其次是內幕交易,第三是未註冊的證券發行。
The disclosure of market information is conducive to the stable development of the capital market. Although there are existing regulations requiring listed companies to disclose information, various violations of the information disclosure system are emerging one after another, indicating that the relevant information disclosure is still insufficient. In order to protect general investors from losses due to securities frauds, this paper aims to enhance the existing market information and provide more comprehensive information for general investors who are on the weak side of information.

This study applies the Latent Dirichlet Allocation (LDA) approach in text mining technology to conduct topic modeling based on 26 years of Litigation Releases publicly released by the U.S. Securities and Exchange Commission (SEC) from 1995 to 2021. Through an automated text classification method, topics are classified from securities fraud cases to extract key features of securities fraud in the United States and to mine hot topics in securities fraud litigations over the years. According to the findings, investment-based schemes are currently the leading prosecution issue, followed by insider trading, and third, unregistered securities offerings.

Contents
摘要 I
Abstract II
Acknowledgement III
Contents IV
List of Figures VI
List of Tables VII
Chapter I Introduction 1
Chapter II Literature Review 4
2.1 Overview of Market Manipulation 4
2.1.1 Potential Problems and Causes of Market Manipulation 4
2.1.2 Difficulties in Identifying Market Manipulation 5
2.1.3 The Cost of Market Manipulation 6
2.1.4 Market Manipulation under Asymmetric Information 7
2.2 The Overview of Text Mining 9
2.2.1 Value of Text Mining 9
2.2.2 Framework for Text Mining 10
2.2.3 Text Mining Applications 11
2.3 Topic Modeling 12
2.3.1 Evolution of Topic Modeling 12
2.3.2 Topic Modeling Applications 13
Chapter III LDA Topic Modeling 15
Chapter IV Data Collection and Data Processing 18
4.1 Data Collection 18
4.1.1 Data Resource 18
4.1.2 Data Selection 19
4.1.3 Worming 20
4.1.4 Construction of Text Database 22
4.2 Data Processing 24
4.2.1 Text Preprocessing 24
4.2.2 Document-Term Matrix (DTM) Creation 28
4.2.3 Setting up LDA Parameters 30
Chapter V Results and Analysis 36
5.1 Topic Labeling 36
5.2 Topic Labels Validation 42
5.3 Feature Extraction of Securities Fraud 44
5.4 Hot Topic Exploration 47
Chapter VI Concluding Remarks and Suggestions 49
References 51
APPENDIX A - LIST OF COMPOUND WORDS 56


List of Figures
Figure 1. The Framework of Text Mining 11
Figure 2. Mechanism of LDA Topic Modeling 16
Figure 3. Process of Web Crawling 21
Figure 4. Annual Number of SEC Litigation Releases 23
Figure 5. Document-Term Matrix Composition 29
Figure 6. Document-Term Matrix Composition after Removing 1% of Sparse Terms 30
Figure 7. Perplexity Results in A 5-Fold Cross-validation Test 32
Figure 8. Test Result of Ldatuning 34
Figure 9. Topic Labels 41
Figure 10. Topic Similarity Dendrogram 43
Figure 11. The Distribution of Papers for The Top 10 Topics 47


List of Tables
Table 1. Description of LDA Parameters 17
Table 2. SEC Enforcement Actions 19
Table 3. NTLK English Stopword List 26
Table 4. Custom Stopword List 27
Table 5. Document-Term Matrix 29
Table 6. Perplexity Results in A 5-Fold Cross-validation Test 33
Table 7. LDA Parameters in Our Research 35
Table 8. Suspected Securities Fraud or Wrongdoing Reported by The SEC 37
Table 9. Features of Each Securities Fraud 44
Aggarwal, C.C. and C. Zhai (2012), A survey of text clustering algorithms, In Mining text data, 77-128, Springer, Boston, MA.
Aggarwal, R.K. and G. Wu (2006), Stock market manipulations, The Journal of Business, 79(4), 1915-1953.
Akerlof, G.A. (1978), The market for “lemons”: quality uncertainty and the market mechanism, In Uncertainty in economics, Academic Press, 235-251,
Akinmade, B., F.F. Adedoyin, and F.V. Bekun (2020), The impact of stock market manipulation on Nigeria’s economic performance, Journal of Economic Structures, 9(1), 1-28.
Amara, I., A.B. Amar, and A. Jarboui (2013), Detection of fraud in financial statements: French companies as a case study, International Journal of Academic Research in Accounting, Finance and Management Sciences, 3(3), 40-51.
Amihud, Y. and H. Mendelson (1986), Liquidity and stock returns, Financial Analysts Journal, 42(3), 43-48.
Antweiler, W. and M.Z. Frank (2004), Is all that talk just noise? the information content of internet stock message boards, The Journal of finance, 59(3), 1259-1294.
Arun, R., V. Suresh, C.E. Veni Madhavan, and N. Murthy (2010), On finding the natural number of topics with latent dirichlet allocation: some observations, In Pacific-Asia conference on knowledge discovery and data mining, 391-402, Springer, Berlin, Heidelberg.
Asmussen, C.B. and C. Møller (2019), Smart literature review: a practical topic modelling approach to exploratory literature review, Journal of Big Data, 6(1), 1-18.
Baesel, J.B. and G.R. Stein (1979), The value of information: inferences from the profitability of insider trading, Journal of Financial and Quantitative Analysis, 14(3), 553-571.
Bao, Y. and A. Datta (2014), Simultaneously discovering and quantifying risk types from textual risk disclosures, Management Science, 60(6), 1371-1391.
Bartov, E. (1993), The timing of asset sales and earnings manipulation, Accounting review, 840-855.
Bird, S. (2006), NLTK: the natural language toolkit, In Proceedings of the COLING/ACL 2006 Interactive Presentation Sessions, 69-72.
Blei, D.M., A.Y. Ng, and M.I. Jordan (2003), Latent Dirichlet Allocation, Journal of machine Learning research, 3, 993-1022.
Boyd-Graber, J., D. Mimno, and D. Newman (2014), Care and feeding of topic models: problems, diagnostics, and improvements, Handbook of mixed membership models and their applications, 225255.
Buenaño-Fernandez, D., M. González, D. Gil, and S. Luján-Mora (2020), Text mining of open-ended questions in self-assessment of university teachers: an LDA topic modeling approach, IEEE Access, 8, 35318-35330.
Cao, J., T. Xia, J. Li, Y. Zhang, and S. Tang (2009), A density-based method for adaptive LDA model selection, Neurocomputing, 72(7-9), 1775-1781.
Carhart, M.M., R. Kaniel, D.K. Musto, and A.V. Reed (2002), Leaning for the tape: evidence of gaming behavior in equity mutual funds, The Journal of Finance, 57(2), 661-693.
Chang, J., S. Gerrish, C. Wang, J. Boyd-Graber, and D. Blei (2009), Reading tea leaves: how humans interpret topic models, Advances in neural information processing systems, 22.
Chen, D., F. Wang, and C. Xing (2021), Financial reporting fraud and CEO pay-performance incentives, Journal of Management Science and Engineering, 6(2), 197-210.
Chi, T. (1994), Trading in strategic resources: necessary conditions, transaction cost problems, and choice of exchange structure, Strategic management journal, 15(4), 271-290.
Clemons, E.K. and M.C. Row (1992), Information technology and industrial cooperation: the changing economics of coordination and ownership, Journal of Management Information Systems, 9(2), 9-28.
Cohen, B.D. and T.J. Dean (2005), Information asymmetry and investor valuation of IPOs: top management team legitimacy as a capital market signal, Strategic Management Journal, 26(7), 683-690.
Das, S.R. and M.Y. Chen (2001), Yahoo! for Amazon: sentiment parsing from small talk on the web, For Amazon: Sentiment Parsing from Small Talk on the Web, EFA.
Deerwester, S., S.T. Dumais, G.W. Furnas, T.K. Landauer, and R. Harshman (1990), Indexing by latent semantic analysis, Journal of the American society for information science, 41(6), 391-407.
Deveaud, R., E. SanJuan, and P. Bellot (2014), Accurate and effective latent concept modeling for ad hoc information retrieval, Document numérique, 17(1), 61-84.
Dyck, I.J., A. Morse, and L. Zingales (2021), How pervasive is corporate fraud?, Rotman School of Management Working Paper, 2222608.
Eleswarapu, V.R., R. Thompson, and K. Venkataraman (2004), The impact of regulation fair disclosure: trading costs and information asymmetry, Journal of financial and quantitative analysis, 39(2), 209-225.
Elmore, K.L. and M.B. Richman (2001), Euclidean distance as a similarity metric for principal component analysis, Monthly weather review, 129(3), 540-549.
Fox, M.B., L.R. Glosten, and G.V. Rauterberg (2018), Stock market manipulation and its regulation, Yale J. on Reg., 35, 67.
Ge, W., C. Larson, and R. Sloan (2007), Predicting material accounting manipulations, University of California, Berkley, University of Washington, University of Michigan, and Barclays Global Investors, Working paper.
Griffiths, T.L. and M. Steyvers (2004), Finding scientific topics, Proceedings of the National academy of Sciences, 101 (suppl_1), 5228-5235.
Grimes, S. (2008), Unstructured data and 80 percent rule, Carabridge Bridgepoints, Q3
Hagen, L. (2018), Content analysis of e-petitions with topic modeling: how to train and evaluate LDA models?, Information Processing & Management, 54(6), 1292-1307.
Hanson, J.D. and D.A. Kysar (1999), Taking behavioralism seriously: the problem of market manipulation, NYUL Rev., 74, 630.
Hasbrouck, J. (1991), The summary informativeness of stock trades: an econometric analysis, The Review of Financial Studies, 4(3), 571-595.
Healy, P.M. and K.G. Palepu (2001), Information asymmetry, corporate disclosure, and the capital markets: a review of the empirical disclosure literature, Journal of accounting and economics, 31(1-3), 405-440.
Hearst, M.A. (1999), Untangling text data mining, In Proceedings of the 37th Annual meeting of the Association for Computational Linguistics, 3-10
Hofmann, T. (1999), Probabilistic latent semantic indexing, In Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 50-57.
Hornik, K. and B. Grün (2011), topicmodels: An R package for fitting topic models, Journal of statistical software, 40(13), pp.1-30.
Huang, Y.C. and Y.J. Cheng (2015), Stock manipulation and its effects: pump and dump versus stabilization, Review of Quantitative Finance and Accounting, 44(4), 791-815.
Ittoo, A. and A. van den Bosch (2016), Text analytics in industry: challenges, desiderata and trends, Computers in Industry, 78, 96-107.
Jarrow, R., S. Fung, and S.C. Tsai (2018), An empirical investigation of large trader market manipulation in derivatives markets, Review of Derivatives Research, 21(3), 331-374.
Khwaja, A.I. and A. Mian (2005), Do lenders favor politically connected firms? rent provision in an emerging financial market, The Quarterly Journal of Economics, 120(4), 1371-1411.
Kwartler, T. (2017), Text mining in practice with R, John Wiley & Sons.
Lang, M.H. and R. J. Lundholm (2000), Voluntary disclosure and equity offerings: reducing information asymmetry or hyping the stock?, Contemporary accounting research, 17(4), 623-662.
Li, T., D. Shin, and B. Wang (2021), Cryptocurrency pump-and-dump schemes, Available at SSRN 3267041.
Liu, L., L. Tang, W. Dong, S. Yao, and W. Zhou (2016), An overview of topic modeling and its current applications in bioinformatics, SpringerPlus, 5(1), 1-22.
MacCartney, B. and C.D. Manning (2009), An extended model of natural logic, In Proceedings of the eight international conference on computational semantics, 140-156.
Morck, R., B. Yeung, and W. Yu (2000), The information content of stock markets: why do emerging markets have synchronous stock price movements?, Journal of financial economics, 58(1-2), 215-260.
Newman, B.M., P.R. Newman, S. Griffen, K. O'Connor, and J. Spas (2007), The relationship of social support to depressive symptoms during the transition to high school, Adolescence, 42(167), 441.
Nguyen, T.H., K. Shirai, and J. Velcin (2015), Sentiment analysis on social media for stock movement prediction, Expert Systems with Applications, 42(24), 9603-9611.
Nikita, M. and M.M. Nikita (2016), Package ‘ldatuning’.
Nyaga, G.N., D.F Lynch, D. Marshall, and E. Ambrose (2013), Power asymmetry, adaptation and collaboration in dyadic relationships involving a powerful partner, Journal of supply chain management, 49(3), 42-65.
Putniņš, T.J. (2012), Market manipulation: a survey, Journal of Economic Surveys, 26(5), 952-967.
Putniņš, T.J. (2020), An overview of market manipulation, Corruption and Fraud in Financial Markets: Malpractice, Misconduct and Manipulation, 13.
Ramage, D., D. Hall, R. Nallapati, and C.D. Manning (2009), Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora, In Proceedings of the 2009 conference on empirical methods in natural language processing, 248-256.
Richardson, V.J. (2000), Information asymmetry and earnings management: some evidence, Review of quantitative finance and accounting, 15(4), 325-347.
Steyvers, M. and T. Griffiths (2007), Probabilistic topic models, In Handbook of latent semantic analysis, 439-460, Psychology Press.
Tan, A.H. (1999), Text mining: the state of the art and the challenges, In Proceedings of the pakdd 1999 workshop on knowledge disocovery from advanced databases, 8, 65-70.
Terry, S. (2015), The macro impact of short-termism, Stanford Inst. for Economic Policy Research.
U.S. Securities and Exchange Commission (2022), Report suspected securities fraud or wrongdoing, modified from August 18, 2022, from https://www.sec.gov/tcr#
Verma, T., R. Renu, and D. Gaur (2014), Tokenization and filtering process in RapidMiner, International Journal of Applied Information Systems, 7(2), 16-18.
Wickham, H. and M.H. Wickham (2016), Package ‘rvest’. URL: https://cran. r-project. org/web/packages/rvest/rvest. pdf, 156.
Xu, Y. and S. B. Cohen (2018), Stock movement prediction from tweets and historical prices, In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 1970-1979.
Yang, T.I., A. Torget, and R. Mihalcea (2011), Topic modeling on historical newspapers, In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, 96-104.

電子全文 電子全文(網際網路公開日期:20270725)
QRCODE
 
 
 
 
 
                                                                                                                                                                                                                                                                                                                                                                                                               
第一頁 上一頁 下一頁 最後一頁 top
無相關期刊