研究生(外文):Pei-Hsin Weng
論文名稱(外文):Recommend Data-Mart Index Dimensions Based on Causality Odds Ratio Measures
指導教授(外文):Meng-Feng Tsai
外文關鍵詞:Causal Odds Ratio MiningData MartInstitutional Research
本研究採用基於相關性特徵選擇(Correlation-based Feature Selection, CFS)來計算和評估特徵集合的價值,並搭配使用向前選擇(Forward Selection, FS)作為具體的特徵選擇方法,篩選出符合特定主題的特徵集合。隨後,透過因果勝算比探勘技術,針對特定主題進行深入分析,同時評估在給定的數據範圍內,該主題是否具有深入探索的可適性。本研究以校務資料倉儲作為資料來源,分別對 「在學適應良好」、「在學不適應」、「多元學習」三個不同主題進行探討,推薦在特定主題中能夠突顯因果相關的資料市集所需的索引維度。藉此協助校務研究人員在教育方針上能更精準且有力,達到決策支援。
In recent years, the trend of big data has gradually made institutional research a topic of concern for many schools. To cope with this trend and improve teaching quality, our school has established an institutional research unit, integrating data from multiple dimensions including student academic performance, course selection, and club participation, forming a rich and complex data warehouse. However, for institutional researchers addressing different topics, how to organize suitable data marts from the data warehouse remains a challenge. Relying solely on experience or relevance may generate seemingly meaningful but essentially meaningless information. This study argues that the index dimensions of the data marts should reflect potentially causal analysis perspectives. Avoiding imprecise data mart design is crucial as it can lead to inaccurate analysis results and difficulties in interpretation, further affecting the effectiveness of decision support.
This study employs the Correlation-based Feature Selection (CFS) method to calculate and evaluate the value of feature sets. In combination with Forward Selection (FS), it is used to filter out feature sets that align with specific themes. Subsequently, using causal odds ratio mining techniques, it conducts in-depth analysis on specific topics while assessing whether the topic is suitable for in-depth exploration within a given data range. This study uses the institutional research data warehouse as the data source and discusses three different topics: "good adaptability in school," "poor adaptability in school," and "diversified learning." It recommends the index dimensions required for data marts that can highlight causal relevance in specific topics. This assists institutional researchers in being more precise and effective in formulating educational policies, thereby achieving decision support.
摘要 i
誌謝 iv
目錄 v
圖目錄 vii
表目錄 viii
一、緒論 1
1-1. 研究背景與動機 1
1-2. 研究目的 2
二、文獻探討 3
2-1. 資料倉儲(Data Warehouse ) 3
2-2. 資料市集(Data Mart) 3
2-3. 基於相關性的特徵選擇(Correlation-based Feature Selection ,CFS) 4
2-3-1. 向前選擇法(Forward Selection ,FS) 5
2-4. 關聯規則探勘 (Association Rule Mining) 6
2-4-1. 支持度(Support) 6
2-4-2. 信賴度(Confidence) 7
2-5. 因果勝算比探勘(Causal Odds Ratio Mining) 7
2-6. 隊列研究(Cohort Study) 8
2-7. 勝算比(Odds Ratio) 9
三、研究方法 11
3-1. 系統架構 11
3-2. 資料來源與前處理 11
3-3. 基於相關性的特徵選擇流程 13
3-4. 系統流程 14
3-5. 因果勝算比規則候選與驗證 16
3-6. 分析與推薦資料市集所需的維度 17
四、實驗 19
4-1. 實驗環境與規格 19
4-2. 實驗資料集與設計 19
4-3. 實驗分析 20
4-3-1. 主題-在學適應良好 20
4-3-1-1. 因果勝算比規則分析與討論-在學適應良好 22
4-3-2. 主題-在學不適應 25
4-3-2-1. 因果勝算比規則分析與討論-在學不適應 28
4-3-3. 主題-多元學習 31
4-3-3-1. 因果勝算比規則分析與討論-多元學習 32
4-4. 實驗結果統計 33
五、結論與討論 35
六、參考文獻 37
