研究生(外文):Tsung Ying Sun
論文名稱(外文):On the Self-Construction of Fuzzy System Modeling: A Fuzzy Reasoning System Based on the Integration of Radial Basis Function Neural Network with Reinforcement Learning Mechanism
指導教授(外文):Ying Kue Yang
外文關鍵詞:Self-constructing of fuzzy System modelingreinforcement learningRadial Basis Function Neural NetworkNeuro-Fuzzy Systemfuzzy inference system
在實際世界應用中,具備學習能力的規則式模糊系統(rule-based fuzzy system),擁有非常卓越的特性足以解決各種困難的問題。但是,獲得可靠的模糊知識庫,卻一直都是建構規則式模糊系統的瓶頸。本論文的研究主題,是把類神經網路(neural networks, NN)的學習能力與模糊推論系統(Fuzzy Inference System, FIS)整合於一體,建立一個可以自動建構規則式模糊系統知識庫的類神經-模糊系統(Neuro-Fuzzy System, NFS)。研究的目標,是自動建構一個以短暫差異強化學習(Reinforcement Learning with Temporal Difference, TD(l) RL)方法為基礎的規則式模糊系統。TD(l)強化學習是一種極具威力的學習方法,特別對於定義困難(ill-defined)或某些無法應用監督式學習的情況。
本論文中,規則式模糊系統被放射狀基函數類神經網路(Radial Basis Function Neural Network, RBFNN)表現,當做學習與調整的設施。這個稱為模糊推理放射狀基函數類神經網路(Fuzzy-Reasoning Radial Basis Function Neural Network, F2RBF2N)的架構,是由兩個子網路組成:動作評斷網路(Action Critic Network, ACN)和動作選擇網路(Action Selection Network, ASN),每個子網路僅需三個神經層(neural layer)。ASN與一個模糊推理(fuzzy reasoning)是功能等價,擔任規則式模糊系統的角色,用來建立目標系統的輸入空間至輸出空間的對應,以便產生模糊系統的輸出去控制目標系統。由於RBFN與規則式模糊系統之間的等價關係,可以省略模糊推理機制的模糊化(fuzzification)和解模糊化(defuzzification)介面,使得ASN只需要三個神經層。ACN的功能是在每個時間階段產生強化學習程序所需的信賴評斷預測信號(credits critic prediction signal),以便修正ASN的實際輸出;另外ACN也參照目標系統現況所產生的外部強化信號(external reinforcement signal),轉換成一個啟發式的內部強化信號(heuristic internal reinforcement signal),做為ASN調適及學習的依據。ASN與ACN共用RBFNN的兩個神經層,使F2RBF2N的網路架構比其他的NFS架構更為簡單,不只可以簡化網路拓樸的複雜度,更可以因為降低網路的計算成本,使得系統的即時處理更高。
A rule-based fuzzy system with learning capability has excellent features to solve many difficult problems in real-world applications. However, the bottleneck of constructing a rule-based fuzzy system is to obtain a reliable fuzzy knowledge base. The subject of this thesis is therefore proposing to integrate a neural network (NN) with a fuzzy inference system (FIS) to generate a neural-fuzzy system (NFS) so that the NFS has the learning capability to self-construct the fuzzy knowledge base of the system. This research also includes developing a methodology to create a self-constructing rule-based fuzzy system based on reinforcement learning with Temporal Difference TD(l) method. The reinforcement learning with TD(l) method is one of the powerful learning methods, specially for ill-defined systems or some situations to which supervised learning method is not applicable.
In this thesis, the rule-based fuzzy system is represented by radial basis function neural network (RBFNN) for learning and tuning facilities. The network architecture, called Fuzzy-Reasoning Radial Basis Function Neural Network (F2RBF2N), is composed of two sub-networks: Action Critic Network (ACN) and Action Selection Network (ASN). There are only three neural layers in each sub-network. The function of ASN is equivalent to that of a fuzzy reasoning system The ASN is to serve as a rule-based fuzzy system performing the mapping function from input space to output space of a controlled target system. Due to the functional equivalence between RBFNN and fuzzy system, the layers performing fuzzification and defuzzification functions can be eliminated so that there are only three neural layers in ASN. The function of ACN is to generate credit critic prediction signals at each time step for reinforcement learning procedure in order to modify the actual outputs of ASN. In addition, ACN also refers to the external reinforcement signals generated from current status of the target system, and transforms those signals into heuristic internal reinforcement signals for the learning and tuning mechanism in ASN. The sharing of two RBFNN neural layers by both ASN and ACN not only makes the F2RBF2N simpler than other NFSs on network architecture but also reduces the cost of computation because of simplified network topology. The ability of real-time processing by our proposed system is therefore improved.
The proposed F2RBF2N in this thesis is proved having the ability to perform well the unsupervised learning from experience. The system can not only on-line generate but also tune the rules in the system as its associated environment or controlled plant is changed.
圖索引 v
表格索引 viii
定理索引 ix
定義索引 x
第一章緒論 1
1.1 系統建模問題 2
1.2 規則式模糊系統與灰箱局部建模策略 5
1.3 模糊系統自動建模方法的演進 9
1.4 以NFS自動建模的限制 12
1.5 本研究的目的 13
1.6 論文架構 15
2.1 模糊集合理論與模糊邏輯 20
2.1.1 基本定義 21
2.1.2 模糊集合的基本運算 25
2.1.3 模糊邏輯(fuzzy logic) 28
2.2 規則式模糊系統 33
2.2.1 規則式模糊系統的種類 33
2.2.2 規則式模糊系統的推理機制 37
2.2.3 模糊基函數(Fuzzy Basis Function, FBF) 43
2.2.4 規則式模糊系統模式鑑別的方法 45
2.3 類神經網路(Artificial Neural Networks) 49
2.3.1 學習機制 50
2.3.1-1 跟隨教師一起學習 52
2.3.1-2 不跟隨教師一起學習 54
2.3.2 類神經網路基本概念 60
2.3.3 多層認知器網路 66
2.3.4 放射狀基函數類神經網路 68
2.4.1 以最佳化技術為主的建模方法74
2.4.2 以聚類分析為主的建模方法74
2.4.3 以啟發式為主的建模方法75
2.4.4 混合的智慧型技術為主的建模方法76
2.4.5 類神經-模糊混合式的建模方法77
2.5 整合的模糊-類神經系統 79
2.5.1 模糊-類神經網路80
2.5.2 以監督式學習為主的類神經-模糊系統83
2.4.2 類神經-模糊系統78
2.6 改良類神經-模糊系統的動機89
2.6.1 監督式類神經-模糊模式的缺點89
2.6.2 改良NFSL模式的動機及發展的方法90
3.1 採用NFRL模式的NFS96
3.1.1 ARIC98
3.1.2 GARIC102
3.1.3 FALCON-R106
3.2 F2RBF2N的基本架構 109
3.2.1 類似神經元的評斷者(Neuron-like Critic)110
3.2.2 F2RBF2N的邏輯結構 111
3.2.3 ASN的功能 112
3.2.4 ACN的功能 115
3.2.5 F2RBF2N的實踐 116
3.3 以RBFNN為主的單值規則式模糊推理系統 118
3.4 F2RBF2N的強化學習機制 124
3.4.1 短暫差異方法(Temporal Difference method, TD(l))124
3.4.2 合格紀錄(eligibility trace)130
3.4.3 F2RBF2N的ASN學習機制 134
3.4.4 F2RBF2N的ACN學習機制 140
3.4.5 隱藏層節點的建立 142
3.5 結論 145
第四章實驗與結果 147
4.1 推車-竿平衡系統 148
4.1.1 實驗的程式設計 150
4.2 實驗的結果 155
第五章結論 165
5.1 F2RBF2N與各種AHC模式的比較167
5.1.1 主要特徵的比較167
5.1.2 網路神經元的特性比較169
5.1.3 強化學習方法的比較170
5.2 本論文的研究成果171
5.2.1 F2RBF2N的發展回顧172
5.2.2 F2RBF2N的特色172
5.2.3 研究成果173
5.3 未來的展望 175
參考文獻 176
附錄 首字母縮略專有名詞列表189
