研究生(外文):Zheng-Kang Liu
論文名稱(外文):Keystroke Dynamics Feature Designed for Zhuyin Input Method
指導教授(外文):Ho-Lin Chen
外文關鍵詞:BiometricsKeystroke DynamicsPattern RecognitionComputer Security
生物特徵量測學(Biometrics)是一種根據人類生理特徵或行為特徵作為辨識手法的一門學問。生物特徵量測學(Biometrics)通常分為生理型(physiological)生物特徵以及行為型(behavioral)生物特徵。在行為型生物特徵之中,敲鍵特徵(keystroke dynamics)因其具備低成本、容易佈署、不易引人注目的特點使其成為行為生物特徵之中最為熱門的研究議題。


本論文提出了一個新的特徵(feature)稱作聲調切割(tonal separation),此特徵紀錄了從第一個注音符號到第一個注音聲符的所有按鍵按住時間(holdtime)和相間時間(release-press time)。 而這個想法是源自於絕大多數注音輸入法的運作模式‧一般的注音輸入法都會在使用者輸入聲調符號的時候將注音符號轉成所預測的中文字。為了比較這兩個特徵(feature)的優缺,我們進行了一個實驗,此實驗會用這兩個特徵(feature)來分別訓練模型(model)。本論文將會使用AUC(Area under ROC curve,ROC曲線下的面積)來比較聲調切割(tonal separation)和兩字一音(digraph)的在同樣的模型下的性能。

在實驗資料裡,我們選擇「ㄉㄜ˙」和「ㄋㄨㄥˋ」作為此次實驗的目標型樣(pattern),在「ㄉㄜ˙」的情況下,聲調切割(tonal separation)在AUC的表現均超過此型樣(pattern)對應的兩字一音(digraph)。而在「ㄋㄨㄥˋ」的情況下,聲調切割(tonal separation)的AUC表現除了與此型樣(pattern)對應的第二組兩字一音(digraph)差不多之外,其餘的第一組兩字一音(digraph)和第三組的兩字一音(digraph)的AUC均略遜於聲調切割(tonal separation)。總結來說,聲調切割(tonal separation)的整體表現優於兩字一音(digraph)。
Biometrics is a measurement which uses a distinctive aspect of people''s biology or behavior to identify people. Biometrics is often categorized into physiological and behavioral. Keystroke dynamics is the most popular behavioral biometrics because it is low-cost, easily deployed, unobtrusive, etc.

Traditionally, in the field of keystroke dynamics, digraph has been regarded as a standard feature. Nonetheless, those researchers had been mostly focusing on the patterns typed in English. Whether using digraph to analyze patterns in Chinese typed through Zhuyin IME is suitable remains to be justified.

In this thesis, we propose a new feature called tonal separation derived from the mechanics of Zhuyin IME, which is a type of input method based on a phonetic system for Chinese. Tonal separation is the time duration which records all holdtime and release-press time for every key from the first Zhuyin character to the first tone mark. It is an intuitive thought coming from that Zhuyin IME transforms patterns into a Chinese character every time when a tone mark is triggered. In order to evaluate the performance of these two features, we conducted an experiment in which we trained models with these two different features. We will compare tonal separation with digraph using AUC, area under ROC curve, to show the performance of classifiers.

In our experiment, we choose (ㄉㄜ˙) and (ㄋㄨㄥˋ) as our target pattern to which we are going to apply the sample extraction using the feature tonal separation and digraph. In the case(ㄉㄜ˙), the AUC of tonal separation samples is better than the AUC of all corresponding digraph samples. In the case (ㄋㄨㄥˋ), the AUC of tonal separation samples is almost the same as the AUC of the second corresponding digraph samples. However, the AUC of tonal separation is still better than that of other two corresponding digraph sample sets. In summary, the overall performance of tonal separation outperforms that of digraph.
口試委員會審定書 iii
誌謝 v
摘要 vii

1 Introduction 11.1
Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2
1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3
1.3 Problem Description . . . . . . . . . . . . . . . . . . . . . . . . . . . .6
1.4 Outline of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7

2 Machine Learning 9
2.1 Basic Procedure of Learning Tasks . . . . . . . . . . . . . . . . . . . . .9
2.2 Types of Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11
2.3 Random Forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .11

3 Biometrics and Keystroke Dynamics 15
3.1 Biometrics: Human Characteristics . . . . . . . . . . . . . . . . . . . . .15
3.1.1 Performance Metrics for a Biometrics System . . . . . . . . . . .16
3.1.2 Confusion Matrix and Derivative Performance Metrics . . . . . .17
3.1.3 ReceiverOperatingCharacteristic(ROC)andAreaUnderanROCCurve (AUC) . . . . . . . . . . . . . . . . . . . . . . . . . . . .20
3.2 Keystroke Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . .25
3.2.1 Types of Keystroke Dynamics . . . . . . . . . . . . . . . . . . .26
3.2.2 Common Features . . . . . . . . . . . . . . . . . . . . . . . . .27

4 Standard Chinese Phonetic System 29
4.1 Typical Non-romanization System for Chinese: Zhuyin . . . . . . . . .30
4.2 Typical Romanization System for Chinese: Pinyin . . . . . . . . . . . ..30
4.3 Overview of Chinese Input Method (IME) and Keyboard Layout . . . ..33
4.3.1 Chinese Input Method Editor (IME) . . . . . . . . . . . . . . . .33
4.3.2 Keyboard Layout for Chinese-speaking Countries . . . . . . . . .35

5 Tonal Separation and Assessment 37
5.1 Feature: Digraph and Tonal Separation . . . . . . . . . . . . . . . . . . .38
5.2 Experiment Settings . . . . . . . . . . . . . . . . . . . . . . . . . . . . .41
5.2.1 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . .41
5.2.2 Data Preprocessing . . . . . . . . . . . . . . . . . . . . . . . . .42
5.2.3 Model Training and Model Assessment . . . . . . . . . . . . . .44
5.3 Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . .45

6 Conclusion and Future Work 53

Bibliography 55
