研究生(外文):Yi-Ying Lu
論文名稱(外文):K-Grouping: A Machine-Learning-based Data Classifier to Reduce the Write Amplification in SSDs
指導教授(外文):Chin-Hsien Wu
口試委員(外文):Jen-Wei HsiehYa-Shu ChenYuan-Hsiang Lin
外文關鍵詞:Flash memoryHot-Cold dataData clusteringGarbage collectionWrite amplificationDecision tree
Solid-state drives (SSDs) composed of flash memory have the advantages of non-volatility, fast speed, shock resistance, low-power consumption, and small size. In recent years, the SSDs have been using as data storage for various devices widely. Two critical characteristics of flash memory are that it does not support the in-place update for the data, and it must write data in units of a page and erase data in units of a block. Due to the two characteristics, when a block is selected as a victim block to erase, we need to move the remaining valid pages from the victim block to another free block. Therefore, how to reduce the amount of valid page movement is a crucial issue for SSDs. By performing data classification, it can sufficiently concentrate the distribution of invalid pages in the flash memory and reduce the data movement cost. This thesis proposes a method to design an adaptive data classifier for different workloads based on the machine learning algorithm. The classifier writes the requests with the same characteristics in the same group of data blocks. Through such a design, it can improve the performance of SSDs by reducing the live page copying and further decreasing the write amplification.
1. Introduction
2. Background
2.1. Flash Translation Layer
2.1.1. Address Translation
2.1.2. Garbage-Collection
2.1.3. Wear-Leveling
2.2. Write Amplification
2.3. Hot-Cold Classification
2.3.1. Two-Level-LRU
2.3.2. WDAC
2.3.3. Multiple Bloom Filters
2.3.4. DAC
3. Motivation
4. K-Grouping
4.1. Framework
4.2. A ML-based Data Classifier
4.2.1. Feature Retrieving
4.2.2. Data Preprocessing
4.2.3. Data Clustering
4.2.4. Classifier Training
4.3. Online Classifying
4.3.1. Write Operation
4.3.2. GC Operation
5. Experiment
5.1. Experiment Setting
5.1.1. Workloads Setting
5.1.2. Simulator Setting
5.1.3. Methods for Comparison
5.1.4. Memory Overhead
5.2. Experiment Overview
5.3. Experiment Result
5.3.1. Features of Each Group
5.3.2. The WA of Different K
5.3.3. The WA of Comparison Method
5.3.4. The WA of Different Size of Training Data
5.3.5. Fixed Classifier for Future Several Days
5.4. Combine MLDC with DAC
6. Conclusion
