|
In this thesis, a multi-threaded superscalar processor architecture for Prolog OR parallel execution is proposed. With multi-threading characteristic of this architecture coarse- grained OR parallelism of a Prolog program can be exploited. Each OR task is to be executed by different thread. Within each thread, superscalar processing exploits fine-grained parallelism of Prolog code. In our OR execution model, Prolog procedures are classified according to their properties. A program can be partitioned into several subprograms according to their behaviors in execution. These subprograms are classified as: non-parallelizable subprograms, parallelizable recursive subprograms, balanced subprograms and irregular unbalanced subprograms. The behaviors of a program will change dependend upon the subprogram currently being executed. The goal of this processor architecture design is to gain program execution speedup by properly utilizing the OR parallelism inherent in the subprograms. Furthermore, in order to reduce the performance restriction caused by irregular unbalanced subprograms, a dynamic load balancing method is proposed. Finally, a multi-threaded superscalar processor architecture for the OR parallelism in Prolog is proposed to implement this model. Specially, the designs of memory and register file are presented. Benchmarks of PLM and BAM systems are used in performance simulation. For the benchmarks with OR parallelism, the performance of this architecture is 164% better than that of superscalar architecture.
|