This paper presents a pipelined CPU design project with a field programmable gate array (FPGA) system in a computer architecture course. The class project is a five-stage pipelined 32-bit MIPS design with experiments on the Altera DE2 board. For proper scheduling, milestones were set every one or two weeks to help students complete the project on time. The goal of the project is to educate students effectively via hands-on learning, rather than having them achieve a complete and flawless CPU design. This study reveals that 21 MIPS instructions are enough to achieve the purpose. With the addition in 2010 of the properly enforced scheduling and the FPGA system, many more students successfully completed the class project than was the case in 2009. A student survey and the independent samples t-test reveal the effectiveness of the methodology with the FPGA system. This work differs from previous work in that the devised project requires the implementation of a real CPU instead of utilizing simulators or just experimenting with ready-made complete CPU models.Index Terms-Computer architecture, education, field programmable gate array (FPGA), hands-on learning, incremental learning, pipeline, problem-based learning (PBL).
Software simulation has been the predominant method for architects to evaluate microprocessor research proposals. There are three tenets in modeling new designs with software models: simulation speed, model accuracy and model completeness. The increasing complexity of the processor and accelerated trend to have multiple processors on a chip are putting burden on simulators to achieve all tenets mentioned, including accurately capturing OS effects. In this work we perform preliminary experimentation/prototyping with an emulation system which overcomes the tension to satisfy all three requirements. The system is an original Socket-7 based desktop processor system with typical hardware peripherals running modern operating systems such as Fedora Core 4 and Windows XP; however we have inserted a Xilinx Virtex-4 in place of the processor that should sit in the motherboard and have used the Virtex-4 to host a complete version of the Pentium r 1 microprocessor (which consumes less than half its resources). We can therefore apply architectural changes to the processor and evaluate their effects on the complete desktop system. We use this FPGA-based emulation system to conduct preliminary architectural experiments including growing the branch target buffer and the level 1 caches. In addition, we experimented with interfacing hardware accelerators such as DES and AES engines which resulted in 27x speedups.
This article presents our cost-effective curriculum on embedded systems. Education on embedded systems requires coverage of both hardware and software aspects of the systems. Our curriculum uses one monolithic environment, virtual platform, to introduce all the layers of the system components (i.e., from hardware to operating systems to user applications). It is cost-effective since a hardware system is replaced by a virtual platform. Correspondingly, hardware boards and a lab space are not required. Yet, students are able to make modifications easily on hardware and software components of interest, fully exercising the system. Students responded to the course survey that they are knowledgeable on how embedded systems work after taking the course. Especially, students responded that the virtual platform is effective to use, in place of a hardware platform to learn embedded systems. The course materials are available to the public from a website at Korea University.
The parallel learning in neural networks can greatly shorten the training time. Its prior efforts were mostly limited to distributing inputs to multiple computing engines. It is because the gradient descent algorithm in the neural network training is inherently sequential. This paper proposes a novel CNN parallel training method for image recognition. It overcomes the sequential property of the gradient descent and enables the parallel training with the speculative backpropagation. We found that the Softmax and ReLU outcomes in the forward propagation for the same labels are likely to be very similar. This characteristic makes it possible to perform the forward and backward propagation simultaneously. We implemented the proposed parallel model with CNNs in both software and hardware, and evaluated its performance. The parallel training reduces the training time by 34% in CIFAR-100 without the loss of the prediction accuracy compared to the sequential training. In many cases, it even improves the accuracy.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.