Nowadays microprocessors are among the most complex electronic systems that
man has ever designed. One small silicon chip can contain the complete
processor, large memory and logic needed to connect it to the input-output
devices. The performance of today's processors implemented on a single chip
surpasses the performance of a room-sized supercomputer from just 50 years
ago, which cost over $ 10 million [1]. Even the embedded processors found in
everyday devices such as mobile phones are far more powerful than computer
developers once imagined. The main components of a modern microprocessor are
a number of general-purpose cores, a graphics processing unit, a shared
cache, memory and input-output interface and a network on a chip to
interconnect all these components [2]. The speed of the microprocessor is
determined by its clock frequency and cannot exceed a certain limit. Namely,
as the frequency increases, the power dissipation increases too, and
consequently the amount of heating becomes critical. So, silicon
manufacturers decided to design new processor architecture, called multicore
processors [3]. With aim to increase performance and efficiency these
multiple cores execute multiple instructions simultaneously. In this way,
the amount of parallel computing or parallelism is increased [4]. In spite
of mentioned advantages, numerous challenges must be addressed carefully
when more cores and parallelism are used. This paper presents a review of
microprocessor microarchitectures, discussing their generations over the
past 50 years. Then, it describes the currently used implementations of the
microarchitecture of modern microprocessors, pointing out the specifics of
parallel computing in heterogeneous microprocessor systems. To use
efficiently the possibility of multi-core technology, software applications
must be multithreaded. The program execution must be distributed among the
multi-core processors so they can operate simultaneously. To use
multi-threading, it is imperative for programmer to understand the basic
principles of parallel computing and parallel hardware. Finally, the paper
provides details how to implement hardware parallelism in multicore systems.