A parallel program needs to manage the trade-off between the time spent in synchronisation and computation. This trade-off is significantly affected by its parallelism degree. A high parallelism degree may decrease computing time while increasing synchronisation cost. Furthermore, thread placement on processor cores may impact program performance, as the data access time can vary from one core to another due to intricacies of the underlying memory architecture. Alas, there is no universal rule to decide thread parallelism and its mapping to cores from an offline view, especially for a program with online behaviour variation. Moreover, offline tuning is less precise. We present our work on dynamic control of thread parallelism and mapping. We address concurrency issues via Software Transactional Memory (STM). STM bypasses locks to tackle synchronisation through transactions. Autonomic computing offers designers a framework of methods and techniques to build autonomic systems with well-mastered behaviours. Its key idea is to implement feedback control loops to design safe, efficient, and predictable controllers, which enable monitoring and adjusting controlled systems dynamically while keeping overhead low. We implement feedback control loops to automate management of threads and diminish program execution time.
KEYWORDSautonomic computing, feedback control, parallelism, synchronisation, thread mapping, transactional memory
INTRODUCTIONMulti-core processors accelerate computation using high thread parallelism (number of simultaneously active threads). A program for multi-cores executes in parallel and needs to scale when the number of cores increases. However, writing a parallel application is difficult, as parallel programming encompasses all the difficulties of sequential programming and introduces extra problems on coordination of interactions among concurrently executing tasks. 1 High thread parallelism may shorten execution time, but it may potentially increase synchronisation time.Multi-core processors incorporate complex memory hierarchies, which consist of several levels of cache to alleviate penalties caused by the data access to main memory. Consequently, parallel applications need to evolve to efficiently exploit the potential of their underlying architecture.Depending on the level of cache where data are placed, access latency differs from different cores. To alleviate access latency, threads can be fixed to certain cores to improve their resource usage, eg, cache, main memory and interconnections.The conventional way to address synchronisation is via locks. However, locks are notorious for various issues such as deadlocks and the vulnerability to thread failures. 2,3 Furthermore, it is not straightforward to analyse interactions among concurrent operations. Transactional memory (TM) has emerged as an alternative parallel programming technique that handles synchronisation through transactions rather than locks. 4 Access to shared data is enclosed in transactions that are speculatively executed without b...