The use and location of memory in integrated circuits plays a key factor in their performance. Memory requires large physical area, access times limit overall system performance and connectivity can result in large fan-out. Modern FPGA systems and ASICs contain an area of memory used to set the operation of the device from a series of commands set by a host. Implementing these settings registers requires a level of care otherwise the resulting implementation can result in a number of large fan-out nets that consume valuable resources complicating the placement of timing critical pathways. This paper presents an architecture for implementing and programming these settings registers in a distributed method across an FPGA and how the presented architecture works in both clock-domain crossing and dynamic partial re-configuration applications. The design is compared to that of a 'global' settings register architecture. We implement the architectures using Intel FPGAs Quartus Prime software targeting an Intel FPGA Cyclone V. It is shown that the distributed memory architecture has a smaller resource cost (as small as 25% of the ALMs and 20% of the registers) compared to the global memory architectures.