System bottlenecks, namely those resources which are subjected to high contention, constrain system performance. Hence effective resource management should be done by focusing on the bottleneck resources and allocating them to the most deserving clients. It has been shown that for any combination of entitlements and requests a fair allocation of bottleneck resources can be found, using an off-line algorithm that is given full information in advance regarding the needs of each client. We extend this result to the on-line case with no prior information. To this end we introduce a simple greedy algorithm. In essence, when a scheduling decision needs to be made, this algorithm selects the client that has the largest minimal gap between its entitlement and its current allocation among all the bottleneck resources. Importantly, this algorithm takes a global view of the system, and assigns each client a single priority based on his usage of all the resources; this single priority is then used to make coordinated scheduling decisions on all the resources. Extensive simulations show that this algorithm achieves fair allocations according to the desired entitlements for a wide range of conditions, without using any prior information regarding resource requirements. It also follows shifting usage patterns, including situations where the bottlenecks change with time.
Huge pretrained language models (LMs) have demonstrated surprisingly good zero-shot capabilities on a wide variety of tasks. This gives rise to the appealing vision of a single, versatile model with a wide range of functionalities across disparate applications. However, current leading techniques for leveraging a "frozen" LM-i.e., leaving its weights untouched-still often underperform fine-tuning approaches which modify these weights in a task-dependent way. Those, in turn, suffer forgetfulness and compromise versatility, suggesting a tradeoff between performance and versatility. The main message of this paper is that current frozenmodel techniques such as prompt tuning are only the tip of the iceberg, and more powerful methods for leveraging frozen LMs can do just as well as fine tuning in challenging domains without sacrificing the underlying model's versatility. To demonstrate this, we introduce three novel methods for leveraging frozen models: input-dependent prompt tuning, frozen readers, and recursive LMs, each of which vastly improves on current frozen-model approaches. Indeed, some of our methods even outperform fine-tuning approaches in domains currently dominated by the latter. The computational cost of each method is higher than that of existing frozen model methods, but still negligible relative to a single pass through a huge frozen LM. Each of these methods constitutes a meaningful contribution in its own right, but by presenting these contributions together we aim to convince the reader of a broader message that goes beyond the details of any given method: that frozen models have untapped potential and that fine-tuning is often unnecessary.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.