Jonathan Kay scite author profile

In this paper, we describe the design and implementation of an integrated architecture for cache systems that scale to hundreds or thousands of caches with thousands to millions of users. Rather than simply try to maximize hit rates, we take an end-to-end approach to improving response time by also considering hit times and miss times. We begin by studying several Internet caches and workloads, and we derive three core design principles for large scale distributed caches: (1) minimize the number of hops to locate and access data on both hits and misses, (2) share data among many users and scale to many caches, and (3) cache data close to clients. Our strategies for addressing these issues are built around a scalable, high-performance data-location service that tracks where objects are replicated. We describe how to construct such a service and how to use this service to provide direct access to remote data and push-based data replication. We evaluate our system through trace-driven simulation and find that these strategies together provide response time speedups of 1.27 to 2.43 compared to a traditional three-level cache hierarchy for a range of trace workloads and simulated environments.

show abstract

Profiling and reducing processing overheads in TCP/IP

Kay

Pasquale

1996

IEEE/ACM Trans. Networking

View full text Add to dashboard Cite

This paper presents detailed measurements of processing overheads for the Ultrix 4.2a implementation of TCP/IP network software running on a DECstation 5000/200. The performance results were used to unco ver throughput and latency bottlenecks. We present a scheme for impro ving throughput when sending large messages by a voiding most checksum computations in a relatively safe manner. We also show that for the implementation we studied, reducing latency (when sending small messages) is a more dif ficult problem because processing overheads are spread over many operations; gaining a significant savings would require the optimization of many different mechanisms. This is especially important because, when processing a realistic w orkload, we ha ve found that non-data-touching operations consume more time in aggre gate than data-touching operations. IntroductionWe analyze TCP/IP [30] and UDP/IP [29] processing overheads given a real w orkload on a DECstation 5000/200 running Ultrix 4.2a, and we use this information to guide our development of new optimizations. The cost of various processing overheads depend on message size; consequently, our optimizations take into account the message size distributions derived from the network traffic in our environment (which is not atypical of many academic and office environments).In our analysis of the TCP/IP and UDP/IP LAN and WAN traffic we were able to collect, we fi nd that message sizes are f ar from uniformly distributed; rather, most messages are either very small or very large. Small messages are usually used to carry control information, whereas lar ge messages typically carry bulk data. Different kinds of optimizations can improve processing speed for each type of traffic; in this paper we discuss both.Typical processing time breakdowns for short (i.e. 64-128 byte) control messages fundamentally differ from those of long multiple kilobyte data messages. The processing time of large messages is dominated by data-touching operations such as cop ying and computing checksums [4, 8-11, 15, 24, 36] because these operations must be applied to each byte. However, small messages have few bytes of data, and thus their processing time is dominated by non-data-touching operations.To optimize processing of large messages, we describe a checksum redundancy a voidance algorithm that eliminates most checksum processing without sacrificing reliability. Since checksum processing alone consumes nearly half the total processing time (of large messages), this optimization improves throughput considerably.On both the LAN and WAN we studied, both of which are typical Unix-networking environments, small messages far outnumber large messages. In fact, even though processing a large message requires more time, the large proportion of small messages causes the cumulative non-data-touching processing time to exceed the cumulative data-touching processing time. We show that it w ould be difficult to significantly reduce the a verage processing time of non-datatouching overheads, at lea...

show abstract

The importance of non-data touching processing overheads in TCP/IP

Kay

Pasquale

1993

108

View full text Add to dashboard Cite

We present detailed measurements of various processing overheads of the TCP/IP and UDP/IP protocol stacks on a DECstation 5000/200 running the Ultrix 4.2a operating system. These overheads include data-touching operations, such as the checksum computation and data movemen~which are well known to be major time consumers. In this stud y, we also considered overheads due to non-data touching operations, such as network buffer manipulation, protocol-specific processing, operating system functions, data structure manipulations (other than network buffers), and error checking. We show that when one considers realistic message size dktributions, where the majority of messages are small, the cumulative time consumed by the nondata touching overheads represents the majority of processing time. We assert that it will be difficult to significantly reduce the cumulative processing time due to non-data touching overheads.

show abstract

A summary of TCP/IP networking software performance for the DECstation 5000

Kay¹,

Pasquale²

1993

View full text Add to dashboard Cite

<title>Fast source-based dithering for networked digital video</title>

Nguyen

Kay

Pasquale

1994

View full text Add to dashboard Cite

Source-based dithering is a set of techniques designed to maximize the performance of real-time networked digital video systems that encode and decode video entirely in software. Usually frame grabber hardware presents frames in a 24 bit per pixel (bpp) format. However, most hosts are only equipped with single or eight bit deep displays and thus the color depth of the video must be reduced at some point. If the encoder reduces the color depth, the bandwidth required to carry the video on the network is lowered by a factor of 24 or 3 respectively, and the computational load is lightened on the receiving hosts. The color depth reduction algorithm must be efficient since the resulting frame rate, and thus the degree to which the illusion of motion is preserved, depends on how quickly a pixel can be processed. We use dithering algorithms chosen for efficiency and a contrast enhancement algorithm to improve image quality. INTRODUCTIONMost modern real-time networked digital video systems are either completely unable to keep up with real time video or require expensive special-purpose hardware in each video participant; thus, performance is very important to such systems. We discuss a series of technique, collectively called source-based dithering, designed to improve the performance of and minimize the loss of video quality in real-time networked digital video systems that encode and decode video in software.Usually frame grabber hardware presents individual images to a CPU in a relatively "deep" format such as "true color." The true color format typically requires at least 24 bits to represent each pixel: 8 bits for each of the red, green, and blue components. True color display hardware is expensive, so most hosts are only equipped to display images in an eight bit deep color format or a single bit deep monochrome format. The depth of a 24 bpp image must be reduced to either 8 bpp or 1 bpp for it to be displayed on such machines.The pixel depth could in theory be reduced at either the source or destination hosts. Depth reduction at the source carries the advantages that the bandwidth required to carry the video on the network is reduced and the processing burden is reduced on receiving hosts. The source host is somewhat compensated for the work of dithering by a reduction in volume of network output processing. Depending on whether the video is reduced to 8 bpp or to 1 bpp, this technique reduces bandwidth by a factor of either 3 or 24 respectively.It is important that the color depth reduction algorithm be efficient because the resulting frame rate, and thus the degree to which the illusion of motion is preserved, depends on how quickly a pixel can be processed. Thus, we use dithering algorithms chosen for efficiency. Dithering is a technique for reducing color depth by placing a small combination of pixels with different colors within a small neighborhood so that, from a distance, the combination looks like the original color. We use a contrast enhancement algorithm to enhance the resultant picture.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jonathan Kay

Design considerations for distributed caching on the Internet

Profiling and reducing processing overheads in TCP/IP

The importance of non-data touching processing overheads in TCP/IP

A summary of TCP/IP networking software performance for the DECstation 5000

<title>Fast source-based dithering for networked digital video</title>

Contact Info

Product

Resources

About