Jeffrey Helt scite author profile

Jeffrey Helt

5Publications

18Citation Statements Received

264Citation Statements Given

How they've been cited

How they cite others

148

264

Affiliations

Princeton University

Publications

Order By: Most citations

Learning to Associate Words and Images Using a Large-Scale Graph

Ya¹,

Sun²,

Helt³

et al. 2017

View full text Add to dashboard Cite

We develop an approach for unsupervised learning of associations between co-occurring perceptual events using a large graph. We applied this approach to successfully solve the image captcha of China's railroad system. The approach is based on the principle of suspicious coincidence, originally proposed by Barlow [1], who argued that the brain builds a statistical model of the world by learning associations between events that repeatedly co-occur. In this particular problem, a user is presented with a deformed picture of a Chinese phrase and eight low-resolution images. They must quickly select the relevant images in order to purchase their train tickets. This problem presents several challenges: (1) the teaching labels for both the Chinese phrases and the images were not available for supervised learning, (2) no pre-trained deep convolutional neural networks are available for recognizing these Chinese phrases or the presented images, and (3) each captcha must be solved within a few seconds. We collected 2.6 million captchas, with 2.6 million deformed Chinese phrases and over 21 million images. From these data, we constructed an association graph, composed of over 6 million vertices, and linked these vertices based on co-occurrence information and feature similarity between pairs of images. We then trained a deep convolutional neural network to learn a projection of the Chinese phrases onto a 230dimensional latent space. Using label propagation, we computed the likelihood of each of the eight images conditioned on the latent space projection of the deformed phrase for each captcha. The resulting system solved captchas with 77% accuracy in 2 seconds on average. Our work, in answering this practical challenge, illustrates the power of this class of unsupervised association learning techniques, which may be related to the brain's general strategy for associating language stimuli with visual objects on the principle of suspicious coincidence.

show abstract

Regular Sequential Serializability and Regular Sequential Consistency

Helt

Burke

Levy

et al. 2021

View full text Add to dashboard Cite

Strictly serializable (linearizable) services appear to execute transactions (operations) sequentially, in an order consistent with real time. This restricts a transaction's (operation's) possible return values and in turn, simplifies application programming. In exchange, strictly serializable (linearizable) services perform worse than those with weaker consistency. Switching to such services, however, can break applications.This work introduces two new consistency models to ease this trade-off: regular sequential serializability (RSS) and regular sequential consistency (RSC). They are just as "strong" for applications; we prove any application invariant that holds when using a strictly serializable (linearizable) service also holds when using an RSS (RSC) service. Yet they are "weaker" for services; they allow new, better-performing designs. To demonstrate this, we design, implement, and evaluate variants of two systems, Spanner and Gryff, weakening their consistency to RSS and RSC, respectively. The new variants achieve better read-only transaction and read tail latency than their counterparts. *CCS Concepts: • Information systems → Parallel and distributed DBMSs; Distributed database transactions.

show abstract

Sandpaper

Helt

Feng

Seshan

et al. 2019

View full text Add to dashboard Cite

Modern content delivery networks (CDNs) allow their customers (i.e., operators of web services) to customize the processing of requests by uploading and executing code at the edges of the CDN's network. To achieve scale, CDNs have forgone heavyweight virtualization techniques. Instead, all requests often execute within the same OS or even process. However, performance interference may arise when these requests have differing demands on multiple system resources. In this paper, we study the sources of performance interference based on workloads from real customers, identify the lack of multi-resource fairness as the culprit, and show that existing schedulers available in commodity OSs are insufficient to enforce fairness between customers.We then design Sandpaper, a new and practical multiresource request scheduler for mitigating performance interference in CDN edge environments. Sandpaper enforces fairness despite constraints, such as sitting within the application runtime and running atop the OS's underlying resource schedulers. By leveraging key insights about the differences between theoretical system models and real systems, Sandpaper bridges the trade-off between resource utilization and multi-resource fairness that plagues existing schedulers. We implement Sandpaper atop Varnish, an open-source CDN edge proxy, and show that it mitigates performance interference while maintaining high resource utilization and with little performance overhead.

show abstract

Morty: Scaling Concurrency Control with Re-Execution

Burke

Suri-Payer

Helt

et al. 2023

View full text Add to dashboard Cite

C5: Cloned Concurrency Control that Always Keeps Up

Helt¹,

Sharma²,

Abadi³

et al. 2022

Preprint

View full text Add to dashboard Cite

Asynchronously replicated primary-backup databases are commonly deployed to improve availability and offload read-only transactions. To both apply replicated writes from the primary and serve read-only transactions, the backups implement a cloned concurrency control protocol. The protocol ensures read-only transactions always return a snapshot of state that previously existed on the primary. This compels the backup to exactly copy the commit order resulting from the primary's concurrency control. Existing cloned concurrency control protocols guarantee this by limiting the backup's parallelism. As a result, the primary's concurrency control executes some workloads with more parallelism than these protocols. In this paper, we prove that this parallelism gap leads to unbounded replication lag, where writes can take arbitrarily long to replicate to the backup and which has led to catastrophic failures in production systems. We then design C5, the first cloned concurrency protocol to provide bounded replication lag. We implement two versions of C5: Our evaluation in MyRocks, a widely deployed database, demonstrates C5 provides bounded replication lag. Our evaluation in Cicada, a recent in-memory database, demonstrates C5 keeps up with even the fastest of primaries.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

customersupport@researchsolutions.com

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Jeffrey Helt

Learning to Associate Words and Images Using a Large-Scale Graph

Regular Sequential Serializability and Regular Sequential Consistency

Sandpaper

Morty: Scaling Concurrency Control with Re-Execution

C5: Cloned Concurrency Control that Always Keeps Up

Contact Info

Product

Resources

About