Over the past decade, emoji have emerged as a new and widespread form of digital communication, spanning diverse social networks and spoken languages. We propose to treat these ideograms as a new modality in their own right, distinct in their semantic structure from both the text in which they are often embedded as well as the images which they resemble. As a new modality, emoji present rich novel possibilities for representation and interaction. In this paper, we explore the challenges that arise naturally from considering the emoji modality through the lens of multimedia research. Specifically, the ways in which emoji can be related to other common modalities such as text and images. To do so, we first present a large scale dataset of real-world emoji usage collected from Twitter. This dataset contains examples of both text-emoji and image-emoji relationships. We present baseline results on the challenge of predicting emoji from both text and images, using state-of-the-art neural networks. Further, we offer a first consideration into the problem of how to account for new, unseen emoji -a relevant issue as the emoji vocabulary continues to expand on a yearly basis. Finally, we present results for multimedia retrieval using emoji as queries.
The sparse coding hypothesis has generated significant interest in the computational and theoretical neuroscience communities, but there remain open questions about the exact quantitative form of the sparsity penalty and the implementation of such a coding rule in neurally plausible architectures. The main contribution of this work is to show that a wide variety of sparsity-based probabilistic inference problems proposed in the signal processing and statistics literatures can be implemented exactly in the common network architecture known as the locally competitive algorithm (LCA). Among the cost functions we examine are approximate ℓp norms (0 ≤ p ≤ 2), modified ℓp-norms, block-ℓ1 norms, and reweighted algorithms. Of particular interest is that we show significantly increased performance in reweighted ℓ1 algorithms by inferring all parameters jointly in a dynamical system rather than using an iterative approach native to digital computational architectures.
Computer vision and human-powered services can provide blind people access to visual information in the world around them, but their efficacy is dependent on high-quality photo inputs. Blind people often have difficulty capturing the information necessary for these applications to work because they cannot see what they are taking a picture of. In this paper, we present Scan Search, a mobile application that offers a new way for blind people to take high-quality photos to support recognition tasks. To support realtime scanning of objects, we developed a key frame extraction algorithm that automatically retrieves high-quality frames from continuous camera video stream of mobile phones. Those key frames are streamed to a cloud-based recognition engine that identifies the most significant object inside the picture. This way, blind users can scan for objects of interest and hear potential results in real time. We also present a study exploring the tradeoffs in how many photos are sent, and conduct a user study with 8 blind participants that compares Scan Search with a standard photo-snapping interface. Our results show that Scan Search allows users to capture objects of interest more efficiently and is preferred by users to the standard interface.
Non-smooth convex optimization programs such as L1 minimization produce state-of-the-art results in many signal and image processing applications. Despite the progress in algorithms to solve these programs, they are still too computationally expensive for many real-time applications. Using recent results describing dynamical systems that efficiently solve these types of programs, we demonstrate through simulation that custom analog ICs implementations of this system could potentially perform compressed sensing recovery for real time applications approaching 500 kHz. Furthermore, we show that this architecture can implement several other optimization programs of recent interest, including Smoothly Clipped Absolute Deviations and group L1 minimization.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.