With the success of deep learning in a wide variety of areas, many deep multi-task learning (MTL) models have been proposed claiming improvements in performance obtained by sharing the learned structure across several related tasks. However, the dynamics of multi-task learning in deep neural networks is still not well understood at either the theoretical or experimental level. In particular, the usefulness of different task pairs is not known a priori. Practically, this means that properly combining the losses of different tasks becomes a critical issue in multi-task learning, as different methods may yield different results. In this paper, we benchmarked different multi-task learning approaches using shared trunk with task specific branches architecture across three different MTL datasets. For the first dataset, i.e. Multi-MNIST (Modified National Institute of Standards and Technology database), we thoroughly tested several weighting strategies, including simply adding task-specific cost functions together, dynamic weight average (DWA) and uncertainty weighting methods each with various amounts of training data per-task. We find that multitask learning typically does not improve performance for a user-defined combination of tasks. Further experiments evaluated on diverse tasks and network architectures on various datasets suggested that multitask learning requires careful selection of both task pairs and weighting strategies to equal or exceed the performance of single task learning.INDEX TERMS Dynamic weighting average, multi-MNIST, multi-objective optimization, multi-task learning, uncertainty weighting.
Process variations in advanced CMOS process nodes limit the benefits of scaling for analog designs. In the presence of increasing random intra-die variations, mismatch becomes a significant design challenge in circuits such as comparators. In this paper we describe and demonstrate the details of a statistical element selection (SES) methodology that relies on choosing a subset of selectable circuit elements (e.g., input transistors in a comparator) to achieve the desired specification (e.g., offset). Silicon results from a 65nm test chip demonstrate that SES can achieve an order of magnitude better matching than both redundancy and Pelgrom-model sizing given the same core circuit area. I. INTRODUCTIONContinuous advancement of CMOS process technology over the past four decades has made inexpensive integrated circuit products with significant processing capabilities an everyday reality. Cost pressures have resulted in substantial integration of analog and digital blocks on the same die, forcing analog designers to adapt to processes that were built for digital systems [1,2]. As we rapidly approach the physical limits of scaling, one of the major challenges for analog circuits has been to ensure consistently high yield in the presence of increasing variability in these nanoscale CMOS processes.In this paper we explore the benefits of a statistical element selection (SES) methodology for analog circuits that is based on post-manufacturing tuning to accommodate large-scale process variations [3,4]. SES exploits inherent random variations to improve the matching of transistors and to increase yield for matching-critical circuits such as comparators. A subset of k elements is selected among an identically laid out set of N elements to provide the best matching performance. As the number of available subsets among a set of N elements increases exponentially (2 N -1), it is possible to achieve impressive matching performance with near-minimum size unit elements. The elements might be individual transistors, pairs of transistors, or passive components.We present a general methodology to determine the appropriate (N, k) numbers and the size of the unit element to ensure that a desired matching specification is met. We present measurement results from a 65nm CMOS test chip using SESbased comparators to validate the model predictions. The generalized methodology can be used for a wide variety of analog circuits-such as current sources, differential amplifiers and comparators-that rely on precise matching of components.
Peripheral I/O data-rates for PCs and mobile computing platforms continue to scale to meet high-bandwidth applications including high-resolution displays and large-capacity external storage. The bandwidth requirements will soon exceed the data-rates of current standards such as PCI Express and USB. A lowpower low-cost serial link is needed for the next-generation peripheral interface that can scale to 32Gb/s per lane. Recent publications have demonstrated 28 to 32Gb/s rates [1][2]. However, the circuit power and channel characteristics are not suitable for mainstream PC and mobile markets. A low-profile connector and cable assembly prototype is developed for these markets, where the link architecture and design are optimized for the channel characteristics. This paper describes a data-rate-scalable 32Gb/s serial link that features a bidirectional transceiver, source-series terminated (SST) 3-tap FFE, a continuous-time linear equalizer (CTLE) with an active inductor, a 6-tap DFE, and clock calibration and adaptation circuitry. Figure 26.2.1(a) illustrates the interconnect topology. The primary components are PCBs, packages, connectors and cable. The 8-lane cable assembly consists of consumer-grade 32AWG shielded twisted pair copper wires. The height and width of the connector are 3mm and 13mm, respectively. The bidirectional transceiver architecture is shown in Fig. 26.2.1(b). An LC-VCO-based PLL generates a quarter-rate clock that is distributed to each transceiver lane using regulated CMOS buffers similar to [3]. In order to save power, the transceivers are grouped into 4 lanes to form a bundle. The DLL and clock multiplier are common to the 4 lanes and are included in the bundle clock circuitry. The TX is based on a half-rate clock architecture, while the RX uses a quarter-rate clock architecture. After the DLL, the quarter-rate clock is multiplied to half-rate and distributed locally to 4 TXs. The output stage is a low-swing segmented SST driver using NMOS devices for channel termination. 3-tap TX pre-emphasis is implemented using switching devices to short differential outputs similar to [4]. The RX front-end consists of a CTLE and 6-tap DFE. The CTLE has an active inductor to provide up to 4dB of peaking while minimizing silicon area [5]. Other than the LC-VCO, no passive inductors are used for bandwidth extension or I/Opad-capacitance reduction. A quadrature clock generator (Q-Gen) provides I/Q clocks for the 4-way interleaved DFE. Across <12dB loss channels and up to 12Gb/s, a separate 2-way interleaved low-power (LP) latch with 2× oversampled CDR is used to reduce power consumption by disabling the CTLE and DFE. There are two separate CDR logic blocks, one in the lane and one in the bundle, that independently control the quadrature phase interpolator (PI). The bundle CDR aggregates the phase-recovery information from all 4 lanes.To minimize the area and pad capacitance of the bidirectional transceiver, the SST TX can be configured to be RX termination as shown in Fig. 26.2.2. When functioning as an output...
This paper details the design of an 8-lane bidirectional link for both within-the-box and external communications in 22 nm CMOS technology. A low profile connector with a high density cable assembly ensure a data rate of up to 32 Gb/s per lane while maintaining channel loss below 25 dB. Channel equalization is performed by a combination of a 3-tap feed-forward equalizer (FFE), single-stage continuous-time linear equalizer (CTLE) and a 6-tap decision-feedback equalizer (DFE). Collaborative timing recovery is used to enable lane characterization without degrading jitter performance. Phase error decimation, with a conditional phase detection scheme, is used to reduce the DFE complexity by 50%. Power consumption over a wide range of data rates from 4 to 32 Gb/s is reduced by using regulated CMOS clocking with lane bundling, low swing transmitter with a source-series terminated (SST) driver and a highly reconfigurable receiver with an active inductor CTLE. At a lane data rate of 32 Gb/s, over a 0.5 m cable with 16 dB of loss, a transceiver lane consumes 205 mW from a 1.07 V supply. The power scales down to 26 mW from a 0.72 V supply at 8 Gb/s, when transmitting over a channel with 8 dB loss. The active silicon area per lane is 0.079 mm .Index Terms-Active inductor CTLE, bidirectional link, collaborative CDR, conditional phase detection, decision-feedback equalizer (DFE), phase error decimation, regulated CMOS clocking, source-series terminated (SST) driver.
Accents mismatching is a critical problem for end-to-end ASR. This paper aims to address this problem by building an accent-robust RNN-T system with domain adversarial training (DAT). We unveil the magic behind DAT and provide, for the first time, a theoretical guarantee that DAT learns accentinvariant representations. We also prove that performing the gradient reversal in DAT is equivalent to minimizing the Jensen-Shannon divergence between domain output distributions. Motivated by the proof of equivalence, we introduce reDAT, a novel technique based on DAT, which relabels data using either unsupervised clustering or soft labels. Experiments on 23K hours of multi-accent data show that DAT achieves competitive results over accent-specific baselines on both native and non-native English accents but up to 13% relative WER reduction on unseen accents; our reDAT yields further improvements over DAT by 3% and 8% relatively on non-native accents of American and British English.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.