Faster training of very deep networks via p-norm gates

Pham, Trang; Tran, Truyen; Phung, Dinh; Venkatesh, Svetha

doi:10.1109/icpr.2016.7900183

Cited by 15 publications

(16 citation statements)

References 17 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…Figure 2 shows the Long-Deep Recurrent Neural Network (LD-RNN) that we have designed for the story point prediction system. It is composed of four components arranged sequentially: (i) word embedding, (ii) document representation using Long Short-Term Memory (LSTM) [34], (iii) deep representation using Recurrent Highway Net (RHWN) [35]; and (iv) differentiable regression. Given a document which consists of a sequence of words s = (w 1 , w 2 , ..., w n ), e.g.…”

Section: Approachmentioning

confidence: 99%

See 1 more Smart Citation

A Deep Learning Model for Estimating Story Points

Choetkiertikul

Dam

Tran

et al. 2019

IIEEE Trans. Software Eng.

Self Cite

122

130

View full text Add to dashboard Cite

Although there has been substantial research in software analytics for effort estimation in traditional software projects, little work has been done for estimation in agile projects, especially estimating user stories or issues. Story points are the most common unit of measure used for estimating the effort involved in implementing a user story or resolving an issue. In this paper, we offer for the first time a comprehensive dataset for story points-based estimation that contains 23,313 issues from 16 open source projects. We also propose a prediction model for estimating story points based on a novel combination of two powerful deep learning architectures: long short-term memory and recurrent highway network. Our prediction system is endto-end trainable from raw input data to prediction outcomes without any manual feature engineering. An empirical evaluation demonstrates that our approach consistently outperforms three common effort estimation baselines and two alternatives in both Mean Absolute Error and the Standardized Accuracy.

show abstract

Section: Approachmentioning

confidence: 99%

“…This gating scheme is highly effective: while traditional deep neural nets cannot go beyond several layers, the Highway Net can have up to a thousand layers [41]. In previous work [35] we found that the operation in Eq. (2) can be repeated multiple times with exactly the same set of parameters.…”

Section: B Deep Representation Using Recurrent Highway Networkmentioning

confidence: 99%

A Deep Learning Model for Estimating Story Points

Choetkiertikul

Dam

Tran

et al. 2019

IIEEE Trans. Software Eng.

Self Cite

122

130

View full text Add to dashboard Cite

show abstract

“…Their model is composed of four components: (1) Word Embedding, (2) Document representation using Long-Short Term Memory (LSTM) [22], [23], (3) Deep representation using Recurrent Highway Network (RHWN) [24], and (4) Differentiable Regression.…”

Section: Deep-sementioning

confidence: 99%

Deep Learning for Agile Effort Estimation Have We Solved the Problem Yet?

Tawosi,

Moussa,

Sarro

2022

Preprint

View full text Add to dashboard Cite

In the last decade, several studies have proposed the use of automated techniques to estimate the effort of agile software development. In this paper we perform a close replication and extension of a seminal work proposing the use of Deep Learning for agile effort estimation (namely Deep-SE), which has set the state-of-the-art since. Specifically, we replicate three of the original research questions aiming at investigating the effectiveness of Deep-SE for both within-project and cross-project effort estimation. We benchmark Deep-SE against three baseline techniques (i.e., Random, Mean and Median effort prediction) and a previously proposed method to estimate agile software project development effort (dubbed TF/IDF-SE), as done in the original study. To this end, we use both the data from the original study and a new larger dataset of 31,960 issues, which we mined from 29 open-source projects. Using more data allows us to strengthen our confidence in the results and further mitigate the threat to the external validity of the study. We also extend the original study by investigating two additional research questions. One evaluates the accuracy of Deep-SE when the training set is augmented with issues from all other projects available in the repository at the time of estimation, and the other examines whether an expensive pre-training step used by the original Deep-SE, has any beneficial effect on its accuracy and convergence speed. The results of our replication show that Deep-SE outperforms the Median baseline estimator and TF/IDF-SE in only very few cases with statistical significance (8/42 and 9/32 cases, respectively), thus confounding previous findings on the efficacy of Deep-SE. The two additional RQs revealed that neither augmenting the training set nor pre-training Deep-SE play a role in improving its accuracy and convergence speed. These results suggest that using semantic similarity is not enough to differentiate user stories with respect to their story points; thus, future work has yet to explore and find new techniques and features that obtain accurate agile software development estimates.

show abstract

“…In case of recurrent nets, the query information is also propagated through the internal state of the controller. For simplicity, in this paper, we implement the controller and the memory updates using skip-connections [33], [34]…”

Section: Recurrent Skip-connectionsmentioning

confidence: 99%

Graph Memory Networks for Molecular Activity Prediction

Pham

Tran

Venkatesh

2018

2018 24th International Conference on Pattern Recognition (ICPR)

Self Cite

View full text Add to dashboard Cite

Molecular activity prediction is critical in drug design. Machine learning techniques such as kernel methods and random forests have been successful for this task. These models require fixed-size feature vectors as input while the molecules are variable in size and structure. As a result, fixed-size fingerprint representation is poor in handling substructures for large molecules. Here we approach the problem through deep neural networks as they are flexible in modeling structured data such as grids, sequences and graphs. We train multiple BioAssays using a multi-task learning framework, which combines information from multiple sources to improve the performance of prediction, especially on small datasets. We propose Graph Memory Network (GraphMem), a memory-augmented neural network to model the graph structure in molecules. GraphMem consists of a recurrent controller coupled with an external memory whose cells dynamically interact and change through a multi-hop reasoning process. Applied to the molecules, the dynamic interactions enable an iterative refinement of the representation of molecular graphs with multiple bond types. GraphMem is capable of jointly training on multiple datasets by using a specific-task query fed to the controller as an input. We demonstrate the effectiveness of the proposed model for separately and jointly training on more than 100K measurements, spanning across 9 BioAssay activity tests.

show abstract

Faster training of very deep networks via p-norm gates

Cited by 15 publications

References 17 publications

A Deep Learning Model for Estimating Story Points

A Deep Learning Model for Estimating Story Points

Deep Learning for Agile Effort Estimation Have We Solved the Problem Yet?

Graph Memory Networks for Molecular Activity Prediction

Contact Info

Product

Resources

About