Lili Jiang scite author profile

Ideally, self-assembly should rapidly and efficiently produce stable correctly assembled structures. We study the tradeoff between enthalpic and entropic cost in self-assembling systems using RecA-mediated homology search as an example. Earlier work suggested that RecA searches could produce stable final structures with high stringency using a slow testing process that follows an initial rapid search of ~9–15 bases. In this work, we will show that as a result of entropic and enthalpic barriers, simultaneously testing all ~9–15 bases as separate individual units results in a longer overall searching time than testing them in groups and stages.

show abstract

KBot: A Knowledge Graph Based ChatBot for Natural Language Understanding Over Linked Data

Ait-Mlouk

Jiang

2020

IEEE Access

View full text Add to dashboard Cite

show abstract

Towards better process management in wastewater treatment plants: Process analytics based on SHAP values for tree-based machine learning methods

Wang

Thunéll²,

Lindberg³

et al. 2022

Journal of Environmental Management

View full text Add to dashboard Cite

Microarray Missing Value Imputation: A Regularized Local Learning Method

Wang

Chen

et al. 2019

IEEE/ACM Trans. Comput. Biol. and Bioinf.

View full text Add to dashboard Cite

Microarray experiments on gene expression inevitably generate missing values, which impedes further downstream biological analysis. Therefore, it is key to estimate the missing values accurately. Most of the existing imputation methods tend to suffer from the over-fitting problem. In this study, we propose two regularized local learning methods for microarray missing value imputation. Motivated by the grouping effect of L2 regularization, after selecting the target gene, we train an L2 Regularized Local Least Squares imputation model (RLLSimpute_L2) on the target gene and its neighbors to estimate the missing values of the target gene. Furthermore, RLLSimpute_L2 imputes the missing values in an ascending order based on the associated missing rate with each target gene. This contributes to fully utilizing the previously estimated values. Besides L2, we further explore L1 regularization and propose an L1 Regularized Local Least Squares imputation model (RLLSimpute_L1). To evaluate their effectiveness, we conducted extensive experimental studies on six benchmark datasets covering both time series and non-time series cases. Nine state-of-the-art imputation methods are compared with RLLSimpute_L2 and RLLSimpute_L1 in terms of three performance metrics. The comparative experimental results indicate that RLLSimpute_L2 outperforms its competitors by achieving smaller imputation errors and better structure preservation of differentially expressed genes.

show abstract

Weighted Rank-One Binary Matrix Factorization

Vaidya

Atluri

et al. 2011

View full text Add to dashboard Cite

Mining discrete patterns in binary data is important for many data analysis tasks, such as data sampling, compression, and clustering. An example is that replacing individual records with their patterns would greatly reduce data size and simplify subsequent data analysis tasks. As a straightforward approach, rank-one binary matrix approximation has been actively studied recently for mining discrete patterns from binary data. It factorizes a binary matrix into the multiplication of one binary pattern vector and one binary presence vector, while minimizing mismatching entries. However, this approach suffers from two serious problems. First, if all records are replaced with their respective patterns, the noise could make as much as 50% in the resulting approximate data. This is because the approach simply assumes that a pattern is present in a record as long as their matching entries are more than their mismatching entries. Second, two error types, 1-becoming-0 and 0-becoming-1, are treated evenly, while in many application domains they are discriminated. To address the two issues, we propose weighted rank-one binary matrix approximation. It enables the tradeoff between the accuracy and succinctness in approximate data and allows users to impose their personal preferences on the importance of different error types. The decision problem, however, as proved in the paper is NP-complete. To solve it, several different mathematical programming formulations are provided, from which 2-approximation algorithms are derived for some special cases. An adaptive tabu search heuristic is presented for solving the general problem, and our experimental study shows the effectiveness of the heuristic.

show abstract

Performance analysis of Hyperledger Fabric platform: A hierarchical model approach

Jiang

Chang

Liu

et al. 2020

Peer-to-Peer Netw. Appl.

View full text Add to dashboard Cite

ETNLP: A Visual-Aided Systematic Approach to Select Pre-Trained Embeddings for a Downstream Task

Vu¹,

Vu²,

Tran³

et al. 2019

View full text Add to dashboard Cite

Given many recent advanced embedding models, selecting pre-trained word embedding (a.k.a., word representation) models best fit for a specific downstream task is non-trivial. In this paper, we propose a systematic approach, called ETNLP, for extracting, evaluating, and visualizing multiple sets of pre-trained word embeddings to determine which embeddings should be used in a downstream task.We demonstrate the effectiveness of the proposed approach on our pre-trained word embedding models in Vietnamese to select which models are suitable for a named entity recognition (NER) task. Specifically, we create a large Vietnamese word analogy list to evaluate and select the pre-trained embedding models for the task. We then utilize the selected embeddings for the NER task and achieve the new state-of-the-art results on the task benchmark dataset. We also apply the approach to another downstream task of privacy-guaranteed embedding selection, and show that it helps users quickly select the most suitable embeddings. In addition, we create an open-source system using the proposed systematic approach to facilitate similar studies on other NLP tasks. The source code and data are available at https: //github.com/vietnlp/etnlp.

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Lili Jiang

A machine learning framework to improve effluent quality control in wastewater treatment plants

RecA-mediated sequence homology recognition as an example of how searching speed in self-assembly systems can be optimized by balancing entropic and enthalpic barriers

KBot: A Knowledge Graph Based ChatBot for Natural Language Understanding Over Linked Data

Towards better process management in wastewater treatment plants: Process analytics based on SHAP values for tree-based machine learning methods

Microarray Missing Value Imputation: A Regularized Local Learning Method

Weighted Rank-One Binary Matrix Factorization

Performance analysis of Hyperledger Fabric platform: A hierarchical model approach

ETNLP: A Visual-Aided Systematic Approach to Select Pre-Trained Embeddings for a Downstream Task

Contact Info

Product

Resources

About