Scaling up Open Tagging from Tens to Thousands: Comprehension Empowered Attribute Value Extraction from Product Title

Xu, Hao; Wang, Wenting; Mao, Xin; Jiang, Xinyu; Lan, Man

doi:10.18653/v1/p19-1514

Cited by 47 publications

(99 citation statements)

References 19 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…If we remove the distilled MLM and the no-answer classifier by setting both and to 0, our model degenerates to the standard question answering model with BERT [7]. If we further replace the BERT contextual layer of the QA component with the BiLSTM layer, our model is regressed to the sequence tagging model in [50]. Moreover, if we also remove the question (attribute) from the QA model, our model is degenerates to the attribute-dependent OpenTag method [54], which is not able to scale to large attribute set.…”

Section: Discussionmentioning

confidence: 99%

“…The most recent attribute value extraction model [50] employs two separate LSTM-based contextual layers for the context and the question respectively, followed by a cross-attention layer to join the outputs of the two layers. Different from them, we utilize one unique contextual encoder with self-attention mechanism developed in BERT [7].…”

Section: Contextual Layermentioning

confidence: 99%

“…The new concatenated embedding is then used for finding the best end index. This beginend dependency modeling is similar to the usage of CRF layer in the open tagging models [50,54].…”

Section: Output Layermentioning

confidence: 99%

“…Several other approaches such as [29,35] formulate the attribute value extraction as an instance of named entity recognition (NER) problem [30], and build extraction models to identify the entities/values from the input text. With the recent advance in natural language understanding, sequence tagging [50,54] based approaches have been proposed, which achieve promising results. However, these techniques suffer from two major limitations:…”

Section: Introductionmentioning

confidence: 99%

“…• Scalability -they do not scale to millions of attributes that are necessary for real world applications. For instance, the AliExpress taxonomy contains thousands of product categories, and a single category, Sports & Entertainment, has over 8.9k unique attributes [50]. Existing methods treat each attribute independently and build one separate model for each of them, which are not suitable for large scale attribute systems.…”

Section: Introductionmentioning

confidence: 99%

See 4 more Smart Citations

Learning to Extract Attribute Value from Product via Question Answering: A Multi-task Approach

Wang

Yang

Kanagal

et al. 2020

Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery &Amp; Data Mining

View full text Add to dashboard Cite

Attribute value extraction refers to the task of identifying values of an attribute of interest from product information. It is an important research topic which has been widely studied in e-Commerce and relation learning. There are two main limitations in existing attribute value extraction methods: scalability and generalizability. Most existing methods treat each attribute independently and build separate models for each of them, which are not suitable for large scale attribute systems in real-world applications. Moreover, very limited research has focused on generalizing extraction to new attributes. In this work, we propose a novel approach for Attribute Value Extraction via Question Answering (AVEQA) using a multi-task framework. In particular, we build a question answering model which treats each attribute as a question and identifies the answer span corresponding to the attribute value in the product context. A unique BERT contextual encoder is adopted and shared across all attributes to encode both the context and the question, which makes the model scalable. A distilled masked language model with knowledge distillation loss is introduced to improve the model generalization ability. In addition, we employ a no-answer classifier to explicitly handle the cases where there are no values for a given attribute in the product context. The question answering, distilled masked language model and the no answer classification are then combined into a unified multi-task framework. We conduct extensive experiments on a public dataset. The results demonstrate that the proposed approach outperforms several state-of-the-art methods with large margin.

show abstract

Section: Discussionmentioning

confidence: 99%

Section: Contextual Layermentioning

confidence: 99%

“…The new concatenated embedding is then used for finding the best end index. This beginend dependency modeling is similar to the usage of CRF layer in the open tagging models [50,54].…”

Section: Output Layermentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

Section: Introductionmentioning

confidence: 99%

See 3 more Smart Citations