Abstract:This paper presents ongoing research in clinical information extraction. This work introduces a new genre of text which are not well-written, noise prone, ungrammatical and with much cryptic content. A corpus of clinical progress notes drawn form an Intensive Care Service has been manually annotated with more than 15000 clinical named entities in 11 entity types. This paper reports on the challenges involved in creating the annotation schema, and recognising and annotating clinical named entities. The informat… Show more
“…인식에 관한 연구를 하고 있다 [3,4]. 사전 조사 자질로는 용어 사전, 불용어, 약어, 조직명, 부처명, 항공사, 교육 기관 등에 서 이미 목록화된 자 질을 사용한다. <true, 3, "the">, <false, 9, "president">, <false, 2, "of">, <true, 5, "apple">, <false, 4, "eats">, <false, 2, "an">, <false, 5, "apple"> …”
Section: 으로써 검색의 정확도를 높이는 등 다양한 분야에서 개체명unclassified
Named entity recognition is required to improve the retrieval accuracy of patent documents or similar patents in the claims and patent descriptions. In this paper, we proposed an automatic named entity recognition for patents by using a conditional random field that is one of the best methods in machine learning research. Named entity recognition system has been constructed from the training set of tagged corpus with 660,000 words and 70,000 words are used as a test set for evaluation. The experiment shows that the accuracy is 93.6% and the Kappa coefficient is 0.67 between manual tagging and automatic tagging system. This figure is better than the Kappa coefficient 0.6 for manually tagged results and it shows that automatic named entity tagging system can be used as a practical tagging for patent documents in replacement of a manual tagging.
“…인식에 관한 연구를 하고 있다 [3,4]. 사전 조사 자질로는 용어 사전, 불용어, 약어, 조직명, 부처명, 항공사, 교육 기관 등에 서 이미 목록화된 자질을 사용한다. <true, 3, "the">, <false, 9, "president">, <false, 2, "of">, <true, 5, "apple">, <false, 4, "eats">, <false, 2, "an">, <false, 5, "apple"> …”
Section: 으로써 검색의 정확도를 높이는 등 다양한 분야에서 개체명unclassified
Named entity recognition is required to improve the retrieval accuracy of patent documents or similar patents in the claims and patent descriptions. In this paper, we proposed an automatic named entity recognition for patents by using a conditional random field that is one of the best methods in machine learning research. Named entity recognition system has been constructed from the training set of tagged corpus with 660,000 words and 70,000 words are used as a test set for evaluation. The experiment shows that the accuracy is 93.6% and the Kappa coefficient is 0.67 between manual tagging and automatic tagging system. This figure is better than the Kappa coefficient 0.6 for manually tagged results and it shows that automatic named entity tagging system can be used as a practical tagging for patent documents in replacement of a manual tagging.
“…This phenomenon is quite common in many domains (Alex et al, 2007;Byrne, 2007;Wang, 2009;Màrquez et al, 2007). However, much of the work on NER copes only with non-nested entities which are also called flat entities and neglects nested entities.…”
Entity mentions embedded in longer entity mentions are referred to as nested entities. Most named entity recognition (NER) systems deal only with the flat entities and ignore the inner nested ones, which fails to capture finer-grained semantic information in underlying texts. To address this issue, we propose a novel neural model to identify nested entities by dynamically stacking flat NER layers. Each flat NER layer is based on the state-ofthe-art flat NER model that captures sequential context representation with bidirectional long short-term memory (LSTM) layer and feeds it to the cascaded CRF layer. Our model merges the output of the LSTM layer in the current flat NER layer to build new representation for detected entities and subsequently feeds them into the next flat NER layer. This allows our model to extract outer entities by taking full advantage of information encoded in their corresponding inner entities, in an inside-to-outside way. Our model dynamically stacks the flat NER layers until no outer entities are extracted. Extensive evaluation shows that our dynamic model outperforms state-ofthe-art feature-based systems on nested NER, achieving 74.7% and 72.2% on GENIA and ACE2005 datasets, respectively, in terms of Fscore.
“…Ideal annotation should be accurate, thus requiring intensive knowledge and context awareness, and it should be automatic at the same time, since expert work is time consuming. Many efforts have been made in this field, from named entity recognition (NER) to information extraction (Ciravegna et al, 2004;Kiryakov et al, 2004), both in open domain (Uren et al, 2006;Cucerzan, 2007;Mihalcea and Csomai, 2007) and particular domains (Wang, 2009;Liu et al, 2011). Most cases of NER or information extraction focus on a small set of categories to be annotated, such as Person, Location, Organization, Misc, etc.…”
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.