Most state-of-the-art named entity recognition (NER) systems rely on handcrafted features and on the output of other NLP tasks such as part-of-speech (POS) tagging and text chunking. In this work we propose a language-independent NER system that uses automatically learned features only. Our approach is based on the CharWNN deep neural network, which uses word-level and character-level representations (embeddings) to perform sequential classification. We perform an extensive number of experiments using two annotated corpora in two different languages: HAREM I corpus, which contains texts in Portuguese; and the SPA CoNLL-2002 corpus, which contains texts in Spanish. Our experimental results give evidence of the contribution of neural character embeddings for NER. Moreover, we demonstrate that the same neural network which has been successfully applied to POS tagging can also achieve state-of-theart results for language-independet NER, using the same hyperparameters, and without any handcrafted features. For the HAREM I corpus, CharWNN outperforms the state-of-the-art system by 7.9 points in the F1-score for the total scenario (ten NE classes). For the SPA CoNLL-2002 corpus, CharWNN outperforms the state-ofthe-art system by 0.8 point in the F1.
The computational module of several MPEG-based video encoders, which includes the known algorithms of Discrete Cosine Transform, Hadamard Transform and Quantization, is widely used to identify and compress spatial redundancy in intra (raw input) or inter (computed residue) data pixel matrices. For some modern multimedia applications, like high definition (HD H.264/AVC) or scalable (H.264/SVC) encoder solutions, the demand for fast module implementations becomes critical. Practical experiments indicate that, inside a H.264 computational module, the quantization module normally represents a real bottleneck for fast hardware implementations.
Considering that we propose a complete integrated solution of H.264 computational module, which incorporates the direct and inverse algorithms of Discrete Cosine Transform, Hadamardand Quantization with minimal communication delays. Also in this paper it is presented a practical study, considering distinct levels of parallelism for the quantization to demonstrate its influence in order to optimize global encoder complexity and performance. All proposed alternatives were designed using hardware description language VHDL and implemented into commercial FPGA boards to obtain experimental results.
Text mining (TM) is a semi-automatized, multi-step process, able to turn unstructured into structured data. TM relevance has increased upon machine learning (ML) and deep learning (DL) algorithms’ application in its various steps. When applied to biomedical literature, text mining is named biomedical text mining and its specificity lies in both the type of analyzed documents and the language and concepts retrieved. The array of documents that can be used ranges from scientific literature to patents or clinical data, and the biomedical concepts often include, despite not being limited to genes, proteins, drugs, and diseases. This review aims to gather the leading tools for biomedical TM, summarily describing and systematizing them. We also surveyed several resources to compile the most valuable ones for each category.
Background
Blood cancers (BCs) are responsible for over 720 K yearly deaths worldwide. Their prevalence and mortality-rate uphold the relevance of research related to BCs. Despite the availability of different resources establishing Disease-Disease Associations (DDAs), the knowledge is scattered and not accessible in a straightforward way to the scientific community. Here, we propose SicknessMiner, a biomedical Text-Mining (TM) approach towards the centralization of DDAs. Our methodology encompasses Named Entity Recognition (NER) and Named Entity Normalization (NEN) steps, and the DDAs retrieved were compared to the DisGeNET resource for qualitative and quantitative comparison.
Results
We obtained the DDAs via co-mention using our SicknessMiner or gene- or variant-disease similarity on DisGeNET. SicknessMiner was able to retrieve around 92% of the DisGeNET results and nearly 15% of the SicknessMiner results were specific to our pipeline.
Conclusions
SicknessMiner is a valuable tool to extract disease-disease relationship from RAW input corpus.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.