Cloud computing can provide a more cost-effective way to deploy scientific workflows than traditional distributed computing environments such as cluster and grid. Due to the large size of scientific datasets, data placement plays an important role in scientific cloud workflow systems for improving system performance and reducing data transfer cost. Traditional tasklevel data placement strategy only considers shared datasets within individual workflows to reduce data transfer cost. However, it is obvious that task-level strategy is not necessarily good enough for the situation of multiple workflows at the workflow level. In this paper, a novel workflow-level data placement model is constructed, which regards multiple workflows as a whole. Then, a two-stage data placement strategy is proposed which first pre-allocates initial datasets to proper datacenters during workflow build-time stage, and then dynamically distributes newly generated datasets to appropriate datacenters during runtime stage. Both stages use an efficient discrete particle swarm optimization algorithm to place flexible-location datasets. Comprehensive experiments demonstrate that our workflow-level data placement strategy can be more cost-effective than its task-level counterpart for data-sharing scientific cloud workflows.
Named entity recognition (NER) is a typical sequential labeling problem that plays an important role in natural language processing (NLP) systems. In this paper, we discussed the details of applying a comprehensive model aggregating neural networks and conditional random field (CRF) on Chinese NER tasks, and how to discovery character level features when implement a NER system in word level. We compared the difference between Chinese and English when modeling the character embeddings. We developed a NER system based on our analysis, it works well on the ACE 2004 and SIGHAN bakeoff 2006 MSRA dataset, and doesn't rely on any gazetteers or handcraft features. We obtained F1 score of 82.3% on MSRA 2006.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.