Radio frequency interference (RFI) detection and excision is one of the key steps in the data processing pipeline of the Five-hundred-meter Aperture Spherical radio Telescope (FAST). The FAST telescope, due to its high sensitivity and large data rate, requires more accurate and efficient RFI flagging methods than its counterparts. In the last decades, approaches based upon artificial intelligence (AI), such as codes using Convolutional Neural Network (CNN), have been proposed to identify RFI more reliably and efficiently. However, RFI flagging of FAST data with such methods has often proved to be erroneous, with further manual inspections required. In addition, network construction as well as training dataset preparation for effective RFI flagging has imposed significant additional workloads. Therefore, rapid deployment and adjustment of AI approaches for different observations is impractical to implement with existing algorithms. To overcome such problems, we propose a model named RFI-Net. With the input of raw data without any processing, RFI-Net can detect RFI automatically, producing corresponding masks without any alteration of the original data. Experiments with RFI-Net using simulated astronomical data show that our model has outperformed existing methods in terms of both precision and recall. Besides, compared with other models, our method can obtain the same relative accuracy with less training data, thus saving effort and time required to prepare the training set. Further, the training process of RFI-Net can be accelerated, with overfittings being minimised, compared with other CNN codes. The performance of RFI-Net has also been evaluated with observing data obtained by FAST and Bleien Observatory. Our results demonstrate the ability of RFI-Net to accurately identify RFI with fine-grained, high-precision masks that required no further modification.
We present an analysis of meteorological data from the second generation of the Kunlun Automated Weather Station (KLAWS-2G) at Dome A, Antarctica during 2015 and 2016. We find that a strong temperature inversion exists for all the elevations up to 14 m that KLAWS-2G can reach, and lasts for more than 10 hours for 50% or more of the time when temperature inversion occurs. The average wind speeds at 4 m elevation are 4.2 m s −1 and 3.8 m s −1 during 2015 and 2016, respectively. The strong temperature inversion and moderate wind speed lead to a shallow turbulent boundary layer height at Dome A. By analyzing the temperature and wind shear profiles, we note telescopes should be elevated by at least 8 m above the ice. We also find that the duration of temperature inversions, and the wind speed, vary considerably from year to year. Therefore, long-term and continuous data are still needed for the site survey at Dome A.
With the arrival of cloud computing and Big Data, many scientific applications with large amount of data can be abstracted as scientific workflows and running on a cloud environment. Distributing these datasets intelligently can decrease data transfers efficiently during the workflow's execution. In this paper, we proposed a 2-stage data placement strategy. In the initial stage, we cluster the datasets based on their correlation, and allocate these clusters onto data centers. Compared with existing works, we have incorporated the data size into correlation calculation, and have proposed a new type of data correlation for the intermediate data named "the first order conduction correlation". Hence the data transmission cost can be measured more reasonable. In the runtime stage, the re-distribution algorithm can adjust data layout according to the changed factors, and the overhead of re-layout itself has also been measured. Compared with previous work, simulation results show that our proposed strategy can effectively reduce the time consumption of data movements during the workflow execution.
Background:The multiple sequence alignment (MSA) is a classic and powerful technique for sequence analysis in bioinformatics. With the rapid growth of biological datasets, MSA parallelization becomes necessary to keep its running time in an acceptable level. Although there are a lot of work on MSA problems, their approaches are either insufficient or contain some implicit assumptions that limit the generality of usage. First, the information of users' sequences, including the sizes of datasets and the lengths of sequences, can be of arbitrary values and are generally unknown before submitted, which are unfortunately ignored by previous work. Second, the center star strategy is suited for aligning similar sequences. But its first stage, center sequence selection, is highly time-consuming and requires further optimization. Moreover, given the heterogeneous CPU/GPU platform, prior studies consider the MSA parallelization on GPU devices only, making the CPUs idle during the computation. Co-run computation, however, can maximize the utilization of the computing resources by enabling the workload computation on both CPU and GPU simultaneously. Results: This paper presents CMSA, a robust and efficient MSA system for large-scale datasets on the heterogeneous CPU/GPU platform. It performs and optimizes multiple sequence alignment automatically for users' submitted sequences without any assumptions. CMSA adopts the co-run computation model so that both CPU and GPU devices are fully utilized. Moreover, CMSA proposes an improved center star strategy that reduces the time complexity of its center sequence selection process from O(mn 2 ) to O(mn). The experimental results show that CMSA achieves an up to 11× speedup and outperforms the state-of-the-art software. Conclusion: CMSA focuses on the multiple similar RNA/DNA sequence alignment and proposes a novel bitmap based algorithm to improve the center star strategy. We can conclude that harvesting the high performance of modern GPU is a promising approach to accelerate multiple sequence alignment. Besides, adopting the co-run computation model can maximize the entire system utilization significantly. The source code is available at https:// github.com/wangvsa/CMSA.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.