Cluster ensembles have recently emerged as a powerful alternative to standard cluster analysis, aggregating several input data clusterings to generate a single output clustering, with improved robustness and stability. From the early work, these techniques held great promise; however, most of them generate the final solution based on incomplete information of a cluster ensemble. The underlying ensemble-information matrix reflects only cluster-data point relations, while those among clusters are generally overlooked. This paper presents a new link-based approach to improve the conventional matrix. It achieves this using the similarity between clusters that are estimated from a link network model of the ensemble. In particular, three new link-based algorithms are proposed for the underlying similarity assessment. The final clustering result is generated from the refined matrix using two different consensus functions of feature-based and graph-based partitioning. This approach is the first to address and explicitly employ the relationship between input partitions, which has not been emphasized by recent studies of matrix refinement. The effectiveness of the link-based approach is empirically demonstrated over 10 data sets (synthetic and real) and three benchmark evaluation measures. The results suggest the new approach is able to efficiently extract information embedded in the input clusterings, and regularly illustrate higher clustering quality in comparison to several state-of-the-art techniques.
Presentation and applied case study of a system-wide workflow which supports rapid, systematic and efficient continuous seeded cooling crystallisation process design, with the aim to deliver a robust, consistent process with tight control of particle attributes.
The lack of a commercial laboratory, pilot and small manufacturing scale dead end continuous filtration and drying unit it is a significant gap in the development of continuous pharmaceutical manufacturing processes for new active pharmaceutical ingredients (APIs). To move small-scale pharmaceutical isolation forward from traditional batch Nutsche filtration to continuous processing a continuous filter dryer prototype unit (CFD20) was developed in collaboration with Alconbury Weston Ltd. The performance of the prototype was evaluated by comparison with manual best practice exemplified using a modified Biotage VacMaster unit to gather data and process understanding for API filtration and washing. The ultimate objective was to link the chemical and physical attributes of an API slurry with equipment and processing parameters to improve API isolation processes. Filtration performance was characterized by assessing filtrate flow rate by application of Darcy's law, the impact on product crystal size distribution and product purity were investigated using classical analytical methods. The overall performance of the 2 units was similar, showing that the prototype CFD20 can match best manual lab practice for filtration and washing while allowing continuous processing and real-time data logging. This result is encouraging and the data gathered provides further insight to inform the development of CFD20.
A key challenge during the transition from laboratory/small batch to continuous manufacturing is the development of a process strategy that can easily be adopted for a larger batch/continuous process. Industrial practice is to develop the isolation strategy for a new drug/process in batch using the design of experiment (DoE) approach to determine the best isolation conditions and then transfer the isolation parameters selected to a large batch equipment/continuous isolation process. This stage requires a series of extra investigations to evaluate the effect of different equipment geometry or even the adaptation of the parameters selected to a different isolation mechanism (e.g., from dead end to cross flow filtration) with a consequent increase of R&D cost and time along with an increase in material consumption. The CFD25 is an isolation device used in the first instance to develop an isolation strategy in batch (optimization mode) using a screening DoE approach and to then verify the transferability of the strategy to a semicontinuous process (production mode). A d-optimal screening DoE was used to determine the effect of varying the input slurry. Properties such as solid loading, particle size distribution, and crystallization solvent were investigated to determine their impact on the filtration and washing performance and the characteristics of the dry isolated product. A series of crystallization (ethanol, isopropanol, and 3-methylbutan-1-ol) and wash solvents (n-heptane, isopropyl acetate and n-dodcane) were used for the process. To mimic a real isolation process, paracetamol-related impurities, acetanilide and metacetamol, were dissolved in the mother liquor. The selected batch isolation strategy was used for the semicontinuous isolation run. Throughput and filtration parameters, such as cake resistance and flow rate, cake residual liquid content and composition, cake purity, particle−particle aggregation, and extent and strength of agglomerates, were measured to evaluate the consistency of the isolated product produced during a continuous experiment and compared with the isolated product properties obtained during the batch process development. Overall, the CFD25 is a versatile tool which allows both new chemical entity process development in batch and the production of the active pharmaceutical ingredient in semicontinuous mode using the same process parameters without changing equipment. The isolated product properties gained during the semicontinuous run are overall comparable between samples. The residual solvent content and composition differs between some samples due to filter plate blockage. In general, the mean properties obtained during semicontinuous running are comparable with the product properties simulated using the DoE.
In this work, we present a microfluidic approach that allows performing nucleation studies under different fluid dynamic conditions. We determine primary nucleation rates and nucleation kinetic parameters for adipic acid solutions by using liquid/liquid segmented flow in capillary tubes in which the crystallizing medium is partitioned into small droplets. We do so by measuring the probability of crystal presence within individual droplets under stagnant (motionless droplets) and flow (moving droplets) conditions as a function of time, droplet volume, and supersaturation. Comparing the results of the experiments with the predictions of the classical nucleation theory model and of the mononuclear nucleation mechanism model, we conclude that adipic acid nucleates mainly via a heterogeneous mechanism under both fluid dynamic conditions. Furthermore, we show that the flow conditions enhance the primary nucleation rate by increasing the kinetic parameters of the process without affecting the thermodynamic parameters. In this regard, a possible mechanism is discussed on the basis of the enhancement of the attachment frequency of nucleation caused by the internal recirculation that occurs within moving droplets.
This chapter discusses the fundamental aspects of nucleation and particle formation in the continuous crystallization context, with a main focus on nucleation and crystal growth. The classic crystallization phenomena, fundamental, thermodynamic or kinetic driven, will not be covered here. The primary and secondary nucleation, and some methods of nuclei generation in continuous crystallization are discussed in this chapter. The performance of continuous crystallization processes and challenges with process kinetics and control (such as seeding, mixing, and process dynamics) are also addressed in this chapter.
Although attempts have been made to solve the problem of clustering categorical data via cluster ensembles, with the results being competitive to conventional algorithms, it is observed that these techniques unfortunately generate a final data partition based on incomplete information. The underlying ensemble-information matrix presents only cluster-data point relations, with many entries being left unknown. The paper presents an analysis that suggests this problem degrades the quality of the clustering result, and it presents a new link-based approach, which improves the conventional matrix by discovering unknown entries through similarity between clusters in an ensemble. In particular, an efficient link-based algorithm is proposed for the underlying similarity assessment. Afterward, to obtain the final clustering result, a graph partitioning technique is applied to a weighted bipartite graph that is formulated from the refined matrix. Experimental results on multiple real data sets suggest that the proposed link-based method almost always outperforms both conventional clustering algorithms for categorical data and well-known cluster ensemble techniques.Peer reviewe
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.