Cluster Analysis in Practice: Dealing with Outliers in Managerial Research

Marques

et al. 2022

Rev. contab. finanç.

This article aimed to investigate the influence of organizational configurations on startup performance. The gap addressed by the article involved analyzing factors simultaneously, considering the possibility of equifinality with regard to the understanding about startup performance. A survey was conducted of 112 southern Brazilian startups. To compose the configurations, the cluster analysis technique was used. The chi-squared and covariance analysis (ANCOVA) tests were used to identify the effect of organizational configurations on startup performance. The results reinforced the assumptions of the configurational approach, highlighting the relationship of interdependence of imperatives in explaining organizational performance. The main distinctive characteristics of the three startup configurations found were: size; characteristics of the information from the management control system (MCS); entrepreneurial orientation (EO); cost leadership strategy (CLS); acceleration; and entrepreneurial source of investment (ESI). The results showed that differences in the characteristics of the information from the MCS and in the level of EO represent a deviation from the ideal configuration and are related with a drop in performance. The paper extends the knowledge on the imperatives investigated for the context of startups and on how these interact to compose the configurations. The results were shown to be relevant in explaining performance, corroborating the idea of equifinality, in which two distinct configurations presented similar performance. By analyzing the configurations that presented the best performance, managers can evaluate in which configuration they find themselves so as to guide actions to improve the startup success rate.

Section: Discussionmentioning

confidence: 99%

Influence of organizational configurations on startup performance

Marques

et al. 2022

Rev. contab. finanç.

“…K-means clustering is a popular method; however, we should perhaps experiment using the k-medoids clustering method. k-medoids is a partitioning method that is best suited for domains requiring robustness to outliers, inconsistent distance metrics, or the dataset with no clear definition of mean or median [53]. The k-medoids algorithm returns medoids which are the actual data points in the dataset.…”

Section: Limitationsmentioning

confidence: 99%

Data-driven versus a domain-led approach to k-means clustering on an open heart failure dataset

Jasinska-Piadlo

Bond

Biglarbeigi

et al. 2022

Int J Data Sci Anal

Domain-driven data mining of health care data poses unique challenges. The aim of this paper is to explore the advantages and the challenges of a ‘domain-led approach’ versus a data-driven approach to a k-means clustering experiment. For the purpose of this experiment, clinical experts in heart failure selected variables to be used during the k-means clustering, whilst during the ‘data-driven approach’ feature selection was performed by applying principal component analysis to the multidimensional dataset. Six out of seven features selected by physicians were amongst 26 features that contributed most to the significant principal components within the k-means algorithm. The data-driven approach showed advantage over the domain-led approach for feature selection by removing the risk of bias that can be introduced by domain experts. Whilst the ‘domain-led approach’ may potentially prohibit knowledge discovery that can be hidden behind variables not routinely taken into consideration as clinically important features, the domain knowledge played an important role at the interpretation stage of the clustering experiment providing insight into the context and preventing far fetched conclusions. The “data-driven approach” was accurate in identifying clusters with distinct features at the physiological level. To promote the domain-led data mining approach, as a result of this experiment we developed a practical checklist guiding how to enable the integration of the domain knowledge into the data mining project.

“…Though k-means clustering has become a popular data-clustering algorithm [69,70], it is sensitive to outliers [68]. An alternative clustering method that is robust to outliers is the partitioning around medoids (PAM) algorithm [69,71]. PAM requires that the optimal number of clusters be determined before the algorithm is applied [70].…”

Section: Cluster Analysismentioning

confidence: 99%

Benefit Segmentation of Tourists to Geosites and Its Implications for Sustainable Development of Geotourism in the Southern Lake Tana Region, Ethiopia

et al. 2022

Geotourism is a sustainable type of tourism that focuses on the geological and geomorphological heritages of an area, and the associated cultural and biodiversity features. Though the popularity of geotourism is rapidly growing, research on the demand side, particularly on segmenting tourists to geosites and understanding their profiles, is limited. This obviously makes the designing of effective tourism policies that aim at developing geotourism sustainably very difficult. Hence, the main objectives of this study were to segment and profile tourists to geosites based on the benefits sought, and to show its implications for sustainable development of geotourism. With a survey of 415 tourists, this study clustered tourists to geosites in the southern Lake Tana region in Ethiopia based on the benefits sought. A factor–cluster method was applied to segment the tourists. The study identified four distinct segments: Activity–Nature Lovers, Culture Lovers, Nature–Culture Lovers, and Want-It-Alls. These segments differed in their demographic, trip, and behavioral characteristics. The findings implied that for sustainable development, destination managers and marketers need to customize their geotourism product development and marketing strategies based on the needs and characteristics of each market segment.