We examine whether stock price effects can be automatically predicted analyzing unstructured textual information in financial news. Accordingly, we enhance existing text mining methods to evaluate the information content of financial news as an instrument for investment decisions. The main contribution of this paper is the usage of more expressive features to represent text and the employment of market feedback as part of our word selection process. In our study, we show that a robust Feature Selection allows lifting classification accuracies significantly above previous approaches when combined with complex feature types. That is because our approach allows selecting semantically relevant features and thus, reduces the problem of over-fitting when applying a machine learning approach. The methodology can be transferred to any other application area providing textual information and corresponding effect data.
Elastic n-tier applications have non-stationary workloads that require adaptive control of resources allocated to them. This presents not only an opportunity in pay-as-you-use clouds, but also a challenge to dynamically allocate virtual machines appropriately. Previous approaches based on control theory, queuing networks, and machine learning work well for some situations, but each model has its own limitations due to inaccuracies in performance prediction. In this paper we propose a multi-model controller, which integrates adaptation decisions from several models, choosing the best. The focus of our work is an empirical model, based on detailed measurement data from previous application runs. The main advantage of the empirical model is that it returns high quality performance predictions based on measured data. For new application scenarios, we use other models or heuristics as a starting point, and all performance data are continuously incorporated into the empirical model's knowledge base. Using a prototype implementation of the multi-model controller, a cloud testbed, and an ntier benchmark (RUBBoS), we evaluated and validated the advantages of the empirical model. For example, measured data show that it is more effective to add two nodes as a group, one for each tier, when two tiers approach saturation simultaneously.
Abstract. The complexity of today's large-scale enterprise applications demands system administrators to monitor enormous amounts of metrics, and reconfigure their hardware as well as software at run-time without thorough understanding of monitoring results. The Elba project is designed to achieve an automated iterative staging to mitigate the risk of violating Service Level Objectives (SLOs). As part of Elba we undertake performance characterization of system to detect bottlenecks in their configurations. In this paper, we introduce our concrete bottleneck detection approach used in Elba, and then show its robustness and accuracy in various configurations scenarios. We utilize a wellknown benchmark application, RUBiS (Rice University Bidding System), to evaluate the classifier with respect to successful identification of different bottlenecks.
The performance evaluation of database servers in N-tier applications is a serious challenge due to requirements such as non-stationary complex workloads and global consistency management when replicating database servers. We conducted an experimental evaluation of database server scalability and bottleneck identification in N-tier applications using the RUBBoS benchmark. Our experiments are comprised of a full scale-out mesh with up to nine database servers and three application servers. Additionally, the fourtier system was run in a variety of configurations, including two database management systems (MySQL and PostgreSQL), two hardware node types (normal and low-cost), and two database replication techniques (C-JDBC and MySQL Cluster). In this paper we present the analysis of results generated with a read-intensive interaction pattern (browse-only workload) in the client emulator. These empirical data can be divided into two kinds. First, for a relatively small number of servers, we find simple hardware resource bottlenecks. Consequently, system throughput increases with an increasing number of database (and application) servers. Second, when sufficient hardware resources are available, non-obvious database related bottlenecks have been found that limit system throughput. While the first kind of bottlenecks shows that there are similarities between database and application/web server scalability, the second kind of bottlenecks shows that database servers have significantly higher sophistication and complexity that require in-depth evaluation and analysis.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.