Abstract. The choice of the kernel function is crucial to most applications of support vector machines. In this paper, however, we show that in the case of text classification, term-frequency transformations have a larger impact on the performance of SVM than the kernel itself. We discuss the role of importance-weights (e.g. document frequency and redundancy), which is not yet fully understood in the light of model complexity and calculation cost, and we show that time consuming lemmatization or stemming can be avoided even when classifying a highly inflectional language like German.
Science Created by You (SCY) is a project on learning in science and technology domains. SCY uses a pedagogical approach that centres around products, called 'emerging learning objects- (ELOs) that are created by students. Students work individually and collaboratively in SCY-Lab (the general SCY learning environment) on 'missions' that are guided by socio-scientific questions (for example 'How can we design a CO2-friendly house?'). Fulfilling SCY missions requires a combination of knowledge from different content areas (eg, physics, mathematics, biology, as well as social sciences). While on a SCY mission, students perform several types of learning activities that can be characterised as productive processes (experiment, game, share, explain, design, etc), they encounter multiple resources, collaborate with varying coalitions of peers and use changing constellations of tools and scaffolds. The configuration of SCY-Lab is adaptive to the actual learning situation and ma y provide advice to students on appropriate learning activities, resources, tools and scaffolds, or peer students who can support the learning process. The SCY project aims at students between 12 and 18 years old. In the course of the project, a total of four SCY missions will be developed, of which one is currently available
We define behavior as a set of actions performed by some actor during a period of time. We consider the problem of analyzing a large collection of behaviors by multiple actors, more specifically, identifying typical behaviors and spotting anomalous behaviors. We propose an approach leveraging topic modeling techniques -LDA (Latent Dirichlet Allocation) Ensembles -to represent categories of typical behaviors by topics that are obtained through topic modeling a behavior collection. When such methods are applied to text in natural languages, the quality of the extracted topics are usually judged based on the semantic relatedness of the terms pertinent to the topics. This criterion, however, is not necessarily applicable to topics extracted from non-textual data, such as action sets, since relationships between actions may not be obvious. We have developed a suite of visual and interactive techniques supporting the construction of an appropriate combination of topics based on other criteria, such as distinctiveness and coverage of the behavior set. Two case studies on analyzing operation behaviors in the security management system and visiting behaviors in an amusement park, and the expert evaluation of the first case study demonstrate the effectiveness of our approach.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.