Tf–idf

Uther, William; Ciaramita, Massimiliano; Berendt, Bettina; Kołcz, Aleksander; Grobelnik, Marko; Mladenić, Dunja; Witbrock, Michael; Risch, John; Bohn, Shawn; Poteet, Steve; Kao, Anne; Quach, Lesley; Wu, Jason; Keogh, Eamonn; Miikkulainen, Risto; Flener, Pierre; Schmid, Ute; Zheng, Fei; Webb, Geoffrey I.; Nijssen, Siegfried

doi:10.1007/978-0-387-30164-8_832

Cited by 84 publications

(25 citation statements)

References 0 publications

Supporting

Mentioning

Contrasting

Unclassified

Order By: Relevance

“…For building conventional ML models, we chose Random Forest (RF) [2] and Support Vector Machine (SVM) [41] as candidate models since these models are observed to be superior performers in the classification of short texts like social media comments [35]. The features that we used in these model building are Regex status and sentiment features along with TF-IDF Vectorizer [33]. The sentiment-related features used in our work are sentiment polarity at the comment level, which can be positive, negative or neutral, and polarity score, which indicates how strongly the sentiment has been expressed in the comment.…”

Section: Ml-based Classificationmentioning

confidence: 99%

Towards offensive language detection and reduction in four Software Engineering communities

Cheriyan

Savarimuthu

Cranefield

2021

Evaluation and Assessment in Software Engineering

View full text Add to dashboard Cite

Software Engineering (SE) communities such as Stack Overflow have become unwelcoming, particularly through members' use of offensive language. Research has shown that offensive language drives users away from active engagement within these platforms. This work aims to explore this issue more broadly by investigating the nature of offensive language in comments posted by users in four prominent SE platforms -GitHub, Gitter, Slack and Stack Overflow (SO). It proposes an approach to detect and classify offensive language in SE communities by adopting natural language processing and deep learning techniques. Further, a Conflict Reduction System (CRS), which identifies offence and then suggests what changes could be made to minimize offence has been proposed. Beyond showing the prevalence of offensive language in over 1 million comments from four different communities which ranges from 0.07% to 0.43%, our results show promise in successful detection and classification of such language. The CRS system has the potential to drastically reduce manual moderation efforts to detect and reduce offence in SE communities.

show abstract

Section: Ml-based Classificationmentioning

confidence: 99%

Towards offensive language detection and reduction in four Software Engineering communities

Cheriyan

Savarimuthu

Cranefield

2021

Evaluation and Assessment in Software Engineering

View full text Add to dashboard Cite

show abstract

“…For each evaluator, we first collect the feature descriptions of all 115 clone groups that were resolved as valid in the intra-clone group validation phase. To reduce manual effort and chances of incurring human error while analyzing all 115 2 = 6555 combinations of feature descriptions, we form a subset of the clone group descriptions of each evaluator based on TF-IDF [114] similarity. After stemming all words in the descriptions, we calculate pair-wise similarity between all feature descriptions of an evaluator using a TF-IDF similarity score.…”

Section: Inter-clone Group Dissimilarity Validationmentioning

confidence: 99%

FACER: An API Usage-based Code-example Recommender for Opportunistic Reuse

Abid

Shamail

Basit

et al. 2021

Preprint

View full text Add to dashboard Cite

To save time, developers often search for code examples that implement their desired software features. Existing code search techniques typically focus on ﬁnding code snippets for a single given query, which means that developers need to perform a separate search for each desired functionality. In this paper, we pro-pose FACER (Feature-driven API usage-based Code Examples Recommender), a technique that avoids repeated searches through opportunistic reuse. Speciﬁcally, given the selected code snippet that matches the initial search query, FACER ﬁnds and suggests related code snippets that represent features that the developer may want to implement next. FACER ﬁrst constructs a code fact repository by parsing the source code of open-source Java projects to obtain methods’ textual information, call graphs, and Application Programming Interface (API) usages. It then detects unique features by clustering methods based on similar API us-ages, where each cluster represents a feature or functionality. Finally, it detects frequently co-occurring features across projects using frequent pattern mining and recommends related methods from the mined patterns. To evaluate FACER, we run it on 120 Java Android apps from GitHub. We ﬁrst manually validate that the detected method clusters represent methods with similar functionality. We then perform an automated evaluation to determine the best parameters (e.g., similarity threshold) for FACER. We recruit 10 professional developers along with 39 experienced students to judge FACER’s recommendation of related methods. Our results show that, on average, FACER’s recommendations are 80% precise. We also survey a total of 20 professional Android and Java developers to understand their code search and reuse experiences, and also to obtain their feedback on the usability and usefulness of FACER. The survey results show that 95% of our surveyed professional developers ﬁnd the idea of related method recommendations useful during code reuse.

show abstract

“…Then obtained: Centroid 1=0.3 and Centroid 2=0.3 as explained in Table 5 and the visualization explained in Figure 2. Next calculate the distance of each data with each weight using (1) [24,25]:…”

Section: Figure 1 Tf-idf Data On Graphmentioning

confidence: 99%

Marketplace affiliates potential analysis using cosine similarity and vision-based page segmentation

Zulfikar¹,

Irfan²,

Ghufron³

et al. 2020

Bulletin EEI

View full text Add to dashboard Cite

One success factor of an online affiliate is determined by the quality of the content source. Therefore, affiliate marketplaces need to do an objective assessment to retrieve content data that will be used to choose the right product in the appropriate product filter. Usually, the selection is not made using a good and measured system so that the selection of product content is only based on parts that are not in accordance with what is seen or subjective. However, if analyzed using a good and measurable system will produce an objective product content and can have a positive impact on users because the selection is based on factual data. The purpose of this research is to analyze the potential of the affiliate marketplace by combining cosine similarity with vision-based page segmentation. This is a new breakthrough made for optimization to get the best content in accordance with the required criteria. This work will produce a number of product recommendations that are appropriate for publication and then made use of for comparison that matches the required criteria. At the limited evaluation stage, the performance of the proposed model obtained satisfactory results, in which 5 queries tested were all as expected.

show abstract

Tf–idf

Cited by 84 publications

References 0 publications

Towards offensive language detection and reduction in four Software Engineering communities

Towards offensive language detection and reduction in four Software Engineering communities

FACER: An API Usage-based Code-example Recommender for Opportunistic Reuse

Marketplace affiliates potential analysis using cosine similarity and vision-based page segmentation

Contact Info

Product

Resources

About