2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR) 2021
DOI: 10.1109/msr52588.2021.00077
|View full text |Cite
|
Sign up to set email alerts
|

Search4Code: Code Search Intent Classification Using Weak Supervision

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1

Citation Types

0
4
0

Year Published

2021
2021
2023
2023

Publication Types

Select...
4
3

Relationship

0
7

Authors

Journals

citations
Cited by 8 publications
(4 citation statements)
references
References 20 publications
0
4
0
Order By: Relevance
“…Rao et al note that a significant bottleneck in this area of research is the lack of datasets as search logs can not be made public due to privacy laws (see also Section 5) [4]. In their later work [17], they release a limited dataset of anonymized software development-related search queries mined from Bing logs between September 1, 2019 and August 31, 2020. The dataset contains more than 11,000 real-world search queries related to the C# and Java programming languages, which they group into 7 categories based on their search intent: API, HowTo, Installation, Debug, Learn, Navigational and Miscellaneous.…”
Section: Software Development-related Queriesmentioning
confidence: 99%
See 1 more Smart Citation
“…Rao et al note that a significant bottleneck in this area of research is the lack of datasets as search logs can not be made public due to privacy laws (see also Section 5) [4]. In their later work [17], they release a limited dataset of anonymized software development-related search queries mined from Bing logs between September 1, 2019 and August 31, 2020. The dataset contains more than 11,000 real-world search queries related to the C# and Java programming languages, which they group into 7 categories based on their search intent: API, HowTo, Installation, Debug, Learn, Navigational and Miscellaneous.…”
Section: Software Development-related Queriesmentioning
confidence: 99%
“…In this work, we select the Java programming language-related queries released by Rao et al [17], which consists of 6,596 queries. Some of the queries in the dataset are similar (e.g., 'java api', and 'java apis', 'java queue', and 'java queues', 'java for loop', and 'for loop java' ), and some are noisy (e.g., 'java chicken', 'java apple' ).…”
Section: Software Development-related Queriesmentioning
confidence: 99%
“…Moreover, they describe the limits of anonymisation of such data and how secure anonymisation could help. Rao et al (2021) discuss how their data was anonymised and how they filter out queries that were entered by less than k users and could potentially contain sensitive information. Yamashita et al (2017) discuss how the history of Git repositories was re-written to remove personal data.…”
Section: Data Showcasementioning
confidence: 99%
“…To reduce the efforts of annotation, recent weak supervision (WS) frameworks have been proposed which focus on enabling users to leverage a diversity of weaker, often programmatic supervision sources [76,77,75] to label and manage training data in an efficient way. Recently, WS has been widely applied to various machine learning tasks in a diversity of domains: scene graph prediction [9], video analysis [23,92], image classification [12], image segmentation [35], autonomous driving [96], relation extraction [36,107,57], named entity recognition [82,53,50,45,27], text classification [78,100,85,86], dialogue system [63], biomedical [43,19,64], healthcare [20,17,21,80,93,81], software engineering [74], sensors data [24,39], E-commerce [66,103], and multi-agent systems [102].…”
Section: Introductionmentioning
confidence: 99%