2020
DOI: 10.48550/arxiv.2002.02314
|View full text |Cite
Preprint
|
Sign up to set email alerts
|

A Dataset for GitHub Repository Deduplication: Extended Description

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1

Citation Types

0
1
0

Publication Types

Select...
1

Relationship

1
0

Authors

Journals

citations
Cited by 1 publication
(1 citation statement)
references
References 0 publications
0
1
0
Order By: Relevance
“…It has three stages: project discovery, data retrieval, and reorganization as shown in Figure 1, which is typical of most big data systems, that use the layered data approach where the initial layers accumulate and process raw data and the later layers produce cleaned/augmented data. We also perform data augmentation on the collected data, focusing on tasks like fork resolution [67] and author identity resolution [6,35]. The paper describes a rapidly evolving WoC prototype with some aspects of the system evolving over time.…”
Section: Building the Woc Infrastructurementioning
confidence: 99%
“…It has three stages: project discovery, data retrieval, and reorganization as shown in Figure 1, which is typical of most big data systems, that use the layered data approach where the initial layers accumulate and process raw data and the later layers produce cleaned/augmented data. We also perform data augmentation on the collected data, focusing on tasks like fork resolution [67] and author identity resolution [6,35]. The paper describes a rapidly evolving WoC prototype with some aspects of the system evolving over time.…”
Section: Building the Woc Infrastructurementioning
confidence: 99%