Proceedings of the 17th International Conference on Mining Software Repositories 2020
DOI: 10.1145/3379597.3387500
|View full text |Cite
|
Sign up to set email alerts
|

A Dataset and an Approach for Identity Resolution of 38 Million Author IDs extracted from 2B Git Commits

Help me understand this report
View preprint versions

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
19
0

Year Published

2020
2020
2022
2022

Publication Types

Select...
5
2

Relationship

2
5

Authors

Journals

citations
Cited by 30 publications
(21 citation statements)
references
References 16 publications
0
19
0
Order By: Relevance
“…Dealing with duplicate or aliased entities from disparate sources of data has been discussed in literature [7,9,33]. As the final step of database preparation, LAGOON ingest batches are merged under a fusion process that de-duplicates entities across different batches by creating a single FusedEntity to represent them.…”
Section: Overview Of Lagoonmentioning
confidence: 99%
“…Dealing with duplicate or aliased entities from disparate sources of data has been discussed in literature [7,9,33]. As the final step of database preparation, LAGOON ingest batches are merged under a fusion process that de-duplicates entities across different batches by creating a single FusedEntity to represent them.…”
Section: Overview Of Lagoonmentioning
confidence: 99%
“…The disambiguation approach used in this analysis is based on simple heuristics and is therefore able to scale well to the large data sets used in our study. However, since conducting our study, new advanced disambiguation algorithms and data sets have been made available which should be considered for future studies (Amreen et al 2020;Fry et al 2020).…”
Section: Construct Validitymentioning
confidence: 99%
“…One paper explicitly mentioned that the data was used with consent of the organisation behind it (Gonzalez-Barahona et al 2015). One paper did discuss ethics issues and described how the data was pseudonymised (Fry et al 2020). Another paper had a detailed discussion on privacy, and legal and ethics issues (Robles et al 2014).…”
Section: Data Showcasementioning
confidence: 99%