It was recently reported that men self-cite >50% more often than women across a wide variety of disciplines in the bibliographic database JSTOR. Here, we replicate this finding in a sample of 1.6 million papers from Author-ity, a version of PubMed with computationally disambiguated author names. More importantly, we show that the gender effect largely disappears when accounting for prior publication count in a multidimensional statistical model. Gender has the weakest effect on the probability of self-citation among an extensive set of features tested, including byline position, affiliation, ethnicity, collaboration size, time lag, subject-matter novelty, reference/citation counts, publication type, language, and venue. We find that self-citation is the hallmark of productive authors, of any gender, who cite their novel journal publications early and in similar venues, and more often cross citation-barriers such as language and indexing. As a result, papers by authors with short, disrupted, or diverse careers miss out on the initial boost in visibility gained from self-citations. Our data further suggest that this disproportionately affects women because of attrition and not because of disciplinary under-specialization.
We introduce several measures of novelty for a scientific article in MEDLINE based on the temporal profiles of its assigned Medical Subject Headings (MeSH). First, temporal profiles for all MeSH terms (and pairs of MeSH terms) were characterized empirically and modelled as logistic growth curves. Second, a paper's novelty is captured by its youngest MeSH (and pairs of MeSH) as measured in years and volume of prior work. Across all papers in MEDLINE published since 1985, we find that individual concept novelty is rare (2.7% of papers have a MeSH ≤ 3 years old; 1.0% have a MeSH ≤ 20 papers old), while combinatorial novelty is the norm (68% have a pair of MeSH ≤ 3 years old; 90% have a pair of MeSH ≤ 10 papers old). Furthermore, these novelty measures exhibit complex correlations with article impact (as measured by citations received) and authors' professional age.
Twitter uses machine learning to crop images, where crops are centered around the part predicted to be the most salient. In fall 2020, Twitter users raised concerns that the automated image cropping system on Twitter favored light-skinned over dark-skinned individuals, as well as concerns that the system favored cropping woman's bodies instead of their heads. In order to address these concerns, we conduct an extensive analysis using formalized group fairness metrics. We find systematic disparities in cropping and identify contributing factors, including the fact that the cropping based on the single most salient point can amplify the disparities because of an effect we term argmax bias. However, we demonstrate that formalized fairness metrics and quantitative analysis on their own are insufficient for capturing the risk of representational harm in automatic cropping. We suggest the removal of saliency-based cropping in favor of a solution that better preserves user agency. For developing a new solution that sufficiently address concerns related to representational harm, our critique motivates a combination of quantitative and qualitative methods that include human-centered design. CCS Concepts: • General and reference → Experimentation; • Human-centered computing → HCI theory, concepts and models; • Applied computing → Sociology; • Computing methodologies → Computer vision; • Social and professional topics → User characteristics.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.