2017
DOI: 10.3145/epi.2017.sep.20
|View full text |Cite
|
Sign up to set email alerts
|

Semi-automatic generation of a corpus of Wikipedia articles on science and technology

Abstract: Eduard Aibar is an associate professor of Science and Technology Studies at the Arts & Humanities Department, Universitat Oberta de Catalunya (UOC). He has been the director of the Internet Interdisciplinary Institute and vice-president for research at UOC. He leads a research group on open science and innovation. His research has focused on the interaction between scientific and technological development and organizational and social change in areas such as eGovernment, town planning, and the Internet. He is … Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
1
1
1
1

Citation Types

0
4
0
1

Year Published

2020
2020
2022
2022

Publication Types

Select...
4
1

Relationship

1
4

Authors

Journals

citations
Cited by 7 publications
(5 citation statements)
references
References 19 publications
0
4
0
1
Order By: Relevance
“…We followed the procedure described by Lam et al [ 12 ] using a different set of categories to those used for the English Wikipedia, given the particularities of the categories in the Spanish Wikipedia. Several authors have stated that Wikipedia’s categories are more of a folksonomy than a true taxonomy [ 37 39 ], and cannot be completely relied upon to organize and navigate through its content. Furthermore, Wikipedia’s categories have tended to be more stable at the bottom (i.e., the category terms on Wikipedia pages do not change over time) than at the top level [ 40 ], making top-level categories less reliable because they are occasionally reorganized.…”
Section: Methodsmentioning
confidence: 99%
“…We followed the procedure described by Lam et al [ 12 ] using a different set of categories to those used for the English Wikipedia, given the particularities of the categories in the Spanish Wikipedia. Several authors have stated that Wikipedia’s categories are more of a folksonomy than a true taxonomy [ 37 39 ], and cannot be completely relied upon to organize and navigate through its content. Furthermore, Wikipedia’s categories have tended to be more stable at the bottom (i.e., the category terms on Wikipedia pages do not change over time) than at the top level [ 40 ], making top-level categories less reliable because they are occasionally reorganized.…”
Section: Methodsmentioning
confidence: 99%
“…Aun cuando existe gran cantidad de contenido científico y tecnológico disponible en la web, en su mayoría, sigue perteneciendo a sistemas cerrados de pago, como es el caso de las revistas científicas y repositorios. Wikipedia se convierte en un agente de transferencia, usando una estructura organizada y accesible a fuentes originales (Minguillón et al, 2017). b. Es un sistema de divulgación y comunicación de contenidos científicos, que propicia la inmediatez de conocimientos producidos, dado que son publicados en la red de inmediato.…”
Section: Características Distintivas De Wikipedia Como Sistema De Div...unclassified
“…On the contrary to the common use of unsupervised machine learning methods, this work is based on supervised methods, incorporating the ''ground truth'' knowledge from an expert classification scheme into the training/test data. Most of the related work based on Wikipedia utilizes the article interlinks or the category graph in conjunction with network analyses to identify articles/categories referring to disciplines or scientific concepts [33], [34], [37]. Those that use machine learning algorithms to classify Wikipedia articles as ''appropriate'' or not in a specific context, train their models on a smaller number of manually engineered features and smaller datasets compared to the method presented in this work, which in its core module uses automatically extracted features of larger dimension and larger training/test datasets.…”
Section: Related Workmentioning
confidence: 99%
“…Then the Arts category and its related categories are mapped to UDC and compared for their structure. Minguillón et al[34] present a semi-automatic method based on random walks to determine a subset of Wikipedia articles containing scientific and technological content. 60,108 Spanish Wikipedia pages in 340 communities were identified as containing scientific and technological content, reachable from 974 six-digit categories from the UNESCO nomenclature for fields of science and technology.…”
mentioning
confidence: 99%