Abstract. We describe RDF123, a highly flexible open-source tool for translating spreadsheet data to RDF. Existing spreadsheet-to-rdf tools typically map only to star-shaped RDF graphs, i.e. each spreadsheet row is an instance, with each column representing a property. RDF123, on the other hand, allows users to define mappings to arbitrary graphs, thus allowing much richer spreadsheet semantics to be expressed. Further, each row in the spreadsheet can be mapped with a fairly different RDF scheme. Two interfaces are available. The first is a graphical application that allows users to create their mapping in an intuitive manner. The second is a Web service that takes as input a URL to a Google spreadsheet or CSV file and an RDF123 map, and provides RDF as output.
Pointwise mutual information (PMI) is a widely used word similarity measure, but it lacks a clear explanation of how it works. We explore how PMI differs from distributional similarity, and we introduce a novel metric, PMI max , that augments PMI with information about a word's number of senses. The coefficients of PMI max are determined empirically by maximizing a utility function based on the performance of automatic thesaurus generation. We show that it outperforms traditional PMI in the application of automatic thesaurus generation and in two word similarity benchmark tasks: human similarity ratings and TOEFL synonym questions. PMI max achieves a correlation coefficient comparable to the best knowledge-based approaches on the Miller-Charles similarity rating dataset.
Semantic textual similarity is a measure of the degree of semantic equivalence between two pieces of text. We describe the SemSim system and its performance in the *SEM 2013 and SemEval-2014 tasks on semantic textual similarity. At the core of our system lies a robust distributional word similarity component that combines Latent Semantic Analysis and machine learning augmented with data from several linguistic resources. We used a simple term alignment algorithm to handle longer pieces of text. Additional wrappers and resources were used to handle task specific challenges that include processing Spanish text, comparing text sequences of di↵erent lengths, handling informal words and phrases, and matching words with sense definitions. In the *SEM 2013 task on Semantic Textual Similarity, our best performing system ranked first among the 89 submitted runs. In the SemEval-2014 task on Multilingual Semantic Textual Similarity, we ranked a close second in both the English and Spanish subtasks. In the SemEval-2014 task on Cross-Level Semantic Similarity, we ranked first in Sentence-Phrase, Phrase-Word, and Word-Sense subtasks and second in the Paragraph-Sentence subtask.
We describe UMBC's systems developed for the SemEval 2014 tasks on Multilingual Semantic Textual Similarity (Task 10) and Cross-Level Semantic Similarity (Task 3). Our best submission in the Multilingual task ranked second in both English and Spanish subtasks using an unsupervised approach. Our best systems for Cross-Level task ranked second in Paragraph-Sentence and first in both Sentence-Phrase and Word-Sense subtask. The system ranked first for the PhraseWord subtask but was not included in the official results due to a late submission.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.