Mining foreign language information resources

Watts, R.J.; Porter, A.L.

doi:10.1109/picmet.1999.787805

Cited by 2 publications

(2 citation statements)

References 6 publications

Supporting

Mentioning

Contrasting

Order By: Relevance

“…The VantagePoint factor map routine applies a small-increment Kaiser Varimax Rotation (yielding more attractive results but running slower than SPSS PCA in developmental tests). Our colleague, Bob Watts of the U.S. Army, has led the development of a more automated version of PCA with an optimization routine to determine a best solution (maximizing inclusion of records with fewest factors) based on selected parameter settings-(Principal Components Decomposition-PCD) [30]. PCA is a basic form of factor analysis that allows terms to appear in multiple "factors" (we take the liberty to use that term in lieu of "principal components").…”

Section: Term Clumpingmentioning

confidence: 99%

“Term clumping” for technical intelligence: A case study on dye-sensitized solar cells

Zhang¹,

Porter²,

Hu³

et al. 2014

Technological Forecasting and Social Change

143

View full text Add to dashboard Cite

Tech Mining seeks to extract intelligence from Science, Technology & Innovation information record sets on a subject of interest. A key set of Tech Mining interests concerns which R&D activities are addressed in the publication and patent abstract records under study. This paper presents six "term clumping" steps that can clean and consolidate topical content in such text sources. It examines how each step changes the content, potentially to facilitate extraction of usable intelligence as the end goal. We illustrate for an emerging technology, dye-sensitized solar cells. In this case we were able to reduce some 90,980 terms & phrases to more user-friendly sets through the clumping steps as one indicator of success. The resulting phrases are better suited to contributing usable technical intelligence than the original results. We engaged seven persons knowledgeable about dye-sensitized solar cells (DSSCs) to assess the resulting content. These empirical results advanced the development of a semi-automated term clumping process that can enable extraction of topical content intelligence.

show abstract

Section: Term Clumpingmentioning

confidence: 99%

“Term clumping” for technical intelligence: A case study on dye-sensitized solar cells

Zhang¹,

Porter²,

Hu³

et al. 2014

Technological Forecasting and Social Change

143

View full text Add to dashboard Cite

show abstract

“…The VantagePoint factor map routine applies a small-increment Kaiser Varimax Rotation (yielding more attractive results, but running slower, than SPSS PCA in developmental tests). Our colleague, Bob Watts of the U.S. Army, has led development of a more automated version of PCA, with an optimization routine to determine a best solution (maximizing inclusion of records with fewest factors) based on selected parameter settings --(Principal Components Decomposition -PCD) [21] He has also empirically compared PCD (inductive) results with a deductive approach based on use of class codes [22].…”

Section: Review Of Related Literaturesmentioning

confidence: 99%

Text Clumping for Technical Intelligence

Porter¹,

Zhang²

2012

Theory and Applications for Advanced Text Mining

View full text Add to dashboard Cite

This development responds to a challenge. Text mining software can conveniently generate very large sets of terms or phrases. Our examples draw from use of VantagePoint or equivalently, Thomson Data "nalyzer -TD" software [ ] to analyze abstract record sets. " typical search on an ST&I topic of interest might yield, say, , records. One approach is to apply VantagePoint s Natural Language Processing NLP to the titles, and also to the abstracts and/or claims. We also take advantage of available topic-rich fields such as keywords and index terms. Merging these fields could well offer on the order of , terms and phrases in one field list . That list, unfortunately, will surely contain much noise and redundancy. The text clumping aim is to clean and consolidate such a list to provide rich, usable content information. "s described, the text field of interest can contain terms i.e., single words or unigrams and/or phrases i.e., multi-word noun + modifiers term sets . Herein, we focus on such NLP phrases, typically including many single words also. Some of the algorithms pertain especially to multi-word phrases, but, in general, many steps can usefully be applied to singleword term sets. Here we focus on analyzing NLP English noun-phrases -to be called simply "phrases.Our larger mission is to generate effective Competitive Technical Intelligence CTI . We want to answer basic questions of "Who is doing What, Where and When? In turn, that information can be used to build "innovation indicators that address users CTI needs [ ]. Typically, those users might be • Information professionals compiling most relevant information resources• Researchers seeking to learn about the nearby "research landscape• R&D managers wanting to invest in the most promising opportunities

show abstract

Mining foreign language information resources

Cited by 2 publications

References 6 publications

“Term clumping” for technical intelligence: A case study on dye-sensitized solar cells

“Term clumping” for technical intelligence: A case study on dye-sensitized solar cells

Text Clumping for Technical Intelligence

Contact Info

Product

Resources

About