Kjell Stridh scite author profile

Patents are an important source of technological knowledge, but the amount of existing patents is vast and quickly growing. This makes development of tools and methodologies for quickly revealing patterns in patent collections important. In this paper, we describe how structured chemometric principles of multivariate data analysis can be applied in the context of text analysis in a novel combination with common machine learning preprocessing methodologies. We demonstrate our methodology in 2 case studies. Using principal component analysis (PCA) on a collection of 12338 patent abstracts from 25 companies in big pharma revealed sub‐fields which the companies are active in. Using PCA on a smaller collection of patents retrieved by searching for a specific term proved useful to quickly understand how patent classifications relate to the search term. By using orthogonal projections to latent structures (O‐PLS) on patent classification schemes, we were able to separate patents on a more detailed level than using PCA. Lastly, we performed multi‐block modeling using OnPLS on bag‐of‐words representations of abstracts, claims, and detailed descriptions, respectively, showing that semantic variation relating to patent classification is consistent across multiple text blocks, represented as globally joint variation. We conclude that using machine learning to transform unstructured data into structured data provide a good preprocessing tool for subsequent chemometric multivariate data analysis and provides an easily interpretable and novel workflow to understand large collections of patents. We demonstrate this on collections of chemical and pharmaceutical patents.

show abstract

ChemInform Abstract: Claisen Rearrangements with Mesityl Oxide Dimethyl Ketal. Synthesis of Ipsdienone, E‐ and Z‐Ocimenone, 2,6‐Dimethyl‐2,7‐octadien‐4‐one and 2,6‐Dimethyl‐2,7‐octadien‐4‐ol.

Baeckstroem¹,

Stridh²,

Li³

et al. 1988

ChemInform

View full text Add to dashboard Cite

show abstract

scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.

Contact Info

hi@scite.ai

10624 S. Eastern Ave., Ste. A-614

Henderson, NV 89052, USA

Blog Terms and Conditions API Terms Privacy Policy Contact Cookie Preferences Do Not Sell or Share My Personal Information

Made with 💙 for researchers

Part of the Research Solutions Family.

Kjell Stridh

Claisen Rearrangements with Mesityl Oxide Dimethyl Ketal. Synthesis of Ipsdienone, E- and Z-Ocimenone, 2,6-Dimethyl-2,7-octadien-4-one and 2,6-Dimethyl-2,7-octadien-4-ol.

Multivariate patent analysis—Using chemometrics to analyze collections of chemical and pharmaceutical patents

ChemInform Abstract: Claisen Rearrangements with Mesityl Oxide Dimethyl Ketal. Synthesis of Ipsdienone, E‐ and Z‐Ocimenone, 2,6‐Dimethyl‐2,7‐octadien‐4‐one and 2,6‐Dimethyl‐2,7‐octadien‐4‐ol.

Contact Info

Product

Resources

About