Since patent data became accessible in the 1980s, we have known that research using this data?while providing tremendous opportunities?rests on important assumptions about how patents are actually generated by firms. It is well known that firm-level selection processes shape the likelihood that firms decide to patent or not an invention. What is unknown is to what extent these processes leave the results of work using patent data at risk of being distorted by sample selection bias. To understand the magnitude of this bias, we replicate two important prior studies using data from a novel, proprietary dataset, which contains more than 35,000 invention disclosures made by inventors within a single firm, only some of which went on to be patented. We find strong indications for the presence of significant selection bias in patent studies in examining the variance of creative outcome distributions and the impact of past experience in subsequent inventions. We highlight what the nature of this bias may mean for our current body of knowledge, and provide suggestions of how this issue should be addressed in future research.Jelcodes:O30,O32
ABSTRACTSince patent data became accessible in the 1980s, we have known that research using this data-while providing tremendous opportunities-rests on important assumptions about how patents are actually generated by firms. It is well known that firm-level selection processes shape the likelihood that firms decide to patent or not an invention. What is unknown is to what extent these processes leave the results of work using patent data at risk of being distorted by sample selection bias. To understand the magnitude of this bias, we replicate two important prior studies using data from a novel, proprietary dataset, which contains more than 35,000 invention disclosures made by inventors within a single firm, only some of which went on to be patented. We find strong indications for the presence of significant selection bias in patent studies in examining the variance of creative outcome distributions and the impact of past experience in subsequent inventions. We highlight what the nature of this bias may mean for our current body of knowledge, and provide suggestions of how this issue should be addressed in future research.