2022
DOI: 10.1007/s11187-022-00609-6
|View full text |Cite
|
Sign up to set email alerts
|

Topic-based classification and identification of global trends for startup companies

Abstract: To foresee global economic trends, one needs to understand the present startup companies that soon may become new market leaders. In this paper, we explore textual descriptions of more than 250 thousand startups in the Crunchbase database. We analyze the 2009–2019 period by using topic modeling. We propose a novel classification of startup companies free from expert bias that contains 38 topics and quantifies the weight of each of these topics for all the startups. Taking the year of establishment and geograph… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
3
1
1

Citation Types

0
5
0

Year Published

2022
2022
2024
2024

Publication Types

Select...
5
1
1
1

Relationship

1
7

Authors

Journals

citations
Cited by 19 publications
(6 citation statements)
references
References 61 publications
(56 reference statements)
0
5
0
Order By: Relevance
“…To train the STM model we use the following covariates: NDC version ( rst or updated), conditionality (yes or no), emissions per capita, emissions per GDP and implied emission change to 2030. To choose the appropriate number of topics, we followed the earlier literature 32,33 to compare STM models with three to 50 topics on such criteria as prediction accuracy (held-out log-likelihood), the extent to which popular words from topics overlap (exclusivity), and the degree to which words from the same topic appear in the same texts (semantic coherence). Figure S6 of the Supplementary Information summarises the results indicating that 21 topics balance these three criteria best.…”
Section: Topic Modellingmentioning
confidence: 99%
“…To train the STM model we use the following covariates: NDC version ( rst or updated), conditionality (yes or no), emissions per capita, emissions per GDP and implied emission change to 2030. To choose the appropriate number of topics, we followed the earlier literature 32,33 to compare STM models with three to 50 topics on such criteria as prediction accuracy (held-out log-likelihood), the extent to which popular words from topics overlap (exclusivity), and the degree to which words from the same topic appear in the same texts (semantic coherence). Figure S6 of the Supplementary Information summarises the results indicating that 21 topics balance these three criteria best.…”
Section: Topic Modellingmentioning
confidence: 99%
“…This process reduces researcher bias because foreknowledge of document content does not affect the topic classifications (Zhang et al, 2021). The LDA topic model is widely used in patent content analysis (Wang et al, 2015;Zhang et al, 2021) and technology topics evaluation (Li et al, 2021;Wang et al, 2020;Savin et al, 2022aSavin et al, , 2022b In order to apply LDA to the STO white papers, we first pre-processed the corpus by 1) converting words to lowercase, 2) removing standard English stop words and punctuation, and 3) lemmatizing all the words by means of the Natural Language Toolkit 6 lemmatiser. We then analysed the distribution of terms with domain experts and filtered out generic terms that appeared in more than 60% of the white papers (Zhang et al, 2021).…”
Section: Identifying Topics With Ldamentioning
confidence: 99%
“…STM is a method to classify textual responses into distinct topics developed specifically for open-ended survey responses [ 12 ]. It has already been used in about 60 articles on a wide range of topics [ 13 ], including those published on public perceptions of climate change and air pollution [ 14 16 ], associations with carbon taxation and its fairness perception [ 17 ], people’s beliefs about others’ climate beliefs [ 18 ] classification of startup companies and identification of their global trends [ 19 ], and studies published on environmental innovation and societal transitions [ 20 ]. Applications of topic modelling have been even broader including, for example, tracing development of agricultural and water technologies over long time [ 21 ] and understanding public attitudes towards municipal solid waste sorting policy in China [ 22 ].…”
Section: Introductionmentioning
confidence: 99%