A new predicting model of an entrepreneurial behaviour: interpretable machine learning on GEM data for multi-year and multi country analysis
International Entrepreneurship and Management Journal
Published online on May 07, 2026
Abstract
{"p"=>{"__content__"=>"Identifying who is likely to engage in entrepreneurship - both in Intention (EI) and in realized Entrepreneurial Outcome Behaviour (EOB) - is essential for designing training, incubation, and policy interventions. By implementing eight waves of data from the Global Entrepreneurship Monitor (GEM) (2014–2021) generated from 43 countries ( > 1.2 million), this paper identifies a compact and generalisable predictive framework for EI and EOB using an explicit predictive model based on interpretable tree-based machine learning (Classification and Regression Trees (CART) and Stochastic Gradient Boosting (TreeNet)). The aim of the paper is to test whether a small, theory-driven subset can predict entrepreneurial behaviour across countries and years. This is achieved through the use of a construct-level framework that embeds intention models (TPB/EEM) within micro-foundations of management (RBV: human and social capital; dynamic capabilities: perceiving-seizing-transforming) and the institutional/economic context. The results show that the resulting five-variable EI model achieves a sensitivity of 75% and a baseline logistic regression of 12%, respectively. In contrast, the six-variable model for EOB achieves a sensitivity of 82% and a baseline logistic regression of 15%. Furthermore, performance does not change across pre-pandemic and COVID-era data. Finally, the model highlights actionable levers - strengthening self-efficacy, reducing institutional frictions that slow the intention-to-action transition, and reinforcing ties with role models - supporting scalable, low-cost interventions. The approach illustrates the ability of parsimonious, transparent, and self-optimising algorithms to uncover and maintain predictive structures from immensely scaled, class-biased behavioural data, while still adhering to the underlying theory of the chosen constructs and algorithms.", "i"=>{"__content__"=>"n"}}}