Abstract:We provide complete source code for building a fundamental industry classification based on publicly available and freely downloadable data. We compare various fundamental industry classifications by running a horserace of short-horizon trading signals (alphas) utilizing open source heterotic risk models (https://ssrn.com/abstract=2600798) built using such industry classifications. Our source code includes various stand-alone and portable modules, e.g., for downloading/parsing web data, etc.Keywords: industry classification; fundamental; open source; source code; stocks; hierarchy; GICS; BICS; ICB; NAICS; SIC; TRBC; quantitative trading; trading signal; alpha; risk model; mean-reversion; optimization; short-horizon; backtest; simulation; download
IntroductionFundamental industry classifications such as GICS, BICS, ICB, NAICS, SIC, TRBC, etc. (GICS = Global Industry Classification Standard (by MSCI and Standard & Poor's); BICS = Bloomberg Industry Classification System; ICB = Industry Classification Benchmark (by London Stock Exchange FTSE); NAICS = North American Industry Classification System (by Mexico's Instituto Nacional de Estadística y Geografía, Statistics Canada also known as Statistique Canada, and the United States Office of Management and Budget); SIC = Standard Industrial Classification (by the United States government agencies); TRBC = Thomson Reuters Business Classification.) are widely used in a variety of fields, including economic applications (for economics, financial economics and accounting related literature, see, e.g., ; for a recent review, see, e.g., [24]; for other applications and more generally related literature, see, e.g., [25][26][27][28][29][30][31][32][33][34][35]), general population and healthcare related studies (see, e.g., [36] and references therein), and (quantitative) finance/trading (including risk modeling) (for related literature, see, e.g., ; for applications to risk modeling within quantitative finance, see, e.g., [60][61][62]; for statistical/data mining related methods, see, e.g., [63][64][65][66][67][68][69]). Industry classification (i.e., taxonomy) groups companies into baskets (e.g., industries) based on some kind of a similarity criterion or criteria, which differ from one classification to another. Such fundamental industry classifications generally are expected to be based on pertinent fundamental/economic data, such as companies' products and services, revenue sources, suppliers, competitors, partners, etc. They are essentially independent of the pricing data and, if well-built, tend to be rather stable out-of-sample as companies seldom jump industries. Many industry classifications are developed commercially and acquiring such data is associated with nontrivial costs. Even government-developed classifications such as NAICS or even SIC (see below) are not exactly free. This is for two main reasons. First, simply specifying a hierarchical structure (e.g., a complete list of, say, sectors, industries and sub-industries as in BICS) is only the tip of th...