Select Page

## Choice of K in K-fold Cross Validation for Classification in Financial Market

Choice of K in K-fold Cross Validation for Classification in Financial Market

Cross Validation is often used as a tool for model selection across classifiers. As discussed in detail in the following paper https://ssrn.com/abstract=2967184, Cross Validation is typically performed in the following steps:

• Step 1: Divide the original sample into K sub samples; each subsample typically has equal sample size and is referred to as one fold, altogether, K-fold.
• Step 2: In turn, while keeping one fold as a holdout sample for the purpose of Validation, perform Training on the remaining K-1 folds; one needs to repeat this step for K iterations.
• Step 3: The performance statistics (e.g., Misclassification Error) calculated from K iterations reflects the overall K-fold Cross Validation performance for a given classifier.

However, one question often pops up: how to choose K in K-fold cross validation. The rule-of-thumb choice often suggested by literature based on non-financial market is K=10. The question is: is it true for Financial Market?

In the following paper, in the context of Financial Market, we compare a range of choices for K in K-fold cross validation for the following 8 most popular classifiers:

• Neural Network
• Support Vector Machine
• Ensemble
• Discriminant Analysis.
• Naïve Bayes.
• K-nearest Neighbours.
• Decision Tree.
• Logistic Regression

For those who want to know a bit more, the paper is available: https://ssrn.com/abstract=2967184

## Parameter Selection in Classification for Financial Market

Parameter Selection in Classification for Financial Market

In practice, we often have to make parameterization choices for a given classifier in order to achieve optimal classification performances; just to name a few examples:

• Neural Network: e.g., the optimal choice of Activation Functions, # of hidden units
• Support Vector Machine: e.g., the optimal choice of Kernel Functions
• Ensemble: e.g., the number of Learning Cycles for Bagging.
• Discriminant Analysis: e.g., Linear/Quadratic; regularization choices for covariance matrix.
• Naïve Bayes: e.g., Kernel choices; bandwidth selections.
• K-nearest Neighbours: e.g., Distance metrics; k in kNN.
• Decision Tree: e.g., Impurity measure choices; Tree Size Constraint.
• Logistic Regression

In the following paper, we discuss in details how the parameterization choices are made in the context of Financial Market and the parameters are tuned in order to achieve optimal performance for each classifier mentioned above: the paper is available: https://ssrn.com/abstract=2967184 ; the presentation slide gives a summary: https://ssrn.com/abstract=2973065