An Empirical Comparison of V-fold Penalisation and Cross Validation for Model Selection in Distribution-Free Regression

Model selection is a crucial issue in machine-learning and a wide variety of penalisation methods (with possibly data dependent complexity penalties) have recently been introduced for this purpose. However their empirical performance is generally not well documented in the literature. It is the goal of this paper to investigate to which extent such recent techniques can be successfully used for the tuning of both the regularisation and kernel parameters in support vector regression (SVR) and the complexity measure in regression trees (CART). This task is traditionally solved via V-fold cross-validation (VFCV), which gives efficient results for a reasonable computational cost. A disadvantage however of VFCV is that the procedure is known to provide an asymptotically suboptimal risk estimate as the number of examples tends to infinity. Recently, a penalisation procedure called V-fold penalisation has been proposed to improve on VFCV, supported by theoretical arguments. Here we report on an extensive set of experiments comparing V-fold penalisation and VFCV for SVR/CART calibration on several benchmark datasets. We highlight cases in which VFCV and V-fold penalisation provide poor estimates of the risk respectively and introduce a modified penalisation technique to reduce the estimation error.

Data and Resources

Additional Info

Field Value
Source https://hal.science/hal-00762828
Author Dhanjal, Charanpal, Baskiotis, Nicolas, Clémençon, Stéphan, Usunier, Nicolas
Maintainer CCSD
Last Updated June 1, 2026, 05:48 (UTC)
Created June 1, 2026, 05:48 (UTC)
Identifier hal-00762828
Language en
Rights https://about.hal.science/hal-authorisation-v1/
contributor Machine Learning and Information Retrieval (MALIRE) ; Laboratoire d'Informatique de Paris 6 (LIP6) ; Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)-Université Pierre et Marie Curie - Paris 6 (UPMC)-Centre National de la Recherche Scientifique (CNRS)
creator Dhanjal, Charanpal
date 2012-12-08T00:00:00
harvest_object_id 231ae314-8b35-4431-8539-c0a9b52eb48c
harvest_source_id 3374d638-d20b-4672-ba96-a23232d55657
harvest_source_title test moissonnage SELUNE
metadata_modified 2026-01-19T00:00:00
relation info:eu-repo/semantics/altIdentifier/arxiv/1212.1780
set_spec type:UNDEFINED