I would point out, as others done, that you have multiplicity and overfitting problems. The p-value obtained ahould be adjusted by all the combinations that you looked at. In any sufficiently large number of variables in a completely random sample, we would expect to find some number of combinations that appear "significant" by chance alone. Repetition will result in different combinations with the lowest observed p-value that time.
Minimizing involves an extremum, and extrema are particularly variable and vulnerable to overfitting. It is thus particularly likely that the the "optimal" model found uskmg one sample will not be reproducinle and woll not be the optimal model found using another.
One way of addressing reproducibility in general is dividing your overall sample into training and validation samples, and after building the model using the training sample, using the validation sample to check if the selected model is reproducible. Even if the repetition results in a low p-value, you may find it does not result in the lowest.
In general, models of this nature are hypothesis-generating, while p-values convey an impression of a confirmation, of the existence of a quantum of evidence to establish a proposition. For this reason, it might be best not to use p-values for model-building, as you may convey a false impression that your model has a level of reliability and reproducibility that it simply doesn't. If you are going to use p-values, at least adjust them for multiplicity.
------------------------------
Jonathan Siegel
Director Clinical Statistics
------------------------------
Original Message:
Sent: 07-20-2020 11:34
From: Terry Meyer
Subject: Minimizing p-values
I have a methodology question. Suppose I have a dependent variable, Y, and n+1 independent variables, Z and X1 thru XN. I want to choose a subset of the Xs to minimize the p-value of Z in a regression of Y on Z and the subset of Xs. Is there a analytical means of choosing the subset? Is there a practical means other than an 'all subsets' regression? Thank you for any help.
------------------------------
Terry Meyer
------------------------------