On the other hand, in the presence of the spurious feature, the full model can fit the training data perfectly with a smaller norm by assigning weight \(1\) for the feature \(s\) (\(|<\theta^\text<-s>>|_2^2 = 4\) while \(|<\theta^\text<+s>>|_2^2 + w^2 = 2 < 4\)).
Generally, in the overparameterized regime, since the number of training examples is less than the number of features, there are some directions of data variation that are not observed in the training data. In this example, we do not observe any information about the second and third features. However, the non-zero weight for the spurious feature leads to a different assumption for the unseen directions. In particular, the full model does not assign chicas escort Jackson MS weight \(0\) to the unseen directions. Indeed, by substituting \(s\) with \(<\beta^\star>^\top z\), we can view the full model as not using \(s\) but implicitly assigning weight \(\beta^\star_2=2\) to the second feature and \(\beta^\star_3=-2\) to the third feature (unseen directions at training).
Inside example, removing \(s\) decreases the error getting a test shipments with a high deviations out-of no for the second element, while deleting \(s\) advances the mistake getting a test shipping with high deviations off zero for the 3rd feature.
Drop in accuracy in test time depends on the relationship between the true target parameter (\(\theta^\star\)) and the true spurious feature parameters (\(<\beta^\star>\)) in the seen directions and unseen direction
As we saw in the previous example, by using the spurious feature, the full model incorporates \(<\beta^\star>\) into its estimate. Շարունակել դիտել …