Hi Frederick & Co,
Thank you for sharing your awesome work with us! I was wondering if you've already put some thought in how your approach can be extended to analyzing the quality of training data for object detection.
Your technique is fairly straightforward to understand in the context of image classification, where there is one classification head and one can easily find a layer close to the output (e.g., use the weights that connect the layer before the logit layer to the logit layer, as mentioned in the FAQ doc), allowing for the computation of a TracInCP score per frame. But how does one do this in the case of an object detector where there are typically two heads (one for classification, one for bounding box regression) and a varying number of objects per frame? Do you believe that object-level gradients for the objects in a training frame can be aggregated in a meaningful way (e.g., mean or max of gradients across objects for a frame?) such that one can still compute a meaningful dot product between aggregated loss gradients for a training frame and aggregated loss gradients for a test frame? Would this have to be done separately for classification and regression (e.g. a classification proponent may turn out to be a regression opponent, and that would be useful information to surface)?
Are you aware of similar attempts at extending your work to object detection?
Thank you for taking the time to share your thoughts on this, @frederick0329 !
Hi Frederick & Co,
Thank you for sharing your awesome work with us! I was wondering if you've already put some thought in how your approach can be extended to analyzing the quality of training data for object detection.
Your technique is fairly straightforward to understand in the context of image classification, where there is one classification head and one can easily find a layer close to the output (e.g., use the weights that connect the layer before the logit layer to the logit layer, as mentioned in the FAQ doc), allowing for the computation of a TracInCP score per frame. But how does one do this in the case of an object detector where there are typically two heads (one for classification, one for bounding box regression) and a varying number of objects per frame? Do you believe that object-level gradients for the objects in a training frame can be aggregated in a meaningful way (e.g., mean or max of gradients across objects for a frame?) such that one can still compute a meaningful dot product between aggregated loss gradients for a training frame and aggregated loss gradients for a test frame? Would this have to be done separately for classification and regression (e.g. a classification proponent may turn out to be a regression opponent, and that would be useful information to surface)?
Are you aware of similar attempts at extending your work to object detection?
Thank you for taking the time to share your thoughts on this, @frederick0329 !