-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve intermediate layer extraction explanation #1338
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good! I've left a proposal to improve the description of the algorithms' output in the DOC string.
"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or " | ||
"class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
"parameter is set to an intermediate layer with more dimensions, the output will be " | ||
"flattened to 2D.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rephrased version (trying to simplify):
Note: The algorithm outputs a time series of class activations or embedding vectors, with a 2D shape [time, feature vector]. Feature vector values will be flattened if the
output
parameter is set to extract an intermediate layer with multiple dimensions.
"class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
"parameter is set to an intermediate layer with more dimensions, the output will be " | ||
"flattened to 2D.\n" | ||
"\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comments as for TensorflowPredictEffnetDiscogs
"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or " | ||
"class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
"parameter is set to an intermediate layer with more dimensions, the output will be " | ||
"flattened to 2D.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for TensorflowPredictEffnetDiscogs
"Note: The output of this algorithm is 2D, which is suitable for extracting embeddings or " | ||
"class activations (the output shape is, e.g., [time, number of classes]). If the output " | ||
"parameter is set to an intermediate layer with more dimensions, the output will be " | ||
"flattened to 2D.\n" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for TensorflowPredictEffnetDiscogs
@@ -66,6 +68,11 @@ AlgorithmStatus TensorToVectorReal::process() { | |||
_timeStamps = tensor.dimension(2); | |||
_featsSize = tensor.dimension(3); | |||
|
|||
if (_channels != 1 && !_warned) { | |||
E_WARNING("TensorToVectorReal: The channel axis (dimension 1) of the input tensor has size larger than 1, but the output of this algorithm is 2D. The batch, channel, and time axes (dimensions 0, 1, 2) will be flattened to the first dimension of the output matrix."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We output a vector of vector of reals, so the "matrix" terminology may be misleading.
TensorToVectorReal converts tensors to 2D arrays by flattening all axis but the last one into the first dimension.
model-specific prediction algorithms (e.g., TensorflowPredictVGGish) use this algorithm to return 2D arrays since they are primarily intended for time-wise predictions or embeddings. However, it is possible to use these algorithms to extract intermediate layers of the models that may have more than two dimensions. In this case, all dimensions but the last one will be flattened. To address this:
Note that it is also possible to retrieve intermediate layers with their original shape using TensorflowPredict as discussed here.