FeatUp: A Model-Agnostic Framework
for Features at Any Resolution

ICLR 2024

Paper Code 🤗 Demo Collab Notebook

Stephanie Fu*, Mark Hamilton*, Laura Brandt, Axel Feldman, Zhoutong Zhang, William T. Freeman

*Equal Contribution

TL;DR: FeatUp improves the spatial resolution of any model's features by 16-32x without changing their semantics.

Examples

Video
DINO
DINO+FeatUp
Video
DINO
DINO+FeatUp
Video
DINO
DINO+FeatUp
Video
DINO
DINO+FeatUp
Video
DINO
DINO+FeatUp
Video
DINO
DINO+FeatUp

Any Backbone:

Video
ViT
DINO
DINOv2
CLIP
RN50
ViT + FeatUp
DINO + FeatUp
DINOv2 + FeatUp
CLIP + FeatUp
RN50 + FeatUp

Improve Downstream Tasks without Retraining :

Video
Semseg Probe
Depth Probe
CAM (Horse)
CAM (Bars)
Semseg + FeatUp
Depth + FeatUp
CAM + FeatUp
CAM + FeatUp

Upsamples Every Feature Dimension:

Video
DINO PCA 1,2,3
DINO PCA 4,5,6
DINO PCA 7,8,9
FeatUp PCA 1,2,3
FeatUp PCA  4,5,6
FeatUp PCA 7,8,9

Abstract

Deep features are a cornerstone of computer vision research, capturing image semantics and enabling the community to solve downstream tasks even in the zero- or few-shot regime. However, these features often lack the spatial resolution to directly perform dense prediction tasks like segmentation and depth prediction because models aggressively pool information over large areas. In this work, we introduce FeatUp, a task- and model-agnostic framework to restore lost spatial information in deep features. We introduce two variants of FeatUp: one that guides features with high-resolution signal in a single forward pass, and one that fits an implicit model to a single image to reconstruct features at any resolution. Both approaches use a multi-view consistency loss with deep analogies to NeRFs. Our features retain their original semantics and can be swapped into existing applications to yield resolution and performance gains even without re-training. We show that FeatUp significantly outperforms other feature upsampling and image super-resolution approaches in class activation map generation, transfer learning for segmentation and depth prediction, and end-to-end training for semantic segmentation.

FeatUp: Upsampling Model Representations with Self-Supervision

FeatUp upsamples any deep network's features to arbitrary resolution while retaining the original semantics. We learn a high-res feature map by enforcing consistency across many low-res "views", which are formed by perturbing and featurizing the input image.

Inspired by NeRF's implicit scene representation learned by enforcing image consistency across multiple views, we learn a view-consistent implicit network that outputs features at any queried resolution. This upsampler can also be parameterized as a feedforward module, usable in any existing pipeline and trainable end-to-end.

Results

Above: both variants of FeatUp (implicit and feedforward JBU module) resolve high-res details that other methods cannot. Additionally our features lie in the same space as the input features, making them usable in downstream architectures without re-training.


Above: Upsampled features from a variety of vision backbones. FeatUp introduces spatial resolution while preserving semantics.

Downstream Evaluations

We evaluate FeatUp on a variety of downstream tasks from the broader literature, including linear probe transfer learning, where features are directly used for depth estimation and semantic segmentation. Additionally, we evaluate CAM quality and weakly-supervised object localization performance. Across the board, our methods qualitatively and quantitatively improve performance on these downstream tasks - see our supplementary material for more examples.

Paper

Bibtex

@inproceedings{
    fu2024featup,
    title={FeatUp: A Model-Agnostic Framework for Features at Any Resolution},
    author={Stephanie Fu and Mark Hamilton and Laura E. Brandt and Axel Feldmann and Zhoutong Zhang and William T. Freeman},
    booktitle={The Twelfth International Conference on Learning Representations},
    year={2024},
    url={https://openreview.net/forum?id=GkJiNn2QDF}
}

Contact

For feedback, questions, or press inquiries please contact Mark Hamilton and Stephanie Fu