As the field of representation learning grows, there has been a proliferation of different loss functions to solve different classes of problems. We introduce a single information-theoretic equation that generalizes a large collection of modern loss functions in machine learning. In particular, we introduce a framework that shows that several broad classes of machine learning methods are precisely minimizing an integrated KL divergence between two conditional distributions: the supervisory and learned representations. This viewpoint exposes a hidden information geometry underlying clustering, spectral methods, dimensionality reduction, contrastive learning, and supervised learning. This framework enables the development of new loss functions by combining successful techniques from across the literature. We not only present a wide array of proofs, connecting over 23 different approaches, but we also leverage these theoretical results to create state-of-the-art unsupervised image classifiers that achieve a +8% improvement over the prior state-of-the-art on unsupervised classification on ImageNet-1K. We also demonstrate that I-Con can be used to derive principled debiasing methods which improve contrastive representation learners.

Information Contrastive Learning (I-Con)

Machine learning methods can seem like a collection of isolated techniques, but what if they all shared a deeper connection? Information Contrastive Learning (I-Con) uncovers this unity. At its core, I-Con reframes over 20 different machine learning methods as the problem of approximating relationships in a training dataset. Mathematically, the I-Con loss is expressed as the average KL divergence between the two neighborhood distributions. By using different neighborhood definitions, we show that I-Con seamlessly incorporates methods from clustering, dimensionality reduction, and contrastive learning.

Unifying Machine Learning Losses

To see how I-Con unifies common machine learning algorithms, let's look at a few specific examples. I-Con unifies popular methods like dimensionality reduction (SNE, t-SNE), contrastive learning (SimCLR), and clustering (k-Means), supervised learning and many more through a single mathematical equation that matches two neighborhood distributions. In methods like stochastic neighbor embedding (SNE), I-Con matches high-dimensional Gaussian neighborhoods with lower-dimensional embeddings. Replacing low-dimensional embeddings with a probability distribution over clusters results in k-Means. Switching Gaussian neighborhoods to graph edge-based neighbors yields Spectral clustering. Using data-augmentation pairs and Gaussian neighborhoods recovers the InfoNCE loss used in contrastive learning methods like CLIP and SimCLR. Beyond re-deriving existing methods, I-Con can also create new ones by mixing neighborhood definitions, such as our contrastive clustering method that combines augmentation pairs and cluster probabilities.

Deriving New Methods

The I-Con framework not only unifies existing methods but also guides the discovery of new algorithms. We organize methods into a periodic table by breaking them into rows and columns based on the types of neighborhood data they use for their learned and supervisory signals. Rows correspond to different types of learned distributions (like clusters or low-dimensional embeddings), while columns correspond to different types of supervisory signals (like Gaussian neighbors or graph edges). This structured organization reveals gaps where new methods can be developed. Additionally, this shared mathematical foundation allows ideas to be transferred across algorithms, leading to innovations like debiased clustering techniques.

Results

We compare I-Con-derived methods to state-of-the-art unsupervised image classification techniques like SCAN and TEMI using the ImageNet-1K dataset and the DINO backbone. I-Con-based image clustering consistently outperforms existing approaches. Our Debiased InfoNCE Clustering method improves over the previous art TEMI across all backbone sizes. The improvements come from I-Con’s self-balancing loss and its ability to integrate insights from contrastive learning and dimensionality reduction into clustering.

Paper

Bibtex

@inproceedings{
    alshammari2025a,
    title={A Unifying Framework for Representation Learning},
    author={Shaden Naif Alshammari and Mark Hamilton and Axel Feldmann and John R. Hershey and William T. Freeman},
    booktitle={The Thirteenth International Conference on Learning Representations},
    year={2025},
    url={https://openreview.net/forum?id=WfaQrKCr4X}
}

Related Projects

Unsupervised Semantic Segmentation by Distilling Feature Correspondences

Website Paper Talk Github BibTex

We show that through an I-Con style loss its possible to classify every pixel from in an image dataset without any supervision, labels, or human feedback of any kind.

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Website Paper Talk Github BibTex

We show that through an I-Con style loss its possible to rediscover the meaning of language just by watching videos. No text or supervision is used in the algorithm.

Contact

For feedback, questions, or press inquiries please contact Shaden Alshammari or Mark Hamilton

I-Con: A Unifying Framework for Representation Learning

ICLR 2025

Shaden Alshammari, John Hershey, Axel Feldmann, William T. Freeman, Mark Hamilton

Information Contrastive Learning (I-Con)

Unifying Machine Learning Losses

Deriving New Methods

Results

Paper

Bibtex