Unsupervised Discovery of Structure in Complex Systems

MIT PhD Thesis 2025

Thesis PDF

Mark Hamilton
PhD Advisor: William T. Freeman

TL;DR: How can we build algorithms that learn without human labels, so that we can use AI to help solve scientific challenges humans dont yet know how to solve.

Abstract

How does the human mind make sense of raw information without being taught how to see or hear? This thesis presents a unifying theory that describes how algorithms can learn and discover structure in complex systems, like natural images, audio, language, and video - without human input. This class of algorithms has the possibility to extend our own understanding of the world by helping us to see previously unseen patterns in nature and science. At the core of this thesis' unified theory is the notion that relationships between deep network representations hold the key discover the structure of the world without human input. This work will begin with a few examples of this principle in action; discovering hidden connections that span cultures and millennia in the visual arts, discovering visual objects in large image corpora, classifying every pixel of our visual world, and rediscovering the meaning of words from raw audio, all without human labels. In the latter half of this thesis, we will present two unifying mathematical theories of unsupervised learning. The first will explain why relationships between deep features can rediscover the semantic structure of the natural world by connecting model explainability, cooperative game theory, and deep feature relationships. The second mathematical theory will show that relationships between representations can be used to unify over 20 common machine learning algorithms spanning 100 years of progress in the field of machine learning. In particular, we introduce a single equation that unifies classification, regression, large language modeling, dimensionality reduction, clustering, contrastive learning, and spectral methods. This thesis uses this unified equation as the basis for a "periodic table of representation learning" that predicts the existence of new types of algorithms. We show that one of these predicted algorithms is a state-of-the-art unsupervised image classification technique. Finally, this work will summarize the key findings and share ongoing and future directions.

Papers in This Thesis

MosAIc: Finding Artistic Connections across Culture with Conditional Image Retrieval

Paper Website Webinar Talk Code BibTex

Relationships between visual features can discover hissen connections in visual arts.

Unsupervised Semantic Segmentation by Distilling Feature Correspondences

Website Paper Talk Github BibTex

We show that its possible to classify every pixel of the visual world without human labels.

FeatUp: A Model-Agnostic Framework for Features at Any Resolution

Website Paper Github BibTex

By looking at how deep features change we can improve the resoltion of depe visual features by up to 64x without changing their semantics.

Separating the "Chirp" from the "Chat": Self-supervised Visual Grounding of Sound and Language

Website Paper Talk Github BibTex

We show that its possible to rediscover language without human labels or text - just by watching unlabelled videos.

Axiomatic Explanations for Visual Search, Retrieval, and Similarity Learning

Website Paper Talk BibTex

We introduce a unifying framework for understanding search engines and other unsupervised algorithms. This connects game theory, model explainability, and feature relationships formally. Explaining why these relationships used in the thesis work so well.

I-Con: A Unifying Framework for Representation Learning

Website Paper Github BibTex

We introduce a single equation that unifies 23+ machine learning algorithms. We use this equation to introduce a periodic table of machine learning. Several of the works of this thesis appear in the table.

Contact

For feedback, questions, or press inquiries please contact Mark Hamilton