Skip to main content

Days of Mayo

Based on personal data: iPhone pics from Nov. 2022 till today

View the interactive →

Every pet owner knows this problem. You pick up a dog on a Tuesday in November, and from that moment your camera roll is never the same again. You photograph everything — the first night, the first snow, the ridiculous sleeping positions. Duplicates accumulate. Three years later you have 845 photos and videos and no idea what any of it looks like as a whole.

This is data trash. Affectionate, sentimental data trash — but trash nonetheless. No structure, no labels, no way to see the shape of it.

I wanted to see the shape of it.

After filtering out duplicates, burst shots, and non-dog photos, 845 images remain. The result is a scatter plot where each one becomes a dot, and dots that look similar land near each other. There are three ways to look at it:

Visual arranges photos by what they literally look like — texture, composition, colour. Outdoor walks with autumn leaves end up near each other. Blurry couch naps drift toward other couch naps. The puppy months — when she was still small enough to fit in a coat pocket — sit in their own area, visually distinct from the full-grown dog sprawled across the entire sofa.

Scene groups photos by what’s happening. Playing, sleeping, walking, being held — activities that look different in pixels but share the same vibe land together. A photo of her running through snow in 2023 might sit right next to a summer run from 2025.

Mood sorts by colour and lighting. Dark winter evenings in one corner, sun-drenched park photos in another. It’s the most subjective of the three, but also the most surprising — you can see the seasons in the scatter without looking at a single date.

You can zoom in anywhere and hover over photos to see them up close.

It is, I think, a better way to look at your photos than scrolling.


How it works

Export. Photos come from Apple Photos via osxphotos, queried by subject tag and shared album. Each image is resized to 512px; videos get a single keyframe at the midpoint via ffmpeg. Burst shots and near-duplicates are removed automatically.

Embed. Three separate feature vectors are computed per image. DINOv2 (ViT-S/14) captures texture, composition, and spatial structure — the “Visual” view. CLIP (ViT-B/32) captures semantic meaning — the “Scene” view. An HSV colour histogram (32 bins) captures lighting and palette — the “Mood” view.

Reduce. Each embedding is projected to 2D with UMAP, preserving local neighbourhood relationships. Position in the scatter encodes genuine similarity, not time.

Display. Photos are packed into sprite atlases and drawn to a canvas element.

View the interactive →