Days of Mayo
Based on personal data: iPhone pics from Nov. 2022 till today
Every pet owner knows this problem. You pick up a dog on a Tuesday in November, and from that moment your camera roll is never the same again. You photograph everything — the first night, the first snow, the ridiculous sleeping positions. Duplicates accumulate. Three years later you have 845 photos and videos and no idea what any of it looks like as a whole.
This is data trash. Affectionate, sentimental data trash — but trash nonetheless. No structure, no labels, no way to see the shape of it.
I wanted to recycle this trash.
After filtering out duplicates, burst shots, and photos including people, 845 unique images remain. The result is a scatter plot where each picture becomes a dot, and dots that look similar land near each other. There are three ways to look at it:
By look arranges photos by what they literally look like: texture, composition, framing. Outdoor walks with autumn leaves end up near each other. Blurry couch naps drift toward other couch naps. The puppy months, when she was still small enough to fit in a coat pocket, sit in their own area, visually distinct from the full-grown dog sprawled across the entire sofa.
By content groups photos by what’s happening. Playing, sleeping, walking, being held: activities that look different in pixels but share the same vibe land together. A photo of her running through snow in 2023 might sit right next to a summer run from 2025.
By color sorts by palette and lighting. Dark winter evenings in one corner, sun-drenched park photos in another. It’s the most subjective of the three, but also the most surprising: you can see the seasons in the scatter without looking at a single date.
You can zoom in anywhere and click on single pictures to see them up close.
It is, I think, a better way to look at your dog photos than scrolling.
How does a photo end up as a dot?
A neural network looks at each image and turns it into a long list of numbers, a kind of fingerprint. Photos that show similar things get similar fingerprints. The fingerprints live in a big space with hundreds of dimensions, which is not something you can plot on a screen, so a second step squashes them down to two dimensions while trying to keep neighbours as neighbours. Fingerprints that were close in the big space stay close on the page; ones that were far apart get pushed apart. What you see is that flattened map. The axes don’t mean anything on their own, only the distance between dots does.
Technical summary
Export. Photos come from Apple Photos via osxphotos, queried by subject tag and shared album. Each image is resized to 512px; videos get a single keyframe at the midpoint via ffmpeg. Burst shots and near-duplicates are removed automatically.
Embed. Three separate feature vectors are computed per image. DINOv2 (ViT-S/14) captures texture, composition, and spatial structure, the “By look” view. CLIP (ViT-B/32) captures semantic meaning, the “By content” view. An HSV colour histogram (32 bins) captures lighting and palette, the “By color” view.
Reduce. Each embedding is projected to 2D with t-SNE, preserving local neighbourhood relationships. Position in the scatter encodes genuine similarity, not time.
Display. Photos are packed into sprite atlases and drawn to a canvas element.
View the interactive →