Saturday, August 12, 2023

Let's see if I've worked out the new AI

A transformer is just interpolation, but, like, really complicated?

And an LLM is a kind of interpolation that looks like extrapolation?


An LLM is fairly easy to describe since it's 1D. It just takes some mathematical function of the previous words in an attempt to predict what the next word would be. 

It extrapolates based on the input text, but the result is an interpolation of the training data. It's an extraordinarily expensive way of trying to figure out the conventional answer. If someone in the training data had started saying X, how would they have finished saying X?


Interpolation cannot create anything new. An LLM cannot extrapolate semantically, it can only try to make the output text match what it sees in the input text.
It's an extraordinarily crippled approach to AI, like trying to transport water in a bucket by first drilling a bunch of holes in the bucket. It's not like you can't get water from A to B using a Swiss bucket, but it takes an ungodly amount of effort. 

Have you tried drinking with a fork? I gave up once the liquid wouldn't cover half the tines. It's an amusing thing to do, once. Gives you a sense of "nonzero" as in, "A mouthful of water using a fork is a nonzero amount."

When they try to refine the approach they often attempt to drill more holes in the bucket, but sometimes they get the bright idea to drill smaller holes. Normally they throw more buckets ("compute") at it, in an attempt to pretend that money is no object and they're so high-status they don't suffer from scarcity.

 

The transformers take the surroundings of a pixel and try to work out what that pixel 'should' look like. If, in the training data, the surroundings of a pixel looked like Y, what is the correct value of the pixel itself, X? Because this is recursive the transformation has to be done many times and try to get to an equilibrium. Imagine three regions - left, middle, right. Once you've adjusted the left to be consistent with itself and the middle, the middle is now being adjusted to a different picture than the left was adjusted to, and the right will be adjusted to yet a third picture. The model can't appreciate the picture as a whole, so you have to run it again and hope it converges.

Basically the prompts adjust the weights on the pixels. Can you see how this means there's a distinctly finite number of meaningfully different output pictures? There's only so many weight sets that make sense.

That it would have issues with fingers is predictable. Fingers are periodic, and AI can't count the fingers. It doesn't see fingers (or words). It just sees a bunch of numbers. The numbers don't mean anything to it - it can only tell the numbers need to be larger or smaller, or so its masters claim. It can sort of tell that a finger should be next to other fingers, but it can't see any further than a finger or two away because making the pixel regions too big will bring your CPU to its knees.

No comments: