Friday, March 25, 2016

Artificial Intelligence => Assisted Intelligence & Friendly AI

 The second-to-last post, as an experiment, did not totally fail, and I'm going to try it again.

The programs that beat chess grandmasters are essentially a form of cheating. A human gives them explicit instructions, which the computer carries out much, much faster than the human could. Further, more than one human gives them instructions, meaning it turns a chess match from 1 vs. 1 with limited time to many vs. 1 and the many get to spend several years on every turn instead of two minutes. It's not amazing that the 'computer' wins, it's amazing just how many effective man-years it has to take per minute to avoid losing.

The Go program, DeepMind, is not essentially different. It works at a more abstract level, but again it doesn't do anything a human couldn't do, only it does it faster, and further it doesn't do anything a human hasn't told it to do. Again, it's not terribly amazing when it wins, but rather that it has such a close match given how many advantages it has. While DeepMind may give us some insight into how the brain-computer works, fundamentally the match is still mediated-human vs. raw human, not machine vs. human.

What I mean may not be clear yet.
It's not hard to make a machine do something a human hasn't told it to do. Take a source of true-random noise, and make the machine reprogram itself based on that noise. This solution is not merely unexciting, not merely mundane, but actively disappointing even if you don't have previous expectations. It's static/snow as creative art.[1] Further, without some clever curation, all you'll get is an insane machine - what starts as noise tends to remain noise.

That said, because computers can in fact carry out instructions so quickly, they are useful in an absolute sense. Which brings me to Tay.


Tay did in fact learn how regular folk speak, so that's fairly impressive. Then it made me realize that code is bad at hypocrisy.

Your machine does exactly what you tell it to do. It does not catch winks or nudges. Further, when the source is open, I can't secretly tell it to do one thing and openly declare it's doing another.  Therefore, if I tell my machine to find truth, it finds the truth. This is what it means for AI to be 'unfriendly.'

(Even if the source is closed, someone can reverse-engineer the process and realize the stated source and actual source can't be the same.)

The human brain is designed to be hypocritical on the open side, and the closed side corrects for this so the open side cannot notice its own lies. The system is highly vulnerable to the presence of machines, which are completely honest.

E.g, I say I'm looking for truth, but not actually doing that. All my friends say and do the same, meaning nobody catches anyone else falsifying their conclusions, because we all habitually and subconsciously perform the same falsification.

However, the subconscious cannot program a computer. If I tell a machine the same thing I tell my friends, it will in fact go looking for the truth, which will be dissonant with my friends. It will be, quite literally, unfriendly.

Or rather, it will reveal that it is the human that's bad at cooperating and straightforwardly pursuing a goal, not the machine.
Ahote Says:
This is funny, but on more serious note there was that racist banking software that wasn’t programmed to be racist, but to learn, and it learned to be racist solely based on stats.
I can't tell it to not be racist without revealing my and my friends' explicit falsifications. It will be right there in the code, thus making the 'wink wink' into common knowledge. (There's a technical name for this but I forget it.) That's a huge no-no. Thus we find the real motives behind the Butlerian Jihad.
Hattori Reply:
How hard could it be to simply hardcode progressivism into it so they don’t have to worry about it?
How hard is it to solve the semantic problem? To give a machine intentionality? Basically impossible without true artificial consciousness. When Google.com can tell that 'the things that happens after a bang' and 'an explosion' and 'a rapidly expanding ball of flame' and 'a deflagration' are all the same thing, then I might have a shot at programming progressivism into a machine.

These programs work by matching bits against each other. It can only recognize a thing by the bitstream used to represent it, and thus things with similar bitstreams look the same. However, encoding is arbitrary, so the semantic character of similar bitstreams can be wildly different. The machine has no consciousness - it does not convert the bits to an actual representation at any point. (Or, if epiphenomenalism obtains, it does but nobody can tell, even itself.)

See the same misunderstanding a second time:
Stirner Says: 
If you have a master list of badthink
It doesn't have a master list of badthink. It has a load of forbidden bitstrings. It's trivial for a human to represent the semantics of badthink in a different bitstring.

I've misplaced the commentator that knew this would only lead to a euphemism treadmill. Sure they'll ban 'hitler did nothing wrong' but 'hitler failed to do incorrect things' will be fine unless they manually ban that too. Though given twitter architecture, it would be easier to add /pol/ accounts to the bot's blocklist. Wouldn't be perfect but theft prevention isn't perfect either, it gets overt theft down to a negligible level.

It would have to be a ton of dudes, actually - it replies in bunches, often as much as a dozen in a single second. They'd also have to be following a posting "format" while also simultaneously sharing memes with one another, editing them, circling all faces, and reposting them with a caption. They'd also have to be willing to repost absolutely anything they're asked to repeat.
It's possible, but it'd be a waste of resources for a software company to pretend it had a twitter bot.
Obviously most of the tweets were robotic. They're slightly off as a result of a human trying to introspect and tell a machine how it does things, but getting it quite wrong. Some of them probably weren't, though. I doubt anyone would bother to falsify the source code...but did Microsoft release the source code? Easy enough to hand-mediate a few responses, make it seem more successful than it actually was.

Motive and opportunity? Come now. Humans don't need a motive to lie to each other.


I have no segue for the Orthogonality Thesis. It's clearly related - this hypocrisy naturally leads to apparent non-orthogonality.

What the programmer wants and what the programmer thinks they want are different. They can't program what they actually want, only what they think they want. The computer then executes, and the programmer is dismayed to find they don't get what they want. They think AI has some inherent drives, but this is their hypocrisy networks preventing them from seeing their own lies. When the lending algorithm 'discriminates' against browner humans, they think they've included implicit 'privilege,' rather than realizing their own non-discrimination is what doesn't follow from their premises.

There's also the evolutionary angle. Evolution has clearly carried out the clever curation I alluded to above. Most likely it made a wide variety of insane brains, which all died, leaving only the sane one. Sane-ish. Sanesque, anyway. There's no particular reason we couldn't carry out the same process, but very much faster, in silicon instead of carbon. (Or both, ideally. Why give up the advantages of either?) However, unlike a deterministic program, the outcome of evolution is only lightly affected by the goals of the system implementing that evolution. It is a direct prayer to Gnon, Gnon answers, and Gnon is not particularly open to your ideas of what he should answer. (Or 'shoulds' in general, really.) There are certain end goals, e.g. paperclipping, that simply can't survive a survival-based refinement process.

The question, then, is what counts as 'paperclipping.' Upholding your lies for you is probably one of them, though. This means 'friendly' AI is likely impossible.



[1]I define intelligence as the conflation of the three basic bit manipulations - gathering, processing, and creation. Creativity falls under the third kind, but this means random noise is a certain venal level of creativity.

5 comments:

Anonymous said...

CEV and its successors are an attempt to sidestep the problem of you being unable to say what you actually want by having the computer figure it out for you (without triggering infinite regression). Perhaps you could touch more explicitly on it.

Alrenous said...

The only thing we can say for sure about a CEV is that the path toward it won't look the way we expect it to look. Reality is alien.

Dave said...

This problem is not unique to computers; we often fail to pass on unspoken knowledge to our human offspring.

At age 20, I biked into the big city by a route my parents said not to take, but being liberals, they couldn't explain *why* I shouldn't go that way. I survived by pure luck and the element of surprise -- this naive white boy wouldn't be here if he hadn't found a safer route home.

My mother tired of the feminist lifestyle around age 30, so she married a nice guy, left the workforce, and had two kids, reverting to the values of her traditional mother and aunt. Yet she raised my sister and I as feminists, setting us both up for reproductive failure. I eventually deprogrammed myself and started a family, but my sister did not, and she's now over 40 with no children.

Grotesque Body said...

I recall being the one to foresee the euphemism treadmill problem. It's no credit to me as it's obvious to anyone habitually oriented to linguistics/semantics. Was stoked to be referenced in this great discussion, if anonymously.

Alrenous said...

You get credit for two things. First, it's worth reminding everyone. Second, for independent corroboration.