On the face it's really fucking dumb.
Humans can't even get good birthday gifts for conspecifics. How are they supposed to know the desires of a superintelligence defined as being unable to communicate with them? Doing tests instead of trusting your first guess is kind of the point of science. Can't do tests on future entities.
On the other hand...
Yudkowsky and co. seem highly convinced that the basilisk meme has nonzero effectiveness and produces donations to MIRI (etc) up to the level of self-destructiveness. It is implausible that Yudkowsky didn't know about the Streisand effect, as the term was coined five years before the basilisk theory.
A truly deviant machiavelli who benefits from donations to AI research would then try to maximize the amplitude of the Streisand effect by maximizing the amplitude of the attempted suppression, on the assumption there's a positive correlation.
Yudkowsky reacted with maximum plausible emotions and repression only restricted by diminishing returns.
So either he's a true defector, or he's really, really, really dumb. Also, plz into emotional continence.
--
More dumb:
For it to be possible to defect on me, I have to define 'me' as including sensations I do not perceive, namely the sensations of future simulations of me. Or, alternatively, I do feel those sensations, meaning it's not acausal, it's just interaction across spacelike separations, such as time travel. Because that wouldn't break the universe or anything.
Yudkowsky accepts that causal decision theory concludes you should defect in the prisoner's dilemma. In other words Yudkowsky could have discovered that conclusion is untrue rather than trying to invent a whole new theory which incidentally creates the apparent possibility of basilisks.
"Since there was no upside to being exposed to Roko's Basilisk, its probability of being true was irrelevant."
Xeno's paradox was a brilliant dig at the idea that Greek philosophy understood physics, motion in particular. Equally, the basilisk shivs Yudkowsky's decision theory. But there's no upside to knowing Yudkowsky's theory has holes in it, now is there?
Classical decision theory already resists blackmail, it simply requires the theory investigator to not stop when they find an emotionally valent conclusion, but to continue until the logic stabilizes.
--
Yudkowsky's sequences are pretty okay. I want to know whether applying logic consistently really is that hard or if Yudkowsky isn't even genuinely trying. Also, plz into emotional continence.