Friday, February 10, 2023

P-hacking for Dummies

The way they explain p-hacking is intentionally obfuscated because p-hacking is egalitarian. (IOW evil, and they're evil-positive.) 

P-hacking is like running 1000 wind sprints, then selecting your top 10 times, dividing it by 10, and calling it your "average." 

Speedrunning videogames is dumb for so many reasons you can figure it out for yourself. I would like to suggest a less-dumb version, called retry%.  You run 10 times, then divide your total time by the number of successful runs. E.g. imagine ten runs of roughly ten minutes, or rather nine successful runs of around ten minutes plus one failed run of three minutes. 93/9 = 10.3 minutes. Retry% rewards players who don't take stupid risks.

Normally the "world record" is held by whoever is the most tedium-resistant and they grind runs for six months to get one "record." In other words the actual holder of the record is the game's RNG, not the player. It rewards players who make the most stupid risks - or rather, offers the illusion of reward, as it's not like there are cash bounties for VG time records. In retry%, it's valid to simply demand a new ten runs from the record-holder, to show the time wasn't a fluke. Replication, bro.

Retry% illustrates p-hacking in more detail. You run the game 1000 times, then select the best series of ten runs, using arbitrary start and stop points. Effectively letting you RNG up a clean ten runs. If instead you have to pre-register, and the runs are explicitly numbered 1-10, it's sufficiently difficult to swap in a fake successful run for a failed run. 


If you collect a data set and notice something weird that you didn't think to look for, you have to adjust the p-value of the weird thing you noticed by the number of weird things you might have noticed. Usually it's easier to simply gather a new data set with the new guess pre-registered, rather than trying to carefully weight and measure all the things that didn't happen but could have. What does "could have" even mean? 

Reminder that if p < 0.05, but there were 20 weird things that you could have noticed, actually p = 1. 

Reminder that at p < 0.05, one in every 20 studies would be worthless, even if "peer review" wasn't fraudulent. "The world's total number of scientific journal articles was estimated at 2.52 million in 2018." In other words a bare minimum 126,000 papers are flat wrong every year - more than enough to find "scientific" support for whatever nonsense you happen to want to ram into law. 

Fun fact: more realistic estimates put the number of flat wrong papers over 60%. Because duh, why are you even producing over 100,000 papers a year, let alone millions? How many genuine discoveries are there a year, a dozen? Two dozen? The total meaningful yearly scientific output fits in a pamphlet. Unless you're okay with hundreds and hundreds of copies of, "Yeah, what he said!" you have no choice but produce truckloads of manure - worse than manure, really, since it can't be used as plant food.


P.S. Imagine school tests were actually supposed to be a valid measure of the % of the course's material you learned. kek

1 comment:

bad comedy said...

Why would anyone want to hack pee?