Author here: Yeah, I was surprised that there doesn't seem to be many options for extra optimizations in the Go compiler. Would be curious if Go experts have more insight or other compiler recommendations.
I doubt its the GC kicking in, but you could run it with the following environment variable set just to ensure that its not doing anything.
GOGC=-1
EDIT: Trying it out quickly shows a small improvement actually, but so small as to likely be noise as I was doing other things on the machine. Summary
./mainnogc 1000 ran
1.01 ± 0.06 times faster than ./maingc 1000
You've set the Go code to use int64 but the other faster implementations are using int32.