Concerning (2), I've had a look at the CPU load on my system during sampling, and it rises to about 14% of one CPU core for about one second in each minute. This is on a fairly powerful quad core i5 machine, so I guess on a much less powerful machine (Raspberry Pi for example) it could saturate the CPU for a second or so.
I'm not sure that I can do anything about this. During sampling there's quite a lot of processing of data, and although the relevant code could probably be optimised for minimum CPU load, I doubt that there's a great gain to be made. And there would be plenty of scope for introducing new bugs.