If you have a tiny model (robotics, RL) cpu-overhead bound, avoid frequent calls to eval() or train() in eager mode, or model.parameters() or anything that goes through your model. Prefer cached versions of these calls.
Comments
Log in with your Bluesky account to leave a comment
Using hydra or similar fancy config objects: Avoid calling cfg.attribute often in the code. Instead, cache the args values in your script as global workspace variables.
In general, in-place operations are not preferable to regular ones (you won't gain much mem improvement or speed-ups). Don't load your code with ReLU(inplace=True), mul_, add_ if not absolutely necessary.
I use line_profiler to check the code line-by-line (careful: cuda ops re async, do not trust it for these!) - very useful to check cpu-overhead https://pypi.org/project/line-profiler/
torch.utils.benchmark.Timer is amazing to assess the runtime of a whole isolated piece of code, but be mindful that the way it plays with global variables isn't always obvious and may differ from time.time() on occasions
Good old cProfile with snakeviz is pretty cool too https://jiffyclub.github.io/snakeviz/
Again, not for cuda ops, and not as fine-grained as line-profiler but quite useful for macro-tracking of compute time
Comments
Again, not for cuda ops, and not as fine-grained as line-profiler but quite useful for macro-tracking of compute time