PyTorch/XLA 2.4 improves Pallas and adds “eager mode”

And now, instead of having to call xm.mark_step() you can call torch_xla.sync()instead. These improvements make it easier to convert your code over to PyTorch/XLA and improve the developer workflow. For more changes to API calls, check out the release notes.

Experimental eager mode

If you’ve been working with PyTorch/XLA for a while, you know that we refer to models being “lazily executed.” That means that PyTorch/XLA creates the compute graph of operation before sending models over to be executed on the XLA device target hardware. With new eager mode, operations are compiled and then immediately executed on the target hardware.

The catch to this feature though is that TPUs themselves do not have a true eager mode, since each instruction is not sent to the TPU by default right away. On TPUs, we achieve this by adding a “mark_step” call after each PyTorch operation to force the compilation and execution. This results in the functionality of eager mode but as an emulation rather than as a native feature.

Our intent with eager mode in this release is not to run it in your production environment, but rather in your own local environments. We hope that eager mode makes it easier to debug your models locally on your own machines without having to deploy it to a larger fleet of devices, such as is the case of most production systems.

Cloud TPU info command line interface

If you’ve used Nvidia GPUs before, you may be familiar with the nvidia-smi tool, which you can use to debug your GPU workloads, identify which cores are being utilized, and see how much memory a given workload is consuming. And now, there’s a similar command line utility for Cloud TPUs that makes it easier to surface utilization information and device information: tpu-info. Here’s an example of its output: