If each intermediate tensor uses a dedicated memory buffer (depicted with 65 distinct colors), they take up ~26MB of runtime memory. The intermediate tensors of MobileNet v2 (top) and a mapping of their sizes onto a 2D memory space (bottom). For example, the intermediate tensors in MobileNet v2 take up 26MB memory (Figure 1) which is about twice as large as the model itself.įigure 1. However, this cost, when implemented naively, can't be taken lightly in a resource-constrained environment it can take up a significant amount of space, sometimes even several times larger than the model itself. These intermediate tensors are typically pre-allocated to reduce the inference latency at the cost of memory space.
Typically, a neural network can be thought of as a computational graph consisting of operators, such as CONV_2D or FULLY_CONNECTED, and tensors holding the intermediate computation results, called intermediate tensors.
In this article, we want to showcase improvements in TensorFlow Lite's (TFLite) memory usage that make it even better for running inference at the edge. Running inference on mobile and embedded devices is challenging due to tight resource constraints one has to work with limited hardware under strict power requirements. Label format = '%N - %m-%d-%Y %H:%M:%S', label X = 0, label Y = 0, label size = 2,