Band is the first mobile inference platform to support multi-DNN workloads on heterogeneous mobile processors. Existing mobile deep learning frameworks such as TFLite focus on single DNN inference and thus cannot fully handle multi-DNN workloads with heterogeneous processors. Moreover, the limited operator support of different accelerators further complicates the problem. Band tackles this challenge by partitioning DNNs into subgraphs, dynamically selecting optimal schedules, and considering fallback operators for unsupported processors. Evaluation results show that Band outperforms TensorFlow Lite by up to 5.04× for single-app multi-DNN workloads and achieves a 3.76× higher satisfaction rate for latency-critical multi-app scenarios.
With novel findings and extensive evaluation, Band was published in MobiSys 2022.
Our team consisted of 5 people, and we implemented and evaluated the entire platform together. Furthermore, I designed and implemented the subgraph partitioning algorithm, which is the core concept of Band.
- Implemented Band in a 5-person team and experienced collaboration, including code review and testing
- Implemented Band based on well-organized TensorFlow Lite C++ code, increasing my understanding of system design and C++-based development
- Designed and implemented techniques based on system profiling
- Gained an understanding of accelerator APIs such as OpenCL and NNAPI while implementing a platform that supports heterogeneous processors