الفهرس | Only 14 pages are availabe for public view |
Abstract In this thesis, we tackle the following question: can one consolidate multi-scale aggregation while learning channel attention more efficiently? To this end, we avail channel-wise attention over multiple feature scales, which empirically shows its aptitude to replace the limited local and uni-scale attention modules. Atten- tion mechanisms have been explored with CNNs across the spatial and channel dimensions. However, all the existing methods devote attention to capturing local interactions from a uni-scale. Thus we propose EMCA, which is lightweight and can efficiently model the global context further; it is easily integrated into any feed-forward CNN architectures and trained in an end-to-end fashion. We validate our novel architecture through comprehensive experiments on image classification, object detection, and instance segmentation with different backbones. |