A CONCEPTUAL FRAMEWORK FOR ROBUST FEW-SHOT LEARNING: INTEGRATING UNBALANCED OPTIMAL TRANSPORT AND SELF-SUPERVISED TRANSFORMER REPRESENTATIONS
DOI:
https://doi.org/10.24191/mjoc.vo11i1.9616Keywords:
Few-Shot Learning, Metric Learning, Optimal Transport, Self-Supervised Learning, Sinkhorn Distance, Unbalanced Vision TransformerAbstract
Few-shot learning (FSL) aims to enable deep models to generalise from extremely limited labelled data, yet unstable metric matching, distribution imbalances, and weak structural representations in low-data regimes often constrain its performance. This paper proposes a conceptual framework that unifies metric-based similarity learning, Unbalanced Optimal Transport (UOT) via Unbalanced Sinkhorn Distance (USD), and self-supervised Transformer representations to conceptually address the theoretical and structural limitations of existing FSL approaches. The framework theoretically unifies distribution-aware USD matching, SSL-enhanced ViT/Swin feature representations, and metric-based inference within a coherent pipeline. This work aims to provide a theoretical foundation and research roadmap for future empirical studies on robust few-shot learning under realistic, distributionally complex conditions.
References
Alsaleh, A. M., Albalawi, E., Algosaibi, A., Albakheet, S. S., & Khan, S. B. (2024). Few-Shot Learning for Medical Image Segmentation Using 3D U-Net and Model-Agnostic Meta-Learning (MAML). Diagnostics, 14(12), 1213. https://doi.org/10.3390/diagnostics14121213
Beh, T. Y. K., Tan, S. C. & Yeo, H. T. (2014, February). Building Classification Models from Imbalanced Fraud Detection Data. Malaysian Journal of Computing, 2(2).
Caron, M., Touvron, H., Misra, I., Jegou, H., Mairal, J. & Bojanowski, P. (2021, October). Emerging Properties in Self-Supervised Vision Transformers. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp. 9630-9640. doi: 10.1109/ICCV48922.2021.00951
Chamarthi, S., Fogelberg, K., Gawlikowski, J., & Brinker, T. J. (2024). Few-shot learning for skin lesion classification: A prototypical networks approach. Informatics in Medicine Unlocked, 48, 101520. https://doi.org/10.1016/j.imu.2024.101520
Chen, W.-Y., Liu, Y.-C., Kira, Z., Wang, Y.-C. F. & Huang, J.-B. (2019). A Closer Look at Few-shot Classification. International Conference on Learning Representations. doi: 10.48550/arXiv.1904.04232
Chen, Y., Liu, Z., Xu, H., Darrell, T. & Wang, X. (2021, August). Meta-Baseline: Exploring Simple Meta-Learning for Few-Shot Learning. In 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021 IEEE Conference on (pp. 9042–9051). IEEE. doi: 10.1109/ICCV48922.2021.00893.
Chizat, L., Peyré, G., Schmitzer, B. & Vialard, F.-X. (2018, February). Scaling Algorithms for Unbalanced Optimal Transport Problems. Mathematics of Computing, 87 (314), pp. 2563–2609. doi: 10.1090/mcom/3303.
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., Uszkoreit, J. & Houlsby, N. (2021, June). An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations (ICLR). doi: 10.48550/arXiv.2010.11929.
Feydy, J., Séjourné, T., Vialard, F.-X., Amari, S., Trouvé, A. & Peyré, G. (2019). Interpolating between Optimal Transport and MMD using Sinkhorn Divergences. Proceedings of the 22nd International Conference on Artificial Intelligence and Statistics (AISTATS) 2019, 89. doi: 10.48550/arXiv.1810.08278.
Hospedales, T., Antoniou, A., Micaelli, P. & Storkey, A. (2022, September). Meta-Learning in Neural Networks: A Survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(9), 5149–5169. doi: 10.1109/TPAMI.2021.3079209
Huang, G., Laradji, I., Vazquez, D., Lacoste-Julien, S. & Rodriguez, P. (2022, August). A Survey of Self-Supervised and Few-Shot Object Detection. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(4), pp. 4071 – 4089. doi: 10.1109/TPAMI.2022.3199617.
Li, W., Wang, L., Huo, J., Shi, Y., Gao, Y. & Luo, J. (2020, February). Asymmetric Distribution Measure for Few-shot Learning. Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence. (pp. 2957-2963). doi: 10.48550/arXiv.2002.00153.
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y. & Zhang, Z. (2021, October). Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. 2021 IEEE/CVF International Conference on Computer Vision (ICCV). pp.9992-10002. doi: 10.1109/ICCV48922.2021.00986.
Gao, T., Fisch, A. & Chen, D. (2021, August). Making Pre-trained Language Models Better Few-shot Learners. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing. pp. 3816–3830. doi: 10.18653/v1/2021.acl-long.295
Séjourné, T., Feydy, J., Vialard, F.-X., Trouvé, A., & Peyré, G. (2023). Sinkhorn Divergences for Unbalanced Optimal Transport (arXiv:1910.12958). arXiv. https://doi.org/10.48550/arXiv.1910.12958
Snell, J., Swersky, K. & Zemel R. (2017, March). Prototypical Networks for Few-shot Learning. Advances in Neural Information Processing Systems. doi:
10.48550/arXiv.1703.05175
Song, Y., Wang, T., Mondal, S. K., & Sahoo, J. P. (2022). A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities (arXiv:2205.06743). arXiv. https://doi.org/10.48550/arXiv.2205.06743
Sung, F., Yang, Y., Zhang, L., Xiang, T., Torr, P. H. S. & Hospedales, T. M. (2018, June). Learning to Compare: Relation Network for Few-Shot Learning. In 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2018 IEEE Conference on (pp. 1199–1208). IEEE. doi: 10.1109/CVPR.2018.00131.
Vinyals, O., Blundell, C., Lillicrap, T., Kavukcuoglu, K. & Wierstra, D. (2016, December). Matching Networks for One Shot Learning. 30th Conference on Neural Information Processing Systems (NIPS 2016). (pp. 3637-3645). doi: 10.48550/arXiv.1606.04080.
Wang, Y., Yao, Q., Kwok, J., & Ni, L. M. (2020, March). Generalizing from a Few Examples: A Survey on Few-Shot Learning. ACM Computing Surveys (CSUR), 53(3), 1-34. doi: 10.1145/3386252.
Ye, H.-J., Hu, H., Zhan, D.-C. & Sha, F. (2020, June). Few-Shot Learning via Embedding Adaptation with Set-to-Set Functions. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). IEEE Conference on (pp. 8805-8814). IEEE. doi: 10.1109/CVPR42600.2020.00883.
Yoon, H., Kwak, J., Tolera, B. A., & Dai, G. (2025, May). SelfReplay: Adapting Self-Supervised Sensory Models via Adaptive Meta-Task Replay. SenSys '25: Proceedings of the 23rd ACM Conference on Embedded Networked Sensor Systems. pp. 226 – 239. doi: 10.1145/3715014.3722066
Zhang, C., Cai, Y., Lin, G. and Shen, C. (2023, May). DeepEMD: Differentiable Earth Mover’s Distance for Few-Shot Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 45, pp.5632-5648. doi: 10.1109/TPAMI.2022.3217373.
Zhou, J., Wei, C., Wang, H., Shen, W., Xie, C., Yuille, A. & Kong, T. (2022). iBOT: Image BERT Pre-Training with Online Tokenizer. The Tenth International Conference on Learning Representations (ICLR 2022). doi: 10.48550/arXiv.2111.07832.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 HAYATI ABD RAHMAN, Pang Yun

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.




