Banner
Home      Log In      Contacts      FAQs      INSTICC Portal
 
Documents

Tutorials

The role of the tutorials is to provide a platform for a more intensive scientific exchange amongst researchers interested in a particular topic and as a meeting point for the community. Tutorials complement the depth-oriented technical sessions by providing participants with broad overviews of emerging fields. A tutorial can be scheduled for 1.5 or 3 hours.



Tutorial on
Seeing Through the User’s Eyes: Advances in Human-Centric Egocentric Vision


Instructor

Francesco Ragusa
University of Catania
Italy
 
Brief Bio
Francesco Ragusa is a Research Fellow at the University of Catania. He is member of the IPLAB (University of Catania) research group since 2015. He has completed an Industrial Doctorate in Computer Science in 2021. During his PhD studies, he has spent a period as Research Student at the University of Hertfordshire, UK. He received his master’s degree in computer science (cum laude) in 2017 from the University of Catania. Francesco has authored one patent and more than 10 papers in international journals and international conference proceedings. He serves as reviewer for several international conferences in the fields of computer vision and multimedia, such as CVPR, ECCV, BMVC, WACV, ACM Multimedia, ICPR, ICIAP, and for international journals, including TPAMI, Pattern Recognition Letters and IeT Computer Vision. Francesco Ragusa is member of IEEE, CVF e CVPL. He has been involved in different research projects and has honed in on the issue of human-object interaction anticipation from egocentric videos as the key to analyze and understand human behavior in industrial workplaces. He is co-founder and CEO of NEXT VISION s.r.l., an academic spin-off the the University of Catania since 2021. His research interests concern Computer Vision, Pattern Recognition, and Machine Learning, with focus on First Person Vision.
Abstract

Wearable devices equipped with cameras, sensors, and on-device AI capabilities are rapidly evolving, driven by the growing availability of commercial solutions and the integration of Augmented Reality into everyday workflows. These devices enable natural and continuous user-machine interaction and open the door to intelligent assistants that expand human capabilities. The combination of mobility, contextual awareness, and multimodal sensing makes wearable systems a unique platform for advanced AI and Computer Vision applications.


First-person (egocentric) vision, unlike traditional third-person approaches that observe the scene from an external point of view, captures the world directly from the user's perspective. This viewpoint provides privileged access to users’ actions, intentions, attention, and interactions with objects and the environment. Recent advances in multimodal learning, large-scale datasets, and foundation models have further accelerated research in egocentric understanding, task reasoning, and human-AI collaboration.


This tutorial will present an updated overview of the challenges and opportunities in egocentric vision, discussing its foundations while highlighting recent methodological breakthroughs and emerging applications.


Keywords

Wearable devices, first person vision, egocentric vision, augmented reality, egocentric datasets, action recognition, action anticipation, human-object interaction, procedural understanding

Aims and Learning Objectives

The participants will understand the main advantages of first person (egocentric) vision over third person vision to analyze the user’s behavior, build personalized applications and predict future events. Specifically, the participants will learn about: 1) the main differences between third person and first person (egocentric) vision, including the way in which the data is collected and processed, 2) the devices which can be used to collect data and provide services to the users, 3) the algorithms which can be used to manage first person visual data for instance to perform action recognition, human-object interaction, procedural understanding and the prediction of future events.

Target Audience

First year PhD students, graduate students, researchers, practitioners.

Prerequisite Knowledge of Audience

Fundamentals of Computer Vision and Machine Learning (including Deep Learning)

Detailed Outline

The tutorial is divided into two parts and will cover the following topics:
Part I: History and motivation
• Agenda of the tutorial;
• Definitions, motivations, history and research trends of First Person (egocentric) Vision;
• Seminal works in First Person (Egocentric) Vision;
• Differences between Third Person and First Person Vision;
• First Person Vision datasets;
• Wearable devices to acquire/process first person visual data;
• Main research trends in First Person (Egocentric) Vision;
Part II: Fundamental tasks for first person vision systems:
• Visual Localization;
• Attended Object Detection;
• Hand-Object Interaction;
• Procedural Understanding;
• Actions and Objects anticipation;
• Dual-Agent Language Assistance
• Industrial Applications;
The tutorial will cover the main technological tools (devices and algorithms) which can be used to build first person vision applications, discussing challenges and open problems and will give conclusions and insights for research in the field.

Secretariat Contacts
e-mail: visapp.secretariat@insticc.org

footer