October: Ego4D | News and features


The University of Bristol is part of an international consortium of 13 universities that have worked in partnership with Facebook AI to advance self-centered awareness.

As a result of this initiative, they have created the world’s largest egocentric data set using commercially available head-mounted cameras.

Advances in Artificial Intelligence (AI) and Augmented Reality (AR) require learning from the same data that humans process in order to perceive the world. Our eyes enable us to explore places, understand people, manipulate objects and enjoy activities – from opening the door every day to the exciting interplay of a soccer game with friends.

Egocentric 4D Live Perception (Ego4D) is an extensive data set that compiles 3,025 hours of portable camera footage from 855 participants in nine countries: UK, India, Japan, Singapore, KSA, Colombia, Rwanda, Italy and the US. The data cover a wide range of activities from the “egocentric” perspective – that is, from the point of view of the person who carries out the activity. The University of Bristol is the only UK representative in this diverse and international initiative, collecting 270 hours from 82 participants who took recordings of their selected daily living activities – such as practicing a musical instrument, gardening, caring for their pet or assembling of furniture.

“In the not too distant future, you might be wearing smart AR glasses that will walk you through a prescription or how to fix your bike – they might even remind you where you left your keys,” said principal investigator at the university of Bristol and Professor of Computer Vision, Dima Damen.

“However, in order for AI to move forward, it must understand the world and the experiences it contains. AI seeks to learn about all aspects of human intelligence by processing data that we perceive. To enable such automated learning, we need to capture and record our daily experiences “through our eyes”. That is what Ego4D offers. “

In addition to the recorded footage, a number of benchmarks are available to researchers. A benchmark is a problem definition along with manually collected labels to compare models. EGO4D benchmarks relate to the understanding of places, spaces, ongoing actions, upcoming actions as well as social interactions.

“Our five new, demanding benchmarks offer researchers a common goal to build basic research for the real perception of visual and social contexts,” says Professor Kristen Grauman from Facebook AI – Technical Lead.

The ambitious project was inspired by the University of Bristol’s successful EPIC-KITCHENS dataset, which recorded the participants’ daily kitchen activities in their homes and was the largest dataset in the field of egocentric computer vision to date. EPIC-KITCHENS pioneered the “pause and narrate” approach to give an almost exact point in time where each action takes place in the long and varied videos. Using this approach, the EGO4D consortium collected 2.5 million timestamped statements on ongoing actions in the video, which is crucial for benchmarking the collected data.

Ego4D is a huge and diverse set of benchmarks that will be invaluable to researchers working in the fields of augmented reality, assistive technology, and robotics. The datasets will be publicly available in November this year to researchers who sign the Ego4D data usage agreement.

additional Information

Bristol University EGO4D team:

Prof. Dima Damen – Professor of Computer Vision

Dr. Michael Wray – Postdoc

Mr. Will Price – PhD student

Mr. Jonathan Munro – PhD student

Mr. Adriano Fragomeni – PhD student

Members of the consortium:

  • Bristol University, UK
  • Carnegie Mellon University (Pittsburg, USA and Rwanda)
  • Georgia Tech, USA
  • Indiana University, USA
  • International Institute for Information Technology, Hyderabad, India
  • King Abdullah University of Science and Technology (KAUST), KSA
  • Massachusetts Institute of Technology, USA
  • National University of Singapore, Singapore
  • Universidad de los Andes, Colombia
  • University of Catania, Italy
  • University of Minnesota, USA
  • University of Pennsylvania, USA
  • Tokyo University, Japan

EPIC KITCHENS is a collaboration with the University of Toronto (Canada) and the University of Catania (Italy) led by the University of Bristol to collect and annotate the largest (over 20 million frames) dataset of the 45 people in their own home recorded over several consecutive days.

The dataset was collected in 4 different countries and narrated in 6 languages ​​to help with visual and language challenges. It offers a range of challenges from object recognition to action prediction and activity modeling in a non-scripted realistic day-to-day environment.

The size of the publicly available datasets is critical to advancement in this area, which is paramount for robotics, healthcare, and augmented reality.

Read more about EPIC-KITCHENS in our blog: EPIC-KITCHENS: Bringing helpful AI closer to reality


Leave A Reply