Organisers | Program
Committe |
Important Dates |
Data Protocol |
Baseline | Submiting Results |
Paper Submission |
FAQ |
Organisers
Abhinav Dhall Australian National University abhinav.dhall@anu.edu.au Roland Goecke University of Canberra /Australian National University roland.goecke@ieee.org Jyoti Joshi University of Canberra jyotijoshi10@gmail.com Michael Wagner University of Canberra /Australian National University michael.wagner@canberra.edu.au Tom Gedeon Australian National University tom@cs.anu.edu.au |
The Emotion Recognition In The Wild Challenge and Workshop (EmotiW) 2013 Grand Challenge consists of an audio-video based emotion classification challenges, which mimics real-world conditions. Traditionally, emotion recognition has been performed on laboratory controlled data. While undoubtedly worthwhile at the time, such lab controlled data poorly represents the environment and conditions faced in real-world situations. With the increase in the number of video clips online, it is worthwhile to explore the performance of emotion recognition methods that work ‘in the wild’. The goal of this Grand Challenge is to define a common platform for evaluation of emotion recognition methods in real-world conditions. The database in the 2013 challenge is the Acted Facial Expression In Wild (AFEW), which has been collected from movies showing close-to-real-world conditions. Three sets for training, validation and testing will be made available. The challenge seeks participation from researchers working on emotion recognition intend to create, extend and validate their methods on data in real-world conditions. Along with the challenge participation, we also invite researchers to submit their original unpublished work based around the theme of the challenge to the EmotiW workshop. Please note that the challenge papers will be published as part of the main track ICMI proceedings and the workshop papers will be published in a separate workshops proceedings. The top papers (challenge and workshop) will be invited for an extended version in a reputed journal. For any queries please email at: EmotiW2013@gmail.com |
Program
Committee
Akshay Asthana Imperial College London Nadia Bianchi-Berthouze University College London Carlos Busso University of Texas, Dallas Hazim Kemal Ekenel Istanbul Technical University Hatice Gunes Queen Mary University of London Zakia Hammal Carnegie Mellon University Gwen Littlewort University of California San Diego Elisa Martinez Marroquin University of Canberra Peter Christen Ambertree Assistance Technologies Stefan Scherer University of Southern California Bjoern Schuller Technische Universitaet Muenchen Nicu Sebe University of Trento Shiguang Shan Chinese Academy of Sciences Gaurav Sharma Technicolor Michel Valstar University of Nottingham Stefanos Zafeiriou Imperial College London |
Important
Dates
Train data available: 20th March 2013 Test data available: 30th June 2013 Results deadline: 25th July 2013 12:00 PM GMT (extended) Paper submission deadline: 18th August 2013 12:00 PM GMT (extended) Notification: 7th September 2013 Camera-ready papers: 15th September 2013 |
Scope
of workshop papers: - EmotiW challenge - Multimodal emotion recognition - Vision based temporal emotion analysis in the wild - Vision based static facial expression analysis in the wild - Audio based emotion recognition - New emotional data corpus representing real-world conditions - Facial feature tracking in the wild - Emotion recognition applications |
The database will be divided into three sets for the challenge: training, validation and testing. The current version of AFEW 1.0, available at cs.anu.edu.au/few contains two sets, extended versions of these sets will be used for training and validation; for testing, new unseen data will be used. The task is to classify a sample audio-video clip into one of the seven categories: Anger, Disgust, Fear, Happiness, Neutral, Sadness and Surprise. The labeled training and validation sets will be made available early and the new, unlabeled test set will be made available at the end of June 2013. There are no separate video-only, audio-only, or audio-video challenges. Participants are free to use either modality or both. Results for all methods will be combined into one set in the end. Participants are allowed to use their own features and classification methods. The labels of the testing set are unknown. Participants will need to adhere to the definition of training, validation and testing sets. In their papers, they may report on results obtained on the training and validation sets, but only the results on the testing set will be taken into account for the overall Grand Challenge results. The participants will report the labels on the test set to the organizers and the classification results will be computed and shared back with the participants during the testing phase (30 June – 15 July 2013). Participants will have up to five chances to submit the test labels and receive classification results. The final results will be presented at the ICMI 2013 conference. Further, the participants are supposed to submit a paper describing their methods and detailing the results. To register your team, please click on this Registration Form. After filing the form you will receive an email containing a license agreement, which needs to be signed by all the members of the team. Along with the participation in the challenge, researchers are also welcome to submit papers describing original work based on the theme of the challenge to the workshop based on the topics above. Paper format and submission will be similar to ICMI papers. The details will be shared soon. |
We provide audio and video baselines. For video the face is localized using Mixture of Parts framework of Zhu and Ramana 2012. The fiducial points generated by MoPS is used for aligning the face. Post aligning LBP-TOP features are extracted from non-overlapping spatial 4x4 blocks. The LBP-TOP feature from each block are concatenated to create one feature vector. Non-linear SVM is learnt for emotion classification. The video only baseline system achieves 27.2% classification accuracy. The audio baseline is computed by extracting features using the OpenSmile toolkit. A linear SVM classifier is learnt. The audio only based system gives 19.5% classification accuracy. A feature level fusion is performed, where the audio and video features are conatenated and a non-linear SVM is learnt. The performance drops here and the classification accuracy is 22.2%. On the test set which contains 312 video clips, audio only gives 22.4%, video only gives 22.7% and feature fusion gives 27.5%. The table below describes the baseline on test and val set for audio only, video only and audio-video feature fusion based systems. The classwise classification accuracy is calculated as total number of correctly classified samples of an emotion class divided by all the samples in that particular emotion class. The overall classification accuracy is the all the samples correctly classified in a particular set divided by the number of samples in a particular set.
|
The test set is available from 30th June. A participating team can submit their test results for evaluation up to five times. Please email your results in a zip file to emotiw2013@gmail.com with the subject: [Labels] Team Name. The name of the zip file containing the results should be your team name. The zip file should contain 312 .txt files, corresponding to the video clips in the test set. Each .txt file should be named after the video clip sample, for example: Video sample: 011145880 The label file should contain a single emotion label assigned to it by your system. The label can be one of the following: Angry / Disgust / Fear / Happy / Neutral / Sad / Surprise |
The challenge papers will be published as part of the main conference proceedings. The workshop papers will be published as part of separate workshop proceedingds. Please follow the ICMI conference template at Author instructions The paper submission website is open. Please note that the submission should be double-blind. |
1. Is it necessary to use both audio-video channels? The challenge data contains: audio, video and meta-data. The meta-data is composed of actor identity, age and gender. The participants are welcome to use any combination of modalities. 2. Can scene information other than face information be used? Context analysis in FER is a hot topic. Participants can use scene/background/body pose etc. information along with the face information. 3. Which face and fiducial points detector have you used? We found Zhu and Ramanan's mixture-of-parts based detector useful in our experiments. The authors have made public ally available an implementation of their method on: LINK 4. Can I use both train and validate data for learning my model? No, please learn your model on the train set and you can optimize the parameters on the val set. 5. Is the use of commercial face detector such as the Google Picasa OK? Any face detector whether commercial or academic can be used to participate in the challenge. The paper accompanying the challenge result submission should contain clear details of the detectors/libraries used. 6. Can I learn my model on both labelled train and unlabelled test data? No, the datasets are subject independent, test data is supposed to be used for testing purposes only. 7. Can I use external data for training along with the one provided? The participants are free to use external data for trainign along with the AFEW train partition. 8. Will the review process be anyonomous? The review process is double-blind.
|