Open MIC (Open Museum Identification Challenge) contains photos of exhibits captured in 10 distinct exhibition spaces of several museums which showcase paintings, timepieces, sculptures, glassware, relics, science exhibits, natural history pieces, ceramics, pottery, tools and indigenous crafts. The goal of Open MIC is to stimulate research in domain adaptation, egocentric recognition and few-shot learning by providing a testbed complementary to the famous Office 31 dataset which reaches ~90% accuracy.

INTRODUCTION

For the source domain, we captured the photos in a controlled fashion by Android phones e.g., we ensured that each exhibit is centered and non-occluded in photos. We prevented adverse capturing conditions and did not mix multiple objects per photo unless they were all part of one exhibit. We captured 2–30 photos of each art piece from different viewpoints and distances in their natural settings.
For the target domain, we employed an egocentric setup to ensure in-the-wild capturing process. We equipped several volunteers with cheap wearable cameras and let them stroll and interact with artworks at their discretion. Open MIC contains 10 distinct source-target subsets of images from 10 different kinds of museum exhibition spaces, each exhibiting various photometric and geometric challenges. We annotated each image with labels of art pieces visible in it. The wearable cameras were set to capture an image every 10s and they operated in-the-wild, e.g., volunteers had no control over shutter, focus, centering, etc.
Therefore, the collected target subsets exhibit many realistic challenges, e.g., sensor noises, motion blur, occlusions, background clutter, varying viewpoints, scale changes, rotations, glares, transparency, non-planar surfaces, clipping, multiple exhibits, active light, color inconstancy, very large or small exhibits, to name but a few phenomena.
Every subset (10 distinct exhibition spaces) contains 37–166 exhibits to identify. We provide 5 train, 5 validation, and 5 test splits per exhibition. In total, our dataset contains 866 unique exhibit labels, 8560 source and 7596 target images.
Shown below are sample source images from our dataset:
Shown below are sample taget images from our dataset:

EXHIBITIONS

Open MIC contains 10 distinct source-target subsets of images from 10 different kinds of museum exhibition spaces. They include:

Paintings from Shenzhen Museum (Shn),
Clocks and Watch Gallery (Clk), and the Indian and Chinese Sculptures (Scl) from the Palace Museum,
Xiangyang Science Museum (Sci),
European Glass Art (Gls) and the Collection of Cultural Relics (Rel) from the Hubei Provincial Museum,
Nature, Animals and Plants in Ancient Times (Nat) from Shanghai Natural History Museum,
Comprehensive Historical and Cultural Exhibits from Shaanxi History Museum (Shx),
Sculptures, Pottery and Bronze Figurines from the Cleveland Museum of Arts (Clv),
Indigenous Arts from Honolulu Museum Of Arts (Hon).

BASELINES

To demonstrate the intrinsic difficulty of the Open MIC dataset, we provide the community with baseline accuracies obtained from:

fine-tuning CNNs on the source subsets (S) and testing on the randomly chosen target splits,
fine tuning on target only (T) and evaluating on remaining disjoint target splits,
fine-tuning on the source+target (S+T) and evaluating on remaining disjoint target splits,
training state-of-the-art domain adaptation So-HoT algorithm (Euclidean and non-Euclidean distances).

Kindly note that this is an identification dataset, i.e. each class defines a unique exhibit. Thus, this domain adaptation dataset is related to a retrieval problem (if you like to pose it this way) or classification problem (each specific exhibit has one label).

DOMAIN ADAPTATION

We include the following evaluation protocols for Domain Adaptation (see the cited below ECCV'18 paper for more details). Kindly note that if you use our dataset, you do not have to run your algorithm on all these protocols for all combinations etc. Just choose one protocol you like:

protocol (i): training/evaluation per exhibition subset (one experiment per one exhibition),
protocol (ii): training/testing on the combined set with 866 identity labels (one experiment for 10 combined exhibitions),
protocol (iii): testing w.r.t. 12 scene factors annotated by us:
object clipping (clp), low lighting (lgt), blur (blr), light glares (glr), background clutter (bgr), occlusions (ocl), in-plane rotations (rot), zoom (zom), tilted viewpoint (vpc), small size/far away (sml), object shadows (shd), reflections (rfl) and the clean view (ok),
protocol (iv): training/evaluation per exhibition subset (unsupervised Domain Adaptation).
Below we illustrate (left) results for training on all 866 identity labels and (right) how the adaptation accuracy depends on photometric and geometric distortions of tagret images:
Below we illustrate results for training/evaluation per exhibition subset for the protocol iv (unsupervised Domain Adaptation):

The above results are based on recent popular algorithms: Invariant Hilbert Space (IHS), Uns. Domain Adaptation with Residual Transfer Networks (RTN) and Joint Adaptation Networks (JAN).

FEW-SHOT LEARNING

We include the following evaluation protocols for One-shot Learning. Kindly note that if you use our dataset, you do not have to run your algorithm on all these protocols for all combinations etc. Just choose one protocol you like:

protocol (v): 1-shot L-way training on each combined target split (p1: shn+hon+clv, p2: clk+gls+scl, p3: sci+nat, p4: shx+rlc) and testing on the rest of combined target splits (this prot. checks the ability of task to task generalisation),
protocol (vi): 1-shot L-way training on each source split (ten exhibitions defined above) and testing on the corresponding target split (v the ability of source to target generalisation),
protocol (vii): 1-shot L-way training on each combined source split (p1,...,p4) and testing on the rest of combined target splits (this prot. checks the ability of task to task and source to target generalisation).
also: see our CVPR 2019 and 2020 papers listed below for the latest FSL results and protocols.
Below we illustrate results for the protocol (v) using our SoSN network (84x84/224x224 image crops):
Below we illustrate results for the protocol (vi) using our SoSN network (84x84/224x224 image crops):

PUBLICATIONS

For more details on the data, protocols, evaluatinons and algorithms, see the following publication. We would ask you to kindly cite the following paper(s) when using our dataset:

Museum Exhibit Identification Challenge for Domain Adaptation and Beyond,
P. Koniusz, Y. Tas, H. Zhang, M. Harandi, F. Porikli, R. Zhang,
European Conference on Computer Vision (ECCV), 2018, bibtex.
(oral ~2% acceptance rate, ECCV'18 talk /YouTube/)
Power Normalizing Second-order Similarity Network for Few-shot Learning,
H. Zhang, P. Koniusz, Winter Conference on Applications of Computer Vision (WACV), 2019, bibtex. Also, see the GitHub code.
Few-Shot Learning via Saliency-guided Hallucination of Samples, Hongguang Zhang, Jing Zhang, Piotr Koniusz, Computer Vision and Pattern Recognition (CVPR), 2019. Also, see the GitHub code.
Adaptive Subspaces for Few-Shot Learning, Christian Simon, Piotr Koniusz, Richard Nock, Mehrtash Harandi, Computer Vision and Pattern Recognition (CVPR), 2020. Also, see the Supp. Mat. and the GitHub code.

REQUEST FORM

Our dataset license follows mostly the fair use regulations making it available for the academic non-commercial use only. The license assumes royalty-free, non-exclusive, non-transferable, attribution, 'no derivatives' rights. Please read carefully the license and fill in below the requested details. We will verify the request and send you an e-mail with a password. The access to the data expires automatically after 30 days. If you have any questions or concerns, if you do not receive an access to the data within 48h upon your request or you need the access immediately, send us an e-mail to Open MIC.

Data Licence (OpenMIC dataset) 1. Terminology 'Data' means the data and other information (in whatever form or manner it is expressed) made available to you under this licence as at the date it is accessed by you, but does not include any inventions, patents, design rights or trademarks of CSIRO or any other person or any computer programs used in the making or operation of a database. 'Authors' mean Dr. P. Koniusz and Dr. R. Zhang. 'You/User' means you as Data user. 'Acknowledgement' means that if you use the Data, you must give credits to the Authors and cite the source of the Data. 'No derivatives' means that if you remix, transform, edit, or build upon the Data in any way, you may not distribute the modified material. "Authors' Employers" means the Authors' past/current/future employers/organisations. 2. Grant of licence 2.1 Authors grant you a royalty-free, non-exclusive, non-transferable, attribution, 'No derivatives' licence to use the Data solely for your non commercial purposes. For the purposes of this licence, 'commercial purpose' means to sell, hire, exchange or otherwise use or exploit the Data (whether in its original or any adapted form, or incorporated in or used in the provision of any products or services) for profit or gain. 2.2 The rights granted under this licence are personal to the User and are not sub-licensable or capable of assignment. You may not distribute the Data to any other person (see Exceptions in Section 6) nor attempt to obtain patent coverage on or assert any other intellectual property rights over the Data. Notwithstanding the foregoing, you may not create or distribute digital or hardcopy products or non-editable digital images (eg: pdf files) based on or containing the Data (see Exceptions in Section 6). 2.3 You must comply with any protocols concerning the attribution of rights in the Data. However, without limiting the obligation that the Data be used for non-commercial purposes only, you must not represent that Authors or 'Authors' Employers' take responsibility for the accuracy or correctness of the Data or any output of any application of the Data (including any prediction or the output of any modelling), or that Authors (or 'Authors' Employers') support or endorse your use of the Data or any products derived or conclusions or predictions drawn from the Data. Where you breach a term of this licence it terminates immediately. 3. Disclaimer/No warranties 3.1 You acknowledge that the Data may have inherent defects or deficiencies and that any use you make of it is at your own risk. To the maximum extent permitted by law, the Data is provided 'as is' and Authors make no express or implied representation or warranty concerning the fitness for any purpose, accuracy, currency and reliability of the Data, nor that its use will not infringe any third party rights. Any use and/or interpretation of the Data is done so entirely at the User's own risk and you accept all risk and responsibility for any losses, damages, costs and other consequences arising from use of the Data or any defect in the Data. 3.2 If you are a consumer under the Australian Consumer Law, certain guarantees and rights may be conferred on you which cannot be excluded, restricted or modified. If so, then to the maximum extent permitted by law, you agree that Authors' (and 'Authors' Employers') liability under those or any similarly non-excludable guarantees, warranties or rights is limited, at Authors' (and 'Authors' Employers') option, to resupply of the relevant goods or resupply of the relevant services, or the payment of the cost of resupplying the relevant goods or services. 4. Indemnity You indemnify and release Authors (and 'Authors' Employers') against any and all claims, demands, suits, liability, loss or expense arising directly or indirectly from: (a) your use of the Data; (b) third party use of any Data or other products and creations derived from the Data; and (c) any breach of this Agreement by you. 5. General 5.1 Entire Agreement. This licence constitutes the entire agreement between us. Authors (and 'Authors' Employers') are not bound by any additional provisions that may appear in any communication from you. 5.2 Variations to be in writing. This licence may not be varied except in writing between us. 5.3 Governing Law. This licence will be governed by the laws in force in the Australian Capital Territory, Australia. 5.5 Authority. Where you enter this licence on behalf of an entity or organisation, you warrant that you have authority to do so. 5.6 Disputes. If there is a dispute between us that cannot be resolved then the matter must be referred to the Australian Commercial Disputes Centre for arbitration in accordance with the Centre's Guidelines on Arbitration. The decision of the arbitrator (including any award as to costs) will be final and binding. 6. Exceptions, Additional Clauses, and Guidelines 6.1 The User acknowledges that the Data may contain material that is subject to copyright owned or controlled by third parties. As such, the User agrees to use the Data at the User's own risk and for non-commercial research purposes only. 6.2 The User may share Data with others members of a scientific research project team for the sole purpose of collaborating on the scientific research project. For clarity, this will not constitute 're-distribution' and each member accessing the Data will constitute a 'User' bound by the terms and conditions of this data licence. 6.2 The User may publish a small proportion of the Data (a maximum of 20 images) in scientific publications for the sole purpose of demonstrating the manner in which the User has utilised the Data in the User's scientific research, and provided the User does not derive any monetary benefit from the scientific publication. The User acknowledges that when choosing and publishing Data under this clause, the User must not publish Data containing material that is subject to copyright owned or controlled by a third party. 6.3 The User may modify the Data for the User's private research and study (including for use in the User's scientific research). The User acknowledges the User must not publish such modifications. The User may agree further rights to the Data with the Authors, but acknowledges that the Authors will grant such rights at their complete discretion. 6.4 The User agrees to provide Acknowledgement to the Authors in any work that results from the use of the Data. Acknowledgment can be made by citing the Authors' relevant scientific publication(s) (as indicated on the website where the Data is made available to the User). 6.5 The Authors reserve the right to terminate the User's access to the Data at any time. I accept the above license (scroll down first).
all fields needed,
plus a valid e-mail for your password after approval.

DATASET/DOWNLOAD (SMALL SIZE)

Once you will have obtained a valid passwrd, you will be able to instantly downlaod our files (enter e-mail as login followed by the password from e-mail).

Firstly, go through the following 'readme' file for tdetails of what is contained in which folders of our archives:

_readme.txt

Below we provide versions of our dataset in resolution 256, 512 and 1024px. You can choose the quality needed for your experiments but we expect that 256 or 512px should be sufficient if you work with CNNs. The following archives contain full images and crops. We used crops in our ECCV'18 paper as well as for one-shot learning:

256_OpenMIC.zip (full size is 0.5 GB, bilinear interpolation)
512_OpenMIC.zip (full size is 1.54 GB, bilinear interpolation)
1024_OpenMIC.zip (full size is 5.2 GB, bicubic interpolation)
target_splits_eccv2018.zip (labels used by us in the ECCV'18 paper)
See also the MULTILABELS at the bottom of this page.

DATASET/DOWNLOAD (LARGE CROPS)

Below are crops (3 per image) in high resolution of approximately 2048x2048px. Note that each exhibition archive is large, e.g. 1-3GB per file, and to evaluate your algorithm on any of the protocols lsited above, you will need to download all 10 following files:

DATASET/DOWNLOAD (FULL IMAGES)

Below are full resolution whole images (over 2048px). Note that each exhibition archive is large, e.g. 1-3GB per file:

FEW-SHOT LEARNING

Below are files prepared for the use with 'Power Normalizing Second-order Similarity Network for Few-shot Learning':

Below are files prepared for the use with 'Adaptive Subspaces for Few-Shot Learning':

openmic_dsn_fewshot.zip (full size is 0.26 GB)

ADDITIONAL LABELS

Below are labels with multiple annotations per image in the target data (some of our ECCV'18 experiments use them) as well as the lists of source and target background images (labelled as -1). Moreover, we also provide annotations for the geometric and photometric distortions observed in target images (the latter file).

Quick Links

Open MIC (ACCV 2018 Workshop)
Open MIC (ECCV 2018 Dataset Request)
Open MIC (citation/bibtex)
Open MIC (download)