Wednesday, May 14, 2008
Limitations of Content-based Image Retrieval
This blog entry is a summary of a viewpoint paper (http://www.theopavlidis.com/technology/CBIR/PaperB/Apr08.htm) . It exists for the purpose of allowing readers to post comments on the paper.
In the paper I discuss my impressions from the current state of the art and then I express opinions on what might be fruitful approaches. I find the current results in CBIR very limited in spite of over 20 years of research efforts. Certainly, I am not the only one who thinks that way, the lead editorial of a recent special issue of the IEEE Proceedings on multimedia retrieval was titled "The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away?"
I offer certain reasons for this state of affairs, especially for the discrepancy between high quality results shown in papers and poorer results in practice. The main reason seems to be that the lessons about feature selection and the "curse of dimensionality" in pattern recognition have been ignored in CBIR. Because there is little connection between pixel statistics and the human interpretation of an image (the "semantic gap") the use of large number of generic features makes highly likely that results will not be scalable, i.e. they will not hold on collections of images other than the ones used during the development of the method. In other words, the transformation from images to features (or other descriptors) is many-to-one and when the data set is relatively small, there are no collisions. But as the size of the set increases unrelated images are likely to be mapped into the same features.
I propose that generic CBIR will have to wait both for algorithmic advances in image understanding and advances in computer hardware. In the meantime I suggest that efforts should be focused on retrieval of images in specific applications where it is feasible to derive semantically meaningful features.
The paper has two appendices with examples of image retrieval. One presents the results obtained from some on line systems and the other presents some experiments I conducted to demonstrate how a method that yields impressive results in the author(s) paper gives poor results in independent tests.
I have posted a page (http://www.theopavlidis.com/technology/CBIR/challenge.htm) with image that, I believe, would challenge the current CBIR methodologies.
In the paper I discuss my impressions from the current state of the art and then I express opinions on what might be fruitful approaches. I find the current results in CBIR very limited in spite of over 20 years of research efforts. Certainly, I am not the only one who thinks that way, the lead editorial of a recent special issue of the IEEE Proceedings on multimedia retrieval was titled "The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away?"
I offer certain reasons for this state of affairs, especially for the discrepancy between high quality results shown in papers and poorer results in practice. The main reason seems to be that the lessons about feature selection and the "curse of dimensionality" in pattern recognition have been ignored in CBIR. Because there is little connection between pixel statistics and the human interpretation of an image (the "semantic gap") the use of large number of generic features makes highly likely that results will not be scalable, i.e. they will not hold on collections of images other than the ones used during the development of the method. In other words, the transformation from images to features (or other descriptors) is many-to-one and when the data set is relatively small, there are no collisions. But as the size of the set increases unrelated images are likely to be mapped into the same features.
I propose that generic CBIR will have to wait both for algorithmic advances in image understanding and advances in computer hardware. In the meantime I suggest that efforts should be focused on retrieval of images in specific applications where it is feasible to derive semantically meaningful features.
The paper has two appendices with examples of image retrieval. One presents the results obtained from some on line systems and the other presents some experiments I conducted to demonstrate how a method that yields impressive results in the author(s) paper gives poor results in independent tests.
I have posted a page (http://www.theopavlidis.com/technology/CBIR/challenge.htm) with image that, I believe, would challenge the current CBIR methodologies.
Labels: CBIR, image processing