Friday, May 23, 2008

 

The Curse of Proof by Example

What is a proof that a method in image analysis (machine vision) does what it claims to do? For example, how do we know that a given method of image retrieval will retrieve images similar to a given query? Unfortunately, the publications standards are such that most papers present only proofs by example.

This a big topic but I cannot help add one more "joke" to the list "How X proves that all odd numbers are prime?" (see http://www.gdargaud.net/Humor/OddPrime.html) A machine vision person's proof: "3 is prime, 5 is prime, 7 is prime, 9 hmm, 11 is prime, 13 is prime, oh well, we have an 83% success rate, let's publish."

The problem with machine vision publications is not only that the reported success rate may be too low for practical applications. It is also that the choice of examples is fairly sloppy with little thought given on how representative are of the of the population of interest.

Wednesday, May 14, 2008

 

Limitations of Content-based Image Retrieval

This blog entry is a summary of a viewpoint paper (http://www.theopavlidis.com/technology/CBIR/PaperB/Apr08.htm) . It exists for the purpose of allowing readers to post comments on the paper.

In the paper I discuss my impressions from the current state of the art and then I express opinions on what might be fruitful approaches. I find the current results in CBIR very limited in spite of over 20 years of research efforts. Certainly, I am not the only one who thinks that way, the lead editorial of a recent special issue of the IEEE Proceedings on multimedia retrieval was titled "The Holy Grail of Multimedia Information Retrieval: So Close or Yet So Far Away?"

I offer certain reasons for this state of affairs, especially for the discrepancy between high quality results shown in papers and poorer results in practice. The main reason seems to be that the lessons about feature selection and the "curse of dimensionality" in pattern recognition have been ignored in CBIR. Because there is little connection between pixel statistics and the human interpretation of an image (the "semantic gap") the use of large number of generic features makes highly likely that results will not be scalable, i.e. they will not hold on collections of images other than the ones used during the development of the method. In other words, the transformation from images to features (or other descriptors) is many-to-one and when the data set is relatively small, there are no collisions. But as the size of the set increases unrelated images are likely to be mapped into the same features.

I propose that generic CBIR will have to wait both for algorithmic advances in image understanding and advances in computer hardware. In the meantime I suggest that efforts should be focused on retrieval of images in specific applications where it is feasible to derive semantically meaningful features.

The paper has two appendices with examples of image retrieval. One presents the results obtained from some on line systems and the other presents some experiments I conducted to demonstrate how a method that yields impressive results in the author(s) paper gives poor results in independent tests.

I have posted a page (http://www.theopavlidis.com/technology/CBIR/challenge.htm) with image that, I believe, would challenge the current CBIR methodologies.

Labels: ,


Sunday, September 16, 2007

 

Why Health Guidelines are Unreliable

Caution: This posting contains mathematical expressions that may cause stress.

Today's New York Times Magazine has an article by Gary Taubes pointing out the limitations of epidemiology in telling people what is good for them. I found the article far too long for its content and also taking rather a narrow view of the topic. Taubes has just written a book about diets, "Good Calories, Bad Calories", but it is not out yet (publication date is Sept. 25) and therefore I have no idea whether he deals with the issue of this blog.

My point here is that a major limitation of health guidelines is the lack of scientific foundations behind them.

Let us consider the example of weight. There is the infamous body mass index (BMI) that it supposed to have an optimal value around 25. To find your BMI you take your weight in kilograms and divide it by the square of your height in meters. (You can tell right away the European origin of the measure. For the conversion you need to multiply pounds by 2.2 to get kilograms and inches by 0.254 to get meters.) You can of course express your ideal weight in kilograms from the formula below, provided the height is expressed in meters.

Ideal Weight = 25 * height2


Let us now look at the results for people 1.60m (5 feet 3 inches), 1.70m (5 feet 7 inches), 1.80m (5 feet 11 inches), 1.90m (6 feet 2 inches) . The weights (in kilograms) we get from the formulas are respectively: 64, 72, 81, and 90. Do you notice that the weight comes to be close to the number expressed by the decimals in your height? When I was growing up in Greece doctors were saying that your ideal weight should be in kilograms to the number expressed by the decimals in your height. The mathematical formula for that is a linear equation

Old Ideal Weight = 100 * (height - 1)


That apparently was an old heuristic and the new "scientific" guidelines wanted to be close to it. The straight line of the linear equation is a tangent to the parabola of the quadratic equation and the two meet for height equal to 2 meters and weight equal to 100 kgs. However they are very close over a large area. For height 1.80m the difference of the equations is only 1kg and for height 1.70m is 2.25kg. I suspect the old formula was derived as a gross approximation to a set of observed weights and heights, most likely from recruits in a Western European army. (I have no evidence for that but data from army recruits have been used often in population studies because they were readily available.)

However a linear formula is itself too coarse an approximation for a large range of heights (if we want to include, for example women and children and not only young men) so the quadratic formula given first was adopted. Of course, the new formula had to agree with the old for the heights common to young men, hence the coefficient value of 25.


But there is another disturbing fact in that formula. Why is the weight expressed as the square of the height? Elementary solid geometry tells us that the volume of an object is equal to the third power of its linear dimensions, so the weight should be proportional of the cube of the height. However strength is proportional to the cross section of your arms and legs, so it is proportional to the square of the height. As a result, the ratio of strength over volume is declining function of height and that is why insects are far more mobile than dogs and dogs are more mobile than humans. Because for any given person height is determined by genetics and childhood nutrition, it must be taken as a given. Then the formula for proper weight must contain both a cubic term (to account for the volume) and a quadratic term (to account for the need for strength). The formula for the BMI is certainly flawed but nobody has figured out the right formula.

Not surprisingly, there have been articles pointing out that it gives too low weights for tall people but the public is not aware of them. Another flaw of the formula (also recognized in the literature) is that it does not distinguish between weight due to muscle and weight due to fat. Again that knowledge is not widely available.

Yes, you must be concerned about gaining excess weight, but what is excess weight is not what the formula tells you. Your best bet is to find a health practitioner who takes a realistic view of weight and discuss the problem with that person.

Labels: , , ,


This page is powered by Blogger. Isn't yours?