TRYST WITH A LEGEND Calyampudi Radhakrishna Rao was born on September 10th, 1920 in Huvvina Hadagalli, Karnataka of C.D.Naidu and Lakshmikanthamma. He received an M.A. in mathematics from Andhra University in 1940, an M.A. in statistics from Calcutta University in 1943, a Ph.D. from Cambridge in 1948 for his thesis "Statistical Problems In Biological Classification" and an Sc.D from the same university in 1965. Thanks to the Cramer-Rao inequality and the Rao-Blackwellisation of unbiased estimates, he is now a household name in the family of statisticians and engineers too. He is the author of more than 280 papers and 11 books in diverse fields like Estimation Theory, Multivariate Analysis, Characterization Problems, Combinatorics and Design of Experiments, Matrix Algebra, Generalised Inverse of a Matrix, Differential Geometric Methods in Statistics, Mathematical Genetics among others. His contributions to the development of the Indian Statistical Institute and the foundation of statistics in India go down to history. Among the innumerable distinguished positions that C.R.Rao has occupied are the Directorship and the Jawaharlal Nehru chair at the I.S.I., Distinguished Professorship at Pittsburgh and the Eberly Professorship at Pennsylvania State University in which position he continues to stay till the present day. Some excerpts from the interview --- 1> Sir, how has statistics as a subject evolved from it's past and how will it be in the future? A>Well, historically statistics was used mainly by the government to take policy decisions. Actually the statistics as used by the rulers started maybe 2 thousand years ago. If you look at the Bible, they have described how to take census. In the Indian context, we have Kautilya's Arthashastra which is also full of statistics. During the Mughal period, you have a publication Ain-i-Akbari, which has information on everything in minute detail. By the way, do you know whether the word statistics is an English or a French or a German word ? It sounds English, right? But, it is actually a German word and it was coined by a German administrator by the name of Achenwal. By statistics he meant all the information needed by the government about the people. Now, the English people did not like the use of the German word and they coined the word 'publistics', which was used in England for several years. Later on, since it is a more complicated word, they opted for the word statistics. Actually somebody in Scotland collected information on all people in Scotland and published it in the form of 'Statistics of Scotland' which attracted criticisms from the English. But the actual statistics as a method of inference started only in 1900. The idea is whenever you have some data, there is some amount of uncertainty in the inference you draw from that, because of errors in measurements and because we can't take a complete set of measurements on any phenomenon; you can observe only a part of the phenomenon. So, when we want to infer from statistics to the theory or mechanism which is producing the observation there is some amount of uncertainty. Now this was realized in 1900 that if you can measure the amount of uncertainty in the data, then you can use it for decision making purposes; and if you know the amount of uncertainty you know the loss you will incur by taking a certain decision. In 1900, Karl Pearson started the use of chi-square. That is the first example of using some criterion for deciding whether a certain hypothesis is true or not. He was comparing (observed - expected) (this chi-square formula, you know) and that was invented by Karl Pearson. He had developed a system of frequency curves or frequency distributions and he wanted to know which one would fit a given data. His hypothesis was that a certain family offrequency distributions are applicable to the given data. So, he needed chi-squares to measure the uncertainty in the hypothesis and when the probability is high, he accepted the hypothesis. Once it was shown that it is possible to make decisions on the basis of data by computing the amount of uncertainty in the data, people started using statistics in several areas. Probably the first area of science where statistics was used was demography where they computed life tables, that is for insurance purposes and then for applications in agriculture that led to design of experiments. Very early the psychologists used statistics to measure what are the components of human intelligence that can be measured, inferred through the performance in various tests like intelligence tests and so on. Then came applications in industry, in QC (did you study industrial statistics?). Then design of experiments in QC to find optimum mix of factors, how to combine various components in order to produce the certain given quantity. Then came applications in medicine for medical diagnosis. If you go to a doctor he takes a large number of tests, in order that proper treatment can be administered. Now how does he use the information from all these tests to arrive at a proper diagnosis? How will he synthesize all this information given the various tests, which he can look only one at a time. He says, 'test for cholesterol, good', 'blood pressure, not so good' and so on. So how will he put all the information together? That's why multivariate analysis comes in, which tells us how to combine the information from the various tests and make a proper diagnosis. So, this is the most useful application of statistics. 2> We have heard of the phrase 'Lies, Damn Lies and Statistics'. Has the idea changed now? (CRR interjects with a 'Yes, of course'). Do the people have faith in statistical results now? A>Well, in sciences like physics, chemistry, etc. statistics is not used so much. Actually, physicists say that they do not have any use for statistics because they have a very precise way of generating measurements and so the errors in physical measurements are much smaller than usual. Sow a variety of wheat to measure the production. Well, it all depends on the fertility of the soil. But in the case of measurements taken by physicists, they are very, very accurate. Secondly, they can generate a very large number of observations repeatedly, they can take measurements which you cannot do in other sciences. So, when you have a large number of measurements you don't need accurate statistical methods, only simple argument is sufficient and accurate. Because of this, physicists do not use statistics very much. But in all sciences where measurements cannot be made without error, there is limitation on the number of observations and there statistics is essential.