Monday, October 13, 2003

1. On clarifying the oft-confused terms Multinomial Logit Model (MLN) and Conditional Logit Model (CLM):

The standard source for CLM is McFadden's "Conditional Logit Analysis of Qualitative Choice Behavior" (1973 -- there are some citation confusion in the literature: sometimes it is stated as 1974, e.g. by Louviere et al, 2000 or by Long, 1997. But in fact the true citation year is 1973, I downloaded the original paper from McFadden's website. This very important article appeared in a book edited by P. Zarembka, "Frontiers in Econometrics" pg. 105-142).

I believe, McFadden was the first one to derive this econometric model from theoretical RUM model of Thurstone (1927, A Law of Comparative Judgement, in "Psychological Review" journal). The term RUM (Random Utility Model) itself was coined by Marschak, 1960 referring to Thurstone's model. On the other hand, MLN was a mere extension of binary logit model, first used by Theil (1969)'s Int's Econ. Review paper, "A Multinomial Extension of the Linear Logit Model."

When I was writing my proposal last year, my main source was Louviere et al, 2000. They never mentioned the term Conditional Logit Model. But, as I found later, what they discuss in the entire book is McFadden's Conditional Logit Model. This using of the term MNL while referring to CLM is also shared by many other sources, including Ben-Akiva and Lerman (1985). To be precise, I should have used the term CLM instead of MNL in my proposal. In fact, I have made a clarifying footnote in the paper I presented in Montreal (footnote number 5, page 6). I copy:

"The difference between the MNL and CLM is that in the latter case, the values of the choice characteristics vary accross choices, while the parameters are common across choices. Here, the likelihood of a choice decision is calculated conditional on the nature of the choices that defines the choice sets. In the former case, however, the values of the variables are common across choices for the same person, but the parameters vary across choices". Note: the term "conditional" has a rigorous rationale econometric-wise, see McFadden's 1984 article in Handbook of Econometrics.

The main different between the two is that the standard, original MNL assumes that the choice probabilities are dependent on individual characteristics only, while the CLM considers the effects of choice characteristics as well (Maddala, 1983). Powers and Xie, 2000 put it this way: "In the standard MLN, explanatory variables are invariant with outcome [or, "choice" in our case - AAP] categories, but their parameters vary with outcome. In the CLM, explanatory variables [may - AAP] vary by outcome as well as by individuals, whereas their parameters are assumed constant over all the outcome categories".

2. On why alternative-specific constant(s) is(are) needed in CLM specification:

Take Earnhart's 2001 study (or why he doesn't encounter problem like we do): He has 3 alternatives for each choice set. They are generic: House 1, House 2, House 3. However, he associates each of these alternative with particular natural feature: respectively "Water-based amenity", "Land-based amenity", and "No amenity" --yes, this last one sounds funny. So, e.g. his House 1 is always water-based amenity. He actually has subdivison of these three amenities. For example, "Long Island Sound", "Saltwater Marsh", "Freshwater Marsh", "River/Stream", and "Lake/Pond" are all in the "Water-based amenity" group. He uses pictures to represent these different amenities. It seems that he uses generic labels "House 1" etc to avoid the more specific "Water-based amenity" etc, i.e. to minimize strategic behavior of the respondents (there is literature on generic vs non-generic labels, related to cognitive response). Now, in his ASC specification, since he has 3 alternatives, he should include 2 ASCs. He uses "No amenity" as the base, and creates two dummies for "Water" and "Land". Next, for the SP part, Earnhart only includes hypothetical houses. In his RP part, he has one house actually bought by the respondent, plus two other houses sold in the town in the same month and year -- picked at random.

Our problem: we cannot attribute the generic house labels to some other inherent attributes that are choice-specific. We can only use "Hypothetical Home" versus "Status Quo Home." Thus, the culprit when combining RP and SP, since in our RP set, the chosen home is always -- again, always -- the actual home. Thus, perfect collinearity with th eoutcome variable.

Louviere et al's book doesn't mention explicitly that ASC is a must. However, in all models and examples, they include ASC. So does McFadden's 1973 pioneering article. I believe Ben-Akiva and Lerman (1985) and Ken Train (1986) are among the first to recognize the importance of ASC, econometrically, i.e. to ensure that the error term still has zero mean. Intuitively, the ASC serves as indicator of the difference in utility between alternatives, all else equal.

My earlier note (response to SC's question):

"Variable HH --hypothetical home -- in our model serves more like a "constant" (by construction, conditional logit does not have "the usual constant" like MLN, since it
cancels out, mathematically. However, we have to somehow include alternative-specific constant to ensure that the estimation sample for each alternative exactly equals the proportion of decisionmakers in the sample that actually chose that alternative). What this ASC gives us when included is three-fold. First, it provides a zero mean for unobserved utility, and second as noted by Train (1986, pg. 25) it can mitigate inaccuracies due to IIA property. For this second issue, we don't need to worry because we ask respondents binary choice.). What's the interpretation? The minus sign there indicates that in general, respondents are reluctant to changes (they don't like to move to the so-called hypothetical homes). This is also confirmed when I calculated the probability of choosing alternative HH;
e.g. for base model, average Prob(HH is chosen) is 0.17 and average Prob(SQ is chosen) is 0.83. This estimates are close to actual frequency of HH and SQ -- status quo home-- chosen in the survey. I don't have the figures for the completed 954 survey handy now; but the figures for 908 survey were 22% HH and 75% SQ"

No comments: