Giter Club home page Giter Club logo

Comments (23)

chevrotin avatar chevrotin commented on May 29, 2024 1

Alright.. Will do

from xlogit.

arteagac avatar arteagac commented on May 29, 2024

Hi @chevrotin, you can pass initial values to xlogit using the init_coeff parameter to the fit function (see example below). You just need to make sure to pass your initial values following the order in which xlogit creates the coefficients. The order of the coefficients is varnames + sd of varnames + scale_factor. An easy way to figure out the order is to run a test estimation and follow the order of the coefficients in the summary (this test estimation does not need to be successful as the goal is to simply figure out the order of the coefficients).

varnames = ["a", "b", "c"]
ml = MixedLogit()
ml.fit(X=df[varnames],
    y=df["choice"],
    varnames=varnames,
    ids=df["chid"],
    randvars={"a": "n", "b": "n", "c "n"}
    ..., 
    init_coeff=np.array([.1, .1, .1, .1, .1, .1, .1])  # Order -> [a, b, c, sd.a, sd.b, sd.c, scale_factor]
    )

Regarding your other question, the current version of xlogit does not allow the scale_factor to follow a distribution. I know this would be a valuable addition to xlogit, but I have not had the chance to implement this yet.

from xlogit.

chevrotin avatar chevrotin commented on May 29, 2024

Thanks Cristian. The code works. I managed to estimate a WTP-space model with my data. Providing starting values, derived from a preference space ML model, seems to do the trick.

Few things to note. The model works with lower number of draws (4000), however, it runs into memory problem (similar to our discussion in logitr) when I specified 10000. I have 8 random variables. I imagine it would become much more constraining when you allow correlated parameters in future versions.

I wonder if cupy allows for some degree of memory management, where it can pass on some of the memory burden to hard drive. Essentially, you have a rocket here. The estimation only took less than 15 minutes. We have plenty of room to dial the speed down to allow for a larger model.

error message: cupy.cuda.memory.OutOfMemoryError: Out of memory allocating 2,691,840,000 bytes (allocated so far: 32,640,581,120 bytes).

from xlogit.

arteagac avatar arteagac commented on May 29, 2024

I think the batch processing functionality in xlogit can fix your GPU OutOfMemoryError. Simply pass the batch_size=1000 parameter to the fit method to tell xlogit to process 1,000 random draws at a time. This avoids overflowing the GPU memory as xlogit processes one batch at a time, computes the log likelihoods, and average them at the end, which does not affect on the final estimates or log likelihood. Please let me know if this works for you. You can also increase the batch_size depending on your GPU memory size, but the maximum for you would be around 4,000, which was when you ran out of GPU ram. The batch processing functionality allows estimations using very large number of random draws. This is how I estimated a model with half-million random using a 6GB GPU and 16GB RAM. Section 5.4 in xlogit's paper shows how this functionality work and the stats on use of RAM and GPU memory.

Regarding the offloading to disk, this is feasible but requires a separate implementation and it would be quite slow. However, the batch processing functionality should solve most of the issues, and the only limitation would be the computer RAM. Note that 16GB of RAM were enough for half-million draws, so this should not be that big of a constraint.

from xlogit.

chevrotin avatar chevrotin commented on May 29, 2024

Ah.. thanks. Glad that you have already solved that. Wasn't reading careful enough. Will try the batch function.

from xlogit.

arteagac avatar arteagac commented on May 29, 2024

Sure, please let me know how it works. Unfortunately xlogit's documentation does not properly elaborate on the batch processing and WTP space models. I definitely need to spend some time expanding the documentation to better highlight some of the nice features implemented into xlogit.

from xlogit.

chevrotin avatar chevrotin commented on May 29, 2024

If you need a helping hand for documentation, I'll be glad to help.

from xlogit.

arteagac avatar arteagac commented on May 29, 2024

That would be actually fantastic and I would love your help. If you can create some examples for WTP space models in a Google Colab notebook, I can directly integrate such notebook into the documentation. I'd would be great if one example includes passing initial values to the estimation using the init_coeff parameter and another example with the use of the batch_size functionality to handle large number of random draws. You could use the Yogurt dataset or any other open-access dataset of your preference. For the examples I try using a tutorial-like style, as illustrated in the examples below:

from xlogit.

arteagac avatar arteagac commented on May 29, 2024

Thanks a lot @chevrotin!

from xlogit.

chevrotin avatar chevrotin commented on May 29, 2024

https://colab.research.google.com/drive/1bjuh4-oAcsz5m9rN4uYnmQtqYPeC-Vwf?usp=sharing

@arteagac Please see the examples of both starting values and using batch_size to estimate with ndraws = 50000.

Note that the results differ a bit to LOGITR. I think this might be because of the multiple starting values that John has implemented. The log-likelihood score suggest that the solution might be a local maxima.

from xlogit.

arteagac avatar arteagac commented on May 29, 2024

Hi @chevrotin,
I am glad that the batch processing functionality worked for you and you managed to estimate a model with 50k random draws. Thank you so much for creating the examples. I will integrate them into the documentation sometime this weekend. Yes, the results differ a bit from logitr ones and logitr has a slightly better log-likelihood. I will try replicating @jhelvy's multi-start approach to see if this makes xlogit's log-likelihood closer to logitr's.

Dear @jhelvy, Can you please provide me some insights on how your multi-start approach samples the initial values? What upper and lower bounds do you use for the random sampling of initial values? I am trying to figure out why logitr yields a slightly better final log-likelihood despite using lower number of draws. Any ideas of why this might be happening? Is it perhaps the multi-start? Do you maybe use Sobol draws by default? I don't think the cause is the optimization routine because we both us Limited Memory BFGS and similar tolerance values (>= 1e-6).

from xlogit.

jhelvy avatar jhelvy commented on May 29, 2024

Cool thread here folks. A couple things:

  1. You don't need GMNL to allow the scale (price) parameter to follow a distribution. You can just specify a distribution to that parameter. In {logitr} this is handled with the randScale parameter, which can take normal (n), log-normal (ln), or censored normal (cn). That said, I am actually going to restrict this in the future to not allow normal since sometimes you can get a negative draw in the simulation, and that basically breaks everything (scale term must be strictly positive). This came up in another {logitr} issue and that was the culprit, so it really shouldn't be allowed to be normal. Also, while theoretically log-normal should be okay (it enforces positivity), in practice those models really struggle to converge. Because log-normals have long tails, the simulations often get a few large numbers in the draws, and when you scale the whole utility model by some large number, it bounces all over the place and diverges. WTP space models in general are rather sensitive to the scale parameter, so I often end up just choose a fixed scale parameter. Censored normal is a good compromise in that it forces positivity while not getting too many large value draws.
  2. For setting up the starting values, you can see my heuristics here. I've added a handful of hard-coded rules depending on the context to make sure I'm using reasonable starting values. The default is to draw random values between (-1, 1). I allow the user to pass in different boundaries if they want as well. But then after I draw those values, I run through these checks to override some. For example, if a negative value was drawn for the scale parameter, I overwrite that to be positive, otherwise the model won't converge. These are purely heuristics that I've worked with from trial and error.

This is also why I went through the trouble of building in a multi-start loop. I had enough convergence issues with WTP space models that I didn't want users to have to do this much work - just let it iterate more to find a better solution. But this is also true for mixed logit in the preference space. Mixed logit models have non-convex log-likelihood functions, so really we should be running them from different starting points to be a little more confident in the solutions.

from xlogit.

arteagac avatar arteagac commented on May 29, 2024

Thanks a lot for your detailed explanation, @jhelvy. I will check the heuristics included in your multi-start loop. Given the importance of the multi-start functionality, I might include it in xlogit.

from xlogit.

chevrotin avatar chevrotin commented on May 29, 2024

Glad to hear from you, John. Here are some of my thoughts:

@jhelvy It's true that GMNL is not necessary for WTP-space. The plain ML can do it. In my own implementations, GMNL tend to converge better. To avoid the long tail, you can use the draws of truncated normal (+- 2) as the log normal price's underlying normal distribution.

@arteagac : I'm kinda curious about xlogit implementation of WTP Space. I take it that the scale parameter is assumed as fixed, therefore it does not have a standard deviation associated with it. Or does it? Also, does it automatically take the negative of price, (when scale_parameter = price) ?

Merry Christmas to both of you, Cristian and John.

from xlogit.

arteagac avatar arteagac commented on May 29, 2024

Hi @chevrotin, you are correct, in xlogit the scale parameter is fixed and does not have an associated standard deviation. Yes, xlogit uses the negative of the scale parameter, as it follows the WTP space formulation shown below (see $p_{nj})$, which was kindly provided by @jhelvy. Merry Christmas for you too!

image

from xlogit.

chevrotin avatar chevrotin commented on May 29, 2024

from xlogit.

jhelvy avatar jhelvy commented on May 29, 2024

@chevrotin I don't have a way to allow bounds on the underlying normal for a log-normal parameter. I just use cn for the scale parameter in a WTP space model and it works reasonably well.

But I'm curious how you've been able to implement GMNL for WTP space models? Do you use custom code? I've struggled to get it to converge for a WTP space setup using alternatives like {gmnl} and {apollo} (see, for example, this article).

WTP space models are just weird, and in my experience there haven't been many developers who have made them easy to estimate. Other than {logitr}, there's a routine in Stata that seems pretty robust, and of course there's xlogit. I guess you could go Bayesian with Stan too and set this up, but at that point you're doing a lot of coding yourself and you really, really need to know what you're doing. I wanted WTP space models to be easily accessible to researchers who aren't programming experts, so I've tried to make the default settings in {logitr} work well for most.

from xlogit.

chevrotin avatar chevrotin commented on May 29, 2024

@jhelvy I use the GMNL estimator with STATA. It usually take days to estimate.

You are right there isn't many routines that are fast and complete.

from xlogit.

jhelvy avatar jhelvy commented on May 29, 2024

Days!? 😲

Wow...I have no idea why. For licensed software you think they'd spend some time doing a little performance engineering. I don't have gmnl on the list to support in logitr though, there are just too many other things to add right now that are more pressing, like maybe support for xlogit.

from xlogit.

chevrotin avatar chevrotin commented on May 29, 2024

from xlogit.

arteagac avatar arteagac commented on May 29, 2024

Dear @chevrotin,
I finally had the chance to integrate into the docs the example you kindly created (see link below):
https://xlogit.readthedocs.io/en/latest/notebooks/wtp_space.html

Thank you again for the time you dedicated to develop this example.

Regarding the slight difference with logitr's final log-likelihood, I don't think it is caused by the fixed scale parameter. I think logitr also uses by default a fixed scale parameter, which you can change to random using the randScale option. I will confirm this with @jhelvy. Also, I am still working on figuring out if the multi-start approach helps xlogit to get a log-likelihood closer to logitr's one.

Dear @jhelvy,
Could you please kindly confirm the following two questions about logitr?

  1. Does logitr use by default a fixed scale parameter for models in WTP space?
  2. How does the multi-start approach sample values when you pass pre-computed initial coefficients? For instance, if the initial coefficients are based on the estimates of a model in preference space, how do does the multi-start approach use these pre-computed initial coefficients?

from xlogit.

jhelvy avatar jhelvy commented on May 29, 2024

Yes, logitr uses a fixed scale parameter by default. It's only random if the user specifies so using randScale.

In logitr, a user can specify a set of starting points using the startVals argument. These values are only used for the starting values for the first run if numMultiStarts > 1, after which random starting values will be used. If no startVals are provided, a zero vector is used for the first run, after which again random values are used.

from xlogit.

arteagac avatar arteagac commented on May 29, 2024

Thanks a lot for the clarification, John!

from xlogit.

Related Issues (19)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.