I wanted to analyse the jailhouse repo as it is much smaller than the other repos.

Analysis jailhouse repo with PaSta about pasta HOT 19 CLOSED

rsarky commented on August 29, 2024

Analysis jailhouse repo with PaSta

from pasta.

Comments (19)

rsarky commented on August 29, 2024 1

Hmm in that case I guess I should also set the date filter such that it sort of overlaps the commit range specified in UPSTREAM

Oh I notice you have already mentioned this in your comment. Nevermind.

from pasta.

rralf commented on August 29, 2024

Generally, for development that's an excellent idea to work on small datasets. The problem is, we lack public inboxes from Jailhouse. I could give you my local mbox, but it's easier for you to choose Linux.

Choose Linux, edit the config in resources/linux/config and simply deactivate all huge lists. Just leave one small list activated. Let's say the alsa mailing list.

Then, choose a timewindow of one month, and reduce the amount of commits to roughly the same month. This makes things manageable on a desktop machine.

from pasta.

rsarky commented on August 29, 2024

Choose Linux, edit the config in resources/linux/config and simply deactivate all huge lists. Just leave one small list activated. Let's say the alsa mailing list.

This does help. I was succesfully able to run pasta analyse rep on the repository.
Although pasta analyse upstream tries to cache around ~83k commits which hangs my system :(

Also as a side:
Since I already had a local clone of the linux clone on my system instead of running git submodule update linux I generated a symlink called repo and pointed it to my local clone of the linux repository

from pasta.

rralf commented on August 29, 2024

Ok, we can fix that.

Try in your config:
UPSTREAM = "v5.5-rc6..origin/master"
[...]
[mbox]
MINDATE = 2020-02-01
MAXDATE = 2020-03-01

and only activate the alsa-devel ML.

Thanks

from pasta.

rralf commented on August 29, 2024

Yep, exactly, see at the UPSTREAM range above. That roughly matches. Should be sufficient for playing around with PaStA.

from pasta.

rsarky commented on August 29, 2024

Hmm, the given commit range and date filter did give me some output. But couldn't get any mappings between patches and commits. Need to play around with the above 2 parameters I guess.
Closing this thread.

from pasta.

rralf commented on August 29, 2024

Did not give you any mappings? That's strange. I had this configuration at a democase today, and we saw at least some (i guess it was 70 or so) mappings. Did you recreate the caches?

$ ./pasta sync -clear all
$./pasta sync -mbox -create all

from pasta.

rralf commented on August 29, 2024

Ah, another tip:
$ cd resources
$ git checkout master
$ git submodule update

Maybe you're running on a too old state of the resources.

from pasta.

rsarky commented on August 29, 2024

$ git submodule update

This was taking an immense amount of time for me over a choppy network so I instead decided to use a local linux repo clone . That shouldn't be an issue I guess?

from pasta.

rsarky commented on August 29, 2024

Did not give you any mappings? That's strange. I had this configuration at a democase today, and we saw at least some (i guess it was 70 or so) mappings. Did you recreate the caches?

$ ./pasta sync -clear all
$./pasta sync -mbox -create all

This didn't help,
My output file basically shows all patch equivalence classes.
Followed by all the upstream commits.
There are no mappings.
The only reason for this that I can see is me using a local clone instead of running git submodule update. Will try with that I guess

from pasta.

rralf commented on August 29, 2024

No, that should not be a problem.

so you did run, in this order:

analyse rep
rate
analyse upstream
rate
?

from pasta.

rralf commented on August 29, 2024

Submodules are only used to have everything tied together. You can use local checkouts as well.

from pasta.

rsarky commented on August 29, 2024

No, that should not be a problem.

so you did run, in this order:

analyse rep

rate

analyse upstream

rate
?

Yup the same order.
I also increased the mailbox span to 3 months.

from pasta.

rralf commented on August 29, 2024

Okay that's really strange. Please find my config here: http://vmexit.de/~ralf/config

Try to delete all caches (e.g.: rm resources/linux/resources/*pkl rm resources/linux/mbox-result), copy over the config and try:

$ pasta sync -mbox -create all
$ pasta analyse rep
$ pasta rate
$ pasta analyse upstream
$ pasta rate

Here on my machine, this gives me 84 mappings against upstream with default thresholds in that time window.

from pasta.

rsarky commented on August 29, 2024

Will try this right now. Thanks for being so patient

from pasta.

rsarky commented on August 29, 2024

Got some mappings now!! ✨

One thing I noticed was when I deleted the mbox-result file and then it got recreated it had a significantly lesser number of lines.

This makes me wonder whether on rerunning analysis on a mailbox do we append to the mbox-result file instead of rewriting it?

from pasta.

rralf commented on August 29, 2024

Aah, I might know what went wrong:

Initially, mbox-result gets created when running the first time 'analyse rep'. This is the basis for all further analyses. Commits are added when 'analyse succ' is started.

So if you change the config, but you leave the old mbox-result, mails that aren't reachable any longer. I would have to look at the code what happens in that case.

The best thing is to start with a clean mbox-result after committing changes to the config.

from pasta.

rsarky commented on August 29, 2024

Aah, I might know what went wrong:

Initially, mbox-result gets created when running the first time 'analyse rep'. This is the basis for all further analyses. Commits are added when 'analyse succ' is started.

You mean analyse upstream right?

So if you change the config, but you leave the old mbox-result, mails that aren't reachable any longer. I would have to look at the code what happens in that case.

The best thing is to start with a clean mbox-result after committing changes to the config.

We could add a flag to clean mbox-result instead of doing so manually.

from pasta.

rralf commented on August 29, 2024

Aah, I might know what went wrong:
Initially, mbox-result gets created when running the first time 'analyse rep'. This is the basis for all further analyses. Commits are added when 'analyse succ' is started.

You mean analyse upstream right?

Yes, sorry, mixed it up.

So if you change the config, but you leave the old mbox-result, mails that aren't reachable any longer. I would have to look at the code what happens in that case.
The best thing is to start with a clean mbox-result after committing changes to the config.

We could add a flag to clean mbox-result instead of doing so manually.

Hmm. I'd rather abort in that case and ask the user for manual intervention. But wait, we actually already do: https://github.com/lfd/PaStA/blob/master/bin/pasta_analyse.py#L190

Did you see that warning during your analysis?

from pasta.

Analysis jailhouse repo with PaSta about pasta HOT 19 CLOSED

Comments (19)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent