Comments (19)
Hmm in that case I guess I should also set the date filter such that it sort of overlaps the commit range specified in UPSTREAM
Oh I notice you have already mentioned this in your comment. Nevermind.
from pasta.
Generally, for development that's an excellent idea to work on small datasets. The problem is, we lack public inboxes from Jailhouse. I could give you my local mbox, but it's easier for you to choose Linux.
Choose Linux, edit the config in resources/linux/config and simply deactivate all huge lists. Just leave one small list activated. Let's say the alsa mailing list.
Then, choose a timewindow of one month, and reduce the amount of commits to roughly the same month. This makes things manageable on a desktop machine.
from pasta.
Choose Linux, edit the config in resources/linux/config and simply deactivate all huge lists. Just leave one small list activated. Let's say the alsa mailing list.
This does help. I was succesfully able to run pasta analyse rep
on the repository.
Although pasta analyse upstream
tries to cache around ~83k commits which hangs my system :(
Also as a side:
Since I already had a local clone of the linux clone on my system instead of running git submodule update linux
I generated a symlink called repo
and pointed it to my local clone of the linux repository
from pasta.
Ok, we can fix that.
Try in your config:
UPSTREAM = "v5.5-rc6..origin/master"
[...]
[mbox]
MINDATE = 2020-02-01
MAXDATE = 2020-03-01
and only activate the alsa-devel ML.
Thanks
from pasta.
Yep, exactly, see at the UPSTREAM range above. That roughly matches. Should be sufficient for playing around with PaStA.
from pasta.
Hmm, the given commit range and date filter did give me some output. But couldn't get any mappings between patches and commits. Need to play around with the above 2 parameters I guess.
Closing this thread.
from pasta.
Did not give you any mappings? That's strange. I had this configuration at a democase today, and we saw at least some (i guess it was 70 or so) mappings. Did you recreate the caches?
$ ./pasta sync -clear all
$./pasta sync -mbox -create all
from pasta.
Ah, another tip:
$ cd resources
$ git checkout master
$ git submodule update
Maybe you're running on a too old state of the resources.
from pasta.
$ git submodule update
This was taking an immense amount of time for me over a choppy network so I instead decided to use a local linux repo clone . That shouldn't be an issue I guess?
from pasta.
Did not give you any mappings? That's strange. I had this configuration at a democase today, and we saw at least some (i guess it was 70 or so) mappings. Did you recreate the caches?
$ ./pasta sync -clear all
$./pasta sync -mbox -create all
This didn't help,
My output file basically shows all patch equivalence classes.
Followed by all the upstream commits.
There are no mappings.
The only reason for this that I can see is me using a local clone instead of running git submodule update
. Will try with that I guess
from pasta.
No, that should not be a problem.
so you did run, in this order:
- analyse rep
- rate
- analyse upstream
- rate
?
from pasta.
Submodules are only used to have everything tied together. You can use local checkouts as well.
from pasta.
No, that should not be a problem.
so you did run, in this order:
- analyse rep
- rate
- analyse upstream
- rate
?
Yup the same order.
I also increased the mailbox span to 3 months.
from pasta.
Okay that's really strange. Please find my config here: http://vmexit.de/~ralf/config
Try to delete all caches (e.g.: rm resources/linux/resources/*pkl rm resources/linux/mbox-result), copy over the config and try:
$ pasta sync -mbox -create all
$ pasta analyse rep
$ pasta rate
$ pasta analyse upstream
$ pasta rate
Here on my machine, this gives me 84 mappings against upstream with default thresholds in that time window.
from pasta.
Will try this right now. Thanks for being so patient
from pasta.
Got some mappings now!! ✨
One thing I noticed was when I deleted the mbox-result
file and then it got recreated it had a significantly lesser number of lines.
This makes me wonder whether on rerunning analysis on a mailbox do we append to the mbox-result file instead of rewriting it?
from pasta.
Aah, I might know what went wrong:
Initially, mbox-result gets created when running the first time 'analyse rep'. This is the basis for all further analyses. Commits are added when 'analyse succ' is started.
So if you change the config, but you leave the old mbox-result, mails that aren't reachable any longer. I would have to look at the code what happens in that case.
The best thing is to start with a clean mbox-result after committing changes to the config.
from pasta.
Aah, I might know what went wrong:
Initially, mbox-result gets created when running the first time 'analyse rep'. This is the basis for all further analyses. Commits are added when 'analyse succ' is started.
You mean analyse upstream
right?
So if you change the config, but you leave the old mbox-result, mails that aren't reachable any longer. I would have to look at the code what happens in that case.
The best thing is to start with a clean mbox-result after committing changes to the config.
We could add a flag to clean mbox-result instead of doing so manually.
from pasta.
Aah, I might know what went wrong:
Initially, mbox-result gets created when running the first time 'analyse rep'. This is the basis for all further analyses. Commits are added when 'analyse succ' is started.You mean
analyse upstream
right?
Yes, sorry, mixed it up.
So if you change the config, but you leave the old mbox-result, mails that aren't reachable any longer. I would have to look at the code what happens in that case.
The best thing is to start with a clean mbox-result after committing changes to the config.We could add a flag to clean mbox-result instead of doing so manually.
Hmm. I'd rather abort in that case and ask the user for manual intervention. But wait, we actually already do: https://github.com/lfd/PaStA/blob/master/bin/pasta_analyse.py#L190
Did you see that warning during your analysis?
from pasta.
Related Issues (20)
- Combine PaStA with the cregit tool
- Compute relation between patch series HOT 8
- Collect user feedback on relating patches in patchwork tool to improve Pasta
- Determine the relevant entries and maintainers for a provided list of files
- [GSOC] Add a requirements.txt to make setup easier HOT 6
- Fix erroneous behaviour in LinuxMaintainers HOT 8
- Readme mentions 4 steps but only 3 are explicitly mentioned HOT 5
- Running "pasta analyse succ" in mbox mode doesnt show appropriate error message HOT 5
- Linux weekly digest HOT 7
- Patch groups file is not created HOT 6
- Support identification of kernel developers for improving the precision of analysis HOT 1
- Update Readme for Getting PaStA HOT 5
- Create a ML model for the patch recipients based on the recipients of sent patches HOT 9
- Derive a rule set for the patch recipients based on the existing email data
- Introduce Redis to handle resources HOT 3
- `git -C resources submodule update` is taking a huge amount of time HOT 6
- set_config shows invalid literal for int with base 10 HOT 11
- git and MAINTAINERS only: plot mailing lists over time HOT 1
- Have an option to only run representative analyses (No repository required) HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from pasta.