rtcto / rtc2git Goto Github PK
View Code? Open in Web Editor NEWA tool made for migrating code from an existing IBM's RTC SCM repository into a Git repository
Home Page: https://rtc.to
License: MIT License
A tool made for migrating code from an existing IBM's RTC SCM repository into a Git repository
Home Page: https://rtc.to
License: MIT License
The script is executing! Yay! Unfortunately, I'm seeing some errors with the lcsm commands, and the changes aren't being put in my git repo:
I've tried running with and without useProvidedHistory, and I get the same errors either way. I've also tried running with the stream's name instead of UUID. Any ideas on how to fix?
Due to the parameter "self.workspace" being put straight into the command line, if the workspace has any spaces in the name, any command using the workspace name will fail.
Me as a dedicated migration-person want to start my migration on a specific (old) baseline in order to get the complete history.
In order to achieve that, the baseline of the components (of the workspace) needs to replaced.
I often experience that lscm hangs whereas scm always works. It would be nice to add a configuration value for this.
Currently the master branch contains only the initial commit. For the user, this could be irritating if he clones the migrated repository the first time.
Following commands would make it possible to make a specific branch to a new master
git branch -m master initialCommit (rename master branch locally to something else)
git branch -m myStreamBranch master (rename branch where migration took place to master)
git push -f origin master (push the current master and override the existing one in the repo aka .git folder)
Here is what I know:
Stream A and Stream B have common history.
I migrated Stream A. The end result of File 1 in Git matched the end result in RTC, so all was well.
I migrated the portion of Stream B from the branching point forward. Then I rebased the history so that Stream A had a complete set of history. The end result of File 1 in Git did not match the end result in RTC.
It turns out that some of the Git commits were missed in the migration of Stream A prior to the branch point. I expected that any code changes that were not migrated as part of a Git commit (for example, if the script stopped because of a merge conflict and then was manually restarted) would be included in the very next Git commit. This would result in the code changes being associated with the wrong comment, but I was ok with that if it only happened occasionally. Upon investigation, that is not what actually happens. The code changes that are missed do not go into the very next Git commit--they go into the next Git commit that touches that file. In my case, the missed Git commit happened before the branching point. The catchup of the missed code change happened after the branching point. Unfortunately, in my case, this means that Stream B never got this code change--resulting in an incorrect file.
I'm going to think this through some more...would committing file changes whenever the script stops fix the problem?
After completing one stream, the script attempted to switch branches, but the following was displayed:
However, the script continued on with the lscm commands. Does this mean that the changes the script is accepting and committing are actually being committed into the previous branch?
Also, any ideas on what caused this problem and how to avoid it?
When migrating I experience that the script gives up on conflicting change sets if there is a need to accept more than two together.
If there is one change set followed by a "merge" change set everything works fine. But in the code base I am working on sometimes accepting e.g. five change sets together is required to avoid conflicts. I propose retryacceptincludingnextchangeset be made into a loop instead of just trying the next change set. This way it would continue discarding/accepting until it succeeds (or the author differs/comment does not contain the word merge).
Also to support unattended migrations adding an option to automatically attempt multiple accepts would be nice.
Currently you always need to specify a git reponame (like migration.git) and a workspacename for the rtc workspace (Migration_Workspace).
To improve usabilty, have a default value for these two.
The question should be "Do you want to recreate the workspace (name)? [Y/n]"
Default is Yes. N will reuse the workspace
With following commands workspaces can be listed:
lscm list workspaces -n "NAME" -r "URL"
The encoding should be configurable. If nothing is configured, the default encoding should be used (encoding = None).
This issue is resolved when the encoding is configurable in the config and a wiki-entry has been made about how to configure the encoding properly with the magic.properties
See discussion #26 (comment)
Currently I always used an existing prepared workspace to do the migration (in order to have less code to migrate, so that I can test the code faster and solve bugs like #7).
This issue should make it possible that it doesnt matter, if you prepared a workspace or let it create by the migration. The migration should deal well with both situations.
Probably #6 needs to be solved first.
Currently branches gets created and pushed.
However they never get merged into the master branch.
This should be done in the end
There should be two steps for the initial commit. One commit should be just the adding of gitignore.
Another initial commit should only happen if the load of the workspace creates files (e.g a git diff).
Sample-Comment: Im doing some "strange" changes
The git command will fail due missing escaping of the "" in the comment.
To prevent repository bloat, migrators want to ignore binary files (if so configured).
The configuration should at least list some file types to ignore, such as .zip or .jar.
Some background ...
These are reasons why to avoid big repositories:
http://blogs.atlassian.com/2014/09/ci-git-repos/
These are tips how to handle big files (if not otherwise possible)
http://blogs.atlassian.com/2014/05/handle-big-repositories-git/
Hi, thank you for you guide.
I am very confused, why there is an option OldestStream?
I want to migrate ALL history from a stream to git. Think about this case: The project only has one stream and all changesets are going to this stream.
So, in my understanding, the steps should be:
create a new git repo
list all changesets
loop the changesets in an order from old to new
checkout the changeset at a time
copy and commit to git repo
end loop
Thanks.
It would be really valuable if snapshots in RTC was represented by tags in the corresponding GIT history.
You'd need to read the list of snapshots of the stream since the workspace doesn't show them, but it should be possible to match the baselines of the components to the snapshots from the stream.
At the moment you have to convert your .jazzignore files by hand.
The tool prompts you to do so at the end of the migration.
The first step would be to do this automatically.
And the final version should also track those files during the migration.
My branch jazzignore
is intended to hold the implementation
If we migrate a RTC project with WorkItems and source code with rtc2jira and rtc2git we would like to maintain existing connections between commits and WorkItems.
In RTC it was possible to assign workitems to changesets and these connection we should keep.
If we have a commit-message on git with a certain pattern (NUMBER: WorkitemDescription), we can add a prefix so that the commits targets the new system.
Due the fact that we keep the same numbers on both systems, a prefix is sufficient and there isnt any need for having a conversion table (oldnumber to newnumber).
At the moment, when something bad happens or you just want to begin a migration on a existing git repo, you need to change code in order to resume the migration see wiki-entry
I think the script should detect if it should resume or not.
My script had been running for quite a while and had successfully created 3 git branches to match 3 of my 5 streams. As I was watching the script's output, I saw it try to get the list of changeentries, and it choked with an out of bounds type exception on splittledlines[somenumber]. (I didn't get a screenshot with the details). I restarted the script, and I noticed that it checked out the first branch in the stream list (not the branch it had been working on) and listed that it was getting 286 changes. I opened the workspace in RTC and confirmed that it was accepting changes in the very first stream listed in config.ini. It's as if the script did not resume where it had left off and instead started from the beginning.
The script then got stuck on a merge conflict, so I stopped it, resolved the conflict in RTC, and restarted the script. This time I saw something about getting changes 1/274. It's like the script resumed where it had left off in this case.
In what cases does the script start from the beginning and in what cases does the script resume where it left off?
It seems that > -> --> are all replaced with to when translating the commit message from Rtc to Git. Is this really necessary? I don't see why > should be disallowed in a git commit message.
I'm going through and verifying the post-migration content matches what is in RTC. I have a file that I'm assuming was renamed at some point during its history as the file begins with a lowercase in the migrated Git repo but does not in the RTC repo. The content of the files is identical. I'm wondering if the Git commands used in the script did not take into account the file rename? Does that ring any bells with you?
The situation was as follows:
Analysis
We figured out, that the workspace was reset to the oldest state.
Cause
The reason was that the only baseline on this stream didnt contain anything (Initial Baseline).
This means, setting the components to the baseline durnig the migration will reset the whole workspace, therefore resetting the migrated workspace.
The 2nd run of the migration triggered a reset of the components to the baseline.
Solution
Luckily we can detect this situation. If the part until the baseline creation is already migrated, we dont have any changesets to get accepted.
Therefore if we have 0 changesets to accept until the branchpoint, we just dont do anything there and continue with the comparing from the workspace with the stream
@romixch : You can link your commit/fix to this issue
@romixch @ohumbel : I created this issue for documentation purpose
Everytime we add a new option to the config file, we need to adjust our config files. Even if we dont need this new option.
To improve that, we should define fallback-values for most of the options (there were it makes sense).
Current implementation:
scmcommand = generalsection['ScmCommand']
Implementation with fallback values:
scmcommand = generalsection.get('ScmCommand', "lscm")
to be able to run lscm/scm tools by rtc you need to have environment variable JAVA_HOME set and add in scm.ini the -vm param.
We should describe that in the wiki and link it from the readme.
@romixch Your part? ๐ฏ ๐
When collecting change sets to accept together in case of merge conflicts, only change sets from the same component should be accepted together.
Thanks to a StackOverflow-User I found out, there is some way accessing all changesets from a component.
Its possible to use the command "lscm list changesets".
With this, the config-flag UseProvidedHistory (https://github.com/WtfJoke/rtc2git/wiki/Getting-your-History-Files) is probably not necessary anymore and should therefore be replaced completely using provided command above.
Fixing this issue, will prevent users from doing a lot of manual work by providing the history files
I have a workspace with multiple components having the same root directory names. So to avoid conflicts I need to specify -i when loading the workspace.
I propose this be made configurable (or maybe just always use -i when loading).
Me as a dedicated migration-person want that the changes get accepted/committed in terms of date/time similar to how they were commited in rtc in order to keep up with the git internal date.
Currently all changes of one component gets accepted (inside of the component ordered by date). After doing that it moves to the next component and repeats the process.
With this issue, the behaviour should change and the changes should only accepted sorted by date, independent of any component
It would be nice if everything was logged to a file - not only the accept messages.
Migration on Windows will fail if you have too long paths. The problem is in handle_captitalization_filename_changes where an os.chdir is executed. This will fail if the path is too long:
FileNotFoundError: [WinError 206] The filename or extension is too long
Instead of doing an os.chdir maybe call git ls-files with the folder as argument. Git can be configured to work with long paths like this: git config --system core.longpaths true
(this works for me at least).
An User reported, that there will be problems, when comments contains line breaks (see #20 (comment))
As a person who is migrating the rtc-repo, want that line breaks in the comments doesn't have any negative effect on the migration, in order that I can run my migration without any problems and the comments will be transferred 1:1 to git.
It would be nice to have an option to control what change sets the conflict resolver picks to accept.
Right now, only change sets belonging to the same author or change sets with "merge" in the comment text are accepted together. I have several examples where change sets from different authors need to be accepted together to resolve a conflict.
IMO, the resolver should continue accepting change sets together until there are no more change sets. - and only then give up. It should succeed at some point.
I want to implement command line support in order to make it easier to start multiple instances with different configurations.
The command line should also support resume function, so that it isnt necessary anymore to edit the script.
Some sample commands could be:
-c PATHTOCONFIG
-r resume
For some unknown reasons the rtc workspace contains conflicts and outgoing changes (despite the fact the skript doesnt check in anything in rtc) after the changes of the next stream gets accepted.
This issue should try to find the cause and/or triy to avoid such behaviour.
I tried to fix this issue already by trying out different approaches, but until now I didnt found any solution.
However this might be an issue of rtc itself... Somehow it seem it cant handle that.
One approach which should be tested is that the workspace gets compared directly with the baseline of the components of the headstream. This would result in a longer migration (each stream would get pulled up to the highest stream instead of branching of while pulling up)
My situation is as follows:
huo@BISONWS1256:~/stuff/temp/rtc2gitMigration/Architecture$ git status
On branch BP_Architektur_Stream
Changes to be committed:
(use "git reset HEAD <file>..." to unstage)
modified: Architekturdokumentation/build.gradle
renamed: Architekturdokumentation/src/main/asciidoc/arc42-template.adoc -> Architekturdokumentation/src/main/asciidoc/BisonProcessArchitekturModernisiert.adoc
huo@BISONWS1256:~/stuff/temp/rtc2gitMigration/Architecture$ git status --porcelain
M Architekturdokumentation/build.gradle
R Architekturdokumentation/src/main/asciidoc/arc42-template.adoc -> Architekturdokumentation/src/main/asciidoc/BisonProcessArchitekturModernisiert.adoc
huo@BISONWS1256:~/stuff/temp/rtc2gitMigration/Architecture$ git status -z
M Architekturdokumentation/build.gradle^@R Architekturdokumentation/src/main/asciidoc/BisonProcessArchitekturModernisiert.adoc^@Architekturdokumentation/src/main/asciidoc/arc42-template.adoc^@huo@BISONWS1256:~/stuff/temp/rtc2gitMigration/Architecture$
Here the ^@
denotes the zero delimiter.
Note that after the 2nd one there is a capital A which is part of the filename.
This leads to the following traceback:
Traceback (most recent call last):
File "migration.py", line 86, in <module>
migrate()
File "migration.py", line 68, in migrate
rtc.acceptchangesintoworkspace(rtc.getchangeentriestoaccept(changeentries, history))
File "/home/huo/gitrepos/rtcTo/rtc2git/rtcFunctions.py", line 213, in acceptchangesintoworkspace
Commiter.addandcommit(changeEntry)
File "/home/huo/gitrepos/rtcTo/rtc2git/gitFunctions.py", line 52, in addandcommit
Commiter.handle_captitalization_filename_changes()
File "/home/huo/gitrepos/rtcTo/rtc2git/gitFunctions.py", line 74, in handle_captitalization_filename_changes
os.chdir(directoryofnewfile)
FileNotFoundError: [Errno 2] No such file or directory: '/home/huo/stuff/temp/rtc2gitMigration/Architecture/hitekturdokumentation/src/main/asciidoc'
huo@BISONWS1256:~/gitrepos/rtcTo/rtc2git$
retryacceptincludingnextchangeset was recently broken. Now it only accepts nextchangeentry. It should accept change and nextchangeentry together.
Hi there,
right now we are also trying to migrate our RTC Content to GIT.
Any plans for a license? We would like to use und extend your software for our one-time migration (contributing our changes back to you)
Cheers
Michael
Thanks to a StackOverflow-User I found out, that there is some way to find the earliest baseline information of a component.
Following command can be used: lscm list baselines --components
Like that its probably possible to remove the "InitialBaselines" and the "Oldest Stream" options in the config.
This issue can be closed when either one of those or both options can be replaced or when a comment is written about the reason why this cant be accomplished.
Instead of migrating using baseline comparison I was wondering if it might be easier to just use change sets. So the migration process would be something like:
Would this not work?
One possible step during migration of a SCM-System is to have both systems running paralell to a certain point.
In that case I want to have an easy way, to keep the git-repository up to date. At the moment you can achieve the same by resume the script, but its a bit of an overhead.
So I like to have a special function, which only compares a workspace to the current stream and accept it one by one.
When migration is finished, it doesnt contain the most current changes from the stream after the baseline-tagging (eg hotfixes, version-fixes, fixes on certain releases).
I want that at the end of the migration that each branch is compared against his corresponding stream in order to get the latest changes which happend on this stream.
While the lscm
/scm
commands are properly quoted, the following problems still exist:
History file not found: ~/rtc2git/History/History_BT_Spider_'Cross main stream'.txt
Executed Command: "git show-ref --verify --quiet refs/heads/'Cross main stream'"
fatal: 'Cross main stream' is not a valid branch name.
Executed Command: "git push origin 'Cross main stream'_branchpoint"
fatal: 'Cross main stream_branchpoint' is not a valid branch name.
fatal: remote part of refspec is not a valid name in Cross main stream_branchpoint
I'm running migration.py, and I'm getting the following error:
'lcsm' is not recognized as an internal or external command, operable program or batch file.
Where can I get lcsm?
Occasionally, the script will report, "Press Enter to try to accept it with next changeset together, press any other key to skip this changeset and continue." When I press any other key, nothing happens. I have reproduced this several times. I have to stop the script and restart it.
I'm using Windows 7 and running the script in Command Prompt.
To keep all clean, the folder "Logs" should be always deleted when the script is started using initialize
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.