Want to show your appreciation?
Please a cup of tea.

Saturday, May 30, 2015

Subversion (SVN) To Git Migration

Summary

There are many blogs and pages tell you how to use git-svn to do the work. But I found many of them works for simple use case but becomes cumbersome to use for complex, none-standard Subversion repository layout. This document try to provide a comprehensive guidance to make your migration from Subversion to Git a stress free process.

Repository Layout

Your repository layout has a deep impact to the effort of migration. Most of Subversion repository fall into one of below three layout categories.

Standard Layout

Standard layout repository strictly follows the trunk, branches and tags naming convention with exact case-sensitive top level directory names. There should be no other top level directory that you care about except those three.
WARNING: If you need to import any other top level directory as a branch, then it is no longer a standard layout so you need to follow instructions in None-Standard Layout section.
WARNING: Some of repositories use title case convention. i.e. with top level directory of Trunk, Branches and Tags. Those repositories are NOT standard layout due to the upper case letters. They are categorized as Semi-Standard Layout.
Note: For the purpose of this discussion the naming convention of individual branch or tag is irrelevant. Although in strict speaking, one would expect individual branch is named as major.minor (e.g. 1.2, 1.3) and tag is named as major.minor.revision (e.g. 1.2.0, 1.3.4).
To migrate Subversion repository with Standard Layout, follow the instructions in Using Stash Import Utility if you use Atlassian Stash as your GIT repository manager, or you can always use svn2git for any git repository.

Standard Layout repository example.

project-one
    +-- branches
            +-- branchname1
            +-- Branchname2
            +-- ...
    +-- tags
            +-- tagName1
            +-- tagname2
            +-- ...
    +-- trunk
Note: Repositories don't have all three top level directories still conform to standard layout as long as other top level directories conform to the naming convention. For example below repository doesn't have tag but it is considered as Standard Layout.
some-service
    +-- branches
    +-- trunk

Semi-Standard Layout

Semi-Standard Layout has similar directory structure as Standard Layout but doesn't follow the strict top level directory naming convention. It has similar trunk, branches and tags concept the top level directories are named differently, including some use title case, some use singular form, or some are completely different. There should be no other top level directory that you care about except those three.
WARNING: If you need to import any other top level directory as a branch, then it is no longer a standard layout so you need to follow instructions in None-Standard Layout section.
To migrate Subversion repository with Semi-Standard Layout, follow the instructions in Using Stash Import Utility if you use Atlassian Stash as your GIT repository manager, or you can always use svn2git for any git repository.

Below are some examples of Semi-Standard Layout.

Example 1: use title case Trunk, Branches and Tags.
project-two
    +-- Branches
            +-- branchname1
            +-- Branchname2
            +-- ...
    +-- Tags
            +-- tagName1
            +-- tagname2
            +-- ...
    +-- Trunk

Example 2: use singular trunk, branch and tag.
repo1
    +-- branch
            +-- branchname1
            +-- Branchname2
            +-- ...
    +-- tag
            +-- tagName1
            +-- tagname2
            +-- ...
    +-- trunk

Example 3: use main, features and releases instead of trunk, branches and tags.
STOP! Some repositories store binary artifacts under Releases directory, that is fundamentally different and we should never import any release binary into Git.
repo2
    +-- features
            +-- branchname1
            +-- Branchname2
            +-- ...
    +-- releases
            +-- tagName1
            +-- tagname2
            +-- ...
    +-- main
Note: Repositories don't have all three top level directories still conform to Semi-Standard Layout as long as other part of directory structure conforms to it. For example below repository doesn't have tag but it is considered as Semi-Standard Layout.
some-service
    +-- features
    +-- main

None-Standard Layout

None-Standard Layout refer to repositories that is not in Standard or Semi-Standard layout. Their branches are all reside under the root of project directory. For example:

project-three
    +-- Current
    +-- Dev
    +-- QA
    +-- UAT
    +-- Release (binary)
You'll need to use svn2git to migrate such repositories to Git.

Using Stash Import Utility

Stash has a build in utility to import Subversion repository as part of the repository creation wizard or thought the setting page of the repository. This can be used to import Subversion repositories with Standard and Semi-Standard Layout.
STOP! if you haven't read about the repository layout.
STOP! if you need resync from Subversion to Git later. Please consider using svn2git.
1. Create a new repository
  • If you already have the repository created and still empty, move on to next step.
  • If you need to reuse a repository that is not empty, sorry you cannot use this utility. Try to use svn2git.
Follow the instructions below to create a new Git repository in Stash.
  1. Go to your Stash web interface and login.
  2. Select a Stash project that is appropriate for the repository that you are creating.
  3. Click "Create Repository" button next to the Stash project name located in the upper-left corner of the project page. If you don't see the button, and you already signed in to Stash, then you need to request for permission.
  4. Enter the name of the repository, click on the "Create repository" button.
You should get an confirmation page telling you that an empty repository is now created. In the bottom of the page, you can find a button labeled as "Import from SVN", check on that to start importing. Move on to step No.3.  

2. Reusing an empty repository
If you have an empty repository that you want to use as target of migration. Go to the project and click on settings tab, if you don't see settings tab, you need to ask for permission to get there. In the settings page, click on the 'Import from SVN' link at the end of left menu to start importing.

3. Import repository
  • Enter the URL to the subversion repository that you are importing.
  • For Standard Layout project, you can leave Trunk, Branches and Tags field as default. Otherwise, enter the proper value for your repository. You can leave it the default if you don't have branches and/or tags.
  • Enter your username and password to access SVN repository.
Below table listed three examples:
Layout Trunk Branches Tags Comments
Standard trunk branches/* tags/* Leaves default value in place
Semi-Standard Trunk Branches/* Tags/* Enter the values for Trunk, Branches and Tags.
Semi-Standard Current branches/* tags/* For single branch import, enter the values for Trunk but leaving Branches and Tags as default.
Click "Import" button. Stash first start to gather the author list, this may take a few seconds to a few minutes and it will return back with a message:
Default authors mapping created. Review it, adjust if necessary and proceed with the import.

4. Review author mapping
Before Stash actually import, it gather author information and ask you to review author mapping. Click on "Continue" button and you'll get a popup windows for you to review the author mapping. Make necessary changes and click on "Continue" button on the popup.
Stash will start to import the repository and update the progress on the page. You can safely close the browser now. You will receive an email telling you that the import is in progress and receive another email when it is done.

Congratulations! Now you have completed the migration from Subversion to Git.

Using svn2git

svn2git is a great tool to migration from Subversion to Git. It uses git-svn under the hood but does a much better, cleaner job. svn2git is a better choice unless you are a guru of both Subversion and Git and a master of svn-git.

Install svn2git

Below lists the installation for various platform but your mileage may vary. If you get it running on a platform not listed below successfully, please update this wiki.
Mac/Linux
 sudo gem install svn2git
Windows (Command Prompto)
 gem install svn2git

Run svn2git

You always run svn2git in an empty directory. After executing svn2git, the directory becomes the root of the converted Git repository that mirrors the Subversion repository.
Note: If you executed svn2git or git-svn with wrong parameter and you want to start over again, you must delete entire directory first then recreate the directory, otherwise you'll get weird error.
WARNING: svn2git can take very long time for large repository, make sure to run on a machine that is close (low network latency) to the Subversion repository server.


Standard Layout Repository

To convert a Standard Layout repository, all you need to do is to run it with URL to Subversion repository. e.g.
mkdir project-one
cd project-one
svn2git https://svn.somecompany.com/svn/project-one/

Semi-Standard Layout Repository

Semi-Standard Layout repository has the same directory structure as Standard Layout repository. The difference is the directory name of trunk, branches and tags. You use corresponding option parameter to specify them. If you have no branch or tag, you can omit it.

Example 1
mkdir project-two
cd project-two
svn2git https://svn.somecompany.com/svn/project-two/ --trunk Trunk --branches Branches --tags Tags

Example 2
mkdir project-simple
cd project-simple
svn2git https://svn.somecompany.com/svn/project-simple/ --trunk Current

None-Standard Layout Repository

For None-Standard Layout repositories, the difficult part is the branches reside at the top level directory. Let's ignore them as the first step and run svn2git with parameters that best matches the Standard or Semi-Standard Layout.

Using project-three as an example,
project-three
    +-- Current
    +-- DEV
    +-- QA
    +-- UAT
    +-- Release (binary)

We will take DEV as trunk, take Current, UAT and QA as branches. Let's start with ignoring the top level branches and convert the DEV first.
 
mkdir project-three
cd project-three
git svn init --trunk DEV --prefix svn/ https://svn.somecompany.com/svn/project-three/

Wait for it to complete and then edit the .git/config file to add three lines to the svn-remote section for Current, UAT and QA.
WARNING: Make sure you have all branches well defined and there is no typo of the names, they are case-sensitive. If you missed one and want to include it later, you'll need to clean up the directory and start from scratch.
[svn-remote "svn"]
 url = https://svn.somecompany.com/svn/project-three
 fetch = DEV:refs/remotes/svn/trunk
 fetch = Current:refs/remotes/svn/Current
 fetch = UAT:refs/remotes/svn/UAT
 fetch = QA:refs/remotes/svn/QA
Note: You can also change the branch name in this process. For example, if you'd like to rename the Current branch in Subversion to Test in Git you can use
 fetch = Current:refs/remotes/svn/Test
Then run fetch and rebase. The fetch operation may take long time to complete if you have a big repository.
git svn fetch
svn2git --rebase

Push to Git Repository Server

If you don't already have a remote Git repository to push to, create one first. Then push all branches to the remote repository.
Find the URL to the remote Git repository, which is the same URL you used to clone remote repository. Run below commands in the directory of your local Git repository you just imported from Subversion, replace with the URL to the remote Git repository. 

git remote add origin <URL>
git push origin --all

Congratulations! Now you have completed the migration from Subversion to Git. You are safe to delete your local copy of the repository but if you need to resync from Subversion to Git again, you'll need your local repository.

Resync From Subversion to Git

You can start the migration when still allowing later changes in Subversion to be brought into Git. This is useful when you are moving an actively developed repository. Often you need to prepare the build scripts to use the new Git repository while allow development and build activity continue to occur in the old Subversion repository.
WARNING: Two way sync is neither tested nor recommended. We do not encourage to try have commits go into both old Subversion and new Git repository, then attempt to keep them in sync later. It will be difficult and risky if at all possible.
Go into the local Git repository that you have created and used for svn2git migration and run below commands

svn2git --rebase
git push origin --all

Q & A

Q: Can svn2git support author mapping?
A: Yes, see https://github.com/nirvdrum/svn2git#authors