Parallel Principle of Git Workflow

Submitted by Calvin on Mon, 11/05/2018 - 00:09
WF1

Background

I was working for a small web development company that was facing challenges juggling a lot of projects and developers of various skill levels. Platforms like Pantheon were starting to offer hosting options that made it "easy" to manage and test code before releasing it to production. Knowledge of git and version control was low. There was a combination of solo developing and project teams working together.

I encountered problems with code quality and deployment across several projects. I wanted to offer a picture of how this could be improved through a better development workflow and to eliminate uncertainty.

Many things came from this, including the consolidation of all project code to a central repository, and the use of virtualization tools to eliminate local setup issues.

A particular challenge this raised was confusion around the GIT workflow I suggested. Many developers only had limited experience with GIT. FTP was still in use in many cases. The only known workflow up until then was a serial one, as I describe below. I wrote this post, in an effort to clarify the differences.

Objective

The goal of our git branching workflow is a robust process that addresses 3 criteria:

  • Scalability: for project and team size.
  • Clear State: Anyone can look at the state of things and pick up where someone else left off. New code can get pushed through without being impeded by other work in progress.
  • Consistency: Same general workflow regardless of host or test environments.

Serial vs Parallel

A common git workflow is a serial process across 3 envrionments.

develop -> test -> live

In cases like Pantheon's standard workflow, there is only one branch, master which equals the 'dev' environment. Git tags are used to pin a commit in the master branch to the test and live environments while the dev environment is really the last commit of master.

Master branch looks like this (Letters represent commits):

A - B-live - C - D - E-test - F - G-HEAD

If everything in E gets sign off from the client the live tag moves to that E commit and the process starts over. A new test tag can be added to HEAD when the latest code is ready for testing.

A - B - C - D - E-live - F - G-HEAD-test

If I make a new feature by branching from master which is our current dev, that will include ALL the commits that exist in master, including any new code from other developers that has been created since the tag to live.

ALL that code has not been released to 'live'. Most of it untested or unapproved.

Consider again:

A - B-live - C - D - E-test - F - G-HEAD

What happens if we need to get commit G (a hot fix) to live but NOT include commits C, D, E, and F? You can't just tag G as live because commit G CONTAINS all those other commits. Commits that are not, and should not yet be live, if ever!

If your Git Fu is strong you can make a new branch from B-live and cherry pick the commit you want from G. But then how do you inject it into the commit history after the current live, but before all the untested, unapproved, or experimental stuff? Yes it can be done, but it's tedious and complex and gets worse as you scale to multiple devs or many active feature developments.

It's DIRTY!

The process is not robust because it requires specialized manual intervention to deal with a common issue.

In a parallel workflow, your branches are clean. Master will only EVER contain code that has been QA tested and Signed Off on by the product owner.

Our testing branches are as clean as possible so our tests are as close to the live experience as possible.

The test environments have separate branches so they can be strategically destroyed and recreated from the clean master when needed. (i.e stage after a live release, or develop whenever it's too dirty.)

We ALWAYS branch new feature branches from the CLEANEST and most STABLE branch in order to not accidentally poison the repo by including unapproved code as described in the serial workflow.

A parallel setup looks like this:

MASTER = LIVE

Each environment has it's own branch, cut from a clean master.

*--*--------  master
 \  \________  stage
   \_________  develop
  • master STABLE - Live code! This branch only ever receives code that is tested, singed off and clean.
  • stage MOSTLY STABLE - We believe this code is ready! This branch only receives features that have passed technical Qualtiy Assurance testing in the develop environment and are ready for review by the product owner.
  • develop UNSTABLE - Where we make sure the code is ready. This branch receives feature branches that may never progress beyond the develop environment. They may go back for further refinement or be abandoned.

Parallel flow looks like this (Feature branch from the cleanest env):

Make a clean develop branch at the beginning of project. master -> develop

Make a clean staging branch at the beginning of release cycle. master -> staging-x.y.z

Make feature branches for development tasks at the beginning of each task.

*---*---*---------------  master
 \   \   \
  \   \   *---*---*       feature/A--short-description
   \   \       \   \
    \   *-------\---*     stage
     \           \
      *-----------*----X  develop

Feature Branch Image

  • Feature: Merge to develop for testing and internal QA.
  • If feature passes testing on develop, merge the passed feature branch to stage for Client Approval and Sign Off.

Deploy a release to master (live), of a completed set of features in the staging branch.

*---*---*-------------------*---  master
 \   \   \                 /
  \   \   *---*---*       /       feature/A--short-description
   \   \       \   \     /
    \   *-------\---*---*         stage
     \           \
      *-----------*------------X  develop

[1] [2]

Release Image (with additional hot fix branch)

Doing it this way we can easily take any feature at any time to any environment if it passes approval. In the case of a hot fix, where it needs to go out before a larger staged release, it can be tested on develop and signed off on stage. It can then be tagged with its own release version (x.y.z where z increments for hot fixes), and deployed live to master because it will only include what's already in master plus ONLY the changes for the hot fix.

If a feature is branched from develop or stage, we're back in the same situation we had with the serial workflow. We have a bunch of code that is not related to our task and is not on master, contaminating our feature branch.

In those contaminated situations we have a breakdown in the process where it will take complex manual work to identify and extract the code we need from the code we don't want. It's a very inefficient break from standard operating procedure.

Does This Meet Our Objectives?

Scaling for Release and Team Size

First, let me be clear that I think the foundation of this, or any, git workflow is not about the size of the team or the project. It's first for quality control. So the process of moving chunks of code through standard stages of testing should not be the question. All code should be tested and then presented to the client for their own review and approval.

Our issues of scale are about how we provide the best, most consistent, Client and Developer experience, despite the scope of a release cycle or the size of a team.

Release Size

The freedom to move any feature or hotfix through the "state of code" at any time, improves the developer experience. Especially for a lead in charge of managing code from multiple sources simultaneously, as well as in slower moving maintenance projects where things may start and stop.

You can have a team working on a bundle of features for a release, or a single developer rushing a hotfix through to production fast. In fact, you can do both both simultaneously!

If a project has several developers working on multiple distinct sets of features at once, you could add additional types of branches to maintain those groups of code as "epics". Just keep the parallel principle in mind and be conscious of contamination.

An epic branches from master. Features branch from and PR against the epic. The epic branch presumably has closely related code from features that depend on each other. It can be tested in a separate epic environment, or the epic, once ready, can be treated as a "super feature", and moved through the test environments as a feature. Doing this cleanly requires careful consideration by someone with technical oversight on the project.

Clear State and Consistency

Scaling for Team Size is really about the Clear State and Consistency Objectives.

The complaint I hear here is, "this is too complex for one developer on a small project". I that case, I remind you, the purpose of these foundational steps is not about how many people are involved, it's about quality control. A solo developer requires the same quality control steps as a team, whether other team members perform QA tasks, or they do it themselves. It still needs to be done to make sure the new code plays nice with existing code, and we can deliver something of quality to the client.

[3] - A note on simplifying and automation

Once we get to a long term maintenance situation of a CMS that's actually complex software, it's valuable to know that anyone on the team can come to the code base at any time and know at a glance what's happening and how to complete their own tasks efficiently. They shouldn't be wondering what to do, what branch to create a feature from, where to merge, how to test, etc.

If the last person to work on this project is not around, can't remember, and it's (probably) not documented, someone has to analyze environments, tags and commit logs to figure out where it's safe to branch from, or how code is deployed to any given environment. You're increasing uncertainty and inefficiency in the process.

Pre-Release

What if your project is not live yet? There is no "live" branch to branch features from!

There are 2 ways you can go here. Use your test environments the way you normally would. If you have no "live", stage becomes your cleanest and most stable branch. At least we know it's QA tested and may or may not be signed off. [1] This may be ok because the whole thing is a work in progress with lots of time to find bugs.

If your project is managed in sprints where your goal is to get X set of features completed and signed off, at the end of each sprint you can commit the signed off work to 'master', even though it has no live environment. Master then, is still the source of truth for the next set of features in the next sprint.

Or, you could make it a part of your process to just merge signed off features into master, similar to that "deploy every feature immediately" scenario [2].

Conclusion

The concept I hope I've illustrated, is that we can save a lot of the hassle of trying to react to mistakes, and greatly improve Code Quality, Client Experience and Developer Experience by sticking to a parallel git workflow that effectively prevents unapproved code from entering a clean production branch.

It allows us to be flexible and adaptable based on project needs, but still follow the same core conventions. Developers will not have any confusion about how one project is run, vs another, nor about the state of any work in progress.

This leads to much less friction in day to day code and project management, even if a project stalls out for a while. It's easier to automate because there is less need of complex analysis for code management. We can use automation tools to test and deploy our key branches. [3]

[1] If we were to be VERY strict about contamination, we would not merge staging into master, we would create a release branch from master and only merge approved features into that, avoiding any possibility of including things from stage that may have failed Client Sign Off and gone back for more development work. I make the assumption that we have a target set of features to get passed on stage and released together.

[2] It's possible to make every feature it's own release. You could then deploy each feature on demand as soon as it gets sign off. If your project's development cycle is that hyper active this may be a good option. If your project has a more moderate tempo of development, then a grouped release saves the overhead of multiple live deployments, with the addition of hotfixes for rapid one off deployment.

[3] On a team where you have devs "soloing", tasks through to production and no additional QA team members to test features you might want to eliminate the step of testing in the develop environment. If you trust the solo developer (and the task description) to judge that the feature they create meets the criteria, developer could do testing on their local as develop environment. Automated testing is always helpful, but would be extra helpful in these cases, at least for key site functionality. You'll save the overhead of pushing to develop and waiting for QA to sign off before moving to stage. The developer can PR directly to staging.