Planning System Cutovers
Planning System Cutovers
By Dave Nielsen
The team has worked hard and has built a system that satisfies all the business requirements, tested it and verified that it meets all the quality standards set for it, and they’ve delivered it on time. Now you can sit back and congratulate yourself on what a great job you did managing this project. Well not quite…… Your job isn’t done until the new system is in production and responsibility for support has been turned over to operations. To get there you need to plan the perfect cutover.
Is the System Ready?
Before planning the cutover there are some questions that need to be answered, such as “What will the system perform like in the production environment?”, “Do we have all the data we need for production?”, “Can the new system support all the users who need to use it?” These are questions that should all be answered before you are ready for a production cutover. Let’s tackle the questions in order. The answer to the first question, obviously, should be “it will perform just as well as it did in the test environment” You can answer the question this way because a complete round of testing was performed in a test environment which duplicates the production environment in every way except the presence of the users. This environment is sometimes called the staging environment. The server which runs the system should be a clone of the production server. If the system runs in a distributed environment, both host and client should be cloned on a network that duplicates the production network. Frequently our new or updated systems must run in an environment with a standard operating system and additional “off the shelf” software. Operating systems and OTS software in the staging environment should be the exact duplicate of what they will be in the production environment and all software versions should be the same. Ensure that any patches that will be applied to the production system are also applied to the staging environment.
Should your new system require an updated operating system and/or updated ancillary software ensure that you will be installing the same operating system, ancillary software and patches on the production environment as on the staging environment. Testing your new or updated system on the same hardware and software it will run on in the production environment is a critical part of testing. Frequently software will behave differently depending on the hardware/OS combination it runs on. The system may work on the new combination but may behave differently depending on the environment so the full suite of tests should be used for testing in the staging environment.
Most systems nowadays require some information to be stored and retrieved. This may be minimal, such as a set of userids and passwords for user login, or it may be extensive requiring a large relational database. Data that must be propagated from the previous production environment must be identified and a plan crafted for capturing it and installing it in the new system during cutover. In the meantime, testing in the staging environment should be done with a data set that simulates the production environment as closely as possible. The data should mimic production as closely as possible in the areas of volume and distribution. This is usually accomplished in systems with a large relational database by taking a snapshot of the production data, translating it to the new data format, and loading it into the staging environment. This translation process is a key to the cutover. During the cutover, the process used to translate for loading into the staging environment must be repeated to port the data to the new production environment. Not only does the process need to be duplicable, it needs to be streamlined so download, translation, and load occur quickly.
Your project may or may not have delivered performance improvements. If it did, these improvements need to be verified in the staging environment. If no performance improvements were required, it should perform at least as well as the old. Testing should include measuring the performance of frequently used functions under load. For example, if the maximum number of users the system must support is 1,000, how quickly is the 1,000th user logged in? Benchmarks should be specified in the areas of performance, load, and stress testing and testing against these benchmarks should be performed in the staging environment. Only when all the benchmarks have been met or exceeded are you ready for cutover.
Are the Users Ready?
The system may be ready for the users but are the users ready for the system? New systems generally deliver new functionality which the business community needs to meet new market demands, a need to reduce effort, performance improvements, etc. Users must be readied so that they can take advantage of the new system as soon as it is activated. New functionality generally means training the user community but beyond this, they must be communicated with so that they know when the new system is implemented. Cutting over without notifying the user community, even a trained user community, will result in a deluge of calls to support.
Cutovers require a window during which neither new nor old systems are available. This is especially true of systems which use large volumes of data. Data must be frozen so that it can be downloaded from the old system, translated, and uploaded to the new system. Users will not have access to the data during this time so should be notified so that they can plan ahead for the period of inactivity. Cutting the new system into service during normal working hours without notifying the user community will certainly trigger a deluge of calls to support and may cause more damage because a deadline isn’t met. Your cutover will include a bulletin before the shutdown of the old system but your user community should be educated in the cutover process well in advance of the bulletin.
The Cutover Plan
Work that can be done without disrupting the production environment should be done in advance of the cutover. Tasks such as hardware installations, database installations, OS installations, software installations should all be done in advance of the actual cutover. The cutover plan needs to identify and schedule all the activities that must occur at the time of cutover. Cutover of new systems which replace existing ones will typically require the cutover to happen during off-peak hours. The time should be identified by your plan. Zero hour marks the start of your cutover activities. The plan should include the activity to be performed, the responsible prime, and the amount of time allotted to the task. Identifying a backup to the prime will give you an extra layer of security.
The first task will be the bulletin notifying the user community that the shutdown will occur at zero hour. You may want to issue several bulletins to ensure that all users receive the notification (i.e. users that log on 30 minutes before shutdown won’t receive the bulletin you send 1 hour before shutdown). The next task is the download of any data from the production environment. The download, translation, and loading of data should follow the procedure defined during testing.
There are several ways of approaching the cutover: provide a new production environment and retire the old one, use the existing production environment, and swapping the staging and production environments. Which approach you take will determine your next steps and will be influenced by how much money the organization has to spend on the system and how mission critical the system is. Providing a new production environment and retiring the old or swapping the staging and production environments will allow you to perform activities like hardware installation, database installations, software upgrades, etc. before the cutover. Reusing the existing production environment will likely require you to perform these activities during cutover. Each activity should be proven on the staging environment and timed so that a reasonable duration can be estimated for your cutover plan. Since the goal here is to limit the amount of down time, try to schedule as many activities in parallel as possible without risking failure. Once the production environment has been upgraded you can load the translated data.
The next step should be a “smoke test” of the new production system. The smoke test should be thorough enough to ensure that no cutover steps have been missed but streamlined enough so that it can be performed in a relatively brief duration. User logins are always a good candidate for smoke tests. Any work that is performed frequently by users is another.
The last step will be to perform any OS updates necessary to point users to the new system and notification that the system is ready. This bulletin is also an opportunity to inform users of any changes in the system, such as version numbers, new features, etc. Make the new version number obvious and point users to any documentation describing the upgrade in your bulletin. Arrange to have the new system monitored for a time after cutover. Since most cutovers occur during non-peak usage, the monitoring should last at least until the system experiencing a peak usage situation, for example a Monday morning.
Your plan should always be tested on the staging environment to ensure that it is complete (i.e. all necessary activities are identified) and that durations are reasonable. If the team has trouble completing a task at 10:00 am on a Tuesday when no-one is breathing down their necks, they will fail when they attempt it at midnight on Saturday with the VP Operations looking on.
Rollback Strategy
Remember I mentioned that a smoke test and monitoring are essential parts of the plan? What happens if the smoke test fails or monitoring reveals an unacceptable degree of system degradation during peak usage? The answer is a rollback. The rollback restores the previous system and data to the production environment and is like another cutover. The rollback strategy will depend on the cutover approach used. Does the production environment need to be altered to roll back or is it simply a matter of installing the old database and data in the staging environment and pointing users to it? The rollback strategy should be tested with the cutover plan to ensure that it works. This may seem like a lot of work, especially since you will likely have to roll the staging environment forward again before cutover, but it is well worth the effort especially on mission critical systems.
And Finally
The cutover plan, including the rollback strategy, should be reviewed with SMEs from previous cutovers and the support group. SMEs from previous cutovers are a valuable source of information about what works well in your organization and what doesn’t work well. Lessons Learned are another valuable source of information but the authors of the lessons are even more valuable. The support group will bear the brunt of any slip-ups in planning or executing the cutover so should feel comfortable that the plan hasn’t missed any steps and that execution will deliver them a supportable system when the handoff happens post cutover.
The cutover plan will be a key deliverable at the Gate Review meeting, or pre-Gate meeting, held to determine cutover readiness. The cutover plan won’t seem like an important deliverable during the rush to complete the project by the deadline but paying attention to the details of that plan well in advance of the cutover will be well worth your effort. Failure to pay attention to this critical deliverable will spoil all the hard work the team has expended to get you to this point. No matter how well the team performed during the rest of the project, a disaster at cutover time will be remembered long after the good work is forgotten. A bad cutover is like a fumble on the goal line at the Super Bowl. The running back may have gained 150 yards during the game but will be remembered for costing his team the game with his fumble.
Dave Nielsen is a principal with three O Project Solutions, the vendors of AceIt©. Dave was also the key architect responsible for the creation of the product. AceIt© has prepared Project Managers from around the world to pass their PMP® exams. You can find endorsements from some of his customers on three O’s web site (http://www.threeo.ca/).