Migrating the Mail System to a New Server: Unexpected Problems.

July 30, 2013

There is plenty of information on the web regarding server migration. A lot of attention is dedicated to a choice of hosting provider, support quality, server configuration and operation system. But did you ever have to decide which IP address to choose for your server? We did not, but from now on we will.

Preparation Step

For various reasons we have decided to migrate our primary mail server to the new hosting provider. Finding the new one wasn’t a problem: there are lots of server configurations on the market which can satisfy even the most demanding client.

We are actively using virtualization on our servers for isolation of services and simplification of maintenance. Use of virtualization allows us to migrate services with minimal downtime and with minimal expenses. In our case it was necessary to move virtual machine to the new host, change its IP address and make changes to a DNS configuration.

Since the spam problem becomes more and more significant nowadays, it is not enough just to have a public IP address and fully qualified domain name (FQDN) for a mail system. The minimal requirements are:

server name should match its IP address (in example the server name mail.example.com should match the IP address 10.112.48.16);
an IP address should match its DNS reverse record.

Additional technologies like SPF, DKIM, etc., allow a receiving side to determine the authorization of the sender for a particular domain. As mail functioning is very important to us, we have taken some actions before migration to minimize mail system downtime:

we have copied the virtual machine image to the new host without system halt (that does not guarantee the integrity of the data within the virtual machine, as data are constantly changing during its operation);
a request have been made to the new hosting provider support to change reverse DNS zone settings for IP address 10.112.48.16 to match the domain name mail.example.com;
an update time for the information in the DNS zone for the mail.example.com domain was set to 10 minutes. Thus updated information will be available to other servers on the web in 10 minutes after the mail.example.com host’s IP address change (no matter if they recently accessed the mail.example.com domain or not);

Then we have delayed the final step of the migration till the end of the next day to ensure that our new DNS information update time settings are distributed for all servers.

Final migration step

Next day we have checked the settings on the new host once again, and as they were correct we have begun the final phase of the migration to the new host.

Then the mail service’s virtual machine has been shut down. As we have a backup for our mail service (backup mx), we should not lose any mail, and halting the main system isn’t critical for us. Then we have synchronized both the new and the old virtual machines’ data. At that moment it was necessary to copy only differences between two systems since we have already copied the data without system halt and now both systems are down. Thus we have guaranteed data integrity after the migration.

The new server’s IP address has been set at the system configuration, host name settings at the DNS have been changed to match the new address. Reverse server name for this IP address has been changed at the preparation step. There were only SPF settings in the DNS which should have been corrected, and the system then may be started. In 10 minutes (the time needed for the settings to become available to any server on the web) we have checked sending and receiving of the mail, system monitoring stats and had a look at the watch – the downtime of the system was less than 20 minutes (it’s up to you to decide whether is it a lot or not, but personally I was pleased).

The next day

The next day began with the claims from our support team that our customers do not receive our replies. So we have examined the mailing system logs looking for errors (we looked through the records related to the clients who did not receive our replies) and finally we have found that:

the most of our replies have been successfully received by mail servers of our clients;
a small part or our mail (clients of the office365 service) have been rejected with a reference to a bad sender reputation (in the report there was a mentioning of the office365 service internal IP addresses reputation database).

We had re-checked all the settings (server name, its IP address, reverse name of the server’s IP address, SPF settings) once again as the situation concerned our mailing system reputation – everything was right. Then we have checked the presence of our server in the block lists. For that purpose we used the mxtoolbox.com service. Neither the IP address of our server nor our domain name has been listed in blocking databases.

To solve the problem with addresses in the offece365’s database we have sent the unblock request to their support (in a couple of minutes we have received automatic response that our request have been accepted for consideration and the reaction time will be up to 24 hours). As time went on our support team tried to contact our clients using alternative methods and that of course had a negative effect on our company’s reputation.

One day later

24 hours later we still have not received any answer from the office365 support and no changes were made (mail was still rejected) and we decided that contacting all the mailing services will be ineffective, so we do need a new solution. As the problem was related to the new server’s IP address, the most obvious solution was to change the IP address. We requested an additional IP address from our hosting provider… and it was declined as there was no free IPv4 address. Almost all servers on the internet are using IPv4 address and for the first time we have faced the situation when hosting provider refused to sell an additional IP address.

Our new hosting provider has some other servers with unused 1-2 IP addresses and our new step was to copy our mailing system to those servers and to check all available IP addresses regarding sending mail to the office365 service. The result was astonishing: we have checked six different IP addresses from different ranges but all of them had the similar problems.

The situation was getting worse: we had an increasing number of customers who didn’t see our replies in their mailboxes, although replies were sent and even received. We have tested sending mail to a newly created Gmail account and our mail was indeed received but not to the `Inbox` folder but to the `Spam` one. Thanks to the “customer care” or for some other reason, Gmail user interface diligently hides the folder with the mail marked as Spam.

We began to ask our customers to add our addresses to their contacts and to check the Spam folder – many of them admitted later that our replies have been found at the Spam folder and they didn’t suspect that they have gotten replies from support. The problem was partially solved, but it wasn't enough.

spam

Next we tried to integrate the DKIM in addition to the SPF that we’ve been using already. Testing has indicated that in spite of the fact that our mailing system correctly authorizes all the outgoing mail – that has absolutely no influence on the chance of getting into the SPAM. Interesting fact: getting into SPAM was always detected with new accounts which had no previous correspondence with us and never happened with our own staff accounts.

For the sake of interest we have created new accounts in different mailing services: Gmail, Yahoo, AOL, etc., and most of them rejected our mail or marked it as spam. Oddly enough, the messages sent from these new accounts to each other were also marked as spam.

Three days later

Based on the testing results we have come to a point that the most significant influence on the evaluation of what “is spam or is not spam?” has the history of the server’s IP address which sends the mail. And it seems we have moved to the server whose IP address was recently used by spammers. On the brink of despair, when we were ready to move back to our old host, we have decided to try another one of our old servers from another provider which was never used for sending mail. It’s good that it had a free IP address.

After migrating the testing system to this “new old” server and setting all the addresses and DNS settings, everything just began to work. It’s rather amazing, but all the servers that claimed us as spammers and refused to work with us for the past few days just began to receive our mail and deliver it to our customers. Yep, just like that. Later, for testing purposes, we even remove DKIM and all other improvements made in a hurry in a try to solve the issue. Even without these improvements everything just works.

Resume

It is hard to summarize this situation or give some particular advises. We just hope that after reading this article you will be able to perform some actions which will allow you to avoid the problem we have faced. We’ll never know how and when this situation could have ended if we didn't have that old server.

P.S.

Two days later we have received the answer from the office365’s support that our request was handled and that the IP address was removed from the database. They have promised to make it in 24 hours but it took them 5 days.

Got any thoughts to share? Please do so in the comments below.

Zend_Date: New Year Problems Previous Post

Using APC to speed up Magento Next Post