Testing the DR site
Not stopping at achieving rapid replication in five minutes flat, Hindustan Dorr-Oliver also tested its DR setup by running its business off the secondary site. By Harshal Kallyanpur
Engineering procurement construction company, Hindustan Dorr-Oliver moved from a legacy infrastructure to a SAP-based centralized ERP system sometime in 2008. The company is headquartered in Mumbai with a presence in cities such as Bangalore, Chennai, Delhi, Ahmedabad and Kolkata. It has a 1,300 strong workforce spread across 30 sites. It migrated almost 90% of its applications to SAP’s ECC 6.0 running on IBM P6 servers running the AIX OS.
For four years since then Hindustan Dorr-Oliver has been running end-to-end on the SAP ERP system. The company generates a lot of data based on the engineering projects that it undertakes. It stores this data in a repository, which serves as reference point for analysis and learning when it gets a new project with similar requirements.
“The SAP application takes in a lot of information from engineering projects and purchase requisitions for these projects are generated based upon this information. The availability of this information and the SAP application is critical for us,” said Ajay Khanvilkar, General Manager -IT, Hindustan Dorr-Oliver Ltd.
He explained that every project was allotted a certain budget and that the historical information helped the project team analyze and arrive at project requirements, which had fewer errors and helped ensure that purchase requisitions stayed within budget. This helped the company boost profitability as it got the project execution costs right from the get go.
All of these activities generated a large amount of data, which was traditionally backed up on to tape libraries in a fairly automated manner. The company’s project sites run 24×7 and its factories operate for up to 16 hours a day, while the corporate IT team is available only during office hours.
“The need was felt for an efficient disaster recovery mechanism, which would ensure round-the-clock availability of applications and data, even in the absence of the core IT team. After discussions with the management, we took the call to set up a secondary site for disaster recovery,” informed Khanvilkar.
The company’s primary data center, where the IBM servers are hosted, is located at Mumbai, while the secondary data center was set up at its factory in Ahmedabad. The secondary site mirrors the primary in terms of the application and operating systems stack but runs on higher configuration servers from IBM.
As the company’s IT infrastructure revolves around IBM servers running AIX, it was looking for a solution that would be a best fit for its DR requirements. It evaluated IBM’s DoubleTake RecoverNow for AIX. It had several discussions with the company’s channel partners to understand some of the use cases for the solutions, the challenges faced by other companies that had adopted it and also the benefits that they had gained from the deployments.
The implementation
After receiving positive feedback from several users and having concluded that the solution was a best fit for IBM AIX environments, Hindustan Dorr-Oliver went ahead and started implementing the solution in September 2010. The implementation was completed sometime in March 2011.
During the early days of the implementation, the company faced a challenge in terms of backing up data. The secondary site was ready with the servers and operating system, the SAP application and the databases in place. However the primary site in Mumbai had close to 230 GB of data that had to be backed up and restored at the secondary site for DR.
Manual replication failed
The company decided to back up the data after office hours on to a portable storage device and have it manually sent over to the secondary site. It would then be replicated at the secondary site and be made available before the company resumed business the next day at around 9 pm. This entire process would take about 12 to 15 hours.
“However, after backing up and restoring almost 60% of the data, we realized that the restoration process had failed and that the system could not back up the entire data. We lost almost a month’s time trying to solve this problem with several attempts being made,” said Khanvilkar.
Over the wire
The DoubleTake team from IBM, along with the IT team at Hindustan Dorr-Oliver, decided that data had to be directly backed up to the secondary site over an Internet connection. The company went in for a MPLS-based 2 Mbps VPN connection between its Mumbai and Ahmedabad data centers and sent the data for backup over the wire.
It took Hindustan Dorr-Oliver 12 hours to transfer the data from the primary to the secondary site. Once the transfer was completed, it wanted to ensure that the data at the secondary site was completely backed up and synchronized with that at the primary site.
“We decided to do a test run to determine if the data backup and restore process was successful. We asked the employees to go about executing transactions as if it was business as usual. At the secondary site, we monitored the DR infrastructure to check if data was being replicated in real-time. The DR solution managed to replicate data in just five minutes, much faster than we expected.”
According to Khanvilkar, engineering companies such as Hindustan Dorr-Oliver do not frequently generate large amounts of data. Therefore, the company tweaked the DR solution in such a way that data generated on the primary site would be replicated at the DR site at an interval of every 15 minutes.
Proof of the pudding
The company took its DR testing one step further and conducted a DR drill wherein it shut down all of its production sites, including the primary data center. It simultaneously moved its operations to the secondary data center and asked its employees to check if their data was consistent and that applications were available.
Khanvilkar commented that, given the consistency of data and applications, the employees were hardly able to distinguish whether the information was being delivered from the primary or the secondary site. The company continued to transact through the secondary data center before switching back to the primary after an hour, only to find that data generated on the DR site had been successfully replicated on the primary site and that the secondary site had seamlessly assumed its DR duties.
Benefits & RoI
The implementation has allowed the company to eliminate the costs, time and effort associated with its traditional method of manually transferring backed up data. The devices used for manually transferring data from primary to secondary could be used for only two or three tries after which they had to be discarded. The MPLS VPN line that the company now uses is a more reliable option with a predictable cost.
From a manpower standpoint, Hindustan Dorr-Oliver has a person dedicated to the task of keeping a check every hour on whether data is being replicated from the primary to the secondary site without any exceptions.
According to Khanvilkar, Hindustan-Dorr Oliver has not witnessed major downtime since the implementation of the DR solution. He explained that, given how the the downtime or non-availability of data and applications could affect the company’s revenue, Hindustan Dorr-Oliver was convinced that the investment made in the DR solution was justified. The company estimated that it would be able to see an ROI within six months to a year.