An Innovative technology simplifying the process of data accessibility via geo spreading across multiple locations
Geo-spreading is the concept of distributing data across multiple storage systems, located across multiple locations. Object Storage technology can be used to achieve geo-spreading and help ensure uninterrupted availability of data, even when the entire datacentre is lost. This is possible because of the use of erasure coding in object storage. Unlike traditional data protection schemes this technique breaks the data into shards, each of which are encoded with parity and then stored across multiple storage media. The advantage of such an advanced algorithm is that even a small subset of the shards would be enough to re-gain all the data. The distribution of data across locations also increases the protection and availability of this data.
3 reasons why you should consider geo -spreading
– Data remains available even if there is a data center failure. Although the performance would be affected, the data is accessible because we no longer need to wait for the RAID rebuild to be complete.
– Replacement of the failed components can be done as per convenience, since there is no hindrance to data availability up to the erasure coding limits.
– Only a fraction of the RAID is needed for storing a host of shards.
While implementing erasure coding it is considered best to ensure that every single drive should hold not more than one shard of an object, and a node should hold only enough shards that it can survive to lose. This would facilitate data availability even in the case of failure. Take for example object storage solutions like Active Archive and ActiveScale; they employ an eighteen symbol object sharding approach that can survive the loss of up to five shards (18/5) in a single data center deployment. This would block access for up to five drives or nodes, however, protection against a data center outage would not be ensured.
Storing the data in a single location might prove to be dangerous for an enterprise or service provider. In the event of a natural disaster or any other incident at this location, the entire data center would go offline or access to it could be blocked. In order to prevent such a scenario organizations have commonly decided to deploy the data across multiple geographic locations or zones. As the data would be present in different cities, their source of power would be different and they would not get affected by the same natural calamity simultaneously. In case it is not feasible for companies to invest in different geographic locations, then at the minimum the data can be distributed across different buildings present in the same campus.
Key Considerations
While using a three -datacentre geo spread arrangement; the erasure coding should be able to permit the rehydration of data even if only one-third of the shards are unavailable. This would help ensure that component failures can be managed and in case of an entire data center going offline, the accessibility to data is not lost. Initially, geo-spreading of data would be a tedious task as data management would become complex with triple the volume of data, and every location would require separate management. However, due to the single namespace associated with the 3-geo configuration, the entire storage can be managed centrally.
Why You Should Care
Protecting access to your data is a paramount concern for any organization. Therefore, it is important to understand and accept the importance of data protection. Sooner or later, there will be one piece of hardware that will fail. However, with the use of object storage technology, even if a drive or node fails; the system would start creating replacement shards automatically. Through the 3-geo configuration, these archival systems would be able to deliver robust durability which means that there is a low probability of you suffering data loss.
Authored by Vivek Tyagi, Director for India business development, SanDisk brand commercial sales and support, Western Digital Corporation