Research Summary

Distributed Storage Systems (DSS) are widely being used as the backbone of such large-scale storage systems, in order to provide reliability and data availability. While storage units are individually unreliable and subject to transient or permanent failures, the data must be protected and made available for users’ access. This can be done by introducing redundancy in the data, which leads to a storage overhead. In addition, a considerable volume of network traffic is dedicated to the repair of failed storage nodes, as failures occur frequently in large-scale storage systems. While it is desired to simultaneously minimize the repair bandwidth and maximize the storage efficiency of the system, it is shown that there is a trade-off between them, and one can be optimized only at the cost of a loss in the other.

We have designed a novel coding scheme, called cascade regenerating codes, for distributed storage systems. Our construction provides encoding/decoding algorithms for storage as well as an efficient mechanism for the repair of failed storage units. These universally structured codes can operate in all the optimum points of the storage-bandwidth trade-off.