Cloud Replication
The Backblaze Cloud Replication feature enables the copying of data from a source bucket to a destination bucket through the Backblaze B2 Native API.
Backblaze's Cloud Replication feature provides for replicating one or many objects to one or multiple buckets within or between regions. This feature can help to serve multiple use cases, like ensuring data set copies are geographically distributed to support disaster recovery objectives, reducing concentration risk, serving compliance needs, or helping to move data closer to end users for faster access irrespective of edge infrastructure.
What Cloud Replication Provides
The Cloud Replication feature allows you to automate the following:
- Configuring a rule to replicate data between a Source bucket and Destination bucket
- Setting automatic retries for any transient errors encountered during replication
- Adding the
replicationStatus
field to every source and destination file.
Key Notes
Key areas to keep in mind when using the Cloud Replication feature:
- This feature currently requires the B2 Native API. The S3 Compatible API does not include Cloud Replication at this time. However, once a replication rule is created, it applies to all files, regardless of whether they are created by the B2 Native or S3 Compatible API.
- When replicating across regions (replicating from US to Europe for example), you will need an account set up for each region as each account can only be associated with one region.
- When configuring a replication source, you must provide an Application Key that has read access to the bucket. Similarly, for destination buckets, an Application Key that has write access to the destination bucket must be provided.
- It is important to set up the destination bucket before the source bucket. If you try to set the replication configuration for a source bucket which references a destination bucket that does not exist, then the configuration attempt will fail.
- Ensure both source and destination are configured correctly for billing, if they are not you will be unable to save the related Cloud Replication rule.
- You cannot use a Master Key as the Application Key for Cloud Replication.
- If Object Lock is set on a source bucket, it must also be enabled on the destination bucket or the replication will return a failed status.
- When creating the replication rules the
writeBucketReplication
must be set on the Authentication Key (readBucketReplications
provides the ability to view but not change replication rules.) - To replicate all files in a bucket, compared to only new files being replicated, set the value of
includeExistingFiles
to true. See below for more details to the fields that can be set for cloud replication rules. - Once a file has begun replicating; transitioned to a
pending
replication status, the replication rule's Source Application Key at the time of transition of will be used to perform the replication. - If the replication rule is deleted or updated after the file has transitioned into a
pending
replication status, the replication will still be performed. If failing the replication is desired, deleting the Source Application Key, or the DestinationsourceToDestinationKeyMapping
will fail the replication.
Note that the following are not replicated:
- Replica Files: Any files that are replicated cannot be set as the source of a new replication.
- SSE-C files: Encrypted files managed by the user.
- Hide Markers: Hide markers will not be replicated. This means that the files hidden in a source will not be hidden in the destination. Hide markers can be applied to the destination after replication completes.
- Metadata Changes: Once a file has been replicated, the replica's metadata will not be updated when the source file's metadata (e.g. object lock settings) changes.
Lifecycle rules apply to replica files in the same manner as they apply to source files. When replicating files between buckets with different lifecycle rules, be aware of the effect that the different lifecycle rules will have on your files.
Example 1:
- You have a source bucket with lifecycle rules that will hide files after 30 days and then delete them five days later.
- You have a destination bucket with lifecycle rules that will hide files after 10 days and then delete them one day later.
In this situation, you might have a file in the source bucket which is 15 days old. If you replicate that file to the destination bucket, the file will become hidden. Specifically, the file will become hidden the next time the system applies the lifecycle rules. Lifecycle rules are applied once per day.
Example 2:
- You have a source bucket with lifecycle rules that will hide files after 10 days and then delete them one day later.
- You have a destination bucket with lifecycle rules that will hide files after 30 days and then delete them five days later.
In this situation, you might have a file in the source bucket which reaches an age of 11 days. Due to the source bucket's lifecycle rules, this file will become hidden after 10 days and then deleted on the 11th day. However, if you replicate that file to the destination bucket before it is deleted, the file will remain visible in the destination bucket until it is 30 days old, even though the original file was already deleted from the source bucket.
Creating a Cloud Replication Rule with the B2 Native API
Using the API
There are three steps to leveraging the Cloud Replication feature:
- Create the Cloud Replication Rule
- Add a new file or a new version of a file
- Check the replication status of the file by looking at the source and destination metadata
Note on Multiple Rules
Cloud Replication allows the creation of two rules each bucket. This means that each file can be used as a source in two rules. Adding any further rules will result in an error. Destination buckets can have more than two buckets as the holding bucket for many replicated sources.
1. Creating a Replication Rule
In common with all B2 Native API operations, you must first authenticate via b2_authorize_account
. Use the following functions to manage Cloud Replication rules.
Using the APIs
There are three steps to leveraging the Cloud Replication feature:
b2_create_bucket
: creates a new bucket in your accountb2_update_bucket
: updates a bucketb2_list_bucket
: lists all buckets in your account
To create a replication rule, include replicationConfiguration
when you call b2_create_bucket
or b2_update_bucket
. At least one of asReplicationSource
or asReplicationDestination
is required, but they can also both be present.
For each replication, the user has to apply one rule for setting up source with asReplicationSource
section and second rule for setting up the destination with asReplicationDestination
section.
Example of a Source rule:
{ "accountId": "12f634bf3cbz", "bucketId": "e1256f0973908bfc71ed0c1z", "replicationConfiguration": { "asReplicationSource": { "replicationRules": [ { "destinationBucketId": "3f46fe8276c62b414506021y", "fileNamePrefix": "", "includeExistingFiles": false, "isEnabled": true, "priority": 1, "replicationRuleName": "replication-us-east" } ], "sourceApplicationKeyId": "00512f95cf4dcf0000000004z" }, } }
Example of a destination configuration, which is deployed on the destination account. The source of this bucket is set up by source bucket owner in source configuration, so destination only needs source key ID to map the two buckets:
{ "accountId": "12f634bf3cbz", "bucketId": "e1256f0973908bfc71ed0c1z", "replicationConfiguration": { "asReplicationDestination": { "sourceToDestinationKeyMapping": { "00512f95cf4dcf0000000004z": "00512f95cf4dcf0000000004y", } } } }
The source replication configuration is set within the asReplicationSource
section, and the destination replication app key mapping is set within the asReplicationDestination
section.
Cloud Replication rules created by the information above can then be turned off by calling b2_update_bucket
setting the isEnabled
parameter to false. To delete a rule, call b2_list_buckets
, if necessary, to obtain the bucket's list of replication rules, remove the rule from the replicationRules
JSON array, and call b2_update_bucket
, supplying the modified replicationRules
. Note, the API layer allows the rules to be turned off or deleted even when created from the Web UI.
When creating a replication rule, you must provide an Authentication Key that has read-access for the source bucket and write-access for the destination bucket. Also, keep in mind that failures may occur due to billing issues. Make sure the source and destination are set up to accommodate the fees associated with Cloud Replication. Contact support if you run into any issues.
Call b2_list_buckets
to retrieve all of the replication rules associated with one or more buckets. Please keep in mind that deleted Replication Rules will not be returned.
Each replication rule has the following fields:
destinationBucketId
: The ID of a bucket to replicate new file uploads to (cannot be the same as the source bucket).fileNamePrefix
: Only files matching this prefix will be replicated.priority
: Priority for resolving conflicts if two or more rules conflict. In general a file will be replicated according to all rules. However, if two or more rules have the same destination, then the priority value will be used to select which rule to consider. A higher priority value wins.isEnabled
: true/false boolean indicating if the rule is enabled. If the rule is disabled it will not be applied. In the front end, this value represents the unpaused state.includeExistingFiles
: true/false boolean indicating if existing files in the bucket will be replicated.replicationRuleName
: Customer provided string that serves as a name or description for the rule.
Note on Authentication Keys:
To successfully create a working replication rule, the authentication keys must have specific rights.
For Source buckets, the Authentication Key specified must have the following rights:
readFiles
readFileLegalHolds
readFileRetentions
For Destination buckets, the Authentication Key specified must have the following rights:
writeFiles
writeFileLegalHolds
writeFileRetentions
Note on File Name Prefix:
The fileNamePrefix
parameter allows API calls to specify a prefix to restrict replication to files with matching names, this feature is not available in the Web User Interface.
2. Adding a New File (or Version)
There is no difference to the process of uploading files to bucket(s) with Cloud Replication rules. But keep the following in mind:
- New files, and new versions of existing files, will be replicated regardless of whether they are created via the API or Web UI.
- Keep in mind that the replication engine is on a distributed system, and so the time to complete replication is based on the number of other replication jobs scheduled, number of files to replicate, and the size of the files to replicate.
3. Check Replication Status
File Info Metadata
To see the replication status and if the file is a replica itself, call b2_get_file_info
. The field replicationStatus
provides the given file's replication status. Each file supports two rules that can be defined.
The replicationStatus
may have one of the following values:
pending
: This file is in process of being replicated. If there are two rules, at least one of the rules is in process. Check again later to see if it has left this status.completed
: This represents a successful replication. If two rules are configured, both rules have completed successfully.failed
: A non-recoverable error has occurred, such as insufficient permissions. The system will not try again to process this file. If two rules are configured, at least one has failed.replica
: This file was created by the replication process. Note that any replica files cannot be used as the source for further replication.
Existing File Replication
When a replication rule is configured to include existing files, the replica files are created with the same upload timestamp as their source file. This has several ramifications:
- Backblaze does not guarantee any ordering of individual file version replications. While replication of existing files is ongoing, it is possible that a filename-based query will have different results between the source and destination buckets. This situation is temporary, and both buckets will provide the same response once the existing files have finished replicating. Consider disabling lifecycle rules in the destination bucket if this will be an issue.
- Existing files will be replicated to the destination bucket regardless of the destination bucket's lifecycle rule configuration. It is possible that some newly created file replicas will be hidden or deleted when lifecycle rules are next applied (typically every 24 hours).
- Replica files are created with the same upload timestamp as their source file. If the destination bucket has a default object lock retention period set, the retention period is calculated based on the timestamp inherited from the source file.
Account Setup
Cross-Regional Replication and Authenticated Accounts
To create a Replication Rule that copies from the US East to US West, you must connect two separate accounts. You will need to login into the accounts separately to manage the files. When you set up a Cloud Replication Rule in the Web Console for the source account, you can authenticate to an account in the destination region and select a destination bucket.