Create a ZIP file in a Backblaze B2 Bucket

Print
Dark
Light

Create a ZIP file in a Backblaze B2 Bucket

Print
Dark
Light

Article summary

Did you find this summary helpful?

Thank you for your feedback

If you have a set of files in Backblaze B2 Cloud Storage, you can automatically combine the files into a single ZIP file and store it in a Backblaze B2 bucket.

You can view a sample application on the Backblaze B2 GitHub page. This web app compresses a list of files into a specified ZIP file. Immediately after receiving and parsing a request, the app responds with HTTP 202 ACCEPTED. It then launches a background job to perform the work.

The app is implemented in Python using the Flask web application framework and the flask-executor task queue. You can run the app in the Flask development server, the Gunicorn WSGI HTTP Server, or a Docker container.

Create a Backblaze B2 Account, Bucket, and Application Key

Before you begin, complete the following tasks:

Create a Backblaze B2 account.
Create a Backblaze B2 bucket.
Create an application key with access to the bucket you want to use.
Copy the application key as soon as you create it because you cannot retrieve it later.

Download the Source Code

Use the following command to download the source code:

$ git clone [email protected]:backblaze-b2-samples/b2-zip-files.git
Cloning into 'b2-zip-files'...
remote: Enumerating objects: 60, done.
remote: Counting objects: 100% (60/60), done.
...
$ cd b2-zip-files

Configure Your App

The app reads a configuration from a set of environment variables. The easiest way to manage these, in many circumstances, is to use the .env file.

Use the following command to copy the included .env.template to .env:
```
$ cp .env.template .env
```

Edit .env, entering your application key, application key ID, bucket name, and endpoint:

LOGLEVEL=DEBUG
AWS_ACCESS_KEY_ID='<Your Backblaze B2 Application Key ID>'
AWS_SECRET_ACCESS_KEY='<Your Backblaze B2 Application Key>'
AWS_ENDPOINT_URL='<Your bucket endpoint, prefixed with https://, for example, https://s3.us-west-004.backblazeb2.com>'
BUCKET_NAME='<Your Backblaze B2 bucket name>'
SHARED_SECRET='<A long random string known only to the app and its authorized clients>'

Optionally, use the following command to configure different buckets for input and output files by replacing BUCKET_NAME:
```
INPUT_BUCKET_NAME='<Bucket with files to be zipped>'
OUTPUT_BUCKET_NAME='<Bucket for zip files>'
```
If you do configure two buckets, your application key must have permissions to access both of the buckets.

Run the App in Docker

Because Docker is the only prerequisite, it is the easiest way to run the app.

Build a Docker image, tagging it to make it easier to work with later:

$ docker build -t docker-user-name/b2-zip-files .
[+] Building 7.5s (12/12) FINISHED                                                                                     docker:desktop-linux
 => [internal] load build definition from Dockerfile                                                                                   0.0s
 => => transferring dockerfile: 978B                                                                                                   0.0s
 => [internal] load metadata for docker.io/library/python:3.10                                                                         0.9s
...

Start a Docker container, reading the environment variables from .env. Gunicorn is installed in the Docker container, and it is configured to listen on port 8000, so you must use Docker's "-p" option to bind port 8000 to an available port on your machine. For example, if you want the Docker container to listen on port 80, run the following command:
```
$ docker run -p 80:8000 --env-file .env superpat7/b2-zip-files:latest
[2024-06-28 23:04:47 +0000] [1] [DEBUG] Current configuration:
  config: python:config.gunicorn
  wsgi_app: None
...
DEBUG:app.py:Connected to B2, my-bucket exists.
```
After the app is running, send a request to the app.
Optionally, publish the image to a repository and run it in a container on any cloud provider that supports Docker. For example, to deploy the app to AWS Fargate for Amazon ECS, push your image to Amazon Elastic Container Registry and create an Amazon ECS Linux task for the Fargate launch type.

Run the App on a Local Machine

Create a Python virtual environment. Virtual environments let you encapsulate a project's dependencies. Backblaze recommends that you create a virtual environment using the following command:
```
$ python3 -m venv .venv
```
Activate the virtual environment before installing dependencies using the following command:
```
$ source .venv/bin/activate
```
Reactivate the virtual environment with the same command if you close your Terminal window and return to the app later.
Install Python dependencies using the following command:
```
$ pip install -r requirements.txt
```

After you configure the app, run the app using one of the following options: Flask or Gunicorn.

Run the app in the Flask development server.

Create a virtual environment, and install the dependencies. By default, the app listens on http://127.0.0.1:5000:

$ flask run
DEBUG:app.py:Connected to B2, my-bucket exists.
 * Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on http://127.0.0.1:5000
INFO:werkzeug:Press CTRL+C to quit

To configure a different interface or port, use --host and --port:

$ flask run --host=0.0.0.0 --port=8000 
DEBUG:app.py:Connected to B2, my-bucket exists.
 * Debug mode: off
INFO:werkzeug:WARNING: This is a development server. Do not use it in a production deployment. Use a production WSGI server instead.
 * Running on all addresses (0.0.0.0)
 * Running on http://127.0.0.1:8000
 * Running on http://192.168.69.12:8000
INFO:werkzeug:Press CTRL+C to quit
...

Run the app in Gunicorn.

Gunicorn does not read environment variables from .env, but you can use the shell to work around that if you run Gunicorn from the command line:

$ (export $(cat .env | xargs) && gunicorn --config python:config.gunicorn app:app)
[2024-06-28 14:21:43 -0700] [56698] [INFO] Starting gunicorn 22.0.0
[2024-06-28 14:21:43 -0700] [56698] [INFO] Listening at: http://0.0.0.0:8000 (56698)
[2024-06-28 14:21:43 -0700] [56698] [INFO] Using worker: sync
[2024-06-28 14:21:43 -0700] [56711] [INFO] Booting worker with pid: 56711
[2024-06-28 14:21:43 -0700] [56712] [INFO] Booting worker with pid: 56712
[2024-06-28 14:21:43 -0700] [56713] [INFO] Booting worker with pid: 56713
DEBUG:app.py:Connected to B2, my-bucket exists.
...

If you run Gunicorn as a service, ensure that you set the above variables in its environment.

After the app is running, send a request to the app.

Send Requests to the App

However you run the app, clients send requests in the same way, setting the Authorization and Content-Type HTTP headers and sending a JSON payload.

The Authorization header must be of the form Authorization: Bearer <your shared secret>, and the Content-Type header must specify JSON content: Content-Type: application/json.

The payload must be JSON in the following form:

{
  "files": [
    "path/to/first/file.pdf",
    "path/to/second/file.txt",
    "path/to/third/file.csv"
  ],
  "target": "path/to/output/file.zip"
}

For example, to send a request from the Mac/Linux command line, use curl with the option -i.

$ curl -i -d '
{
  "files": [
    "path/to/first/file.pdf",
    "path/to/second/file.txt",
    "path/to/third/file.csv"
  ],
  "target":"path/to/output/file.zip"
}
' http://127.0.0.1:8080 -H 'Content-Type: application/json' -H 'Authorization: Bearer my-long-random-string-of-characters'
HTTP/1.1 202 ACCEPTED
Server: gunicorn
Date: Fri, 28 Jun 2024 23:17:24 GMT
Connection: close
Content-Type: text/html; charset=utf-8
Content-Length: 0

As mentioned above, the app responds to the request immediately with 202 ACCEPTED. You should see the app's progress in the Flask/Gunicorn/Docker log output as in the following example:

[2024-06-28 23:17:24 +0000] [27] [DEBUG] POST /
DEBUG:app.py:Request: {
  "files": [
    "path/to/first/file.pdf",
    "path/to/second/file.txt",
    "path/to/third/file.csv"
  ],
  "target":"path/to/output/file.zip"
}
DEBUG:app.py:Opening my-bucket/path/to/output/file.zip for writing as a ZIP
DEBUG:app.py:Writing my-bucket/path/to/first/file.pdf to ZIP
DEBUG:app.py:Wrote my-bucket/path/to/first/file.pdf to ZIP
...
DEBUG:app.py:Finished writing my-bucket/path/to/output/file.zip in 11.175 seconds.
DEBUG:app.py:Read 1667163 bytes, wrote 1116999 bytes, compression ratio was 67%
DEBUG:app.py:Currently using 70 MB

Use a file name that does not already exist so that your client can periodically poll the target file name until it is available. The following minimal example shows you how to do so using Boto3, the AWS SDK for Python.

s3_client = boto3.client('s3')

while True:
    try:
        # Get information on the object
        s3_client.head_object(
            Bucket=bucket,
            Key=key
        )
        print(f'{bucket}/{key} is available')
        break
    except ClientError as err:
        if err.response['ResponseMetadata']['HTTPStatusCode'] == 404:
            # The object was not found - sleep for a second then try again
            time.sleep(1)
        else:
            # Some other problem!
            raise err

Next Steps

To view the entire project, see the Backblaze B2 GitHub page. You can fork this repository, and use it as a starting point for your own app. If you develop interesting ideas, please send them to [email protected].

Was this article helpful?

What's Next

Custom Domain Installation Process

Table of contents

Create a Backblaze B2 Account, Bucket, and Application Key
Download the Source Code
Configure Your App
Run the App in Docker
Run the App on a Local Machine
Send Requests to the App
Next Steps

Create a ZIP file in a Backblaze B2 Bucket

Create a Backblaze B2 Account, Bucket, and Application Key

Download the Source Code

Configure Your App

Run the App in Docker

Run the App on a Local Machine

Send Requests to the App

Next Steps

Related articles

What's Next