Pixeltable Integration with Backblaze B2

Print
Dark
Light

Pixeltable Integration with Backblaze B2

Print
Dark
Light

Article summary

Did you find this summary helpful?

Thank you for your feedback

This guide shows you how to store and process images, video, audio, and documents in Backblaze B2 Cloud Storage while referencing them directly from Pixeltable using s3:// URLs.

Prerequisites

You must have Python 3.9+ and pip and complete the following steps.

Enable Backblaze B2 on your account.
Create a B2 bucket.
Consider Lifecycle Rules to transition or clean up temporary objects created by pipelines.
If you need immutable data retention, enable Object Lock on your B2 bucket before ingest (bucket-level setting).
Create an Application Key ID and Application Key with permissions to the target bucket.
Use separate Application Keys per environment (dev/stage/prod) and restrict to the minimum required bucket permissions.
For cross-account access, share credentials via your CI secrets manager and inject as environment variables at runtime.

Install Dependencies

Enter the following command to install the necessary dependencies:

# macOS / Linux
# python3 -m venv .venv && source .venv/bin/activate

# Windows
# python -m venv .venv && .venv\Scripts\activate

pip install --upgrade pixeltable boto3

Configure Credentials for Backblaze B2

Pixeltable can access s3:// objects using the same credentials lookup as boto3.

Set the following environment variables (replace values with your own). This means the system where your Pixeltable code runs (for example, your local machine, server, or container). If your bucket is in another region, substitute the correct regional S3 endpoint.

macOS/Linux
export AWS_ACCESS_KEY_ID="<your B2 application key ID>"
export AWS_SECRET_ACCESS_KEY="<your B2 application key>"
export AWS_ENDPOINT_URL="https://s3.us-west-004.backblazeb2.com"

Windows
# setx AWS_ACCESS_KEY_ID "<your B2 application key ID>"
# setx AWS_SECRET_ACCESS_KEY "<your B2 application key>"
# setx AWS_ENDPOINT_URL "https://s3.us-west-004.backblazeb2.com"

Verify your connection to Backblaze B2.

# Run this in your terminal after setting the environment variables
python3 - <<'PYCODE'
import boto3
s3 = boto3.client("s3")
print([b["Name"] for b in s3.list_buckets()["Buckets"]])  # should not error
PYCODE

# Run this in PowerShell after setting the environment variables
python - <<'PYCODE'
import boto3
s3 = boto3.client("s3")
print([b["Name"] for b in s3.list_buckets()["Buckets"]])  # should not error
PYCODE

Reference B2 objects from Pixeltable

Pixeltable supports inserting media via local paths, HTTPS URLs, and s3:// URLs. With the environment configured above, s3:// will resolve against your B2 endpoint.

Images

# Run this in your terminal after activating your virtual environment
# and setting your AWS environment variables
python3 - <<'PYCODE'
import pixeltable as pxt

images = pxt.create_table("media.images", {
    "id": pxt.Int,
    "img": pxt.Image,
})

# Insert objects that already exist in your B2 bucket
rows = [
    {"id": 1, "img": "s3://my-bucket/path/to/cat.jpg"},
    {"id": 2, "img": "s3://my-bucket/path/to/dog.png"},
]
images.insert(rows)
PYCODE

# Run this in PowerShell after activating your virtual environment
# and setting your AWS environment variables
python - <<'PYCODE'
import pixeltable as pxt

images = pxt.create_table("media.images", {
    "id": pxt.Int,
    "img": pxt.Image,
})

# Insert objects that already exist in your B2 bucket
rows = [
    {"id": 1, "img": "s3://my-bucket/path/to/cat.jpg"},
    {"id": 2, "img": "s3://my-bucket/path/to/dog.png"},
]
images.insert(rows)
PYCODE

Videos

# Run this in your terminal after activating your virtual environment
# and setting your AWS environment variables
python3 - <<'PYCODE'
import pixeltable as pxt

videos = pxt.create_table("media.videos", {"vid": pxt.Video})
videos.insert([{"vid": "s3://my-bucket/videos/clip.mp4"}])

docs = pxt.create_table("media.docs", {"doc": pxt.Document})
docs.insert([{"doc": "s3://my-bucket/docs/report.pdf"}])
PYCODE

# Run this in PowerShell after activating your virtual environment
# and setting your AWS environment variables
python - <<'PYCODE'
import pixeltable as pxt

videos = pxt.create_table("media.videos", {"vid": pxt.Video})
videos.insert([{"vid": "s3://my-bucket/videos/clip.mp4"}])

docs = pxt.create_table("media.docs", {"doc": pxt.Document})
docs.insert([{"doc": "s3://my-bucket/docs/report.pdf"}])
PYCODE

Running AI transforms and queries

You can index and search media or run models directly. Example: create an image embedding index and find similar images.

# Run this in your terminal after activating your virtual environment
# and setting your AWS environment variables
python3 - <<'PYCODE'
from pixeltable.functions.huggingface import clip

images.add_embedding_index(
    "img",
    embedding=clip.using(model_id="openai/clip-vit-base-patch32")
)

sample = images.where(images.id == 1).select(images.img).collect()[0][0]
res = (
    images.order_by(images.img.similarity(sample), asc=False)
          .limit(5)
          .select(images.id, images.img)
          .collect()
)
print(res)
PYCODE

# Run this in PowerShell after activating your virtual environment
# and setting your AWS environment variables
python - <<'PYCODE'
from pixeltable.functions.huggingface import clip

images.add_embedding_index(
    "img",
    embedding=clip.using(model_id="openai/clip-vit-base-patch32")
)

sample = images.where(images.id == 1).select(images.img).collect()[0][0]
res = (
    images.order_by(images.img.similarity(sample), asc=False)
          .limit(5)
          .select(images.id, images.img)
          .collect()
)
print(res)
PYCODE

Additional Resource

This notebook shows how to use Pixeltable to extract and manage video frames stored in Backblaze B2, organizing the frames into a queryable, multimodal dataset that can support downstream AI and data processing workflows such as indexing, transformation, and model inference.

Was this article helpful?

What's Next

Cloud Compute Integrations

Table of contents

Prerequisites
Install Dependencies
Configure Credentials for Backblaze B2
Reference B2 objects from Pixeltable
Running AI transforms and queries
Additional Resource

Pixeltable Integration with Backblaze B2

Prerequisites

Install Dependencies

Configure Credentials for Backblaze B2

Reference B2 objects from Pixeltable

Images

Videos

Running AI transforms and queries

Additional Resource

Related articles

What's Next