- Print
- DarkLight
Pixeltable Integration with Backblaze B2
- Print
- DarkLight
This guide shows you how to store and process images, video, audio, and documents in Backblaze B2 Cloud Storage while referencing them directly from Pixeltable using s3:// URLs.
Prerequisites
You must have Python 3.9+ and pip and complete the following steps.
Create a B2 bucket.
Consider Lifecycle Rules to transition or clean up temporary objects created by pipelines.
If you need immutable data retention, enable Object Lock on your B2 bucket before ingest (bucket-level setting).Create an Application Key ID and Application Key with permissions to the target bucket.
Use separate Application Keys per environment (dev/stage/prod) and restrict to the minimum required bucket permissions.
For cross-account access, share credentials via your CI secrets manager and inject as environment variables at runtime.
Install Dependencies
Enter the following command to install the necessary dependencies:
# macOS / Linux
# python3 -m venv .venv && source .venv/bin/activate
# Windows
# python -m venv .venv && .venv\Scripts\activate
pip install --upgrade pixeltable boto3Configure Credentials for Backblaze B2
Pixeltable can access s3:// objects using the same credentials lookup as boto3.
Set the following environment variables (replace values with your own). This means the system where your Pixeltable code runs (for example, your local machine, server, or container). If your bucket is in another region, substitute the correct regional S3 endpoint.
macOS/Linux export AWS_ACCESS_KEY_ID="<your B2 application key ID>" export AWS_SECRET_ACCESS_KEY="<your B2 application key>" export AWS_ENDPOINT_URL="https://s3.us-west-004.backblazeb2.com"Windows # setx AWS_ACCESS_KEY_ID "<your B2 application key ID>" # setx AWS_SECRET_ACCESS_KEY "<your B2 application key>" # setx AWS_ENDPOINT_URL "https://s3.us-west-004.backblazeb2.com"Verify your connection to Backblaze B2.
# Run this in your terminal after setting the environment variables python3 - <<'PYCODE' import boto3 s3 = boto3.client("s3") print([b["Name"] for b in s3.list_buckets()["Buckets"]]) # should not error PYCODE# Run this in PowerShell after setting the environment variables python - <<'PYCODE' import boto3 s3 = boto3.client("s3") print([b["Name"] for b in s3.list_buckets()["Buckets"]]) # should not error PYCODE
Reference B2 objects from Pixeltable
Pixeltable supports inserting media via local paths, HTTPS URLs, and s3:// URLs. With the environment configured above, s3:// will resolve against your B2 endpoint.
Images
# Run this in your terminal after activating your virtual environment
# and setting your AWS environment variables
python3 - <<'PYCODE'
import pixeltable as pxt
images = pxt.create_table("media.images", {
"id": pxt.Int,
"img": pxt.Image,
})
# Insert objects that already exist in your B2 bucket
rows = [
{"id": 1, "img": "s3://my-bucket/path/to/cat.jpg"},
{"id": 2, "img": "s3://my-bucket/path/to/dog.png"},
]
images.insert(rows)
PYCODE# Run this in PowerShell after activating your virtual environment
# and setting your AWS environment variables
python - <<'PYCODE'
import pixeltable as pxt
images = pxt.create_table("media.images", {
"id": pxt.Int,
"img": pxt.Image,
})
# Insert objects that already exist in your B2 bucket
rows = [
{"id": 1, "img": "s3://my-bucket/path/to/cat.jpg"},
{"id": 2, "img": "s3://my-bucket/path/to/dog.png"},
]
images.insert(rows)
PYCODEVideos
# Run this in your terminal after activating your virtual environment
# and setting your AWS environment variables
python3 - <<'PYCODE'
import pixeltable as pxt
videos = pxt.create_table("media.videos", {"vid": pxt.Video})
videos.insert([{"vid": "s3://my-bucket/videos/clip.mp4"}])
docs = pxt.create_table("media.docs", {"doc": pxt.Document})
docs.insert([{"doc": "s3://my-bucket/docs/report.pdf"}])
PYCODE# Run this in PowerShell after activating your virtual environment
# and setting your AWS environment variables
python - <<'PYCODE'
import pixeltable as pxt
videos = pxt.create_table("media.videos", {"vid": pxt.Video})
videos.insert([{"vid": "s3://my-bucket/videos/clip.mp4"}])
docs = pxt.create_table("media.docs", {"doc": pxt.Document})
docs.insert([{"doc": "s3://my-bucket/docs/report.pdf"}])
PYCODERunning AI transforms and queries
You can index and search media or run models directly. Example: create an image embedding index and find similar images.
# Run this in your terminal after activating your virtual environment
# and setting your AWS environment variables
python3 - <<'PYCODE'
from pixeltable.functions.huggingface import clip
images.add_embedding_index(
"img",
embedding=clip.using(model_id="openai/clip-vit-base-patch32")
)
sample = images.where(images.id == 1).select(images.img).collect()[0][0]
res = (
images.order_by(images.img.similarity(sample), asc=False)
.limit(5)
.select(images.id, images.img)
.collect()
)
print(res)
PYCODE# Run this in PowerShell after activating your virtual environment
# and setting your AWS environment variables
python - <<'PYCODE'
from pixeltable.functions.huggingface import clip
images.add_embedding_index(
"img",
embedding=clip.using(model_id="openai/clip-vit-base-patch32")
)
sample = images.where(images.id == 1).select(images.img).collect()[0][0]
res = (
images.order_by(images.img.similarity(sample), asc=False)
.limit(5)
.select(images.id, images.img)
.collect()
)
print(res)
PYCODE