Pixeltable Integration with Backblaze B2
    • Dark
      Light

    Pixeltable Integration with Backblaze B2

    • Dark
      Light

    Article summary

    This guide shows you how to store and process images, video, audio, and documents in Backblaze B2 Cloud Storage while referencing them directly from Pixeltable using s3:// URLs.

    Prerequisites

    You must have Python 3.9+ and pip and complete the following steps.

    1. Enable Backblaze B2 on your account.

    2. Create a B2 bucket.
      Consider Lifecycle Rules to transition or clean up temporary objects created by pipelines.
      If you need immutable data retention, enable Object Lock on your B2 bucket before ingest (bucket-level setting).

    3. Create an Application Key ID and Application Key with permissions to the target bucket.
      Use separate Application Keys per environment (dev/stage/prod) and restrict to the minimum required bucket permissions.
      For cross-account access, share credentials via your CI secrets manager and inject as environment variables at runtime.

    Install Dependencies

    Enter the following command to install the necessary dependencies:

    # macOS / Linux
    # python3 -m venv .venv && source .venv/bin/activate
    
    # Windows
    # python -m venv .venv && .venv\Scripts\activate
    
    pip install --upgrade pixeltable boto3

    Configure Credentials for Backblaze B2

    Pixeltable can access s3:// objects using the same credentials lookup as boto3.

    1. Set the following environment variables (replace values with your own). This means the system where your Pixeltable code runs (for example, your local machine, server, or container). If your bucket is in another region, substitute the correct regional S3 endpoint.  

      macOS/Linux
      export AWS_ACCESS_KEY_ID="<your B2 application key ID>"
      export AWS_SECRET_ACCESS_KEY="<your B2 application key>"
      export AWS_ENDPOINT_URL="https://s3.us-west-004.backblazeb2.com"
      Windows
      # setx AWS_ACCESS_KEY_ID "<your B2 application key ID>"
      # setx AWS_SECRET_ACCESS_KEY "<your B2 application key>"
      # setx AWS_ENDPOINT_URL "https://s3.us-west-004.backblazeb2.com"
    2. Verify your connection to Backblaze B2.

      # Run this in your terminal after setting the environment variables
      python3 - <<'PYCODE'
      import boto3
      s3 = boto3.client("s3")
      print([b["Name"] for b in s3.list_buckets()["Buckets"]])  # should not error
      PYCODE
      # Run this in PowerShell after setting the environment variables
      python - <<'PYCODE'
      import boto3
      s3 = boto3.client("s3")
      print([b["Name"] for b in s3.list_buckets()["Buckets"]])  # should not error
      PYCODE

    Reference B2 objects from Pixeltable

    Pixeltable supports inserting media via local paths, HTTPS URLs, and s3:// URLs. With the environment configured above, s3:// will resolve against your B2 endpoint.

    Images

    # Run this in your terminal after activating your virtual environment
    # and setting your AWS environment variables
    python3 - <<'PYCODE'
    import pixeltable as pxt
    
    images = pxt.create_table("media.images", {
        "id": pxt.Int,
        "img": pxt.Image,
    })
    
    # Insert objects that already exist in your B2 bucket
    rows = [
        {"id": 1, "img": "s3://my-bucket/path/to/cat.jpg"},
        {"id": 2, "img": "s3://my-bucket/path/to/dog.png"},
    ]
    images.insert(rows)
    PYCODE
    # Run this in PowerShell after activating your virtual environment
    # and setting your AWS environment variables
    python - <<'PYCODE'
    import pixeltable as pxt
    
    images = pxt.create_table("media.images", {
        "id": pxt.Int,
        "img": pxt.Image,
    })
    
    # Insert objects that already exist in your B2 bucket
    rows = [
        {"id": 1, "img": "s3://my-bucket/path/to/cat.jpg"},
        {"id": 2, "img": "s3://my-bucket/path/to/dog.png"},
    ]
    images.insert(rows)
    PYCODE

    Videos

    # Run this in your terminal after activating your virtual environment
    # and setting your AWS environment variables
    python3 - <<'PYCODE'
    import pixeltable as pxt
    
    videos = pxt.create_table("media.videos", {"vid": pxt.Video})
    videos.insert([{"vid": "s3://my-bucket/videos/clip.mp4"}])
    
    docs = pxt.create_table("media.docs", {"doc": pxt.Document})
    docs.insert([{"doc": "s3://my-bucket/docs/report.pdf"}])
    PYCODE
    # Run this in PowerShell after activating your virtual environment
    # and setting your AWS environment variables
    python - <<'PYCODE'
    import pixeltable as pxt
    
    videos = pxt.create_table("media.videos", {"vid": pxt.Video})
    videos.insert([{"vid": "s3://my-bucket/videos/clip.mp4"}])
    
    docs = pxt.create_table("media.docs", {"doc": pxt.Document})
    docs.insert([{"doc": "s3://my-bucket/docs/report.pdf"}])
    PYCODE

    Running AI transforms and queries

    You can index and search media or run models directly. Example: create an image embedding index and find similar images.

    # Run this in your terminal after activating your virtual environment
    # and setting your AWS environment variables
    python3 - <<'PYCODE'
    from pixeltable.functions.huggingface import clip
    
    images.add_embedding_index(
        "img",
        embedding=clip.using(model_id="openai/clip-vit-base-patch32")
    )
    
    sample = images.where(images.id == 1).select(images.img).collect()[0][0]
    res = (
        images.order_by(images.img.similarity(sample), asc=False)
              .limit(5)
              .select(images.id, images.img)
              .collect()
    )
    print(res)
    PYCODE
    # Run this in PowerShell after activating your virtual environment
    # and setting your AWS environment variables
    python - <<'PYCODE'
    from pixeltable.functions.huggingface import clip
    
    images.add_embedding_index(
        "img",
        embedding=clip.using(model_id="openai/clip-vit-base-patch32")
    )
    
    sample = images.where(images.id == 1).select(images.img).collect()[0][0]
    res = (
        images.order_by(images.img.similarity(sample), asc=False)
              .limit(5)
              .select(images.id, images.img)
              .collect()
    )
    print(res)
    PYCODE


    Was this article helpful?