Native API String Encoding
    • Dark
      Light

    Native API String Encoding

    • Dark
      Light

    Article summary

    Strings are passed into and out of the Backblaze B2 Cloud Storage APIs in four places: JSON objects, HTTP headers, URL paths, and URL query strings. The following information covers how to encode unicode characters.

    For all of the Backblaze API operations and their corresponding documentation, see API Documentation.

    JSON

    Most of the API calls in Backblaze B2 pass a request in JSON and return results in JSON. For these calls, strings are represented in UTF-8, with "interesting" characters encoded in typical JSON fashion. You can include most characters directly in the strings with the normal escaping for things such as newlines and quotes.

    If you prefer, you can represent any character using backslash-u syntax with the UTF-16 value of the character. The following two JSON strings are equivalent:

    "自由.txt"
    "\u81ea\u7531.txt"

    Note that backslash-u syntax in JSON is in UTF-16, so for code points above 0xffff, you must use the high and low surrogates for the character. For example, the code point 0x10400 is the character DESERET CAPITAL LETTER LONG I, and is written like this in JSON: "\ud801\udc00". For more information, see Wikipedia's UTF-16 article.

    With values that are returned from Backblaze B2, the UTF-8 is included directly in the string wherever allowed. The only time backslash-u syntax is generated by Backblaze B2 is for characters with code points below \u0020.

    The JSON library for the language you use probably already does this correctly, and you should not have to worry about this.

    URL Encoding

    URL Encoding replaces "non-safe" bytes in a UTF-8 string with the "percent encoding" of hex values for those bytes. This encoding is used for values in HTTP headers, in the URL path, and in the URL query string.

    You should not percent-encode most HTTP headers such as Content-Type. The standards for these headers already cover how values should be encoded. Some values for Content-Type include spaces and should not be percent-encoded.

    You must percent-encode the values of the custom headers X-Bz-File-Name and X-Bz-Info-*. These values can contain arbitrary unicode strings. You should percent-encode them before sending them to Backblaze B2, and Backblaze B2 will percent-encode them in all responses.

    When you download a file by name, you must URL-encode the file name before you append it to the download URL. When using a query string instead of passing a JSON request, you must URL-encode all of the parameters.

    The characters that do not need percent-encoding are a subset of the printable ASCII characters: upper-case letters, lower-case letters, digits, ".", "_", "-", "/", "~", "!", "$", "'", "(", ")", "*", ";", "=", ":", and "@". You must replace all other byte values in a UTF-8 with "%" and the two-digit hex value of the byte.

    This list of safe characters is safe in HTTP headers, in URL paths, and in URL query strings. Some characters, such as "&" are safe in some contexts and not others. For consistency, Backblaze B2 requires that you percent-encode them every time. You can also percent-encode all characters, except that you may not encode "/" as "%2F" when they are used in a URL path.

    The "+" character is a special case that is used to encode spaces. Backblaze B2 will always encode a space as "+" and will accept either "+" or "%20." A plus sign must be percent-encoded as "%2B."

    For example, the string "hello world" is encoded as "hello+world," and the string "日本語" is encoded as "%E6%97%A5%E6%9C%AC%E8%AA%9E".

    Test Cases

    If you are not sure whether your URL encoder does this correctly, you can use the following test cases (in JSON) to verify. Your encoder should produce either the fullyEncoded or the minimallyEncode string and should be able to decode either one.

    [
      {"fullyEncoded": "%20", "minimallyEncoded": "+", "string": " "},
      {"fullyEncoded": "%21", "minimallyEncoded": "!", "string": "!"},
      {"fullyEncoded": "%22", "minimallyEncoded": "%22", "string": "\""},
      {"fullyEncoded": "%23", "minimallyEncoded": "%23", "string": "#"},
      {"fullyEncoded": "%24", "minimallyEncoded": "$", "string": "$"},
      {"fullyEncoded": "%25", "minimallyEncoded": "%25", "string": "%"},
      {"fullyEncoded": "%26", "minimallyEncoded": "%26", "string": "&"},
      {"fullyEncoded": "%27", "minimallyEncoded": "'", "string": "'"},
      {"fullyEncoded": "%28", "minimallyEncoded": "(", "string": "("},
      {"fullyEncoded": "%29", "minimallyEncoded": ")", "string": ")"},
      {"fullyEncoded": "%2A", "minimallyEncoded": "*", "string": "*"},
      {"fullyEncoded": "%2B", "minimallyEncoded": "%2B", "string": "+"},
      {"fullyEncoded": "%2C", "minimallyEncoded": "%2C", "string": ","},
      {"fullyEncoded": "%2D", "minimallyEncoded": "-", "string": "-"},
      {"fullyEncoded": "%2E", "minimallyEncoded": ".", "string": "."},
      {"fullyEncoded": "/", "minimallyEncoded": "/", "string": "/"},
      {"fullyEncoded": "%30", "minimallyEncoded": "0", "string": "0"},
      {"fullyEncoded": "%31", "minimallyEncoded": "1", "string": "1"},
      {"fullyEncoded": "%32", "minimallyEncoded": "2", "string": "2"},
      {"fullyEncoded": "%33", "minimallyEncoded": "3", "string": "3"},
      {"fullyEncoded": "%34", "minimallyEncoded": "4", "string": "4"},
      {"fullyEncoded": "%35", "minimallyEncoded": "5", "string": "5"},
      {"fullyEncoded": "%36", "minimallyEncoded": "6", "string": "6"},
      {"fullyEncoded": "%37", "minimallyEncoded": "7", "string": "7"},
      {"fullyEncoded": "%38", "minimallyEncoded": "8", "string": "8"},
      {"fullyEncoded": "%39", "minimallyEncoded": "9", "string": "9"},
      {"fullyEncoded": "%3A", "minimallyEncoded": ":", "string": ":"},
      {"fullyEncoded": "%3B", "minimallyEncoded": ";", "string": ";"},
      {"fullyEncoded": "%3C", "minimallyEncoded": "%3C", "string": "<"},
      {"fullyEncoded": "%3D", "minimallyEncoded": "=", "string": "="},
      {"fullyEncoded": "%3E", "minimallyEncoded": "%3E", "string": ">"},
      {"fullyEncoded": "%3F", "minimallyEncoded": "%3F", "string": "?"},
      {"fullyEncoded": "%40", "minimallyEncoded": "@", "string": "@"},
      {"fullyEncoded": "%41", "minimallyEncoded": "A", "string": "A"},
      {"fullyEncoded": "%42", "minimallyEncoded": "B", "string": "B"},
      {"fullyEncoded": "%43", "minimallyEncoded": "C", "string": "C"},
      {"fullyEncoded": "%44", "minimallyEncoded": "D", "string": "D"},
      {"fullyEncoded": "%45", "minimallyEncoded": "E", "string": "E"},
      {"fullyEncoded": "%46", "minimallyEncoded": "F", "string": "F"},
      {"fullyEncoded": "%47", "minimallyEncoded": "G", "string": "G"},
      {"fullyEncoded": "%48", "minimallyEncoded": "H", "string": "H"},
      {"fullyEncoded": "%49", "minimallyEncoded": "I", "string": "I"},
      {"fullyEncoded": "%4A", "minimallyEncoded": "J", "string": "J"},
      {"fullyEncoded": "%4B", "minimallyEncoded": "K", "string": "K"},
      {"fullyEncoded": "%4C", "minimallyEncoded": "L", "string": "L"},
      {"fullyEncoded": "%4D", "minimallyEncoded": "M", "string": "M"},
      {"fullyEncoded": "%4E", "minimallyEncoded": "N", "string": "N"},
      {"fullyEncoded": "%4F", "minimallyEncoded": "O", "string": "O"},
      {"fullyEncoded": "%50", "minimallyEncoded": "P", "string": "P"},
      {"fullyEncoded": "%51", "minimallyEncoded": "Q", "string": "Q"},
      {"fullyEncoded": "%52", "minimallyEncoded": "R", "string": "R"},
      {"fullyEncoded": "%53", "minimallyEncoded": "S", "string": "S"},
      {"fullyEncoded": "%54", "minimallyEncoded": "T", "string": "T"},
      {"fullyEncoded": "%55", "minimallyEncoded": "U", "string": "U"},
      {"fullyEncoded": "%56", "minimallyEncoded": "V", "string": "V"},
      {"fullyEncoded": "%57", "minimallyEncoded": "W", "string": "W"},
      {"fullyEncoded": "%58", "minimallyEncoded": "X", "string": "X"},
      {"fullyEncoded": "%59", "minimallyEncoded": "Y", "string": "Y"},
      {"fullyEncoded": "%5A", "minimallyEncoded": "Z", "string": "Z"},
      {"fullyEncoded": "%5B", "minimallyEncoded": "%5B", "string": "["},
      {"fullyEncoded": "%5C", "minimallyEncoded": "%5C", "string": "\\"},
      {"fullyEncoded": "%5D", "minimallyEncoded": "%5D", "string": "]"},
      {"fullyEncoded": "%5E", "minimallyEncoded": "%5E", "string": "^"},
      {"fullyEncoded": "%5F", "minimallyEncoded": "_", "string": "_"},
      {"fullyEncoded": "%60", "minimallyEncoded": "%60", "string": "`"},
      {"fullyEncoded": "%61", "minimallyEncoded": "a", "string": "a"},
      {"fullyEncoded": "%62", "minimallyEncoded": "b", "string": "b"},
      {"fullyEncoded": "%63", "minimallyEncoded": "c", "string": "c"},
      {"fullyEncoded": "%64", "minimallyEncoded": "d", "string": "d"},
      {"fullyEncoded": "%65", "minimallyEncoded": "e", "string": "e"},
      {"fullyEncoded": "%66", "minimallyEncoded": "f", "string": "f"},
      {"fullyEncoded": "%67", "minimallyEncoded": "g", "string": "g"},
      {"fullyEncoded": "%68", "minimallyEncoded": "h", "string": "h"},
      {"fullyEncoded": "%69", "minimallyEncoded": "i", "string": "i"},
      {"fullyEncoded": "%6A", "minimallyEncoded": "j", "string": "j"},
      {"fullyEncoded": "%6B", "minimallyEncoded": "k", "string": "k"},
      {"fullyEncoded": "%6C", "minimallyEncoded": "l", "string": "l"},
      {"fullyEncoded": "%6D", "minimallyEncoded": "m", "string": "m"},
      {"fullyEncoded": "%6E", "minimallyEncoded": "n", "string": "n"},
      {"fullyEncoded": "%6F", "minimallyEncoded": "o", "string": "o"},
      {"fullyEncoded": "%70", "minimallyEncoded": "p", "string": "p"},
      {"fullyEncoded": "%71", "minimallyEncoded": "q", "string": "q"},
      {"fullyEncoded": "%72", "minimallyEncoded": "r", "string": "r"},
      {"fullyEncoded": "%73", "minimallyEncoded": "s", "string": "s"},
      {"fullyEncoded": "%74", "minimallyEncoded": "t", "string": "t"},
      {"fullyEncoded": "%75", "minimallyEncoded": "u", "string": "u"},
      {"fullyEncoded": "%76", "minimallyEncoded": "v", "string": "v"},
      {"fullyEncoded": "%77", "minimallyEncoded": "w", "string": "w"},
      {"fullyEncoded": "%78", "minimallyEncoded": "x", "string": "x"},
      {"fullyEncoded": "%79", "minimallyEncoded": "y", "string": "y"},
      {"fullyEncoded": "%7A", "minimallyEncoded": "z", "string": "z"},
      {"fullyEncoded": "%7B", "minimallyEncoded": "%7B", "string": "{"},
      {"fullyEncoded": "%7C", "minimallyEncoded": "%7C", "string": "|"},
      {"fullyEncoded": "%7D", "minimallyEncoded": "%7D", "string": "}"},
      {"fullyEncoded": "%7E", "minimallyEncoded": "~", "string": "~"},
      {"fullyEncoded": "%7F", "minimallyEncoded": "%7F", "string": "\u007f"},
      {"fullyEncoded": "%E8%87%AA%E7%94%B1", "minimallyEncoded": "%E8%87%AA%E7%94%B1", "string": "\u81ea\u7531"},
      {"fullyEncoded": "%F0%90%90%80", "minimallyEncoded": "%F0%90%90%80", "string": "\ud801\udc00"}
    ]

    Sample Code

    Java example
    // Requirements: com.google.Gson
    // NOTE: Remember to run the java with assertions on "-ea"
    
    import com.google.gson.Gson;
    import com.google.gson.reflect.TypeToken;
    
    import java.io.FileNotFoundException;
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.UnsupportedEncodingException;
    import java.lang.reflect.Type;
    import java.util.ArrayList;
    import java.util.List;
    
    public class B2 {
        public static String b2UrlEncode(String s) throws UnsupportedEncodingException {
            return java.net.URLEncoder.encode(s, "UTF-8").replace("%2F", "/");
        }
    
        public static String b2UrlDecode(String s) throws UnsupportedEncodingException {
            return java.net.URLDecoder.decode(s, "UTF-8");
        }
    
        public static void runTestCases() {
            Gson gson = new Gson();
            FileReader stringEncodingTestCases = null;
            try {
                stringEncodingTestCases = new FileReader("/tmp/cases.json");
                Type listType = new TypeToken<ArrayList<TestCase>>() {}.getType();
                List<TestCase> testCases = gson.fromJson(stringEncodingTestCases, listType);
                for (TestCase aTestCase : testCases) {
                    assert aTestCase.string.equals(b2UrlDecode(aTestCase.fullyEncoded));
                    assert aTestCase.string.equals(b2UrlDecode(aTestCase.minimallyEncoded));
                    String encoded = b2UrlEncode(aTestCase.string);
                    assert encoded.equals(aTestCase.fullyEncoded) || encoded.equals(aTestCase.minimallyEncoded);
                }
            } catch (FileNotFoundException fnfx) {
                fnfx.printStackTrace();
            } catch (UnsupportedEncodingException uex) {
                uex.printStackTrace();
            } finally {
                try {
                    stringEncodingTestCases.close();
                } catch (IOException iox) {
                    iox.printStackTrace();
                }
            }
        }
    
        public static void main(String[] args) {
            runTestCases();
        }
    
        public static class TestCase {
            public String fullyEncoded;
            public String minimallyEncoded;
            public String string;
        }
    }
    Python example
    import urllib
    import unittest
    import json
    
    def b2_url_encode(s):
        """URL-encodes a unicode string to be sent to B2 in an HTTP header.
        """
        return urllib.quote(s.encode('utf-8'))
    
    def b2_url_decode(s):
        """Decodes a Unicode string returned from B2 in an HTTP header.
    
        Returns a Python unicode string.
        """
        # Use str() to make sure that the input to unquote is a str, not
        # unicode, which ensures that the result is a str, which allows
        # the decoding to work properly.
        return urllib.unquote_plus(str(s)).decode('utf-8')
    
    class TestEncodeDecode(unittest.TestCase):
        # This assumes that the test cases are in a file called "cases.json".
        def test_encode_decode(self):
            for item in json.load(open('cases.json')):
                self.assertEquals(item['string'], b2_url_decode(item['fullyEncoded']))
                self.assertEquals(item['string'], b2_url_decode(item['minimallyEncoded']))
                self.assertIn(b2_url_encode(item['string']), [item['fullyEncoded'], item['minimallyEncoded']])
    
    if __name__ == '__main__':
        unittest.main()
    Swift example
    import Foundation
    
    extension String {
        func b2UrlEncode() -> String? {
            let b2CharacterSet = NSMutableCharacterSet()
            b2CharacterSet.formUnionWithCharacterSet(NSCharacterSet.URLPathAllowedCharacterSet())
            b2CharacterSet.removeCharactersInString("&+,")
            return stringByAddingPercentEncodingWithAllowedCharacters(b2CharacterSet)
        }
        func b2UrlDecode() -> String? {
            if (self == "+") {
                return " "
            }
            return stringByRemovingPercentEncoding
        }
    }
    
    // Run a test
    let pathToEncodeJson = "<PATH TO TEST CASES>"
    let jsonTestCaseData = NSData(contentsOfFile: pathToEncodeJson!)
    do {
        var encodingTestCases = try NSJSONSerialization.JSONObjectWithData(jsonTestCaseData!, options: .MutableContainers)
        if let encodingTestCases = encodingTestCases as? Array<Dictionary<String,String>> {
            for anEncodingTestCase in encodingTestCases {
                assert(anEncodingTestCase["string"]! == anEncodingTestCase["minimallyEncoded"]!.b2UrlDecode()!)
                assert(anEncodingTestCase["string"]! == anEncodingTestCase["fullyEncoded"]!.b2UrlDecode()!)
                assert(anEncodingTestCase["string"]!.b2UrlEncode()! == (anEncodingTestCase["fullyEncoded"]!)
                    || anEncodingTestCase["string"]!.b2UrlEncode()! == (anEncodingTestCase["minimallyEncoded"]!), "Failed decoding test.")
            }
        }
    } catch let error as NSError {
        print("\(error.domain)")
    }
    Ruby example
    require 'uri'
    require 'test/unit'
    require 'json'
    
    module URI
        class B2
            def self.b2_url_encode(str)
                URI.encode_www_form_component(str.force_encoding(Encoding::UTF_8)).gsub("%2F", "/")
            end
            def self.b2_url_decode(str)
                URI.decode_www_form_component(str, Encoding::UTF_8)
            end
        end
    end
    
    class B2StringEncodingTest < Test::Unit::TestCase
        def test_string_encoding()
            encodingTestCases = JSON.parse(File.read("cases.json"))
            encodingTestCases.each do |oneTestCase|
                assert(oneTestCase["string"] == URI::B2.b2_url_decode(oneTestCase["fullyEncoded"]), "Failed decoding fully-encoded test.")
                assert(oneTestCase["string"] == URI::B2.b2_url_decode(oneTestCase["minimallyEncoded"]), "Failed decoding minimally-encoded test.")
                encoded = URI::B2.b2_url_encode(oneTestCase["string"])
                assert(oneTestCase["fullyEncoded"] == encoded || oneTestCase["minimallyEncoded"] == encoded, "Failed encoding test.")
            end
        end
    end
    C# example
    using System;
    using System.Web.Script.Serialization;
    using System.Collections.Generic;
    using System.Diagnostics;
    
    namespace B2CustomExtension
    {
        public static class B2StringExtension
        {
            static public string b2UrlEncode(this string str)
            {
                if (str == "/")
                {
                    return str;
                }
                return Uri.EscapeDataString(str);
            }
    
            static public string b2UrlDecode(this string str)
            {
                if (str == "+")
                {
                    return " ";
                }
                return Uri.UnescapeDataString(str);
            }
        }
    }
    
    namespace B2
    {
        using B2CustomExtension;
        using System.IO;
    
        class TestCase
        {
            public string String { get; set; }
            public string FullyEncoded { get; set; }
            public string MinimallyEncoded { get; set; }
        }
    
        class TestStringEncoding
        {
            static void Main(string[] args)
            {
                string testCaseData = System.IO.File.ReadAllText("C:\\cases.json", System.Text.Encoding.UTF8);
                JavaScriptSerializer ser = new JavaScriptSerializer();
                var testCases = ser.Deserialize<List<TestCase>>(testCaseData);
                foreach (TestCase aTestCase in testCases)
                {
                    Debug.Assert(aTestCase.FullyEncoded.b2UrlDecode() == aTestCase.String);
                    Debug.Assert(aTestCase.MinimallyEncoded.b2UrlDecode() == aTestCase.String);
                    Debug.Assert(aTestCase.String.b2UrlEncode() == aTestCase.FullyEncoded || aTestCase.String.b2UrlEncode() == aTestCase.MinimallyEncoded);
                }
            }
        }
    }
    PHP example
    <?php
    $myfile = fopen("cases.json", "r") or die("Unable to open file!");
    $jsonCases = fread($myfile,filesize("cases.json"));
    fclose($myfile);
    
    $json = json_decode($jsonCases);
    $numTestCases = count($json);
    
    for($i = 0; $i < $numTestCases; $i++) {	
    	# Test Encode
    	if ($json[$i]->{"string"} == "/") {
    		$encodedStr = "/";
    	} else {
    		$encodedStr = rawurlencode($json[$i]->{"string"});	
    	}
    	if ($encodedStr != $json[$i]->{"fullyEncoded"} && $encodedStr != $json[$i]->{"minimallyEncoded"}) {				
    		echo $json[$i]->{"string"} . " failed encoding " . $encodedStr . "(encoded) should be "  . $json[$i]->{"fullyEncoded"} . " or " . $json[$i]->{"minimallyEncoded"} . "\n";
    	}
    	# Test Decode
    	$decodedStrMax = rawurldecode($json[$i]->{"fullyEncoded"});
    	$decodedStrMin = rawurldecode($json[$i]->{"minimallyEncoded"});
    	if ($decodedStrMax != $json[$i]->{"string"} && $decodedStrMin != $json[$i]->{"string"}) {				
    		echo $json[$i]->{"string"} . " failed decoding " . $decodedStrMax . "(fullyEncoded)/" . $decodedStrMin . "(minimallyEncoded) should be " . $json[$i]->{"string"} . "\n";
    	}	
    }
    ?>

    Was this article helpful?