Encoding Strings
Strings are passed into and out of the B2 APIs in four places: in JSON objects, in HTTP headers, in the URL path, and in the URL query string. This page covers how unicode characters are encoded.
JSON
Most of the API calls in B2 pass a request in JSON, and return results in JSON. For these calls, strings are represented in UTF-8, with "interesting" characters encoded in the the normal JSON way. Most characters can just be included directly in the strings, with the normal escaping for newlines, quotes, and such.
If you prefer, any character can be represented using backslash-u syntax with the UTF-16 value of the character. These two JSON strings are equivalent:
"自由.txt" "\u81ea\u7531.txt"
Note that backslash-u syntax in JSON is in UTF-16, so for code points
above 0xffff
, you will need to use the high and low surrogates
for the character. For example, the code point 0x10400
is the
character DESERET CAPITAL LETTER LONG I, and is written
like this in JSON: "\ud801\udc00"
.
See Wikipia's UTF-16 article for more information.
In values returned from B2, the UTF-8 will be included directly
in the string wherever allowed. The only time backslash-u syntax
is generated by B2 is for characters with code points below \u0020
.
The JSON library for the language you are using probably already does the right thing, and you shouldn't have to worry about any of this.
URL Encoding
URL Encoding replaces "non-safe" bytes in a UTF-8 string with the "percent encoding" of hex values for those bytes. This encoding is used for values in HTTP headers, in the URL path, and in the URL query string.
Most HTTP headers, such as Content-Type should not be percent encoded. The standards for these headers already cover how values should be encoded. Some values for Content-Type include spaces, and should not be percent-encoded, because that would be non-standard.
The values of the custom headers X-Bz-File-Name
and
X-Bz-Info-*
must be percent-encoded. These values can
contain arbitrary unicode strings. You should percent-encode them
before sending them to B2, and B2 will percent-encode them in all
responses.
When downloading a file by name, the file name must be URL-encoded before appending it to the download URL. And when using a query string instead of passing a JSON request, the parameters must all be URL-encoded.
The characters that do not need percent-encoding are a subset of the printable ASCII characters: upper-case letters, lower-case letters, digits, ".", "_", "-", "/", "~", "!", "$", "'", "(", ")", "*", ";", "=", ":", and "@". All other byte values in a UTF-8 must be replaced with "%" and the two-digit hex value of the byte.
This list of safe characters is safe in HTTP headers, in URL paths, and in URL query strings. Some characters, such as "&" are safe in some contexts and not others. For consistency, B2 requires %-encoding them all the time. You may %-encode all characters if you prefer, except that you may not encode "/" as "%2F" when used in a URL path.
The "+" character is a special case that is used to encode spaces. B2 will always encode a space as "+", and will accept either "+" or "%20". A plus sign must be percent-encoded as "%2B".
For example, the string "hello world" is encoded as "hello+world",
and the string "日本語" is encoded as "%E6%97%A5%E6%9C%AC%E8%AA%9E"
.
Test Cases
If you're not sure whether your URL encoder does the right thing, you can use these test cases (in JSON) to check. Your encoder should produce either the fullyEncoded or the minimallyEncode string, and shoud be able to decode either one.
[ {"fullyEncoded": "%20", "minimallyEncoded": "+", "string": " "}, {"fullyEncoded": "%21", "minimallyEncoded": "!", "string": "!"}, {"fullyEncoded": "%22", "minimallyEncoded": "%22", "string": "\""}, {"fullyEncoded": "%23", "minimallyEncoded": "%23", "string": "#"}, {"fullyEncoded": "%24", "minimallyEncoded": "$", "string": "$"}, {"fullyEncoded": "%25", "minimallyEncoded": "%25", "string": "%"}, {"fullyEncoded": "%26", "minimallyEncoded": "%26", "string": "&"}, {"fullyEncoded": "%27", "minimallyEncoded": "'", "string": "'"}, {"fullyEncoded": "%28", "minimallyEncoded": "(", "string": "("}, {"fullyEncoded": "%29", "minimallyEncoded": ")", "string": ")"}, {"fullyEncoded": "%2A", "minimallyEncoded": "*", "string": "*"}, {"fullyEncoded": "%2B", "minimallyEncoded": "%2B", "string": "+"}, {"fullyEncoded": "%2C", "minimallyEncoded": "%2C", "string": ","}, {"fullyEncoded": "%2D", "minimallyEncoded": "-", "string": "-"}, {"fullyEncoded": "%2E", "minimallyEncoded": ".", "string": "."}, {"fullyEncoded": "/", "minimallyEncoded": "/", "string": "/"}, {"fullyEncoded": "%30", "minimallyEncoded": "0", "string": "0"}, {"fullyEncoded": "%31", "minimallyEncoded": "1", "string": "1"}, {"fullyEncoded": "%32", "minimallyEncoded": "2", "string": "2"}, {"fullyEncoded": "%33", "minimallyEncoded": "3", "string": "3"}, {"fullyEncoded": "%34", "minimallyEncoded": "4", "string": "4"}, {"fullyEncoded": "%35", "minimallyEncoded": "5", "string": "5"}, {"fullyEncoded": "%36", "minimallyEncoded": "6", "string": "6"}, {"fullyEncoded": "%37", "minimallyEncoded": "7", "string": "7"}, {"fullyEncoded": "%38", "minimallyEncoded": "8", "string": "8"}, {"fullyEncoded": "%39", "minimallyEncoded": "9", "string": "9"}, {"fullyEncoded": "%3A", "minimallyEncoded": ":", "string": ":"}, {"fullyEncoded": "%3B", "minimallyEncoded": ";", "string": ";"}, {"fullyEncoded": "%3C", "minimallyEncoded": "%3C", "string": "<"}, {"fullyEncoded": "%3D", "minimallyEncoded": "=", "string": "="}, {"fullyEncoded": "%3E", "minimallyEncoded": "%3E", "string": ">"}, {"fullyEncoded": "%3F", "minimallyEncoded": "%3F", "string": "?"}, {"fullyEncoded": "%40", "minimallyEncoded": "@", "string": "@"}, {"fullyEncoded": "%41", "minimallyEncoded": "A", "string": "A"}, {"fullyEncoded": "%42", "minimallyEncoded": "B", "string": "B"}, {"fullyEncoded": "%43", "minimallyEncoded": "C", "string": "C"}, {"fullyEncoded": "%44", "minimallyEncoded": "D", "string": "D"}, {"fullyEncoded": "%45", "minimallyEncoded": "E", "string": "E"}, {"fullyEncoded": "%46", "minimallyEncoded": "F", "string": "F"}, {"fullyEncoded": "%47", "minimallyEncoded": "G", "string": "G"}, {"fullyEncoded": "%48", "minimallyEncoded": "H", "string": "H"}, {"fullyEncoded": "%49", "minimallyEncoded": "I", "string": "I"}, {"fullyEncoded": "%4A", "minimallyEncoded": "J", "string": "J"}, {"fullyEncoded": "%4B", "minimallyEncoded": "K", "string": "K"}, {"fullyEncoded": "%4C", "minimallyEncoded": "L", "string": "L"}, {"fullyEncoded": "%4D", "minimallyEncoded": "M", "string": "M"}, {"fullyEncoded": "%4E", "minimallyEncoded": "N", "string": "N"}, {"fullyEncoded": "%4F", "minimallyEncoded": "O", "string": "O"}, {"fullyEncoded": "%50", "minimallyEncoded": "P", "string": "P"}, {"fullyEncoded": "%51", "minimallyEncoded": "Q", "string": "Q"}, {"fullyEncoded": "%52", "minimallyEncoded": "R", "string": "R"}, {"fullyEncoded": "%53", "minimallyEncoded": "S", "string": "S"}, {"fullyEncoded": "%54", "minimallyEncoded": "T", "string": "T"}, {"fullyEncoded": "%55", "minimallyEncoded": "U", "string": "U"}, {"fullyEncoded": "%56", "minimallyEncoded": "V", "string": "V"}, {"fullyEncoded": "%57", "minimallyEncoded": "W", "string": "W"}, {"fullyEncoded": "%58", "minimallyEncoded": "X", "string": "X"}, {"fullyEncoded": "%59", "minimallyEncoded": "Y", "string": "Y"}, {"fullyEncoded": "%5A", "minimallyEncoded": "Z", "string": "Z"}, {"fullyEncoded": "%5B", "minimallyEncoded": "%5B", "string": "["}, {"fullyEncoded": "%5C", "minimallyEncoded": "%5C", "string": "\\"}, {"fullyEncoded": "%5D", "minimallyEncoded": "%5D", "string": "]"}, {"fullyEncoded": "%5E", "minimallyEncoded": "%5E", "string": "^"}, {"fullyEncoded": "%5F", "minimallyEncoded": "_", "string": "_"}, {"fullyEncoded": "%60", "minimallyEncoded": "%60", "string": "`"}, {"fullyEncoded": "%61", "minimallyEncoded": "a", "string": "a"}, {"fullyEncoded": "%62", "minimallyEncoded": "b", "string": "b"}, {"fullyEncoded": "%63", "minimallyEncoded": "c", "string": "c"}, {"fullyEncoded": "%64", "minimallyEncoded": "d", "string": "d"}, {"fullyEncoded": "%65", "minimallyEncoded": "e", "string": "e"}, {"fullyEncoded": "%66", "minimallyEncoded": "f", "string": "f"}, {"fullyEncoded": "%67", "minimallyEncoded": "g", "string": "g"}, {"fullyEncoded": "%68", "minimallyEncoded": "h", "string": "h"}, {"fullyEncoded": "%69", "minimallyEncoded": "i", "string": "i"}, {"fullyEncoded": "%6A", "minimallyEncoded": "j", "string": "j"}, {"fullyEncoded": "%6B", "minimallyEncoded": "k", "string": "k"}, {"fullyEncoded": "%6C", "minimallyEncoded": "l", "string": "l"}, {"fullyEncoded": "%6D", "minimallyEncoded": "m", "string": "m"}, {"fullyEncoded": "%6E", "minimallyEncoded": "n", "string": "n"}, {"fullyEncoded": "%6F", "minimallyEncoded": "o", "string": "o"}, {"fullyEncoded": "%70", "minimallyEncoded": "p", "string": "p"}, {"fullyEncoded": "%71", "minimallyEncoded": "q", "string": "q"}, {"fullyEncoded": "%72", "minimallyEncoded": "r", "string": "r"}, {"fullyEncoded": "%73", "minimallyEncoded": "s", "string": "s"}, {"fullyEncoded": "%74", "minimallyEncoded": "t", "string": "t"}, {"fullyEncoded": "%75", "minimallyEncoded": "u", "string": "u"}, {"fullyEncoded": "%76", "minimallyEncoded": "v", "string": "v"}, {"fullyEncoded": "%77", "minimallyEncoded": "w", "string": "w"}, {"fullyEncoded": "%78", "minimallyEncoded": "x", "string": "x"}, {"fullyEncoded": "%79", "minimallyEncoded": "y", "string": "y"}, {"fullyEncoded": "%7A", "minimallyEncoded": "z", "string": "z"}, {"fullyEncoded": "%7B", "minimallyEncoded": "%7B", "string": "{"}, {"fullyEncoded": "%7C", "minimallyEncoded": "%7C", "string": "|"}, {"fullyEncoded": "%7D", "minimallyEncoded": "%7D", "string": "}"}, {"fullyEncoded": "%7E", "minimallyEncoded": "~", "string": "~"}, {"fullyEncoded": "%7F", "minimallyEncoded": "%7F", "string": "\u007f"}, {"fullyEncoded": "%E8%87%AA%E7%94%B1", "minimallyEncoded": "%E8%87%AA%E7%94%B1", "string": "\u81ea\u7531"}, {"fullyEncoded": "%F0%90%90%80", "minimallyEncoded": "%F0%90%90%80", "string": "\ud801\udc00"} ]
Sample Code
Code
// Requirements: com.google.Gson
// NOTE: Remember to run the java with assertions on "-ea"
import com.google.gson.Gson;
import com.google.gson.reflect.TypeToken;
import java.io.FileNotFoundException;
import java.io.FileReader;
import java.io.IOException;
import java.io.UnsupportedEncodingException;
import java.lang.reflect.Type;
import java.util.ArrayList;
import java.util.List;
public class B2 {
public static String b2UrlEncode(String s) throws UnsupportedEncodingException {
return java.net.URLEncoder.encode(s, "UTF-8").replace("%2F", "/");
}
public static String b2UrlDecode(String s) throws UnsupportedEncodingException {
return java.net.URLDecoder.decode(s, "UTF-8");
}
public static void runTestCases() {
Gson gson = new Gson();
FileReader stringEncodingTestCases = null;
try {
stringEncodingTestCases = new FileReader("/tmp/cases.json");
Type listType = new TypeToken<ArrayList<TestCase>>() {}.getType();
List<TestCase> testCases = gson.fromJson(stringEncodingTestCases, listType);
for (TestCase aTestCase : testCases) {
assert aTestCase.string.equals(b2UrlDecode(aTestCase.fullyEncoded));
assert aTestCase.string.equals(b2UrlDecode(aTestCase.minimallyEncoded));
String encoded = b2UrlEncode(aTestCase.string);
assert encoded.equals(aTestCase.fullyEncoded) || encoded.equals(aTestCase.minimallyEncoded);
}
} catch (FileNotFoundException fnfx) {
fnfx.printStackTrace();
} catch (UnsupportedEncodingException uex) {
uex.printStackTrace();
} finally {
try {
stringEncodingTestCases.close();
} catch (IOException iox) {
iox.printStackTrace();
}
}
}
public static void main(String[] args) {
runTestCases();
}
public static class TestCase {
public String fullyEncoded;
public String minimallyEncoded;
public String string;
}
}
Code
import urllib
import unittest
import json
def b2_url_encode(s):
"""URL-encodes a unicode string to be sent to B2 in an HTTP header.
"""
return urllib.quote(s.encode('utf-8'))
def b2_url_decode(s):
"""Decodes a Unicode string returned from B2 in an HTTP header.
Returns a Python unicode string.
"""
# Use str() to make sure that the input to unquote is a str, not
# unicode, which ensures that the result is a str, which allows
# the decoding to work properly.
return urllib.unquote_plus(str(s)).decode('utf-8')
class TestEncodeDecode(unittest.TestCase):
# This assumes that the test cases are in a file called "cases.json".
def test_encode_decode(self):
for item in json.load(open('cases.json')):
self.assertEquals(item['string'], b2_url_decode(item['fullyEncoded']))
self.assertEquals(item['string'], b2_url_decode(item['minimallyEncoded']))
self.assertIn(b2_url_encode(item['string']), [item['fullyEncoded'], item['minimallyEncoded']])
if __name__ == '__main__':
unittest.main()
Code
import Foundation
extension String {
func b2UrlEncode() -> String? {
let b2CharacterSet = NSMutableCharacterSet()
b2CharacterSet.formUnionWithCharacterSet(NSCharacterSet.URLPathAllowedCharacterSet())
b2CharacterSet.removeCharactersInString("&+,")
return stringByAddingPercentEncodingWithAllowedCharacters(b2CharacterSet)
}
func b2UrlDecode() -> String? {
if (self == "+") {
return " "
}
return stringByRemovingPercentEncoding
}
}
// Run a test
let pathToEncodeJson = "<PATH TO TEST CASES>"
let jsonTestCaseData = NSData(contentsOfFile: pathToEncodeJson!)
do {
var encodingTestCases = try NSJSONSerialization.JSONObjectWithData(jsonTestCaseData!, options: .MutableContainers)
if let encodingTestCases = encodingTestCases as? Array<Dictionary<String,String>> {
for anEncodingTestCase in encodingTestCases {
assert(anEncodingTestCase["string"]! == anEncodingTestCase["minimallyEncoded"]!.b2UrlDecode()!)
assert(anEncodingTestCase["string"]! == anEncodingTestCase["fullyEncoded"]!.b2UrlDecode()!)
assert(anEncodingTestCase["string"]!.b2UrlEncode()! == (anEncodingTestCase["fullyEncoded"]!)
|| anEncodingTestCase["string"]!.b2UrlEncode()! == (anEncodingTestCase["minimallyEncoded"]!), "Failed decoding test.")
}
}
} catch let error as NSError {
print("\(error.domain)")
}
Code
require 'uri'
require 'test/unit'
require 'json'
module URI
class B2
def self.b2_url_encode(str)
URI.encode_www_form_component(str.force_encoding(Encoding::UTF_8)).gsub("%2F", "/")
end
def self.b2_url_decode(str)
URI.decode_www_form_component(str, Encoding::UTF_8)
end
end
end
class B2StringEncodingTest < Test::Unit::TestCase
def test_string_encoding()
encodingTestCases = JSON.parse(File.read("cases.json"))
encodingTestCases.each do |oneTestCase|
assert(oneTestCase["string"] == URI::B2.b2_url_decode(oneTestCase["fullyEncoded"]), "Failed decoding fully-encoded test.")
assert(oneTestCase["string"] == URI::B2.b2_url_decode(oneTestCase["minimallyEncoded"]), "Failed decoding minimally-encoded test.")
encoded = URI::B2.b2_url_encode(oneTestCase["string"])
assert(oneTestCase["fullyEncoded"] == encoded || oneTestCase["minimallyEncoded"] == encoded, "Failed encoding test.")
end
end
end
Code
using System;
using System.Web.Script.Serialization;
using System.Collections.Generic;
using System.Diagnostics;
namespace B2CustomExtension
{
public static class B2StringExtension
{
static public string b2UrlEncode(this string str)
{
if (str == "/")
{
return str;
}
return Uri.EscapeDataString(str);
}
static public string b2UrlDecode(this string str)
{
if (str == "+")
{
return " ";
}
return Uri.UnescapeDataString(str);
}
}
}
namespace B2
{
using B2CustomExtension;
using System.IO;
class TestCase
{
public string String { get; set; }
public string FullyEncoded { get; set; }
public string MinimallyEncoded { get; set; }
}
class TestStringEncoding
{
static void Main(string[] args)
{
string testCaseData = System.IO.File.ReadAllText("C:\\cases.json", System.Text.Encoding.UTF8);
JavaScriptSerializer ser = new JavaScriptSerializer();
var testCases = ser.Deserialize<List<TestCase>>(testCaseData);
foreach (TestCase aTestCase in testCases)
{
Debug.Assert(aTestCase.FullyEncoded.b2UrlDecode() == aTestCase.String);
Debug.Assert(aTestCase.MinimallyEncoded.b2UrlDecode() == aTestCase.String);
Debug.Assert(aTestCase.String.b2UrlEncode() == aTestCase.FullyEncoded || aTestCase.String.b2UrlEncode() == aTestCase.MinimallyEncoded);
}
}
}
}
Code
<?php
$myfile = fopen("cases.json", "r") or die("Unable to open file!");
$jsonCases = fread($myfile,filesize("cases.json"));
fclose($myfile);
$json = json_decode($jsonCases);
$numTestCases = count($json);
for($i = 0; $i < $numTestCases; $i++) {
# Test Encode
if ($json[$i]->{"string"} == "/") {
$encodedStr = "/";
} else {
$encodedStr = rawurlencode($json[$i]->{"string"});
}
if ($encodedStr != $json[$i]->{"fullyEncoded"} && $encodedStr != $json[$i]->{"minimallyEncoded"}) {
echo $json[$i]->{"string"} . " failed encoding " . $encodedStr . "(encoded) should be " . $json[$i]->{"fullyEncoded"} . " or " . $json[$i]->{"minimallyEncoded"} . "\n";
}
# Test Decode
$decodedStrMax = rawurldecode($json[$i]->{"fullyEncoded"});
$decodedStrMin = rawurldecode($json[$i]->{"minimallyEncoded"});
if ($decodedStrMax != $json[$i]->{"string"} && $decodedStrMin != $json[$i]->{"string"}) {
echo $json[$i]->{"string"} . " failed decoding " . $decodedStrMax . "(fullyEncoded)/" . $decodedStrMin . "(minimallyEncoded) should be " . $json[$i]->{"string"} . "\n";
}
}
?>