[数据结构与算法] Base64变种实现，如何实现Base64自定义编码

开发: C++知识库 Java知识库 JavaScript Python PHP知识库人工智能区块链大数据移动开发嵌入式开发工具数据结构与算法开发测试游戏开发网络协议系统运维
教程: HTML教程 CSS教程 JavaScript教程 Go语言教程 JQuery教程 VUE教程 VUE3教程 Bootstrap教程 SQL数据库教程 C语言教程 C++教程 Java教程 Python教程 Python3教程 C#教程
数码: 电脑笔记本显卡显示器固态硬盘硬盘耳机手机 iphone vivo oppo 小米华为单反装机图拉丁

-> 数据结构与算法 -> Base64变种实现，如何实现Base64自定义编码 -> 正文阅读

[数据结构与算法]Base64变种实现，如何实现Base64自定义编码

某些情况下，标准的Base64编码可能无法满足特殊的业务需求，此时我们往往希望通过最简单的方式实现Base64的变种以满足需求。所谓变种是指借鉴了Base64的思想，但是自定义了字符表。下文基于java作描述，所用到的数据类型或语法均基于java。

Base64实现思想：

Base64的编码思想十分简单。首先我们知道每字节的长度为8位，Base64将3个字节进行组合并拆分，分解为4个字节（高两位补0），每个字节可表示的最大整数是63，因此可以对应一个拥有63个字符的编码表，通过编码表映射，最终得到Base64字符串。
Base64的组合拆分逻辑如下：
有字节（ByteA）、字节（ByteB）、字节（ByteC）
1.取ByteA的前6位，并在高位补上两个0得到第一个编码字节（EnByteA）
2.取ByteA的后2位、ByteB的前4位，并在高位补上两个0得到第二个编码字节（EnByteB）
3.取ByteB的后4位、ByteC的前2位，并在高位补上两个0得到第三个编码字节（EnByteC)
4.取ByteC的后6位，并在高位补上两个0得到第四个编码字节（EnByteD）
如：aaaaaaaa bbbbbbbb cccccccc
最终会被拆解为：00aaaaaa 00aabbbb 00bbbbcc 00cccccc

在JAVA中表示如下：
EnByteA = ByteA >> 2 &0x3f
EnByteB = (((ByteA & 0x3) << 4) | (ByteB >> 4))) & 0x3f
EnByteC = (((ByteB & 0xf) << 2) | (ByteC >> 6)) & 0x3f
EnByteD = ByteC & 0x3f

其中：
0x3f：00111111
0x3：00000011
0xf：00001111

通过上文描述，可知，Base64将3个字节作为一组进行编码，当字节数不是3的倍数时，将会出现剩余字节不够组成一组的情况，Base64仍然会用以上编码规则对剩余的字节进行编码，但会采用填充字符的方式表达缺少的字节，如缺少一个字节才够一组，则在末尾补上一个“=”，缺少两个字节则补上“==”。

对剩余字节的处理：
剩余一个字节时：
EnByteA = ByteA >> 2 &0x3f
EnByteB = (((ByteA & 0x3) << 4)
EnByteC = 填充字符在编码表中的下标
EnByteD = 填充字符在编码表中的下标

剩余两个字节时：
EnByteA = ByteA >> 2 &0x3f
EnByteB = (((ByteA & 0x3) << 4) | (ByteB >> 4))) & 0x3f
EnByteC = (((ByteB & 0xf) << 2)
EnByteD = 填充字符在编码表中的下标

Base64对剩余字节的处理完整地保留了字节信息，但是会导致编码可变。如何理解可变？假如剩余一个字节，则EnByteB实际上是来记录ByteA 的低2位的，因此EnByteB的低6位在解码中会被忽略，由此EnByteB的低6位是任意值都能解码到ByteA ，因此通过这种思想实现的编码容易被攻击者通过篡改最后一个字符的方式猜测到编码规则。

自定义编码表

在标准的Base64编码中，编码表由[a-zA-Z0-9]组成。为满足我们特殊的需求，我们可以自定义自己的编码表，只要有63个符号表示即可，并且对于填充“=”我们也可用自己的方式去实现。
举个例子：
定义编码表如下

private static final char[] Base64Alphabet = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
			'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
			'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3',
			'4', '5', '6', '7', '8', '9', '#', '@' };

定义解码表如下

private static final byte[] Base64Reversal = {-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 62, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, 63, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1, -1, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51}

解码表可以通过程序自动生成
其中的解码表数组（reversal）定义了长度为123，是因为在编码表中，ascii值最大是122，则数组长度最小为123

byte[] reversal = new byte[123];
for (int i = 0; i < reversal.length; i++) {
	reversal[i] = -1;
}
byte i = 0;
for (char c : Base64Alphabet) {
	int index = (int) c;
	reversal[index] = i;
	i++;
}

这样定义的解码表，在解码逻辑中可以直接通过Base64Reversal[char]的方式得到byte。

最终实现：

package com.kingsoft.passport.util;

import java.nio.charset.Charset;

/**
 * base64
 * @author yuzhanchao
 *
 */
public class Base64 {

	private static final Charset DEAFAULT_CHARSET = Charset.forName("utf-8");
	
	private static final char[] Base64Alphabet = { 'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N',
			'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i',
			'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3',
			'4', '5', '6', '7', '8', '9', '#', '@' };

	private static final byte[] Base64Reversal = { -1, -1, -1, -1, -1, -1, -1, -1, -1, 
	-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
	 -1, -1, -1, -1, -1, 62, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 52, 5
	 3, 54, 55, 56, 57, 58, 59, 60, 61, -1, -1, -1, -1, -1, -1, 63, 0, 1, 2, 3, 
	 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 2
	 3, 24, 25, -1, -1, -1, -1, -1, -1, 26, 27, 28, 29, 30, 31, 32, 33, 3
	 4, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51 };

	private static final char Bit4Symbol = '-';

	private static final char Bit8Symbol = '_';

	public static String encode(String str) {
		byte[] bytes = str.getBytes(DEAFAULT_CHARSET);
		int sourceLength = bytes.length;
		int groupLength = sourceLength / 3;
		int remainder = sourceLength % 3;
		int resultLength = groupLength * 4;
		if (remainder == 1) {
			resultLength += 3;
		} else if (remainder == 2) {
			resultLength += 4;
		}

		char[] alphabet = Base64Alphabet;

		char[] result = new char[resultLength];

		int sourceCursor = 0;
		int targetCursor = 0;
		for (int i = 0; i < groupLength; i++) {
			int byte1 = bytes[sourceCursor++] & 0xff;
			int byte2 = bytes[sourceCursor++] & 0xff;
			int byte3 = bytes[sourceCursor++] & 0xff;
			result[targetCursor++] = alphabet[(byte1 >> 2 & 0x3f)];
			result[targetCursor++] = alphabet[((((byte1 & 0x3) << 4) | (byte2 >> 4))) & 0x3f];
			result[targetCursor++] = alphabet[((((byte2 & 0xf) << 2) | (byte3 >> 6))) & 0x3f];
			result[targetCursor++] = alphabet[byte3 & 0x3f];
		}

		if (remainder > 0) {
			byte byte1 = bytes[sourceCursor++];
			result[targetCursor++] = alphabet[byte1 >> 2 & 0x3f];
			if (remainder == 1) {
				result[targetCursor++] = alphabet[((byte1 & 0x3) << 4) & 0x3f];
				result[targetCursor++] = Bit8Symbol;
			} else {
				byte byte2 = bytes[sourceCursor++];
				result[targetCursor++] = alphabet[(((byte1 & 0x3) << 4) | (byte2 >> 4)) & 0x3f];
				result[targetCursor++] = alphabet[((byte2 & 0xf) << 2)];
				result[targetCursor++] = Bit4Symbol;
			}
		}

		return new String(result);
	}

	public static String decode(String str) {
		char[] source = str.toCharArray();
		int strLength = source.length;
		int groupLength = 0;
		int remainder = 0;
		char last = source[strLength - 1];
		if (last == Bit8Symbol) {
			groupLength = (strLength - 3) / 4;
			remainder = 1;
		} else if (last == Bit4Symbol) {
			groupLength = strLength / 4 - 1;
			remainder = 2;
		} else {
			groupLength = strLength / 4;
		}

		byte[] reversal = Base64Reversal;

		byte[] result = new byte[groupLength * 3 + remainder];

		int sourceCursor = 0;
		int targetCursor = 0;
		for (int i = 0; i < groupLength; i++) {
			byte sbyte1 = reversal[source[sourceCursor++]];
			byte sbyte2 = reversal[source[sourceCursor++]];
			byte sbyte3 = reversal[source[sourceCursor++]];
			byte sbyte4 = reversal[source[sourceCursor++]];
			result[targetCursor++] = (byte) ((sbyte1 << 2) | (sbyte2 >> 4));
			result[targetCursor++] = (byte) ((sbyte2 << 4) | (sbyte3 >> 2));
			result[targetCursor++] = (byte) (((sbyte3 & 0x3) << 6) | sbyte4);
		}
		
		if (remainder > 0) {
			byte sbyte1 = reversal[source[sourceCursor++]];
			byte sbyte2 = reversal[source[sourceCursor++]];
			result[targetCursor++] = (byte) ((sbyte1 << 2) | (sbyte2 >> 4));
			if (remainder == 2) {
				byte sbyte3 = reversal[source[sourceCursor++]];
				result[targetCursor++] = (byte) ((sbyte2 << 4) | (sbyte3 >> 2));
			}
		}

		return new String(result, DEAFAULT_CHARSET);
	}

}

数据结构与算法最新文章

【力扣106】从中序与后续遍历序列构造二叉

leetcode 322 零钱兑换

哈希的应用：海量数据处理

动态规划|最短Hamilton路径

华为机试_HJ41 称砝码【中等】【menset】【

【C与数据结构】——寒假提高每日练习Day1

基础算法——堆排序

2023王道数据结构线性表--单链表课后习题部

LeetCode 之反转链表的一部分

【题解】lintcode必刷50题＜有效的括号序列

加:2022-03-16 22:43:30 更:2022-03-16 22:51:13

360图书馆购物三丰科技阅读网日历万年历 2025年7日历

-2025/7/4 7:40:33-

图片自动播放器
↓图片自动播放器↓

TxT小说阅读器
↓语音阅读,小说下载,古典文学↓

一键清除垃圾
↓轻轻一点,清除系统垃圾↓

图片批量下载器
↓批量下载图片,美女图库↓

网站联系: qq:121756557 email:121756557@qq.com IT数码