How to Predict an Encoded Base64 Value
Learn how to predict the likely data type
Why is Base64 predictable?
Base64 is an encoding scheme, meaning that it converts one representation of data into another representation. The underlying data remains the same, and is not encrypted, so it remains decodable. This makes encoded portions of Base64 strings predictable given a particular input, meaning one can deduce certain common characters. Using the techniques in this article, you can learn to predict the likely data type of an encoded Base64 string.
Identify Base64 strings
Before continuing, if you have a string but are unsure whether a string is Base64 encoded, a tell-tale sign is the =
character at the end of the string. If one or more are present, it's likely to be a Base64-encoded data. An equals sign is not a requirement for a Base64 encoded string, and often does not contain it, so don't solely rely on this technique to identify Base64 strings. It is an easy way to quickly identify the general encoding type, so you can then predict the data type as outlined in this article. If the string is of interest to you, convert the string with a Base64 decoder such as this website (Base64decode.one).
Predictable data types
When developing software, engineers have the choice of several markup languages and data-interchange formats, such as JSON, HTML, XML, and related formats. All these textual formats must have predictable structures to be effective, which also makes them predictable when they're encoded using certain schemes, such as Base64. For example, a JSON string will look like this:
{
"success": "true"
"response": {
"userId": 123
}
}
Notice how the JSON object starts and ends with a curly brace. Because JSON objects always start with an {
(open curly brace) and are often followed by a "
(double quote) to define a key, this part of the string is predictably encoded every time. There are exceptions to this rule with all data formats, such as if there is a comment or other extraneous data at the beginning of the encoded string. For some formats, a string can start with more than one particular character. For example, a JSON object will start with an {
(open curly brace), meanwhile a JSON array will start with an [
(open bracket).
Common patterns cheat sheet
Use this cheat sheet to identify the likely data type of an encoded Base64 string. Look for these common patterns:
JSON: eyJ
at beginning, due to {"
(open curly brace & double quote)
JSON: ewo
, ewp
at beginning, due to {
(open curly brace) and new line
JSON, success true: eyJzdWNjZXNzIjoidHJ1ZSJ9
HTML, XML: P
at beginning, due to <
(less than symbol)
HTTPS URL: aHR0cHM6Ly
at beginning, due to https://
As you work with Base64 strings, you will notice more patterns. Over time, you can compile your own cheat sheet based on your frequent workflows. Show off your prediction skills the next time you are with a coworker and see a familiar Base64 string pattern.
Automated prediction without decoding
This article focuses on quick human predictions. However, software can predict a wider variety of string formats without decoding the string. There likely isn't a practical reason to implement this, but it is a fun experiment. Many more variations of the formats mentioned above, plus other formats which have too many variations or inconsistencies to be manually memorized can be handled by the experimental software.
You will need to write documentation for the experimental software. When you do, make sure to read when the word "Base64" should be capitalized so your experiment looks and feels more professional. Maybe someone will find a real-world use for it that way!
Last updated on December 2, 2019