Base64 is a method of encoding a file into normal printable characters so that it can be more easily transmitted electronically. This is particularly useful for binary files where the contents may contain special control, or other "non-printable", characters that could get lost in translation went sent electronically from A to B.
From WikiPedia:
"Base64 encoding schemes are commonly used when there is a need to encode binary data that needs to be stored and transferred over media that are designed to deal with textual data. This is to ensure that the data remains intact without modification during transport."
Base64 is in common use - some places you will find it are:
- » email attachments
- » binary data in XML documents
- » embedded images in HTML pages
- » code obfuscation
- » SAS001 Password encoding (Shhh! don't tell anyone!)
- » Email attachments using the EMAIL engine of the FILENAME statement
- » Numeric values with the XML LIBNAME engine when XMLDOUBLE=PRECISION is in effect
To work directly with Base64 data in SAS you would (previously) need to take a look at the algorithms used to encode/decode streams of bytes and then implement those methods in SAS. You can find the information to do this in RFC 4648 if you would like to rebuild this from the ground up. There is also a SAS Global Forum 2011 paper by Erik Tilanus titled "The Trilogy on e-Mailing, Part 3: Handling e-Mail Attachments with SAS®", where the Base64 decoding method of manipulating 4 x 6-bit bytes back into 3 x 8-bit bytes is shown as implemented in the SAS Data Step language!
With the introduction of the BASE64Xw.d format/informat with SAS 9.2, this allows me to "cheat" a bit and create some simpler code to help work with Base64 data in SAS. While these new formats/informats let me work above the bit/byte-level manipulation required, they are limited to a maximum string length of 32k - which is fine for working with data field values, but probably not long enough if I want to encode a whole file. When working out how to break a longer string (or a stream of bytes from a file) down so that it can be handled by the provided format/informat, the trick here is:
- » For encoding, 3 characters = 4 characters
- » For decoding, 4 characters = 3 characters
So, all I have to do for encoding is read my source "stream of bytes" in multiples of 3 bytes/characters, and for decoding read in multiples of 4 bytes/characters.
Let's look at a data step that can encode a file of any size into Base64:
Hopefully the code is commented enough let you understand what is happening. I am basically reading in an Excel file one byte at a time until I have collected 57 bytes (57 is divisible by 3), then I apply the SAS BASE64Xw.d format and write it out to another (plain text) file where it the result is a 76 character string - (57 / 3) * 4 = 76.
Decoding is a bit easier, but pretty much a reverse of the above:
Here we can use the SAS BASE64Xw.d informat directly on the INPUT statement to read in the lines of Base64 encoded data. The semi-tricky part is going through the resulting "string" byte by byte to output those bytes to a file exactly as they are - which is the DO loop getting the BYTE of each byte in the string before outputting that byte using the SAS $CHARw. format.
Probably the most useful routine out of the above is the encoding. One example of using this is to embed an image file directly into a HTML page - that way the image will always show up as it doesn't rely on a separate image file to be present.
The core reason that I ended up creating these routines was to upload files using SOAP via a web service... more on that in a later post!
What I ended up with was a single macro that I could call to either encode/decode a file, and would also take either a FILEREF or a FILENAME as parameters for both the input and output files (it works this out by itself). The source code to the macro is attached to this Article.
The syntax to this macro is:
);
Utilising this macro as part of the previous example (embedding an image) would be:
I have also created a DI Studio Transformation so that I can use this functionality in my DI Studio jobs (SAS Package attached to this Article):
The transform provides a nice GUI for entering the parameters:
Compatibility | |
---|---|
SAS 9.2 | Yes |
SAS 9.3 | Yes |
This article was originally posted 8May2012 at http://www.scorpiosoftware.com.au/2012/05/sas-and-base64/
Comments
2 comments
Is it possible to use your sas macro to decode a base64 blob stored in a MySQL variable into a txt file?
I have a whole table of blob variable to decode in multiple text files.
Thanks.
Hi Yves, The macro as described will only work with Base64 data in an external file, but with a slight tweak to the decoding routine above you could use the following code as a start:
data _null_;
length b64 $ 32767 byte $ 1; /* < -- Increase b64 length to max */
/* Change INFILE and INPUT statements */
set mysql.table_with_b64blobs;
b64=input(blob_var,$base64X32767.);
b64length=length(blob_var);
/* The rest is the same */
putlog "NOTE: Detected Base64 Line Length of " b64length;
file "D:\Temp\my_decoded_file.png" recfm=F lrecl= 1;
do i=1 to (b64length/4)*3;
byte=byte(rank(substr(b64,i, 1)));
put byte $char1.;
end;
run;
The caveat is that because the max string length that can be manipulated in SAS is 32k, your Base64 blob fields in MySQL have to contain 32k characters or less.
Article is closed for comments.