Base64 encoding using String

Hello,

I'm trying to encode base64 a String type, however, it requires an unsigned char[].
I've tried lots of methods to convert the String to unsigned char and have just got nowhere with this and am reaching out to you.
I realise there are similar topics and I have tried understanding them but to no avail.
I haven't seen one relating to this specific topic.
Please could you help as I'm just going round in circles.

#include "base64.hpp"

void setup() {

  String Test = "Test";

  // unsigned char string[] = "String example";
  unsigned char string[] = {Test};

  unsigned char base64[21]; // 20 bytes for output + 1 for null terminator

  // encode_base64() places a null terminator automatically, because the output is a string
  unsigned int base64_length = encode_base64(string, strlen((char *) string), base64);

  printf("%d\n", base64_length); 
  printf((char *) base64); 

}

void loop() {
  // put your main code here, to run repeatedly:

}

The Error I get: Compilation error: cannot convert 'String' to 'unsigned char' in initialization

The requirement is the String to be String and not anything else like char buf[] etc.....
Thank you.

You can try to use

unsigned int base64_length = encode_base64(Test.c_str, strlen(Test.c_str), base64);

See .c_str()

void setup() {

  String Test = "Test";

  // unsigned char string[] = "String example";
  unsigned char string1[] = {'T','e','s','t', '\0'};
  unsigned char string2[] = {"Test"};
  unsigned char string3[] = "Test";
  const unsigned char* string4 = (const unsigned char*) Test.c_str();
}


Thanks for the reply, unfortunately this did not work, here is the error:

Compilation error: invalid use of non-static member function 'const char* String::c_str() const'

Should have been?

unsigned int base64_length = encode_base64(Test.c_str(), strlen(Test.c_str()), base64);

or maybe

unsigned int base64_length = encode_base64(Test.c_str(), Test.length(), base64);

Thank you everyone for helping and I got it working with:

unsigned char* string4 = (unsigned char*) Test.c_str();

I did in my endeavours try (unsigned char*) but all that what missing off that effort was the star in unsigned char* string4.

Thanks once again, great forum!!

This occurs when you are calling c_str on the class rather than on an instance. Not sure what is going on. At a guess, you are posting fragments here on the site that don’t match your actual sketch.

Here is a complete sketch that compiles on my machine.

void setup() {

  String Test = "Test";

  // unsigned char string[] = "String example";
  const unsigned char *string = (const unsigned char *) Test.c_str();

  unsigned char base64[21]; // 20 bytes for output + 1 for null terminator

  // encode_base64() places a null terminator automatically, because the output is a string
  // I don't have the library installed, so I will just comment out this bit
  //unsigned int base64_length = encode_base64(string, strlen((char *) string), base64);
  //
  // printf("%d\n", base64_length); 
  // printf((char *) base64); 

}

void loop() {
  // put your main code here, to run repeatedly:

}


Ok, but out of curiosity, I still don't understand a couple of things.
Why are you always using "unsigned char" (equivalent to "byte") for strings, instead of just "char" (used for ASCII characters), and why do you need to cast "Test.c_str()" to "(const unsigned char *)", while it already returns a "(const char *)"?

This is obviously not a bug, as it's a legitimate cast, but what I mean is the code you see below works the same way as yours, but is simpler and more straightforward:

  String Test = "Test";
  const char *strTest = Test.c_str();

I am just curious -- why the following print() functions do not show the same output when the operands are same?

void setup() 
{
  Serial.begin(9600);
  char ch1 = 65;
  byte ch2 = 65;
  Serial.println(ch1); //shows: A
  Serial.println(ch2);  //shows: 65
}

void loop() {}

I was wondering the same thing.

  String Test = "Test";
  auto strTest = Test.c_str();

Two different overloads of the Println() function in the Print class.

Because "byte" is a numeric value, and "char" is a character. Even if both are set with the same numeric value, the latter is automatically recognized by "println()" as an ASCII character (65 is the ASCII code for the letter 'A').
These are some of the basics of the C programming language.

The C++ language as it's implemented by function overloading which isn't available in C.

That's not the point of my post, I was talking about variable and constant types, not the C++ overloading.

My point was @GolamMostafa would have never asked the question if we were programming in C as println() would only accept one variable type and would only print the argument one way.

I had always thought that the Compiler had looked at the data type and did --

1.
casting: when the data type is char:
Serial.println(char)65); //65 -----> on ASCII monitor: shows: A

2.
cared for the default decimal base when the data type is byte:

Serial.println(65);
==> Serial.println(65, DEC); 
==> Serial.write(0x36);  //shows: 6
==> Serial.write(0x35);  //shows: 5
==> Serial.write(0x0A);  //Newline

One high level frame is broken down to three low level frames.

Now, hearing "function overloading" which is a feature of C++. I believe that it does whatever I have expressed above.

Base64 converts binary data into a restricted ASCII alphabet. If the original data is already pure ASCII text, encoding it in Base64 provides no intrinsic technical advantage for storage or transmission efficiency as Base64 expands the data size by about 33 percent because it encodes every 3 bytes into 4 ASCII characters.

That's why the API is expecting a buffer of unsigned char (aka bytes) rather than pure ASCII stuff.

That being said, a String (or c-string) can hold UTF-8 data and thus is better seen as a byte stream as well.

The challenge was that the c_str() call is typed as const char * and the API expects unsigned data, so there was a type mismatch that got cleared by the cast.

The encode_base64 function, which OP does not show, has a poorly declared function signature.

unsigned char and "just plain" char are distinct types. (As is the rarely used signed char -- three variants.) byte is also eventually equivalent to unsigned char and/or uint8_t

Semantically, byte is "just a number" from 0 to 255 and char is text. Base64 encoding converts bytes, binary data, like in a PNG or JPEG image file, into a string. So it should take byte* and write to char*. Instead apparently it reads and writes to unsigned char*: which matches the former, but not the latter.

And while byte* and char* are both pointers to buffers of 8-bit, single-byte elements, as pointers they are not directly interchangeable, while the plain values are.

void setup() {
  char c = 'A';
  unsigned char b = c; // fine, not even a warning
  auto pc = &c;
  auto pb = &b;
  pb = pc; // nope
}

void loop() {}
error: invalid conversion from 'char*' to 'unsigned char*' [-fpermissive]
    6 |   pb = pc; // nope
      |        ^~
      |        |
      |        char*

Base64 output is composed only of ASCII characters A–Z, a–z, 0–9, +, / and possibly = for padding. These values lie in the range 43–122, which fits safely in both signed and unsigned char. From a strictly semantic perspective, a char * would indeed be sufficient and make sense.

One reason someone still uses unsigned char * is simply symmetry with the input type. Base64 encoders typically accept arbitrary binary data as unsigned char *. Returning the result in the same type keeps the API consistent and indicates the function operates on byte buffers rather than text / cStrings.

Another reason is that the signedness of char in C is implementation-defined. On some targets char is signed, on others it is unsigned. Using unsigned char removes any ambiguity and guarantees that every element of the buffer is treated as a value in the range 0–255 during bit manipulation which is also a further practical motive for defensive coding. Base64 encoders perform shifts and masking on byte values extracted from the input. Using unsigned char avoids any possibility of sign extension during those operations.

Note also that many low level libraries consistently represent buffers as unsigned char * even when the data happens to be printable ASCII. This convention is common in cryptographic and encoding code where the buffer is conceptually a sequence of bytes rather than a C string.

So I would not call this a bad API but the author could have made this easier by offering overloads for example.