Splitting string with colons and spaces?

Edit: using noorpari76's solution, thanks

#include <fstream>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <regex>

const std::regex ws_re(":| +");
void printTokens(const std::string& input)
{
    std::copy( std::sregex_token_iterator(input.begin(), input.end(), ws_re, -1),
               std::sregex_token_iterator(),
               std::ostream_iterator<std::string>(std::cout, "\n"));
}

int main()
{
    const std::string text1 = "...:---:...";
    std::cout<<"no whitespace:\n";
    printTokens(text1);

    std::cout<<"single whitespace:\n";
    const std::string text2 = "..:---:... ..:---:...";
    printTokens(text2);

    std::cout<<"multiple whitespaces:\n";
    const std::string text3 = "..:---:...   ..:---:...";
    printTokens(text3);
}

strtok() can take multiple delimiters, including space. Always check the references: https://www.cplusplus.com/reference/cstring/strtok/

It is silly to use the totally unnecessary String, which creates additional problems with low-memory Arduinos.

Example:

char input_string[50] = "Hel lo,Wor ld!";
char * token;
char strings[4][10] = {0}; //up to four strings to receive the parsed data (each 9 characters max)

void setup()
{
  Serial.begin(9600);
  Serial.println();
  byte index = 0;  // index to index the strings array
  token = strtok(input_string, " ,");  // get the first part
  while (token != NULL )
  {
    strcpy(strings[index], token);  //safer: use strncpy() instead
    Serial.println( strings[index] ); //print each substring
    index++;                 // increment the array position
    token = strtok(NULL, " ,");  // get the next part
  }
}

void loop()
{
}

Output

Hel
lo
Wor
ld!

The space is a separator; why do you want to keep it? From my point of view, that should be done when you display the data.

When you update your code, please post the complete code so we don't have to merge it (and run the risk of doing it incorrectly).

Translate in which sense? From e.g. English to Dutch? Can you give an example.

E.g. 'Hello, world' in Dutch would be 'Hallo, wereld'.

I'd suggest using <regex> library if the compiler of yours supports C++11.

#include <fstream>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <regex>

const std::regex ws_re(":| +");
void printTokens(const std::string& input)
{
    std::copy( std::sregex_token_iterator(input.begin(), input.end(), ws_re, -1),
               std::sregex_token_iterator(),
               std::ostream_iterator<std::string>(std::cout, "\n"));
}

int main()
{
    const std::string text1 = "...:---:...";
    std::cout<<"no whitespace:\n";
    printTokens(text1);

    std::cout<<"single whitespace:\n";
    const std::string text2 = "..:---:... ..:---:...";
    printTokens(text2);

    std::cout<<"multiple whitespaces:\n";
    const std::string text3 = "..:---:...   ..:---:...";
    printTokens(text3);
}

The description of library is on cppreference. If you are not familiar with regular expressions, the part in the code above const std::regex ws_re(":| +"); means that there should be either ':' symbol or (or in regular expressions denoted by pipe symbol '|') any amount of whitespaces ('+' stands for 'one or more symbol that stands before the plus sign'). Then one is able to use this regular expression to tokenize any input with std::sregex_token_iterator. For more complex cases than whitespaces, there is wonderful regex101.
The only disadvantage I could think of is that regex engine is likely to be slower than simple handwritten tokenizer.

Don't "conjoin" the substrings. You are in control!

This topic was automatically closed 180 days after the last reply. New replies are no longer allowed.