Skip to content

Store strings inside library, instead using external char * #100

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

romansavrulin
Copy link

This strings should be stored somewhere, let it be inside the library, because in that case we can use temp variables for them outside the library.

Usecase:

AsyncMqttClient _mqtt;

void loadConfig(){
        JSONConfig jc("/mqtt.json");
	auto &j = jc.load();
	auto addr = j[F("addr")].as<char *>();
	auto port = j[F("port")].as<int>();
	if (addr && port) _mqtt.setServer(addr, port);
	auto user = j[F("user")].as<char *>();
	auto pass = j[F("pass")].as<char *>();
	if (user && pass) _mqtt.setCredentials(user, pass);
}

Without patch external variables will be out of the scope when mqtt will try to reconnect and one will get exception.

@romansavrulin romansavrulin changed the title Store strings inside library, instead using char * Store strings inside library, instead using external char * Apr 21, 2018
@timpur
Copy link
Collaborator

timpur commented Apr 28, 2018

I respectivly disagree, allowing you to use pointers means that you can delare the host as a constant or fixed width char eg const char[] = "host.name.com". Taking pointers alows the user of the lib to decide how they want to store the strings. I dont think there is much to gain, but more lost here. if you want to use Strings you can (https://www.arduino.cc/reference/en/language/variables/data-types/string/functions/c_str/).

Also core libaries should always try avoid using String beacuse there quite costly on ebedded devices. https://hackingmajenkoblog.wordpress.com/2016/02/04/the-evils-of-arduino-strings/

@romansavrulin
Copy link
Author

@timpur I know the cost of the solution.

Performance: You nothing to do with that Strings during normal work. It's only convenient container for char *. So no any measurable performance drop.

The memory: const char * is useless and will eat your RAM the same way String do, unless you provide separate variables for PROGMEM C strings. This will bloat library interface and implementation dramatically to bring in separate functions for parsing them. If you avoid, your library users will get random panics, depending on PROGMEM's string length and parsing routines. (See unaligned memory access for PROGMEM strings on esp8266)

Maintenance: Obviously, using Strings makes things much cleaner and allows better user experience.

So, Using Strings makes sense.

@timpur
Copy link
Collaborator

timpur commented Apr 29, 2018

Did you actually read the article ?

Strings make copies all other the place with out you knowing, this can lead the heap fragmentation, something I personally try to avoid.

@tonton81
Copy link

char arrays are preferred over Strings, Strings will eventually break your program in the long run due to it’s dynamic allocation

@tonton81
Copy link

...and causes memory leaks...

@timpur
Copy link
Collaborator

timpur commented Apr 29, 2018

@romansavrulin
You also said to improve user experience, but you haven't change the headers of any functions to take strings. So the use is seeing no benifits. This is kinda pointless then?

@tonton81
Copy link

const char* doesnt waste memory, it uses storage, same as PROGMEM.

@timpur timpur added the discussion Converstation label Apr 29, 2018
@romansavrulin
Copy link
Author

romansavrulin commented Apr 29, 2018

@timpur @tonton81 Okay, let's go step-by step.

To PROGMEM or to CONST

Read this links and think about world order one more time:

If you are still not confident about const keyword on that tricky platform, let's proceed to

Variables placement test

Here's a simple test sketch.

//36 bytes in total (0x24 bytes)
char * char_variable = "this is char *"; //15 with trailing zero
const char * const_char_variable = "this is const char *"; //21 with trailing zero



void setup() {
  // put your setup code here, to run once:

}

void loop() {
  // put your main code here, to run repeatedly:
  String progmem_char_variable = (F("This is real progmem")); //21 bytes with trailing zero (0x15 bytes)

  Serial.print(char_variable);
  Serial.print(const_char_variable);
  Serial.print(progmem_char_variable);
}

I compile it with 2.4.1 ESP-8266 arduino and will generate map file by adding -Wl,-Map "-Wl,{build.path}/{build.project_name}.map" to recipe.c.combine.pattern= parameter in platform.txt. Map file will be placed into temporarely build folder.

Map file analysis

The map file says the following

this is the pointers. it's placed into RAM, as expected.

.data.const_char_variable
                0x000000003ffe84dc        0x4 /var/folders/vf/jdzgfhk92252hqsgnsmnp7n00000gn/T/arduino_build_833821/sketch/sketch_apr29a.ino.cpp.o
                0x000000003ffe84dc                const_char_variable
 .data.char_variable
                0x000000003ffe84e0        0x4 /var/folders/vf/jdzgfhk92252hqsgnsmnp7n00000gn/T/arduino_build_833821/sketch/sketch_apr29a.ino.cpp.o
                0x000000003ffe84e0                char_variable

Here's .rodata section and ALL char *, const char * and other const data initializers IS PLACED INTO RAM TOO. So whenever you mark any type const it will anyway go to RAM on ESP8266 and some ARDUINOs. So, it will eat RAM anyway.

*(.rodata.*)
 .rodata.str1.1 // "this is char *\0" and "this is const char *\0" concatenated into single array 
                0x000000003ffe8870       0x24 /var/folders/vf/jdzgfhk92252hqsgnsmnp7n00000gn/T/arduino_build_833821/sketch/sketch_apr29a.ino.cpp.o

To FLASH goes ONLY strings that is used with F("") helpers or variables that is marked with PROGMEM explicitly. This is how things work for now.

// F("This is real progmem")
 .irom.text     0x0000000040233df9       0x15 /var/folders/vf/jdzgfhk92252hqsgnsmnp7n00000gn/T/arduino_build_833821/sketch/sketch_apr29a.ino.cpp.o

F("") heplers are kinda const char*, but you need special functions (strlen_P and the family) to use them correct way.

ELF analysis

To ensure this is the data we are looking for, let's dump sections

~/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/bin/xtensa-lx106-elf-objdump -sj .rodata sketch_apr29a.ino.elf | less

We will get the following

sketch_apr29a.ino.elf:     file format elf32-xtensa-le

Contents of section .rodata:
 3ffe8510 322e322e 31286366 64343866 33290000  2.2.1(cfd48f3)..
...

 3ffe8870 74686973 20697320 636f6e73 74206368  this is const ch
 3ffe8880 6172202a 00746869 73206973 20636861  ar *.this is cha
 3ffe8890 72202a00 2f557365 72732f72 6f6d616e  r *./Users/roman

and .irom

~/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/bin/xtensa-lx106-elf-objdump -sj .irom0.text sketch_apr29a.ino.elf | less

sketch_apr29a.ino.elf:     file format elf32-xtensa-le

Contents of section .irom0.text:
 40201010 327c4f61 1c000060 00000060 1c0f0060  2|Oa...`...`...`

...

 40233df0 2110f8f1 12c1500d f0546869 73206973  !.....P..This is
 40233e00 20726561 6c207072 6f676d65 6d003c3c   real progmem.

So, assumption that const keyword doesn't place data in FLASH and still wastes RAM is correct.

Test data

Here's .map and .elf files for entertainment

map_elf.zip

@romansavrulin
Copy link
Author

@timpur @tonton81

Next, You've mentioned heap fragmentation. What can go wrong with heap fragmentation after setting String container once (most likely for that variables usecases) per program run?

If you really care about heap fragmentation, please, be consistent and carefully read this library's code.
Then, fix all the places where it uses vectotrs that is relocaded after each packed reception and other stl:: stuff like std::function, etc...

Or better admit that you are using modern cpp and stop the holly war between c and cpp approaches.

Peace...

@timpur
Copy link
Collaborator

timpur commented Apr 29, 2018

@romansavrulin
You still didn't address how this PR will benifit the end user, no function headers have changed to take a string ? Kinda seems pointless to only use strings underneath in the lib. Don't see how the user is benifiting from this ? (At current state of the PR) ?

Edit: also you make a good point about heep and the lib, I might go though and optimise it a bit. Yes std vectors and std functions might not be best for heap, but they have valid use (could be cleaned up).

@romansavrulin
Copy link
Author

romansavrulin commented Apr 29, 2018

@timpur Good question.

I haven't touched headers for interface compatibility reason. If you agree to pull-in String, I can change them too. It will be better to add both String and FlashStringHelper overloads. But this will lead to temporary strings creation during parameter set, as you've correctly mentioned above.

The current PR solves memory management issues. For now, if user will somehow free the memory where parameters points to, the library will crash or go crazy. It can be easily achieved by both stack and dynamic allocations.

All the following snippets will cause panic, because variables will be out of the scope when setup exists.

AsyncMqttClient _mqtt;

void setup(){
  const char user[] = "username";
  const char pass[] = "pass";
  _mqtt.setCredentials(user, pass);
}
AsyncMqttClient _mqtt;

void setup(){
  String user = "username";
  String pass = "pass";
  _mqtt.setCredentials(user.c_str(), pass.c_str());
}

Using Strings inside the lib will ensure the data that is needed by the lib will be safe and sound, as it is copied inside String containers under the hood. It also ensures little memory overhead for data storage, due to dynamic allocations. You don't need to keep big static buffers for that data.

@tonton81
Copy link

on teensy, const is automatically out in flash during compile, no need for progmem or F() macros, the static print/println() are stored on flash as well without F() macro

surely before you complain about marvin’s library not working on teensy, ive ported his library over as well as many of the core functions of ESP, without using silly AT protocols. 3 megabaud rates i had marvins mqtt running for months... but enough about that, not all microcontrollers put strings (small ‘s’) in flash, and F() is not needed by all microcontrollers if the compiler puts it in flash automatically :)

@timpur
Copy link
Collaborator

timpur commented Apr 29, 2018

Tbh, if the user isn't benifiting any any way then there's not much point ? And if using strings for the user means memory management issues ? Then not sure about this ... ?

@tonton81
Copy link

how would it panic out of setup() scope? you do realize, internally, the class stores the username/password. i fail to see how a panic would occur “out of scope”

@tonton81
Copy link

i would also refrain from using STL c++ containers, they use dynamic allocations and adding Strings to the mix is bad, especially if the user is using interrupts. vectors (one of many STL containers) can corrupt the data when dealing with ISRs, none of the STL containers are ISR safe, and the random dynamic allocations on these microcontrollers are asking for trouble.

@romansavrulin
Copy link
Author

romansavrulin commented Apr 30, 2018

@timpur this examples will most likely panic or misbehave without my PR.

This PR adds automatic memory management. No additional actions need to be taken by user.

Without it user must keep an eye on structures where pointers point to. And if it points to the place that used to be valid at some point (e.g. setup in our case), but at the moment when the library needs this data again it's destroyed (say, reconnect event, that is definitely a bit later, when you've already exit setup), you will get unexpected behavior or panic.

@timpur
Copy link
Collaborator

timpur commented Apr 30, 2018

Yeah true let me talk to Marvin.

@romansavrulin
Copy link
Author

@tonton81

I'm absolutely sure about that cases.

When you use String in setup scope, cpp runtime will destroy both user and pass objects when setup exits. And internally that strings will cleanup the stuff they allocated on the heap.

But, wait... You've passed c_str() result inside this async lib, right? Now, the pointers _username and _password points to the place on the heap where both user and pass objects used to store their data. And this library don't even realize that both objects are destroyed because they are out of the scope of the function that used them. Much later, when you need to reconnect by some network error or any other reason, you will try to get the data by those pointers. And guess what's will be there at that time? You're right - garbage or other object's data.

The same thing is with const char[] user case. But structures are destroyed from the stack by compiler, when it deallocates stack segment for the function. If you see disassembly of

void setup(){
  const char user[] = "username";
  const char pass[] = "pass";

  Serial.print(user);
  Serial.print(pass);
}

you will see memcpy to the stack allocated char arrays (p.s. Serial.print(pass); is for keeping compiler from optimizing out this variable)

40201b24 <setup>:
40201b24:	e0c112        	addi	a1, a1, -32
40201b27:	fffc31        	l32r	a3, 40201b18 <_Z22__run_user_rf_pre_initv+0x4>
40201b2a:	012d      	mov.n	a2, a1
40201b2c:	940c      	movi.n	a4, 9
40201b2e:	7109      	s32i.n	a0, a1, 28
40201b30:	61c9      	s32i.n	a12, a1, 24
40201b32:	063d05        	call0	40207f04 <memcpy>
40201b35:	fff931        	l32r	a3, 40201b1c <_Z22__run_user_rf_pre_initv+0x8>
40201b38:	540c      	movi.n	a4, 5
40201b3a:	219b      	addi.n	a2, a1, 9
40201b3c:	063c45        	call0	40207f04 <memcpy>
40201b3f:	fff8c1        	l32r	a12, 40201b20 <_Z22__run_user_rf_pre_initv+0xc>
40201b42:	013d      	mov.n	a3, a1
40201b44:	0c2d      	mov.n	a2, a12
40201b46:	001c85        	call0	40201d10 <_ZN5Print5printEPKc>
40201b49:	0c2d      	mov.n	a2, a12
40201b4b:	319b      	addi.n	a3, a1, 9
40201b4d:	001c05        	call0	40201d10 <_ZN5Print5printEPKc>
40201b50:	7108      	l32i.n	a0, a1, 28
40201b52:	61c8      	l32i.n	a12, a1, 24
40201b54:	20c112        	addi	a1, a1, 32
40201b57:	f00d      	ret.n
40201b59:	000000        	ill
40201b5c:	233e35        	excw
40201b5f:	84e040        	extui	a14, a4, 0, 9
40201b62:	fe          	.byte 0xfe
40201b63:	3f          	.byte 0x3f
40201b64:	84dc      	bnez.n	a4, 40201b80 <loop+0x18>
40201b66:	fe          	.byte 0xfe
40201b67:	3f          	.byte 0x3f

The stack of the setup is definitely out of the scope when you need to reconnect.

@romansavrulin
Copy link
Author

And what is interesting, if you rewrite the same example the following way

void setup() {
  const char *user = "username";
  const char *pass = "pass";

  Serial.print(user);
  Serial.print(pass);
}

you won't get panic anymore, because pointers will point to .rodata directly.

40201b24 <setup>:
40201b24:	f0c112        	addi	a1, a1, -16
40201b27:	0261c2        	s32i	a12, a1, 8
40201b2a:	fffbc1        	l32r	a12, 40201b18 <_Z22__run_user_rf_pre_initv+0x4>
40201b2d:	fffb31        	l32r	a3, 40201b1c <_Z22__run_user_rf_pre_initv+0x8>
40201b30:	202cc0        	or	a2, a12, a12
40201b33:	3109      	s32i.n	a0, a1, 12
40201b35:	001c85        	call0	40201d00 <_ZN5Print5printEPKc>
40201b38:	fffa31        	l32r	a3, 40201b20 <_Z22__run_user_rf_pre_initv+0xc>
40201b3b:	0c2d      	mov.n	a2, a12
40201b3d:	001c05        	call0	40201d00 <_ZN5Print5printEPKc>
40201b40:	3108      	l32i.n	a0, a1, 12
40201b42:	21c8      	l32i.n	a12, a1, 8
40201b44:	10c112        	addi	a1, a1, 16
40201b47:	f00d      	ret.n
40201b49:	000000        	ill
40201b4c:	233e25        	excw
40201b4f:	84e040        	extui	a14, a4, 0, 9
40201b52:	fe          	.byte 0xfe
40201b53:	3f          	.byte 0x3f
40201b54:	84dc      	bnez.n	a4, 40201b70 <loop+0x18>
40201b56:	fe          	.byte 0xfe
40201b57:	3f          	.byte 0x3f

So, take care!

p.s. To take a single function dizassembly, just run

~/Library/Arduino15/packages/esp8266/tools/xtensa-lx106-elf-gcc/1.20.0-26-gb404fb9-2/bin/xtensa-lx106-elf-objdump -d sketch_apr29a.ino.elf | awk -v RS= '/^[[:xdigit:]].*<setup>/'

@tonton81
Copy link

for classes as well, when initializing i would never put a variable storage as a const or a pointer, i prefer copying it directly and using the internal copy in future
ex
i never:
const char *
the setCredentials should be copying it to a sized char array , in the source file its const char *, so yeah that seems to be a problem

when i ported over the ESP core to teensy with mqtt etc, i never used pointers or const variables because of issues like this, and it allowed dynamic updates..

@romansavrulin
Copy link
Author

Hello, @timpur ! Is there any news on that PR? Will it appear in mainline?

@timpur
Copy link
Collaborator

timpur commented Jun 11, 2018

Hmmm, I understand the problem, just not keen on using the String class to solve this. TBH I'd prefer copying the mem manually.

Thinking depending on the input type it will copy or not, that way if you know what your doing you can use a global const that doesn't copy the mem, other wise it will be copied.

@romansavrulin
Copy link
Author

@timpur what are you actually fighting for when copying manually? Saving few bytes? You won't save - char const * doesn't have any benefit on memory, as I showed you above. So, does manual copy implementation really worth it then?

@timpur
Copy link
Collaborator

timpur commented Jun 12, 2018

@romansavrulin, okay, but still not really keen on using String, but can you update this PR to include the function signatures also and anything else that makes sense, then well review :)

@tve
Copy link

tve commented Nov 13, 2018

I disagree with the initial premise:

This strings should be stored somewhere, let it be inside the library, because in that case we can use temp variables for them outside the library.

In my use-cases most of the long-lived strings, such as user name, password, etc come from settings in flash that are loaded into memory at boot time and pretty much stay there. This means I already have memory allocated and can just point the MQTT library at that. If the library made a copy, then those strings would be in memory twice, because I can't easily get rid of the copy I already have.
What is missing in the current design is proper documentation that these pointers passed into the MQTT library need to point to stable storage.

@bertmelis
Copy link
Contributor

I'm with @tve on this PR. Okay to close?

@arkq
Copy link

arkq commented Apr 3, 2025

In the past 2 days I've been searching for a bug, why the client is not able to establish connection with the mosquitto broker. In the logs I've seen Client <unknown> disconnected due to protocol error. Finally, I had to read the implementation of the library to see that the signature setClientId(const char* clientId) is somewhat misleading... The library does not modify the string, BUT it stores the pointer to passed memory address... I've been using a local buffer to create client ID on the fly. The buffer was overwritten when I've exited the scope, and instead of a string, a garbage was sent during MQTT connect...

It would be nice to have a documentation saying that the memory passed to setters needs to be valid throughout the lifetime of client object. Preferably, such documentation should be available not in the doc but in the header file, so any decent IDE will pull it as a contextual function description.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Converstation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants