The EMG server can often run for years without issues, but some combinations of configuration options and traffic patterns may cause it to crash. We take those crashes very seriously, so here is how you can help us find and fix those as quickly as possible.
First, as always, is to carefully examine the log/general file. Any warnings or errors there should be addressed.
Next, please make sure core files are enabled. This is done by running “ulimit -c unlimited” before starting the emgd process. The setting is set for the shell where it is run, and then inherited to any child processes it creates. When the EMG Watchdog is used to start emgd, this command must therefore be run in the shell script before the watchdog itself is started, which in turn normally is /etc/rc.local. After this, the easiest way to make sure emgd has this setting, the machine must be restarted. When using the new scripts introduced in version 8.0.1, you will instead make sure this command is present in the run_emgd.sh script, and then run “emgd -stop“. When the emgd process restarts, it will have the new setting.
You may also need to enable core files on the system level, and figure out exactly where they are stored. Different Linux distributions use different methods for this, so you may have to do some web searching, using the information in your /etc/os-release file.
Next, check the file size of the core file. If you get multiple files and they are all around some suspiciously even number such as 20GB, the issue is most likely that it runs out of memory for some reason. In that case, just send an email to support@braxo.se explaining the situation, and make sure you include the following things:
- The output from “
emgd -v“. - Your full configuration, so both the
server.cfgfile, the files in thedbconfig/lastokdirectory, and host specificserver.cfgfiles, any relevant routing files, etc. - An as detailed description as possible of when the crash occurs.
For other file sizes, please make sure gdb is installed. Then follow the steps below.
- Run:
gdb /the/path/to/emgd /the/path/to/one/core/file, obviously replacing the paths to their correct values.
- Within gdb, run:
bt - If this shows something meaningful, such as an actual call stack and not just a single line or 1000 identical ones, run:
bt full - Somewhere in this stack trace you will most likely have a pointer “
connector“, and perhaps also a “cc” and a “qe”. In these cases, run “up” or “down” within the stack trace to get to the correct level. Then run “p connector->name”, “p cc->nameno”, or “p qe->id_s”, depending on which pointer you find. In the “qe” case, please find the id in the connector log files to find the relevant connector(s). The corresponding connector.* and pdu.* files may then have useful information. Please make sure there is information in them from the time of the crash. - Send all this output to us. As always please also include the same information as listed above.
Please note that the core files themselves are of little use to us.