Introduction
Guest blogger: Andrin Linggi
Today we solved a tricky problem for a customer.
At a customer we need to start a java program with a bash script that is stored on a remote server. The java program will run in the background until it is killed by another script.
Starting the program with the script works fine when we run it locally on the remote server, and also when started over an ssh command.
user@remote-server $ /path/to/script.sh
user@ansible-server $ ssh remote-server /path/to/script.sh
But with ansible and the shell module it doesn’t work. Ansible doesn’t report any error and an application config file, that the start script also create, was created on the remote server. Nevertheless, the expected process wasn’t running.
Our first idea was that something had to be different between the two calls (locally and with ansible). We checked environment variables, classpath, folder handling in the bash script, but nothing worked…
To troubleshoot further, we added a pgrep
at the end of the script to see if the java process is started and is running. And… it was running as long as the startup script was running!
someuser 1514 1.4 62.8 7435456 3820348 ? Sl Apr14 306:42 /path/to/java ...
Now it became clear that ansible is killing the process. This is apparently a known feature of ansible
, which does have some implications on how you need to write your start/stop scripts in Linux. See github
Ansible:
- Kills any sub processes hanging after it runs the
shell
module, but - Leaves
nohup
:ed processes
Many sysadmins write start/stop scripts that starts a program with:
/path/to/my/program >/dev/null 2>&1 &
Which mostly works, as your standard shell or ssh does not wildly kill your spawned off processes.
The proper way to do this, though, is to use nohup
to break away any TTYs and file handles for stdin
and stdout
from the background process.
nohup /path/to/my/program >/dev/null 2>&1 &
So, to solve our issue, we added nohup before the java call, and it worked as long as we did not pipe the stdout from the java process to anything else. But that would be too easy…
The next problem we had was that the java call was piped to logsave
to handle the logs. nohup
and pipes are not best friends. One possible workaround is to inline a script in our script, and spawn off a shell with the java program and logsave
. Thanks Stack Overflow!
nohup $SHELL << EOF &
java blabla | logsave blabla
EOF
Hopefully someone find this explanation usefule they run into a similar situation.