Bernd, another option would be a mismatch between the message that "virsh destroy" issues and the message that force_stop() in the pacemaker agent expects to receive.  Pacemaker is trying to determine the success or failure of the destroy based on the concatenation of the text of the exit code and the text output by virsh; if either of those have changed between virsh versions, and especially if virsh destroy ever exits with a status other than zero, then you'll get that OCF error.

Do you know what $VIRSH_OPTIONS ends up as in your Pacemaker config, particularly whether --graceful is specified?

Cheers,

- Peter

On Wed, 7 Oct 2020 at 18:13, Lentes, Bernd <bernd.lentes@helmholtz-muenchen.de> wrote:
Hi,

Is it possible that "virsh destroy" does not stop a domain ?
I'm asking because i have some domains running in a two-node HA-Cluster (pacemaker).
And sometimes one node get fenced (killed) because it couldn't stop a domain.
That's very ugly.

This is also the reason why i asked before what "virsh destroy" really does ?
IIRC a kill -9 can't terminate a process which is in "D" state (uninterruptible sleep).
So if the process of the domain is in "D" state, it can't be finished. Right ?

Pacemaker tries to shutdown or destroy a domain with a resource agent, which is a shell script, similar
to an init script.

Here is an excerp from the resource agent for virtual domains:

force_stop()
{
        local out ex translate
        local status=0

        ocf_log info "Issuing forced shutdown (destroy) request for domain ${DOMAIN_NAME}."
        out=$(LANG=C virsh $VIRSH_OPTIONS destroy ${DOMAIN_NAME} 2>&1)              # hier wird die domain destroyed
        ex=$?
        translate=$(echo $out|tr 'A-Z' 'a-z')
        echo >&2 "$translate"
        case $ex$translate in
                *"error:"*"domain is not running"*|*"error:"*"domain not found"*|\
                *"error:"*"failed to get domain"*)
                        : ;; # unexpected path to the intended outcome, all is well   sucess
                [!0]*)
                        ocf_exit_reason "forced stop failed"   # <============ fail of destroy seems to be possible
                        return $OCF_ERR_GENERIC ;;     
                0*)
                        while [ $status != $OCF_NOT_RUNNING ]; do
                                VirtualDomain_status
                                status=$?
                        done ;;
        esac
        return $OCF_SUCCESS
}

The function force_stop is responsible for stop/destroy the domain.
And it cares about a non-working "virsh destroy".
Is there a developer who can explain what "virsh destroy" really does ?
Or is there another ML for the developers ?

Bernd

--

Bernd Lentes
Systemadministration
Institute for Metabolism and Cell Death (MCD)
Building 25 - office 122
HelmholtzZentrum München
bernd.lentes@helmholtz-muenchen.de
phone: +49 89 3187 1241
phone: +49 89 3187 3827
fax: +49 89 3187 2294
http://www.helmholtz-muenchen.de/mcd

stay healthy
Helmholtz Zentrum München

Helmholtz Zentrum München