[libvirt] char: Logging serial pty output when disconnected

Interactive access to a guest serial console can be enabled by hooking the serial device to a pty backend, e.g. -device isa-serial,chardev=cs0 -chardev pty,id=cs0. With libvirt this can be configured via <console type='pty'><target type='serial' port='0'/></console>. Output from the same serial device can also be logged to a file by adding logfile=/somefile to the -chardev option (<log file=/somefile/> in libvirt). Unfortunately output gets logged only when a client like virsh console is connected to the pty; otherwise qemu drops it on the floor. This makes chardev logging much less useful than it could be for debugging guest problems after the fact. Currently qemu_chr_fe_write() calls qemu_chr_fe_write_log() only for data consumed by the backend chr_write function. With the pty backend, pty_chr_write() returns 0 indicating that the data was not consumed when the pty is disconnected. Simply changing it to return len instead of 0 tricks the caller into logging the data even when the pty is disconnected. I don't know what problems this might cause, but one data point is that tcp_chr_write() already happens to work this way. Alternatively, qemu_chr_fe_write() could be modified to log everything passed to it, regardless of how much data chr_write claims to have consumed. The trouble is that the serial device retries writing unconsumed data, so when the pty is disconnected you'd see every character duplicated 4 times in the log file. Any opinions on either approach, or other suggestions? If there are no objections to the first one, I'll prepare a patch. --Ed

On Thu, Jan 26, 2017 at 05:07:16PM -0800, Ed Swierk wrote:
Interactive access to a guest serial console can be enabled by hooking the serial device to a pty backend, e.g. -device isa-serial,chardev=cs0 -chardev pty,id=cs0. With libvirt this can be configured via <console type='pty'><target type='serial' port='0'/></console>.
Output from the same serial device can also be logged to a file by adding logfile=/somefile to the -chardev option (<log file=/somefile/> in libvirt).
Unfortunately output gets logged only when a client like virsh console is connected to the pty; otherwise qemu drops it on the floor. This makes chardev logging much less useful than it could be for debugging guest problems after the fact.
Currently qemu_chr_fe_write() calls qemu_chr_fe_write_log() only for data consumed by the backend chr_write function. With the pty backend, pty_chr_write() returns 0 indicating that the data was not consumed when the pty is disconnected. Simply changing it to return len instead of 0 tricks the caller into logging the data even when the pty is disconnected. I don't know what problems this might cause, but one data point is that tcp_chr_write() already happens to work this way.
Alternatively, qemu_chr_fe_write() could be modified to log everything passed to it, regardless of how much data chr_write claims to have consumed. The trouble is that the serial device retries writing unconsumed data, so when the pty is disconnected you'd see every character duplicated 4 times in the log file.
Any opinions on either approach, or other suggestions? If there are no objections to the first one, I'll prepare a patch.
If the pty backend intends to just drop data into a blackhole when no client is connected, then its chr_write() impl should return the length of the data discarded, not zero. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On Fri, Jan 27, 2017 at 1:40 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Thu, Jan 26, 2017 at 05:07:16PM -0800, Ed Swierk wrote:
Currently qemu_chr_fe_write() calls qemu_chr_fe_write_log() only for data consumed by the backend chr_write function. With the pty backend, pty_chr_write() returns 0 indicating that the data was not consumed when the pty is disconnected. Simply changing it to return len instead of 0 tricks the caller into logging the data even when the pty is disconnected. I don't know what problems this might cause, but one data point is that tcp_chr_write() already happens to work this way.
Alternatively, qemu_chr_fe_write() could be modified to log everything passed to it, regardless of how much data chr_write claims to have consumed. The trouble is that the serial device retries writing unconsumed data, so when the pty is disconnected you'd see every character duplicated 4 times in the log file.
Any opinions on either approach, or other suggestions? If there are no objections to the first one, I'll prepare a patch.
If the pty backend intends to just drop data into a blackhole when no client is connected, then its chr_write() impl should return the length of the data discarded, not zero.
That's exactly the question: when no client is connected, should the pty backend just drop the data into a black hole, returning the length of the data discarded? Or should it return 0, letting the frontend device decide what to do with it? I can't discern a consistent pattern across all the char backends. The closest analog is the tcp backend, which does discard the data and return len. In contrast, several backends call io_channel_send{,_full}(), which returns -1 if the write would block or fails for any other reason. It's not clear there's much the frontend can do to recover from an error, but there's no consistent pattern across serial devices either. Most just ignore the return value. But the 16550A serial device retries 4 times after an error. Changing the pty backend to discard the data on the first attempt would bypass this retry mechanism. Is that a problem? --Ed

On Mon, Jan 30, 2017 at 08:58:25PM -0800, Ed Swierk wrote:
On Fri, Jan 27, 2017 at 1:40 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Thu, Jan 26, 2017 at 05:07:16PM -0800, Ed Swierk wrote:
Currently qemu_chr_fe_write() calls qemu_chr_fe_write_log() only for data consumed by the backend chr_write function. With the pty backend, pty_chr_write() returns 0 indicating that the data was not consumed when the pty is disconnected. Simply changing it to return len instead of 0 tricks the caller into logging the data even when the pty is disconnected. I don't know what problems this might cause, but one data point is that tcp_chr_write() already happens to work this way.
Alternatively, qemu_chr_fe_write() could be modified to log everything passed to it, regardless of how much data chr_write claims to have consumed. The trouble is that the serial device retries writing unconsumed data, so when the pty is disconnected you'd see every character duplicated 4 times in the log file.
Any opinions on either approach, or other suggestions? If there are no objections to the first one, I'll prepare a patch.
If the pty backend intends to just drop data into a blackhole when no client is connected, then its chr_write() impl should return the length of the data discarded, not zero.
That's exactly the question: when no client is connected, should the pty backend just drop the data into a black hole, returning the length of the data discarded? Or should it return 0, letting the frontend device decide what to do with it?
It should return len of data discarded.
I can't discern a consistent pattern across all the char backends. The closest analog is the tcp backend, which does discard the data and return len. In contrast, several backends call io_channel_send{,_full}(), which returns -1 if the write would block or fails for any other reason.
It's not clear there's much the frontend can do to recover from an error, but there's no consistent pattern across serial devices either. Most just ignore the return value. But the 16550A serial device retries 4 times after an error. Changing the pty backend to discard the data on the first attempt would bypass this retry mechanism. Is that a problem?
I don't think so - retrying in this way is pointless IMHO - it is just going to get the same result on every retry on 99% of occassions. Regards, Daniel -- |: http://berrange.com -o- http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://entangle-photo.org -o- http://search.cpan.org/~danberr/ :|

On 31/01/2017 04:39, Daniel P. Berrange wrote:
On Mon, Jan 30, 2017 at 08:58:25PM -0800, Ed Swierk wrote:
On Fri, Jan 27, 2017 at 1:40 AM, Daniel P. Berrange <berrange@redhat.com> wrote:
On Thu, Jan 26, 2017 at 05:07:16PM -0800, Ed Swierk wrote:
Currently qemu_chr_fe_write() calls qemu_chr_fe_write_log() only for data consumed by the backend chr_write function. With the pty backend, pty_chr_write() returns 0 indicating that the data was not consumed when the pty is disconnected. Simply changing it to return len instead of 0 tricks the caller into logging the data even when the pty is disconnected. I don't know what problems this might cause, but one data point is that tcp_chr_write() already happens to work this way.
Alternatively, qemu_chr_fe_write() could be modified to log everything passed to it, regardless of how much data chr_write claims to have consumed. The trouble is that the serial device retries writing unconsumed data, so when the pty is disconnected you'd see every character duplicated 4 times in the log file.
Any opinions on either approach, or other suggestions? If there are no objections to the first one, I'll prepare a patch.
If the pty backend intends to just drop data into a blackhole when no client is connected, then its chr_write() impl should return the length of the data discarded, not zero.
That's exactly the question: when no client is connected, should the pty backend just drop the data into a black hole, returning the length of the data discarded? Or should it return 0, letting the frontend device decide what to do with it?
It should return len of data discarded.
I can't discern a consistent pattern across all the char backends. The closest analog is the tcp backend, which does discard the data and return len. In contrast, several backends call io_channel_send{,_full}(), which returns -1 if the write would block or fails for any other reason.
It's not clear there's much the frontend can do to recover from an error, but there's no consistent pattern across serial devices either. Most just ignore the return value. But the 16550A serial device retries 4 times after an error. Changing the pty backend to discard the data on the first attempt would bypass this retry mechanism. Is that a problem?
I don't think so - retrying in this way is pointless IMHO - it is just going to get the same result on every retry on 99% of occassions.
Just to provide the full context, the retry happens even if you get EAGAIN, and in that case it does makes sense. But if the pty is disconnected I agree it should discard the data. Paolo

On Tue, Jan 31, 2017 at 7:34 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
On 31/01/2017 04:39, Daniel P. Berrange wrote:
I don't think so - retrying in this way is pointless IMHO - it is just going to get the same result on every retry on 99% of occassions.
Just to provide the full context, the retry happens even if you get EAGAIN, and in that case it does makes sense.
But if the pty is disconnected I agree it should discard the data.
Thanks for the feedback. Please refer to this proposed patch: http://patchwork.ozlabs.org/patch/721974/ . --Ed
participants (3)
-
Daniel P. Berrange
-
Ed Swierk
-
Paolo Bonzini