On 02/08/2018 03:30 AM, Lin Fu wrote:
I could feel that you want to express "Oh, shit! How terrible
the code it
is." from the reply, although across the screen. Anyway, I'm really glad
to thanks for your review and suggestions. :-)
No - just trying to be thorough with respect to expectations that you're
probably not aware of as someone who doesn't regularly follow or
contribute to libvirt. If it came off as oh this is crap, then sorry -
it wasn't intended that way. I'd rather have someone be explicit when
they don't like something than make me guess what I did wrong or that
wasn't following some project guidelines.
On 2018年02月08日 07:55, John Ferlan wrote:
> That's where I'd expect to find the details of why someone would want to
> choose dlm over lockd or sanlock. What goodies are provided? What
> advantage is there?
Be careful how you snip - loss of context of placement... This comment
was pointing out you didn't update the docs.
For those, assuming this scene: one customer has used cluster software
with DLM in their environment, why not choose DLM plugin instead of
sanlock/lockd ? Beside that, compared with sanlock/lockd, DLM don't need
the share storage, and sanlock would bring a little heavyheavily on disk
IO according to lockd RFC(2011-July/msg00337.html), but DLM only requires
network.
I'd choose dlm over other options too, but that's because I have a
pretty good idea of how it works and it's features. The on/off nature of
pthread/mutex locks has always felt like a step backwards for sharing
resources from what I cut my teeth on with the OpenVMS Lock Manager.
Still you need to "sell" why using dlm is better than the existing
sanlock or lockd.
My "vision" of dlm is that resource you're adding that's available
clusterwide in some manner should be able to have something that each
member of the cluster would name the resource in the same manner. What I
don't have is the mental picture of how say /dev/sdm on one host
"appears" on another host. If it was /dev/sdm, then it'd be really
simple to use that as a resource name and know it's the same resource
between hosts. When I see the device name being hashed and added as a
resource, then I wonder how different nodes in the cluster will know
it's the same thing. Although I do recall some code used the Resource
Name field rather judiciously to do some fairly interesting things. When
you see system space (e.g. kernel) address in a resource name, you know
something entertaining is occurring.
I have sent a RFC in 12/2017 to ack how the opinion is. Daniel said that
"it can be considered in scope for a libvirt lock manager plugin if you
want to try writing one"(2017-December/msg00706.html). So I draft this
patch set which could only make DLM run in libvirt(with configure flag
`--with-dlm` to compile 'dlm.so' module by force). Hope some suggestion
bringed from libvirt community to help me improve, if accepted DLM lock
manager plugin RFC, so I don't update 'docs/locking.html.in'.
So this series was really more of an RFC than a V1... It's ok, but
indicating RFC sets the expectations of the reviewer as opposed to
posting a V1.
-------------
The following is the reply contents, aim to answer why I do that
for most comments:
1) sorry for indent in my code. I coded them in two notebooks,
both using vim. However one is newer, I have not configured '.vimrc' in
that one.
Following posting guidelines of running make check syntax-check would
have avoided that. There are some that won't even *look* at code that
fails those basic tests.
2)
>> > +AC_DEFUN([LIBVIRT_CHECK_CPG],[
>> > + dnl in some distribution, Version is `UNKNOW` in libcpg.pc
>
> UNKNOWN
> doesn't make it very useful does it?
In some distribution, for example, openSUSE Tumbleweed:
$ cat /usr/lib64/pkgconfig/libcpg.pc
prefix=/usr
exec_prefix=${prefix}
libdir=/usr/lib64
includedir=${prefix}/include
Name: cpg
Version: UNKNOWN
Description: cpg
Requires:
Libs: -L${libdir} -lcpg
Cflags: -I${includedir}
using `LIBVIRT_CHECK_PKG([CPG], [libcpg], [xxx])` directly would make
build error. So I decide to ignore 'version'. I also don't know why
that is.
Seems like Tumbleweed needs to have a bug report filed in order to get
the Version properly filled in. If you live with UNKNOWN, then the
problem is how do you really know if the version someone has installed
will have what you want without peeking into the library and looking for
specific routines. I think for some reason we do that with some of the
crypto m4 files, but then again I try to avoid these files.
3)
>> > +#define LOCK_RECORD_FILE_PATH "/tmp/libvirtd-dlm-file"
>
> Is somewhere in /tmp the best/only place to put this? What not similar
> to lockd/sanlock using /var/lib/libvirt/dlm... (which would seemingly
> need to be created on installation).
In my opinion, if one node reboots, DLM cluster would drop locks belong
that node, most of Linux distribution would clean '/tmp' directory...
And if /tmp is very small by some admin's choice (done for various
reasons), then what happens to your code? There are mechanisms to store
in XML data that needs to be saved between libvirtd restarts and even
host reboots. It's really not clear where this fits. Perhaps a
/var/run/libvirt type file that would be cleaned up differently.
4)
>> > +
>> > +#define STATUS "STATUS"
>> > +#define RESOURCE_NAME "RESOURCE_NAME"
>> > +#define LOCK_ID "LOCK_ID"
>> > +#define LOCK_MODE "LOCK_MODE"
>> > +#define VM_PID "VM_PID"
>> > +
>> > +#define BUFFERLEN 128
>
> Yuck - for a stack variable...
>> > + for (i = 0, str = raw, status = 0; ; str = NULL, i++) {
>> > + subtoken = strtok_r(str, " \n", &saveptr);
>> > + if (subtoken == NULL)
>> > + break;
>> > +
>> > + switch(i) {
>> > + case 0:
>
> These are magic numbers?
Badly code wants to provided a readable file, like that:
STATUS
RESOURCE_NAME LOCK_MODE VM_PID
0
1501e418dedd122b050c91ec61650ae30d3c217806125d3c2f365989143aa28d
EXMODE 21857
The first line is the form header, the following is the lock
information.
You could consider XML or JSON format...
5)
>> > + char *name;
>> > + uint32_t mode;
>> > + uint32_t lkid;
>
> I would think uint32_t would be limiting... Why not just unsigned int?
view in '/usr/include/libdlm.h', it use `uint32_t` instead of
`unsigned int`.
It's more of a I'm used to seeing unsigned int rather than uint32_t
which is more limiting.
6)
>> > + virMutexLock(&(lockListWait.listMutex));
>> > + virListAddTail(&lock->entry, &(lockListWait.list));
>> > + virMutexUnlock(&(lockListWait.listMutex));
>
> What is this list for? I think this also is completely separable. I
> wouldn't use a list either... 10s of domains probably works fine. 100s
> of domains perhaps not as well 1000s of domains starts being a
> bottleneck for sure.
Hashtable is a good idea, and I wanted to use it but I had never read
it's code... Chosing list because that there won't be too much vms in
one host, is it offten common that there is more than 50 vms in normal
usage?
Lots of recent examples using virHashCreate, virHashAddEntry,
virHashRemoveEntry, etc. A hash table can/will resize as necessary.
Assuming usage is a dangerous thing. You're not just talking domains,
you're factoring in shared storage between domains and leases into your
resource counts.
Recent examples have been assuming no one would ever have more than
(say) 200 disks defined for a domain for which we want to obtain domain
statistics. Well it happened and we chased a way to handle that given
the limitation of size of RPC messages. More recently some block device
with more than 200 backing stores generated a very large pile of data
where there was lots of duplication.
7)
>> > + if (safewrite(fd, buffer, strlen(buffer)) != strlen(buffer)) {
>> > + virReportSystemError(errno,
>> > + _("unable to write lock information
'%s' to file '%s'"),
>> > + buffer,
NULLSTR(driver->lockRecordFilePath));
>> > + return;
>> > + }
>
> Not sure I understand the reason of the file
Compared with sanlock and lockd, this implement of DLM has no daemon,
some area is needed to record lock information in order to resume lock
after libvirtd reboot. I chose file.
Hmm. you're using virtlockd and creating lock_driver_dlm, but then
virtlockd isn't running? I really didn't dig too deep and think that
much about how this integrates at all.
8)
>> > + * found with another mode, and -ENOENT if no orphan.
>> > + *
>> > + * cast/bast/param are (void *)1 because the kernel
>> > + * returns errors if some are null.
>> > + */
>> > +
>> > + status = dlm_ls_lockx(lockspace, mode, &lksb,
LKF_PERSISTENT|LKF_ORPHAN,
>> > + name, strlen(name), 0,
>> > + (void *)1, (void *)1, (void *)1,
>
> ^^ These would seem to relate to the same parametes used for
> dlm_lock[_wait]. They are actually quite useful when it comes to lock
> conversions. You may want to considering investigating that...
>
>
> Still seems like a "bug" to me to need to pass 1 as a callback function
> or a blocking callback function.
>
>> > + NULL, NULL);
>
> Why not dlm_is_lock() instead since you're not using xid and timeout
The code is from lvm2, I try to use `dlm_lock_wait`, however it failed,
I don't know why. `(void *)1` is the same reason. Maybe a bug from
libdlm or 'dlm.ko'.
Perhaps more time spent understanding dlm interactions will help. Just
copying an existing implementation without understanding what the
interactions are may not fit into the model you're trying to create. I
honestly don't have the cycles to learn all that dlm has to offer right
now. But I do think you do need to spend more time understanding how
libvirt works today and how dlm will make things better and then how
exactly to accomplish that in an understandable manner. My biggest
concern is someone would add a feature like this and then not continue
working with/on libvirt. The dlm concepts are a departure from existing
locking models. If you want to get known in the libvirt community then
pick some of the simple tasks from the bite sized tasks page:
https://wiki.libvirt.org/page/BiteSizedTasks
that still need addressing...
9)
>> > +
>> > + /* create thread to receive notification from kernel */
>
> notification for what? what gets called?
This is implemented by libdlm, the result is writted in `lksb`.
int dlm_lock_wait(uint32_t mode,
struct dlm_lksb *lksb,
uint32_t flags,
const void *name,
unsigned int namelen,
uint32_t parent, /* unused */
void *bastarg,
void (*bastaddr) (void *bastarg),
void *range); /* unused */
Again - a bit of context... This was a dlm_ls_pthread_init call made in
virLockManagerDlmSetup. So something that's called once creates a thread
to do what? I didn't research it, but it's something you have to know
and understand so that if someone asks you can have the answer.
An lksb is a "lock status block", but that's on a lock by lock basis.
IIRC, those are written when lock conversions occur. If you used the
"dlm_lock" API, then used the completion routines, you can then check
the lksb for the status of the attempted operation. I'm very rusty in my
recollection and my description is more my understanding of it.
10)
>> > + priv->hasRWDisks = true;
>> > + /* ignore disk resource without error */
>> > + return 0;
>> > + }
>> > + }
>> > +
>> > + if (virCryptoHashString(VIR_CRYPTO_HASH_SHA256, name,
&newName) < 0)
>
>> Interesting _lockd uses SHA256 and _sanlock uses MD5...
>
>
>> > + /* we need format the lock information, so the lock name
must be the constant length */
>> > + if (virCryptoHashString(VIR_CRYPTO_HASH_SHA256, name,
&newName) < 0)
>> > + goto cleanup;
>
> Neither _lockd nor _sandisk do any sort of Hash on this name.
>
>
>> > + memset(&lksb, 0, sizeof(lksb));
>> > + rv = dlm_ls_lock_wait(lockspace, priv->resources[i].mode,
>> > + &lksb, LKF_NOQUEUE|LKF_PERSISTENT,
>> > + priv->resources[i].name,
strlen(priv->resources[i].name),
>
> This is where I wonder why we cannot pass the path to the disk to dlm
> instead of the hash
It's interesting here. Originally I use MD5 instead of SHA256, but I
found that:
typedef enum {
VIR_CRYPTO_HASH_MD5, /* Don't use this except for historic compat */
VIR_CRYPTO_HASH_SHA256,
VIR_CRYPTO_HASH_LAST
} virCryptoHash;
Futhermore, I find the bug patch related sanlock name: sanlock's name
length is limited 48, there is always someone use more than 48 characters
(see
https://bugzilla.redhat.com/show_bug.cgi?id=735443). The same
situation, why not use the constant length? DLM limits name is 64 bytes
to avoid some potential bugs. Besides, it's convenient to format write
it to file.
I'm still wondering why a hash is being used. That's an implementation
detail which I haven't yet come to grips with given what I know from my
history with OpenVMS (think about that /dev/sdm example above).
In OpenVMS "disks" that were resources in a cluster shared with some SAN
within the cluster connectivity fabric. Each member of the cluster had
an ID (I think you have that with the CPG above) and shared resources
would be named with that ID - it's been over 15 years, so too long to
remember the details.
Still point is, for a dlm environment to be effective device naming
between cluster members is going to be important.
11)
>> > + * 'ensure sanlock socket is labelled with the VM process
label',
>> > + * however, fixing sanlock socket security labelling remove
related
>> > + * code. Now, `fd` parameter is useless.
>> > + */
>> > + if (fd)
>> > + *fd = -1;
>
> But you just set it to -1??!
Maybe I can submit a patch to cut those code. Afterall it's useless now.
Nothing's been accepted you'd just remove this from future patches.
12)
>> > + virCheckFlags(0, -1);
>> > +
>> > + if (state)
>> > + *state = NULL;
>
> Why would this return NULL? My recollection is there is some sort of
> inquiry API for dlm.
Inquiry API is disappeared now... Only lock/unlock refered to libdlm
source code and reference book.
http://people.redhat.com/ccaulfie/docs/rhdlmbook.pdf
Chapter 4.3. Lock Query Operations
You need to think in terms of what libvirt consumers could use or need
and perhaps not specifically something that dlm provides.
--
Regards
Lin Fu(river)
Don't forget to update your local libvirt.git repo in order to add the
pertinent information so that the next series you post will have it.
Sorry if there's an incomplete thoughts above - I've been distracted
today and this evening - it's dinner time now.
John