[libvirt] Repack git repo?

So I was checking out the repo the other day and it took ages. So it got me thinking what might be the problem. Looks like a part of it is that our pack is split among ~250 files. Therefore when somebody does checkout git needs to repack it into a single pack every time. And this may take ages on such slow processor as Atom is. However, reading some docs on this it looks like 'git gc --aggressive' is not advised rather than 'git repack'. Any thoughts? Michal

On Thu, Aug 03, 2017 at 09:16:13AM +0200, Michal Privoznik wrote:
So I was checking out the repo the other day and it took ages. So it got me thinking what might be the problem. Looks like a part of it is that our pack is split among ~250 files. Therefore when somebody does checkout git needs to repack it into a single pack every time. And this may take ages on such slow processor as Atom is. However, reading some docs on this it looks like 'git gc --aggressive' is not advised rather than 'git repack'.
I created a 'tmp' repo and ran 'repack' on it, but afaict, there's no appreciable difference. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Aug 03, 2017 at 08:57:47AM +0100, Daniel P. Berrange wrote:
On Thu, Aug 03, 2017 at 09:16:13AM +0200, Michal Privoznik wrote:
So I was checking out the repo the other day and it took ages. So it got me thinking what might be the problem. Looks like a part of it is that our pack is split among ~250 files. Therefore when somebody does checkout git needs to repack it into a single pack every time. And this may take ages on such slow processor as Atom is. However, reading some docs on this it looks like 'git gc --aggressive' is not advised rather than 'git repack'.
I created a 'tmp' repo and ran 'repack' on it, but afaict, there's no appreciable difference.
Other thought I had was having the po/*.po files removed from the repo. DV would add them to the release, but they would not take 100MB in the repository when they are not needed for development builds. It would not cut down on the data that need to be transferred since they are in the history, but at least would take some weight off the repo. After few years we could do a split as linux did and have one repo with complete history and one that goes from v4.0.0 onwards. Just my $.02
Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list

On Thu, Aug 03, 2017 at 10:14:02 +0200, Martin Kletzander wrote:
On Thu, Aug 03, 2017 at 08:57:47AM +0100, Daniel P. Berrange wrote:
On Thu, Aug 03, 2017 at 09:16:13AM +0200, Michal Privoznik wrote:
So I was checking out the repo the other day and it took ages. So it got me thinking what might be the problem. Looks like a part of it is that our pack is split among ~250 files. Therefore when somebody does checkout git needs to repack it into a single pack every time. And this may take ages on such slow processor as Atom is. However, reading some docs on this it looks like 'git gc --aggressive' is not advised rather than 'git repack'.
I created a 'tmp' repo and ran 'repack' on it, but afaict, there's no appreciable difference.
Other thought I had was having the po/*.po files removed from the repo. DV would add them to the release, but they would not take 100MB in the repository when they are not needed for development builds. It would not cut down on the data that need to be transferred since they are in the history, but at least would take some weight off the repo. After few years we could do a split as linux did and have one repo with complete history and one that goes from v4.0.0 onwards.
NACK. There's no real benefit in crippling the repo like this. You can always make a shallow clone of the repo if you don't want to get the complete history. Jirka

On 08/03/2017 09:57 AM, Daniel P. Berrange wrote:
On Thu, Aug 03, 2017 at 09:16:13AM +0200, Michal Privoznik wrote:
So I was checking out the repo the other day and it took ages. So it got me thinking what might be the problem. Looks like a part of it is that our pack is split among ~250 files. Therefore when somebody does checkout git needs to repack it into a single pack every time. And this may take ages on such slow processor as Atom is. However, reading some docs on this it looks like 'git gc --aggressive' is not advised rather than 'git repack'.
I created a 'tmp' repo and ran 'repack' on it, but afaict, there's no appreciable difference.
In fact, there is. I just finished running 'git repack -a -d' over the 'tmp' repo and here are the results: $ time git clone git://libvirt.org/libvirt.git libvirt_temp.git Cloning into 'libvirt_temp.git'... remote: Counting objects: 236385, done. remote: Compressing objects: 100% (38422/38422), done. remote: Total 236385 (delta 200296), reused 232761 (delta 196975) Receiving objects: 100% (236385/236385), 297.08 MiB | 5.55 MiB/s, done. Resolving deltas: 100% (200296/200296), done. real 2m40.089s user 1m2.831s sys 0m2.970s $ time git clone git://libvirt.org/tmp tmp.git Cloning into 'tmp.git'... remote: Counting objects: 236365, done. remote: Compressing objects: 100% (35400/35400), done. remote: Total 236365 (delta 200277), reused 236065 (delta 199977) Receiving objects: 100% (236365/236365), 297.19 MiB | 6.17 MiB/s, done. Resolving deltas: 100% (200277/200277), done. real 1m16.209s user 1m7.782s sys 0m2.940s In the first case, the network transmission took ~54s, so prep work on server took ~1m45s. In the second case, network transmission took 48s, so prep work took just ~28s. Therefore I think it makes sense to run the command. If nobody objects I can do that later today. Michal

On Thu, Aug 03, 2017 at 11:33:29AM +0200, Michal Privoznik wrote:
On 08/03/2017 09:57 AM, Daniel P. Berrange wrote:
On Thu, Aug 03, 2017 at 09:16:13AM +0200, Michal Privoznik wrote:
So I was checking out the repo the other day and it took ages. So it got me thinking what might be the problem. Looks like a part of it is that our pack is split among ~250 files. Therefore when somebody does checkout git needs to repack it into a single pack every time. And this may take ages on such slow processor as Atom is. However, reading some docs on this it looks like 'git gc --aggressive' is not advised rather than 'git repack'.
I created a 'tmp' repo and ran 'repack' on it, but afaict, there's no appreciable difference.
In fact, there is. I just finished running 'git repack -a -d' over the 'tmp' repo and here are the results:
$ time git clone git://libvirt.org/libvirt.git libvirt_temp.git Cloning into 'libvirt_temp.git'... remote: Counting objects: 236385, done. remote: Compressing objects: 100% (38422/38422), done. remote: Total 236385 (delta 200296), reused 232761 (delta 196975) Receiving objects: 100% (236385/236385), 297.08 MiB | 5.55 MiB/s, done. Resolving deltas: 100% (200296/200296), done.
real 2m40.089s user 1m2.831s sys 0m2.970s
$ time git clone git://libvirt.org/tmp tmp.git Cloning into 'tmp.git'... remote: Counting objects: 236365, done. remote: Compressing objects: 100% (35400/35400), done. remote: Total 236365 (delta 200277), reused 236065 (delta 199977) Receiving objects: 100% (236365/236365), 297.19 MiB | 6.17 MiB/s, done. Resolving deltas: 100% (200277/200277), done.
real 1m16.209s user 1m7.782s sys 0m2.940s
In the first case, the network transmission took ~54s, so prep work on server took ~1m45s. In the second case, network transmission took 48s, so prep work took just ~28s. Therefore I think it makes sense to run the command. If nobody objects I can do that later today.
Yep, that's fine with me. Regards, Daniel -- |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :| |: https://libvirt.org -o- https://fstop138.berrange.com :| |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|

On Thu, Aug 03, 2017 at 09:16:13 +0200, Michal Privoznik wrote:
So I was checking out the repo the other day and it took ages. So it got me thinking what might be the problem. Looks like a part of it is that our pack is split among ~250 files. Therefore when somebody does
250 files should not be an issue on today's computers. Maybe 2 orders of magnitude more would pose some issues.
checkout git needs to repack it into a single pack every time. And this may take ages on such slow processor as Atom is. However, reading some docs on this it looks like 'git gc --aggressive' is not advised rather than 'git repack'.
Any thoughts?
Clone from github or some other hosting with more network conectivity/CPU power if you are bothered by this?

On Thu, Aug 03, 2017 at 10:33:47AM +0200, Peter Krempa wrote:
On Thu, Aug 03, 2017 at 09:16:13 +0200, Michal Privoznik wrote:
So I was checking out the repo the other day and it took ages. So it got me thinking what might be the problem. Looks like a part of it is that our pack is split among ~250 files. Therefore when somebody does
250 files should not be an issue on today's computers. Maybe 2 orders of magnitude more would pose some issues.
checkout git needs to repack it into a single pack every time. And this may take ages on such slow processor as Atom is. However, reading some docs on this it looks like 'git gc --aggressive' is not advised rather than 'git repack'.
Any thoughts?
Clone from github or some other hosting with more network conectivity/CPU power if you are bothered by this?
Yes, you can clone from github and then switch origins, sure. I was just interested how stuff is affected and should be handled =)
-- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
participants (5)
-
Daniel P. Berrange
-
Jiri Denemark
-
Martin Kletzander
-
Michal Privoznik
-
Peter Krempa