Migrating a vagrant-libvirt VM to a newer host: skip `vagrant package`, hand it to virsh
Moving a vagrant-libvirt VM between Linux hosts looks like a two-step
(vagrant package then vagrant up on the other side), and that's exactly
what I tried. Both steps died in the same place: fog-libvirt's stream
upload/download of a large qcow2 from/to libvirt's storage pool reset
mid-flight (Cannot recv data: Connection reset by peer, hung at 0%).
The streaming bug is in fog-libvirt's vol upload/download. The fix is to
bypass vagrant-libvirt's vol-upload/vol-download entirely: flatten
the overlay qcow2 against its backing file, drop the result in a libvirt
storage pool by hand, then virsh define + virsh start. Treat
vagrant-libvirt as the boot-time scaffolding only; the running VM is plain
libvirt after that.
1. Make the box self-contained
vagrant package exists to bake the VM's disk + metadata into a .box
file. With a libvirt provider the disk is usually a qcow2 with a backing
file (qemu-img info ... | grep "backing file"), and vol-download only
streams the overlay — you'd ship an incomplete box. Skip it and flatten
manually:
# on the source host
sudo qemu-img convert -O qcow2 -p \
/var/lib/libvirt/images/<vm-name>.img \
/path/to/box.img
box.img is ~30 GB on a 127 GiB / 20 GB-used Windows guest (sparse qcow2
expands into the file). The source vol keeps working — convert is
read-only on the input.
2. Ship the box to the new host
# on the destination host, make a non-default pool first if /var is tight
sudo mkdir -p /home/<user>/libvirt-pool
sudo chown libvirt-qemu:libvirt /home/<user>/libvirt-pool
virsh pool-define-as homepool dir --target "/home/<user>/libvirt-pool"
virsh pool-build homepool && virsh pool-start homepool
virsh pool-autostart homepool
# convert directly into the pool (bypasses fog-libvirt stream upload)
sudo qemu-img convert -O qcow2 -p ~/box.img \
/home/<user>/libvirt-pool/<vm-name>.img
# build the box on the destination so vagrant-cli recognises it
cd ~ && cat > metadata.json <<EOF
{"provider":"libvirt","format":"qcow2","virtual_size":$(qemu-img info box.img | awk -F'[()]' '/virtual size/{gsub(" GiB","",$2); print int($2)}')}
EOF
tar cf gitbash-local.box metadata.json box.img
vagrant box add gitbash-local gitbash-local.box
Two libvirt gotchas to remember:
- The default pool path is
/var/lib/libvirt/images. On a host with a small/varpartition (Ubuntu 24.04 stock is 63 GB; apt and friends fill it fast), the default pool runs out of room long before you finish a 30 GB VM. Make a pool under/homeand use that.df -h /varis part of pre-flight. - Ubuntu's libvirt qemu runs as
libvirt-qemu, notroot. Pool directories need to bechown libvirt-qemu:libvirtand771, orpool-buildwill fail with a permission error. Debian still usesroot.
3. Hand the VM to virsh
Skip vagrant up. The virsh define of a saved dumpxml is enough:
# pull dumpxml from the source (and a network it depends on, e.g.
# vagrant-libvirt's management network)
virsh -c qemu:///system dumpxml <source-domain> > gitbash.xml
virsh -c qemu:///system net-dumpxml vagrant-libvirt > vagrant-libvirt.xml
# edit: change <name>, replace disk <source file=> with the new pool path,
# drop <uuid> from network (libvirt will generate a fresh one)
sed -i ... gitbash.xml
sed -i '/<uuid>/d' vagrant-libvirt.xml
# on the destination
scp gitbash.xml vagrant-libvirt.xml dest:/tmp/
ssh dest "virsh net-define /tmp/vagrant-libvirt.xml
virsh net-start vagrant-libvirt
virsh net-autostart vagrant-libvirt
virsh define /tmp/gitbash.xml
virsh start gitbash"
4. When the qemu versions don't match
The dumpxml's <type arch= machine=...> is host-specific. Pull the latest
machine type the destination's qemu actually supports:
qemu-system-x86_64 -machine help | grep pc-q35
If the source was Debian-trixie with qemu 10 (pc-q35-10.0) and the
destination is Ubuntu 24.04 with qemu 8.2 (pc-q35-noble max), the
<type> is downgraded in the XML and the Windows guest has to re-run HAL
on first boot — read 600 MB, write 130 MB, take 5–10 minutes through
one or two auto-restarts. Disk I/O stat with
virsh domblkstat <domain> will stay high; that is not a BSOD, it's
the HAL reconfig running. Wait it out.
5. What survives the move on its own
- Tailscale identity: the machine key lives on the disk; the IP
(
100.x.y.z) re-binds when the service comes back up. - Cloudflare DNS records: no change needed (the IP is preserved).
6. What doesn't survive
- Service autostart types: HAL reconfig rewrites service-start graphs.
The Tailscale service dropped to
Manualafter the move; first boot had no Tailscale IP until I ranStart-Service Tailscalefrom an elevated PowerShell. Verify on next reboot andSet-Service ... -StartupType Automaticif it didn't stick. - vagrant CLI control: once you
virsh startfrom the hand-written XML, the domain is no longer in vagrant's.vagrant.dindex.vagrant upfrom the project dir on the new host would create a second domain alongside this one. Either keep ownership of the domain via virsh and don'tvagrant up, or rebuild the projectmachines/<name>/libvirt/state to satisfy vagrant-libvirt's bookkeeping.
The recovery path is also better now: the flattened box.img on the
source host, the source's original vol, and a tar-able metadata.json
are enough to rebuild on either host without going back to the original
cloud box.