On Mon, Jun 08, 2015 at 10:22:28AM +0100, Ian Campbell wrote: > On Mon, 2015-06-08 at 04:37 +0000, osstest service user wrote: > > flight 58119 libvirt real [real] > > http://logs.test-lab.xenproject.org/osstest/logs/58119/ > > > > Regressions :-( > > > > Tests which did not succeed and are blocking, > > including tests which could not be run: > > This has been failing for a while now, sorry for not brining it to your > attention sooner. > libxl: debug: libxl_event.c:638:libxl__ev_xswatch_deregister: watch w=0x7f805c25b248 wpath=/local/domain/0/device-model/1/state token=3/0: deregister slotnum=3 > libxl: error: libxl_exec.c:393:spawn_watch_event: domain 1 device model: startup timed out > libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7f805c25b248: deregister unregistered > libxl: debug: libxl_event.c:652:libxl__ev_xswatch_deregister: watch w=0x7f805c25b248: deregister unregistered > libxl: error: libxl_dm.c:1564:device_model_spawn_outcome: domain 1 device model: spawn failed (rc=-3) > libxl: error: libxl_create.c:1373:domcreate_devmodel_started: device model did not start: -3 Hi, I've tried to debug this "device model: startup time out" issue that I'm seeing on OpenStack. What I've done is strace every single QEMU. It appear that QEMU take more than 10s to load... I have attached two logs of those strace. On the first one, qemu-dm-instance-00000344.log, qemu does not have finished to load its dynamic libraries, this is often the case on timeout. On the second one, qemu-dm-instance-000005b6.log, qemu did actually reach the running state, but it was to late. As additionnal information, I have some dstat log. While qemu-dm-instance-00000344.log is starting: ----system---- ----total-cpu-usage---- ------memory-usage----- -net/total-> time |usr sys idl wai hiq siq| used buff cach free| recv send> 22-06 19:04:42| 33 21 23 23 0 0|4841M 103M 577M 1436M| 682B 0 > 22-06 19:04:43| 19 3 30 48 0 0|4840M 103M 729M 1286M|3106B 13k> 22-06 19:04:44| 12 3 34 52 0 0|4842M 103M 840M 1172M|3778B 13k> 22-06 19:04:45| 3 1 52 44 0 0|4844M 103M 939M 1072M| 614B 0 > 22-06 19:04:46| 3 3 49 46 0 0|4846M 103M 1055M 954M| 606B 0 > 22-06 19:04:47| 3 1 40 56 0 0|4848M 103M 1144M 863M| 414B 0 > 22-06 19:04:48| 3 3 44 50 0 0|4852M 103M 1266M 736M| 590B 0 > 22-06 19:04:49| 3 1 38 58 0 0|4854M 103M 1354M 646M| 706B 0 > 22-06 19:04:50| 2 3 32 63 0 0|4857M 103M 1472M 526M| 616B 0 > 22-06 19:04:51| 3 1 36 60 0 0|4859M 103M 1558M 438M| 760B 0 > 22-06 19:04:52| 2 1 41 57 0 0|4862M 103M 1670M 323M| 544B 0 > 22-06 19:04:53| 1 1 52 46 0 0|4866M 103M 1789M 200M|1072B 724B> 22-06 19:04:54| 1 1 51 48 0 0|4868M 103M 1880M 106M| 838B 740B> 22-06 19:04:55| 12 3 33 52 0 0|4868M 103M 1940M 45.6M| 736B 1044B> 22-06 19:04:56| 17 5 22 56 0 0|4872M 103M 1936M 46.3M| 986B 1068B> And while qemu-dm-instance-000005b6.log is starting: time |usr sys idl wai hiq siq| used buff cach free| recv send> 22-06 23:49:50| 56 13 28 3 0 0|4973M 83.3M 448M 1454M|1188B 1280B> 22-06 23:49:51| 55 16 17 12 0 0|4943M 83.3M 647M 1285M|1418B 0 > 22-06 23:49:52| 31 8 31 31 0 0|4957M 83.3M 915M 1003M|5628B 24k> 22-06 23:49:53| 16 3 40 41 0 0|4983M 83.3M 1034M 858M| 164B 0 > 22-06 23:49:54| 16 3 45 37 0 0|4954M 83.3M 1132M 790M|1160B 0 > 22-06 23:49:55| 14 3 36 47 0 0|4979M 83.3M 1233M 662M| 524B 0 > 22-06 23:49:56| 24 3 27 47 0 0|5009M 83.3M 1326M 540M| 858B 0 > 22-06 23:49:57| 32 2 23 43 0 0|4966M 83.3M 1461M 448M|1076B 0 > 22-06 23:49:58| 14 5 49 32 0 0|4995M 83.3M 1472M 408M| 810B 64B> 22-06 23:49:59| 19 2 46 33 0 0|4961M 83.3M 1472M 441M| 174B 0 > 22-06 23:50:00| 16 2 57 25 0 0|4988M 83.3M 1473M 414M| 880B 0 > 22-06 23:50:01| 28 8 37 27 0 0|4979M 83.3M 449M 1447M| 652B 0 > 22-06 23:50:02| 47 6 34 12 0 0|4941M 83.3M 451M 1484M|1192B 1664B> 22-06 23:50:03| 45 4 47 4 0 0|4961M 83.4M 451M 1463M| 686B 0 > 22-06 23:50:04| 50 8 37 5 0 0|4985M 84.0M 442M 1448M|1848B 3384B> But I'm not sure that the same issue that happen on osstest's merlot[01] hosts. -- Anthony PERARD