Replication Tab - 400 Bad Request operation timed out

Replication on TrueNAS Scale 23.100.2 appears to be working without issue, but accessing the replication tab in TrueCommand 3.0.1 very quickly returns error 400 Bad Request operation timed out. Other functionality in TrueCommand appears to operate normally.

Thanks for the post. I’m not able to reproduce this locally, so I’ll need some more information.

Could you try turning on trace logging (API page, name is sys/log-level/set and the body is {“level”: “trace”}), go back to the replication page, and then send the Middleware Log (API page, “Middleware Log”, maybe the UI one for good measure, but that one isn’t terribly useful). Turn back to the regular log level when you’re done ({“level”: “info”}), as that gets quite noisy.

MW Log --------------------

{“level”:“WARN”,“ts”:“Apr 25 17:50:11.155295”,“caller”:“rrd/cache.go:98”,“msg”:“Failed to start batch update”,“cd_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:11.155338”,“caller”:“dao/stats_daemon.go:79”,“msg”:“Failed to add stats values, retrying”,“id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“rrd_daemon_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“vals”:{“arc_miss”:{“avg”:0},“arc_used”:{“avg”:97.37271201312248},“cpu_temp”:{“avg”:0},“cpu_used”:{“avg”:0.5012531994987561},“d_busy”:{“avg”:0},“d_rb”:{“avg”:0},“d_rop”:{“avg”:0},“d_wb”:{“avg”:0},“d_wop”:{“avg”:0},“mem_used”:{“avg”:67},“net_in”:{“avg”:0.0791015625,“enX0”:0.2373046875,“enX1”:0,“enX2”:0,“tot”:0.2373046875},“net_out”:{“avg”:0.3372395833333333,“enX0”:1.01171875,“enX1”:0,“enX2”:0,“tot”:1.01171875},“net_total”:{“avg”:0.4163411458333333,“enX0”:1.2490234375,“enX1”:0,“enX2”:0,“tot”:1.2490234375},“net_used”:{“avg”:0,“enX0”:0,“enX1”:0,“enX2”:0,“tot”:0}},“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:11.155655”,“caller”:“rrd/cache.go:98”,“msg”:“Failed to start batch update”,“cd_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:11.155679”,“caller”:“dao/stats_daemon.go:88”,“msg”:“Failed to add stats values, Discarding these values.”,“id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“rrd_daemon_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“vals”:{“arc_miss”:{“avg”:0},“arc_used”:{“avg”:97.37271201312248},“cpu_temp”:{“avg”:0},“cpu_used”:{“avg”:0.5012531994987561},“d_busy”:{“avg”:0},“d_rb”:{“avg”:0},“d_rop”:{“avg”:0},“d_wb”:{“avg”:0},“d_wop”:{“avg”:0},“mem_used”:{“avg”:67},“net_in”:{“avg”:0.0791015625,“enX0”:0.2373046875,“enX1”:0,“enX2”:0,“tot”:0.2373046875},“net_out”:{“avg”:0.3372395833333333,“enX0”:1.01171875,“enX1”:0,“enX2”:0,“tot”:1.01171875},“net_total”:{“avg”:0.4163411458333333,“enX0”:1.2490234375,“enX1”:0,“enX2”:0,“tot”:1.2490234375},“net_used”:{“avg”:0,“enX0”:0,“enX1”:0,“enX2”:0,“tot”:0}},“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:11.155944”,“caller”:“internal/server-list.go:428”,“msg”:“Failed to write realtime stats”,“id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:11.203414”,“caller”:“api/methods.go:101”,“msg”:“Timed out sending message”,“id”:“43cf8dbc-2307-4796-896b-e377e6d508ad”,“name”:“tnseaside”,“msg”:“pool.query”,“id”:“b785c893-fecf-4f3a-985f-d2916afa899f”,“time”:“2m0s”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:11.203457”,“caller”:“api/health.go:274”,“msg”:“Failed to update pool stats on health event”,“id”:“43cf8dbc-2307-4796-896b-e377e6d508ad”,“name”:“tnseaside”,“err”:“operation timed out”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:13.161396”,“caller”:“rrd/cache.go:98”,“msg”:“Failed to start batch update”,“cd_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:13.161455”,“caller”:“dao/stats_daemon.go:79”,“msg”:“Failed to add stats values, retrying”,“id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“rrd_daemon_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“vals”:{“arc_miss”:{“avg”:0},“arc_used”:{“avg”:97.37271201312248},“cpu_temp”:{“avg”:0},“cpu_used”:{“avg”:0.37593989962406593},“d_busy”:{“avg”:0},“d_rb”:{“avg”:0},“d_rop”:{“avg”:0},“d_wb”:{“avg”:0},“d_wop”:{“avg”:0},“mem_used”:{“avg”:67},“net_in”:{“avg”:0.17154947916666666,“enX0”:0.3349609375,“enX1”:0.1796875,“enX2”:0,“tot”:0.5146484375},“net_out”:{“avg”:0.3444010416666667,“enX0”:1.0185546875,“enX1”:0.0146484375,“enX2”:0,“tot”:1.033203125},“net_total”:{“avg”:0.5159505208333334,“enX0”:1.353515625,“enX1”:0.1943359375,“enX2”:0,“tot”:1.5478515625},“net_used”:{“avg”:0,“enX0”:0,“enX1”:0,“enX2”:0,“tot”:0}},“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:13.161532”,“caller”:“rrd/cache.go:98”,“msg”:“Failed to start batch update”,“cd_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:13.161552”,“caller”:“dao/stats_daemon.go:88”,“msg”:“Failed to add stats values, Discarding these values.”,“id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“rrd_daemon_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“vals”:{“arc_miss”:{“avg”:0},“arc_used”:{“avg”:97.37271201312248},“cpu_temp”:{“avg”:0},“cpu_used”:{“avg”:0.37593989962406593},“d_busy”:{“avg”:0},“d_rb”:{“avg”:0},“d_rop”:{“avg”:0},“d_wb”:{“avg”:0},“d_wop”:{“avg”:0},“mem_used”:{“avg”:67},“net_in”:{“avg”:0.17154947916666666,“enX0”:0.3349609375,“enX1”:0.1796875,“enX2”:0,“tot”:0.5146484375},“net_out”:{“avg”:0.3444010416666667,“enX0”:1.0185546875,“enX1”:0.0146484375,“enX2”:0,“tot”:1.033203125},“net_total”:{“avg”:0.5159505208333334,“enX0”:1.353515625,“enX1”:0.1943359375,“enX2”:0,“tot”:1.5478515625},“net_used”:{“avg”:0,“enX0”:0,“enX1”:0,“enX2”:0,“tot”:0}},“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:13.161723”,“caller”:“internal/server-list.go:428”,“msg”:“Failed to write realtime stats”,“id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:13.259670”,“caller”:“api/methods.go:101”,“msg”:“Timed out sending message”,“id”:“43cf8dbc-2307-4796-896b-e377e6d508ad”,“name”:“tnseaside”,“msg”:“pool.dataset.query”,“id”:“55096602-729d-4071-a86b-49c12eeb5ad5”,“time”:“2m0s”}
{“level”:“ERROR”,“ts”:“Apr 25 17:50:13.259749”,“caller”:“api/pool.go:53”,“msg”:“Failed to get pool stats”,“id”:“43cf8dbc-2307-4796-896b-e377e6d508ad”,“err”:“operation timed out”,“stacktrace”:“github.com/iXsystems/TrueCommand/middleware/nas/api.(*TrueNASServer).GatherPoolMetrics\n\t/build-mw/src/nas/api/pool.go:53\ngithub.com/iXsystems/TrueCommand/middleware/nas/api.(*TrueNASServer).UpdateHealth\n\t/build-mw/src/nas/api/health.go:278\ngithub.com/iXsystems/TrueCommand/middleware/internal.(*MemoryCache).InitializeSubscriptionHandlers.func4\n\t/build-mw/src/internal/server-list.go:438\ngithub.com/iXsystems/TrueCommand/middleware/nas/api.(*TrueNASServer).processEvent\n\t/build-mw/src/nas/api/event.go:80”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:15.167686”,“caller”:“rrd/cache.go:98”,“msg”:“Failed to start batch update”,“cd_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:50:15.167759”,“caller”:“dao/stats_daemon.go:79”,“msg”:“Failed to add stats values, retrying”,“id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“rrd_daemon_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“vals”:{“arc_miss”:{“avg”:0},“arc_used”:{“avg”:97.37352980641383},“cpu_temp”:{“avg”:0},“cpu_used”:{“avg”:0.6273525006273589},“d_busy”:{“avg”:0},“d_rb”:{“avg”:0},“d_rop”:{“avg”:0},“d_wb”:{“avg”:0},“d_wop”:{“avg”:0},“mem_used”:{“avg”:67},“net_in”:{“avg”:0.5045572916666666,“enX0”:0.54296875,“enX1”:0.7314453125,“enX2”:0.2392578125,“tot”:1.513671875},“net_out”:{“avg”:0.4134114583333333,“enX0”:1.0673828125,“enX1”:0.111328125,“enX2”:0.0615234375,“tot”:1.240234375},“net_total”:{“avg”:0.91796875,“enX0”:1.6103515625,“enX1”:0.8427734375,“enX2”:0.30078125,“tot”:2.75390625},“net_used”:{“avg”:0,“enX0”:0,“enX1”:0,“enX2”:0,“tot”:0}},“err”:“rrd not available”}

UI Log -------------------
{
“id”: 43,
“updated_at”: “2024-04-25T17:51:09.544203349Z”,
“log_entry”: “Http failure response for [[[[removed URL http : \ \ … due to error, new user cannot add more thna 2 links in a post on the forum]]]] truecommand.lan/api/v1/nas/hna/replication/query: 400 Bad Request”
},
{
“id”: 42,
“updated_at”: “2024-04-25T17:48:09.785738626Z”,
“log_entry”: “Http failure response for [[[[removed URL http : \ \ … due to error, new user cannot add more thna 2 links in a post on the forum]]]] truecommand.lan/api/v1/nas/hna/replication/query: 400 Bad Request”
},
{
“id”: 41,
“updated_at”: “2024-04-25T17:46:41.902265359Z”,
“log_entry”: “Http failure response for [[[[removed URL http : \ \ … due to error, new user cannot add more thna 2 links in a post on the forum]]]] truecommand.lan/api/v1/nas/hna/replication/query: 400 Bad Request”
},
{
“id”: 40,
“updated_at”: “2024-04-25T15:52:43.41179398Z”,
“log_entry”: “Http failure response for [[[[removed URL http : \ \ … due to error, new user cannot add more thna 2 links in a post on the forum]]]] truecommand.lan/api/v1/nas/hna/replication/query: 400 Bad Request”
},

corresponding middleware log for 2024-0425T17:51:09 UI log…

{“level”:“WARN”,“ts”:“Apr 25 17:51:09.441735”,“caller”:“rrd/cache.go:98”,“msg”:“Failed to start batch update”,“cd_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:51:09.441791”,“caller”:“dao/stats_daemon.go:79”,“msg”:“Failed to add stats values, retrying”,“id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“rrd_daemon_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“vals”:{“arc_miss”:{“avg”:0},“arc_used”:{“avg”:97.3735775677113},“cpu_temp”:{“avg”:0},“cpu_used”:{“avg”:1.1264079999999979},“d_busy”:{“avg”:0},“d_rb”:{“avg”:0},“d_rop”:{“avg”:0},“d_wb”:{“avg”:0},“d_wop”:{“avg”:0},“mem_used”:{“avg”:66.9},“net_in”:{“avg”:0.20149739583333334,“enX0”:0.4091796875,“enX1”:0.1953125,“enX2”:0,“tot”:0.6044921875},“net_out”:{“avg”:3.1695963541666665,“enX0”:9.5087890625,“enX1”:0,“enX2”:0,“tot”:9.5087890625},“net_total”:{“avg”:3.37109375,“enX0”:9.91796875,“enX1”:0.1953125,“enX2”:0,“tot”:10.11328125},“net_used”:{“avg”:0,“enX0”:0,“enX1”:0,“enX2”:0,“tot”:0}},“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:51:09.441841”,“caller”:“rrd/cache.go:98”,“msg”:“Failed to start batch update”,“cd_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:51:09.441854”,“caller”:“dao/stats_daemon.go:88”,“msg”:“Failed to add stats values, Discarding these values.”,“id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“rrd_daemon_id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“vals”:{“arc_miss”:{“avg”:0},“arc_used”:{“avg”:97.3735775677113},“cpu_temp”:{“avg”:0},“cpu_used”:{“avg”:1.1264079999999979},“d_busy”:{“avg”:0},“d_rb”:{“avg”:0},“d_rop”:{“avg”:0},“d_wb”:{“avg”:0},“d_wop”:{“avg”:0},“mem_used”:{“avg”:66.9},“net_in”:{“avg”:0.20149739583333334,“enX0”:0.4091796875,“enX1”:0.1953125,“enX2”:0,“tot”:0.6044921875},“net_out”:{“avg”:3.1695963541666665,“enX0”:9.5087890625,“enX1”:0,“enX2”:0,“tot”:9.5087890625},“net_total”:{“avg”:3.37109375,“enX0”:9.91796875,“enX1”:0.1953125,“enX2”:0,“tot”:10.11328125},“net_used”:{“avg”:0,“enX0”:0,“enX1”:0,“enX2”:0,“tot”:0}},“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:51:09.441914”,“caller”:“internal/server-list.go:428”,“msg”:“Failed to write realtime stats”,“id”:“ef5e6c8e-940a-4f2a-8588-9df0cd813407”,“err”:“rrd not available”}
{“level”:“WARN”,“ts”:“Apr 25 17:51:09.458100”,“caller”:“api/methods.go:101”,“msg”:“Timed out sending message”,“id”:“43cf8dbc-2307-4796-896b-e377e6d508ad”,“name”:“tnseaside”,“msg”:“pool.dataset.query”,“id”:“80358638-9a34-49e9-a284-0c0b15e94cb4”,“time”:“2m0s”}
{“level”:“ERROR”,“ts”:“Apr 25 17:51:09.458148”,“caller”:“api/pool.go:53”,“msg”:“Failed to get pool stats”,“id”:“43cf8dbc-2307-4796-896b-e377e6d508ad”,“err”:“operation timed out”,“stacktrace”:“github.com/iXsystems/TrueCommand/middleware/nas/api.(*TrueNASServer).GatherPoolMetrics\n\t/build-mw/src/nas/api/pool.go:53\ngithub.com/iXsystems/TrueCommand/middleware/nas/api.(*TrueNASServer).UpdateHealth\n\t/build-mw/src/nas/api/health.go:278\ngithub.com/iXsystems/TrueCommand/middleware/internal.(*MemoryCache).InitializeSubscriptionHandlers.func4\n\t/build-mw/src/internal/server-list.go:438\ngithub.com/iXsystems/TrueCommand/middleware/nas/api.(*TrueNASServer).processEvent\n\t/build-mw/src/nas/api/event.go:80”}
{“level”:“WARN”,“ts”:“Apr 25 17:51:09.538998”,“caller”:“api/methods.go:101”,“msg”:“Timed out sending message”,“id”:“43cf8dbc-2307-4796-896b-e377e6d508ad”,“name”:“tnseaside”,“msg”:“replication.query”,“id”:“299d9788-0aef-4342-9184-6a2417ad0c5b”,“time”:“2m0s”}

Upgraded my primary nodes that do replication to 24.04 and TrueCommand no longer errors on the replication tab. Everything is snappy and accurate now.

Actually, the issue returns over time, but clears when I restart truecommand service.

Thank you for the updates, I really appreciate it. Sorry I haven’t gotten back.

I see the 400’s are being logged by the UI, and it’s from the backend looking at the replication tasks on the NAS. I think what’s going on is that at least one of the NASes has a broken connection. Something that I encountered in testing but struggle to reproduce. As a result, the NAS appears connected to TC but no RRD daemon exists all NAS APIs time out, which is what is causing the 400 error.

The solution is to restart the connection, either by removing the server entirely (all previous data will be unassociated and archived) or reauthenticating. Restarting the TC container will fix this too and is probably the best solution.

I’m not sure what’s leading the NAS connections to break over time. Usually I can blame this on WireGuard, but these are not on a VPN, right? I’ll examine the logs and create ticket.