04:10:24 .merges 04:10:24 -xmr-pr- 7220 7234 7235 7237 7239 7243 7244 7246 7247 7249 7250 7251 05:36:51 .merges. 05:36:54 .merges 05:36:54 -xmr-pr- 7220 7234 7235 7237 7239 7243 7244 7246 7247 7249 7250 7251 05:37:07 Alright, stuffing my face to wake up some, then will merge. 05:37:26 Sorry, I was 3/4ths brain dead last time I checked, so was not in a good head-space for attempting to merge. 05:41:29 oki thanks 06:44:37 Merged, I think. 06:45:29 thanks 06:47:42 Of course, was just a bit brain-dead earlier from NYE celebrations. 10:07:47 .merges 10:07:47 Merge queue empty 10:13:20 .murders 10:13:28 I guess the murder queue is empty too 10:14:22 Shame. 10:33:56 I just started a fresh IBD btw two boxes on my wifi. most of the time network is less than 2Mbps 10:34:07 I've seen a spike up to 11Mbps 10:34:14 very brief 10:36:38 I also set limit-up on the serving node to 8MB/s 10:40:03 hm, the client node just lost its connection. was using --add-exclusive-node too 10:40:34 dropped after 87552 10:44:56 client node seemed to be deadlocked. 11:04:27 restarted it. client node is a quadcore laptop but I never see more than 80% CPU use 11:05:15 would expect to see more since block verification is threaded 11:05:49 oo, a peak of 25Mbps 11:06:09 seems low for local net 11:06:34 I get the impression that the speed is really throttled by blocks/sec. bits/sec only goes up when there are full blocks. 11:07:19 and yes, seems slow 11:07:49 if speed increases with multiple incoming connections one would assume it's a bottleneck on the providing node 11:07:52 not disk limited either 11:08:38 the providing node is relatively idle 11:08:53 disk shows 7% busy, hundreds of KB/sec reads 11:09:27 monerod using ~5% CPU there 11:10:02 hmm 11:10:12 sounds like artificially throttled then 11:11:22 dunno where tho 11:11:37 client node is running --db-sync-mode fast:async:10000 11:11:43 it is not waiting for any disk writes 11:12:27 how's memory activity? 11:12:34 10G free 11:12:38 on client node 11:12:43 4G cached 11:13:01 provider node 2.4G free, 17G cached 11:16:01 that's utilization. how about activity? probably hard to monitor 11:17:06 I just caught a few seconds using perf 11:17:33 seems to be bottlenecked in LMDB. how embarrassing 11:18:08 man, and I deleted the "let's replace LMDB with oracle just to be sure it's not the culprit" joke that popped in my head earler :P 11:18:17 LOL 11:20:26 http://highlandsun.com/hyc/perf.txt 11:21:16 6.7% time in mdb_txn_safe, which is a serialization point. so we're losing concurrency there 11:22:55 like ACID compliance? 11:23:40 not quite. this is in the DB driver for its own consistency control, not internal to LMDB 11:24:07 that function exists to wrap LMDB transactions because higher level callers don't know about them 11:24:26 since the higher levels were all written for the non-transactional in-memory blockchain 11:26:16 I suspect that's why we're not getting concurrency in verification 11:27:37 seems like a good optimization target 11:28:36 yeah. lots of work to remove that wrapper layer though, and make the higher level code smarter about DB transactions 11:30:04 is there a RPC call for print_net_stats? 11:31:22 there has to be 11:31:34 every monerod command uses RPC 11:31:47 (or can use it, to talk to itself) 11:31:56 here is not listed: https://www.getmonero.org/resources/developer-guides/daemon-rpc.html 11:34:30 better to just read https://github.com/monero-project/monero/blob/master/src/rpc/core_rpc_server_commands_defs.h 11:34:42 COMMAND_RPC_GET_NET_STATS 11:35:56 ty 11:37:48 curl http://127.0.0.1:18081/get_net_stats -H 'Content-Type: application/json' 11:42:38 binaryFate: yes, thank you .. already integrated in my queries 11:58:20 bah. got a segfault, systemd stored the corefile compressed 11:58:28 zstd compressor dies decompressing the core file 11:58:54 what kind of idiot designs so much fragility into critical system facilities 12:25:57 hyc one doesn't need to spend any effort to have fragility in their systems :P 12:27:11 yes but the systemd folks have gone out of their way to do so 12:51:42 M5M400: anyway, this perf report still isn't telling the whole story. If those two LMDB functions were the actual bottleneck, CPU use would be at least 100% 12:52:16 since it's less, and there's no I/O wait, and network is fairly idle, then the rest of the time is probably mutex locking 13:05:20 has anyone else done a full sync from scratch lately? i'm getting lots of failures on v0.17.1.8, but restarting monerod gets past them 13:06:15 for a couple hundred thousand blocks till it happens again 13:06:21 Paste a level 1 log ? 13:06:49 ok next time it hits 13:08:01 was twice in the first 250,000 blocks 13:09:27 of course now I'm at 600k and it hasn't hit again 13:12:46 there it goes https://weuse.cash/2020/09/11/bitcoin-mixing-a-catch-22/ 13:12:51 doh 13:12:57 https://paste.debian.net/1179404/ 13:13:07 moneromooo: ^ 13:13:49 ty 13:16:21 It's syncing off a single peer on the local network ? 13:16:27 yes 13:16:33 exclusive node 13:16:51 and after I exit and restart it just resumes syncing 13:41:33 hyc: there are two "Consistency failure in m_blocks_hash_check construction" messages in src/cryptonote_core/blockchain.cpp, is this the first or the second one, based on line numbers in the log ? 13:47:58 looking... 13:48:36 line 4827 13:49:26 1st one 13:50:30 ty 13:51:37 now up to 1151100 and it hasn't happened again yet 13:52:46 What was the first height it did this ? 13:53:00 first time it happened at block 282112, next at 535040 13:53:19 again at 544768 13:53:32 then 661504 13:53:55 Is pruning involved on any of these daemons ? 13:54:03 nope 13:55:12 hasn't happened again since 661504 14:33:56 seemed to run into a problem at 130572 but it disconnected and reconnected automatically and continued 14:34:06 1303572 14:37:02 Known, assuming no other error. 14:45:02 well. looks like my node is frozen again, even though I am no longer updating the ban list with the ban command or with the dns list 14:45:16 ok, what does frozen mean 14:45:26 gdb, thread apply all bt 14:46:05 same as before. Systemd says it's running but it won't respond to RPC. Even doing a simple 'monerod status' just results in the system printing the name of the release version, then hanging rpetty much forever until I hit ctrl+C 14:51:14 (yeah there was no error message on that disconnect/reconnect) 14:58:01 Lyza: ok, did not happen to any of my nodes yet, so you will have to run it in gdb so that we can find out where it freezes 14:58:42 if it's still hung, attach gdb to the process and get the backtrace like mooo said 14:58:53 this time it seems to have happened while I was interacting with the node with a wallet. but yeah I'm installing gdb rn, if y'all could give me a specific command to run it with that would be helpful 14:59:16 if the process is still there, find out its PID 14:59:16 gdb /path/to/monerod `pidof monerod` 14:59:22 hyc unfortunately already restarted, didn't even know that was a thing you could do 14:59:23 yeah that ^ 14:59:28 oh well 14:59:55 usually also "gdb - " works, gdb can figure out the pathname when it reads the process memory 15:00:52 so I don't need to do anything special when launching, I can just do that if it hangs again? 15:01:17 this person also reported freezing: https://github.com/monero-project/monero/issues/7224#issuecomment-753482686 15:02:54 otherwise I did not read any reports about freezing yet 15:04:12 alright well I'll keep an eye out and try to upload some useful info if it happens again 15:09:58 Lyza: yes, then after it attaches, "thread apply all bt" 15:12:26 got it ty 15:31:20 if https://github.com/monero-project/monero/pull/7254 gets a review we can include it this release, else next one 15:34:36 reviewing it 15:48:02 reviewed 15:53:58 thanks 15:54:03 hyc: I can't repro. If I paste you a log patch and you can repro it, can you paste the logs ? 15:54:53 https://paste.debian.net/hidden/3eeee02e/ is the patch. 15:55:10 Actually... 15:55:47 https://paste.debian.net/hidden/8066848e/ 15:56:03 The 1500 bit was to try and exercise it more. 16:13:51 I've restarted the sync 16:14:08 may have hardware issues here, this laptop hasn't been used in a while 17:04:54 .merge+ 7254 7255 17:04:54 Added 17:24:30 https://github.com/monero-project/monero/pull/7264 needs an approval 17:27:16 done 17:27:20 .merge+ 7264 7263 7248 7261 7262 17:27:20 Added 17:27:21 thanks 17:27:28 ok ^ would be release 17:29:35 we still have an issue that daemon sync bans false positives, but we had this issue for a couple releases now 17:57:17 Which case exactly ? 18:01:52 just starting a chain sync from height 0 and I get a lot of false positive bans, currently at 40% synced and 24 peers banned 18:01:59 some of those 24 are in the block list but not all 18:02:43 Did I ask for logs ? 18:02:59 I don’t remember 18:03:20 Well, send them again, assuming you had, necause I don't remember either :) 18:04:27 Though it seems common enough I should be able to repro when I get to it 18:04:58 Didn't repro hyc's though so send logs if yo uhave them. 18:05:12 restarted sync with log 1 18:06:57 hmm is there a different log level I can use? 1 is super spammy during sync 18:07:44 You can always rm ~/.bitmonero/bitmonero.log-* from time to time if it fills up too much. 18:07:52 100 MB each by default. 18:11:11 1,blockchain:ERROR might do. I don't *think* a ban would have an interesting message in that category, and it's what adds the "blclk added" lines 18:15:10 First block hash is unknown, dropping connection 18:15:16 can send you full logs 18:16:33 Oh. Right, I know what that is. It's when you start downloading a lot higher than the current. 18:17:09 I've seen that today, I'd forgot already. Adding to the list.. 18:17:38 but I assume is harmless apart from blocking some false positives during sync? 18:18:31 If you're asking if it can corrupt your db or the like, it can't. 18:57:14 https://paste.debian.net/hidden/5c71d1a8/ is a likely fix, seems to be running good here. 19:03:47 ok will test 20:27:50 .merge- 7254 7255 20:27:50 Removed 21:14:09 19:57 <+moneromooo> https://paste.debian.net/hidden/5c71d1a8/ is a likely fix, seems to be running good here. <-- seems better with this patch 21:14:23 after 10% synced only 1 banned and the one banned is in the block list 21:24:06 still 0 false positives after 20% 21:52:51 hyc: I have reproed it once. The logs show the two values are the same, so it looks like a race. But you had a single peer, so it doesn't make sense... 21:56:57 My logs timings do show a race though. 21:57:04 Maybe two bugs. 21:59:40 hyc: https://paste.debian.net/1179470/ 22:01:11 Hmm, not enough. Somwthing else locking too. 22:03:16 Has to be the main lock: https://paste.debian.net/1179474/ (not run yet). 22:10:10 "protocol: handle receiving a block hash we've not added yet" <-- do you want to PR this? 22:12:19 done 22:13:40 thanks, after 60% synced I have 2 false positives, but I assume they are banned for other reasons 22:13:52 idle or so 22:14:17 grep "dropping connection$", it'll tell you why. 22:20:04 restarted with log 1 now to see what the remaining false positives are 22:27:08 is it normal that I get so many of these? https://paste.debian.net/hidden/ad96263d/ 22:27:52 Is your node pruned ? 22:28:07 I’m not syncing a pruned node 22:28:25 Then it's not expected. 22:28:56 Oh, wait, not quite the same message I thought. 22:29:33 Then it'll depend why it ends up there, more logs needed. 22:29:50 net.cn:DEBUG 22:29:58 And block_queue:DEBUG 22:30:12 And cn.block_queue:DEBUG 22:30:46 together with log 0 or log 1? 22:30:54 Either. 23:36:53 ok glad you repro'd it. I've been running self tests on my ssd cuz I thought it was flaking out on me 23:37:23 Well, as I said, the one I got was a race between two threads getting data. You would have had just one. 23:38:00 hmmm 23:38:28 It'd also race with some other code, but that code only reads, so would not get you an assert there. 23:38:39 in the log file, some were e.g. P2P3 vs P2P1 so maybe there's more than one active at once 23:39:27 You'd need two sets of hashes at once from the excluive peer. I don't think that can happen. 23:39:54 ok. then I'll keep looking for flaky hardware here 23:40:03 But merge the lock and see if it happens again I guess :) If you can repro fairly easily, we'll know. 23:40:19 could* 23:41:32 I was also getting some DB PAGE_NOT_FOUND errors, which is why I'm testing my ssd now 23:41:58 but I suppose a concurrency issue could have corrupted a write too 23:42:12 ok I'll rebuild release branch with the patches 23:44:05 That was not doing anything with the db but reads. 23:45:07 But someone reported https://github.com/monero-project/monero/issues/7259, might be a new db related bug. 23:47:53 I saw that, one of the backtraces may have been in the DB, but the other was not