-
selsta
.merges
-
xmr-pr
7220 7234 7235 7237 7239 7243 7244 7246 7247 7249 7250 7251
-
Snipa
.merges.
-
Snipa
.merges
-
xmr-pr
7220 7234 7235 7237 7239 7243 7244 7246 7247 7249 7250 7251
-
Snipa
Alright, stuffing my face to wake up some, then will merge.
-
Snipa
Sorry, I was 3/4ths brain dead last time I checked, so was not in a good head-space for attempting to merge.
-
selsta
oki thanks
-
Snipa
Merged, I think.
-
luigi1111w
thanks
-
Snipa
Of course, was just a bit brain-dead earlier from NYE celebrations.
-
dEBRUYNE
.merges
-
xmr-pr
Merge queue empty
-
fluffypony
.murders
-
fluffypony
I guess the murder queue is empty too
-
Snipa
Shame.
-
hyc
I just started a fresh IBD btw two boxes on my wifi. most of the time network is less than 2Mbps
-
hyc
I've seen a spike up to 11Mbps
-
hyc
very brief
-
hyc
I also set limit-up on the serving node to 8MB/s
-
hyc
hm, the client node just lost its connection. was using --add-exclusive-node too
-
hyc
dropped after 87552
-
hyc
client node seemed to be deadlocked.
-
hyc
restarted it. client node is a quadcore laptop but I never see more than 80% CPU use
-
hyc
would expect to see more since block verification is threaded
-
hyc
oo, a peak of 25Mbps
-
M5M400
seems low for local net
-
hyc
I get the impression that the speed is really throttled by blocks/sec. bits/sec only goes up when there are full blocks.
-
hyc
and yes, seems slow
-
M5M400
if speed increases with multiple incoming connections one would assume it's a bottleneck on the providing node
-
hyc
not disk limited either
-
hyc
the providing node is relatively idle
-
hyc
disk shows 7% busy, hundreds of KB/sec reads
-
hyc
monerod using ~5% CPU there
-
M5M400
hmm
-
M5M400
sounds like artificially throttled then
-
hyc
dunno where tho
-
hyc
client node is running --db-sync-mode fast:async:10000
-
hyc
it is not waiting for any disk writes
-
M5M400
how's memory activity?
-
hyc
10G free
-
hyc
on client node
-
hyc
4G cached
-
hyc
provider node 2.4G free, 17G cached
-
M5M400
that's utilization. how about activity? probably hard to monitor
-
hyc
I just caught a few seconds using perf
-
hyc
seems to be bottlenecked in LMDB. how embarrassing
-
M5M400
man, and I deleted the "let's replace LMDB with oracle just to be sure it's not the culprit" joke that popped in my head earler :P
-
hyc
LOL
-
hyc
-
hyc
6.7% time in mdb_txn_safe, which is a serialization point. so we're losing concurrency there
-
M5M400
like ACID compliance?
-
hyc
not quite. this is in the DB driver for its own consistency control, not internal to LMDB
-
hyc
that function exists to wrap LMDB transactions because higher level callers don't know about them
-
hyc
since the higher levels were all written for the non-transactional in-memory blockchain
-
hyc
I suspect that's why we're not getting concurrency in verification
-
M5M400
seems like a good optimization target
-
hyc
yeah. lots of work to remove that wrapper layer though, and make the higher level code smarter about DB transactions
-
mfoolb
is there a RPC call for print_net_stats?
-
hyc
there has to be
-
hyc
every monerod command uses RPC
-
hyc
(or can use it, to talk to itself)
-
mfoolb
-
hyc
-
hyc
COMMAND_RPC_GET_NET_STATS
-
mfoolb
ty
-
binaryFate
curl
127.0.0.1:18081/get_net_stats -H 'Content-Type: application/json'
-
mfoolb
binaryFate: yes, thank you .. already integrated in my queries
-
hyc
bah. got a segfault, systemd stored the corefile compressed
-
hyc
zstd compressor dies decompressing the core file
-
hyc
what kind of idiot designs so much fragility into critical system facilities
-
sech1
hyc one doesn't need to spend any effort to have fragility in their systems :P
-
hyc
yes but the systemd folks have gone out of their way to do so
-
hyc
M5M400: anyway, this perf report still isn't telling the whole story. If those two LMDB functions were the actual bottleneck, CPU use would be at least 100%
-
hyc
since it's less, and there's no I/O wait, and network is fairly idle, then the rest of the time is probably mutex locking
-
hyc
has anyone else done a full sync from scratch lately? i'm getting lots of failures on v0.17.1.8, but restarting monerod gets past them
-
hyc
for a couple hundred thousand blocks till it happens again
-
moneromooo
Paste a level 1 log ?
-
hyc
ok next time it hits
-
hyc
was twice in the first 250,000 blocks
-
hyc
of course now I'm at 600k and it hasn't hit again
-
hyc
-
hyc
doh
-
hyc
-
hyc
moneromooo: ^
-
moneromooo
ty
-
moneromooo
It's syncing off a single peer on the local network ?
-
hyc
yes
-
hyc
exclusive node
-
hyc
and after I exit and restart it just resumes syncing
-
moneromooo
hyc: there are two "Consistency failure in m_blocks_hash_check construction" messages in src/cryptonote_core/blockchain.cpp, is this the first or the second one, based on line numbers in the log ?
-
hyc
looking...
-
hyc
line 4827
-
hyc
1st one
-
moneromooo
ty
-
hyc
now up to 1151100 and it hasn't happened again yet
-
moneromooo
What was the first height it did this ?
-
hyc
first time it happened at block 282112, next at 535040
-
hyc
again at 544768
-
hyc
then 661504
-
moneromooo
Is pruning involved on any of these daemons ?
-
hyc
nope
-
hyc
hasn't happened again since 661504
-
hyc
seemed to run into a problem at 130572 but it disconnected and reconnected automatically and continued
-
hyc
1303572
-
moneromooo
Known, assuming no other error.
-
Lyza
well. looks like my node is frozen again, even though I am no longer updating the ban list with the ban command or with the dns list
-
selsta
ok, what does frozen mean
-
moneromooo
gdb, thread apply all bt
-
Lyza
same as before. Systemd says it's running but it won't respond to RPC. Even doing a simple 'monerod status' just results in the system printing the name of the release version, then hanging rpetty much forever until I hit ctrl+C
-
hyc
(yeah there was no error message on that disconnect/reconnect)
-
selsta
Lyza: ok, did not happen to any of my nodes yet, so you will have to run it in gdb so that we can find out where it freezes
-
hyc
if it's still hung, attach gdb to the process and get the backtrace like mooo said
-
Lyza
this time it seems to have happened while I was interacting with the node with a wallet. but yeah I'm installing gdb rn, if y'all could give me a specific command to run it with that would be helpful
-
hyc
if the process is still there, find out its PID
-
moneromooo
gdb /path/to/monerod `pidof monerod`
-
Lyza
hyc unfortunately already restarted, didn't even know that was a thing you could do
-
hyc
yeah that ^
-
hyc
oh well
-
hyc
usually also "gdb - <pid>" works, gdb can figure out the pathname when it reads the process memory
-
Lyza
so I don't need to do anything special when launching, I can just do that if it hangs again?
-
selsta
-
selsta
otherwise I did not read any reports about freezing yet
-
Lyza
alright well I'll keep an eye out and try to upload some useful info if it happens again
-
hyc
Lyza: yes, then after it attaches, "thread apply all bt"
-
Lyza
got it ty
-
selsta
if
monero-project/monero #7254 gets a review we can include it this release, else next one
-
sech1
reviewing it
-
sech1
reviewed
-
selsta
thanks
-
moneromooo
hyc: I can't repro. If I paste you a log patch and you can repro it, can you paste the logs ?
-
moneromooo
-
moneromooo
Actually...
-
moneromooo
-
moneromooo
The 1500 bit was to try and exercise it more.
-
hyc
I've restarted the sync
-
hyc
may have hardware issues here, this laptop hasn't been used in a while
-
selsta
.merge+ 7254 7255
-
xmr-pr
Added
-
selsta
-
moneromooo
done
-
selsta
.merge+ 7264 7263 7248 7261 7262
-
xmr-pr
Added
-
selsta
thanks
-
selsta
ok ^ would be release
-
selsta
we still have an issue that daemon sync bans false positives, but we had this issue for a couple releases now
-
moneromooo
Which case exactly ?
-
selsta
just starting a chain sync from height 0 and I get a lot of false positive bans, currently at 40% synced and 24 peers banned
-
selsta
some of those 24 are in the block list but not all
-
moneromooo
Did I ask for logs ?
-
selsta
I don’t remember
-
moneromooo
Well, send them again, assuming you had, necause I don't remember either :)
-
moneromooo
Though it seems common enough I should be able to repro when I get to it
-
moneromooo
Didn't repro hyc's though so send logs if yo uhave them.
-
selsta
restarted sync with log 1
-
selsta
hmm is there a different log level I can use? 1 is super spammy during sync
-
moneromooo
You can always rm ~/.bitmonero/bitmonero.log-* from time to time if it fills up too much.
-
moneromooo
100 MB each by default.
-
moneromooo
1,blockchain:ERROR might do. I don't *think* a ban would have an interesting message in that category, and it's what adds the "blclk added" lines
-
selsta
First block hash is unknown, dropping connection
-
selsta
can send you full logs
-
moneromooo
Oh. Right, I know what that is. It's when you start downloading a lot higher than the current.
-
moneromooo
I've seen that today, I'd forgot already. Adding to the list..
-
selsta
but I assume is harmless apart from blocking some false positives during sync?
-
moneromooo
If you're asking if it can corrupt your db or the like, it can't.
-
moneromooo
paste.debian.net/hidden/5c71d1a8 is a likely fix, seems to be running good here.
-
selsta
ok will test
-
selsta
.merge- 7254 7255
-
xmr-pr
Removed
-
selsta
19:57 <+moneromooo>
paste.debian.net/hidden/5c71d1a8 is a likely fix, seems to be running good here. <-- seems better with this patch
-
selsta
after 10% synced only 1 banned and the one banned is in the block list
-
selsta
still 0 false positives after 20%
-
moneromooo
hyc: I have reproed it once. The logs show the two values are the same, so it looks like a race. But you had a single peer, so it doesn't make sense...
-
moneromooo
My logs timings do show a race though.
-
moneromooo
Maybe two bugs.
-
moneromooo
-
moneromooo
Hmm, not enough. Somwthing else locking too.
-
moneromooo
Has to be the main lock:
paste.debian.net/1179474 (not run yet).
-
selsta
"protocol: handle receiving a block hash we've not added yet" <-- do you want to PR this?
-
moneromooo
done
-
selsta
thanks, after 60% synced I have 2 false positives, but I assume they are banned for other reasons
-
selsta
idle or so
-
moneromooo
grep "dropping connection$", it'll tell you why.
-
selsta
restarted with log 1 now to see what the remaining false positives are
-
selsta
is it normal that I get so many of these?
paste.debian.net/hidden/ad96263d
-
moneromooo
Is your node pruned ?
-
selsta
I’m not syncing a pruned node
-
moneromooo
Then it's not expected.
-
moneromooo
Oh, wait, not quite the same message I thought.
-
moneromooo
Then it'll depend why it ends up there, more logs needed.
-
moneromooo
net.cn:DEBUG
-
moneromooo
And block_queue:DEBUG
-
moneromooo
And cn.block_queue:DEBUG
-
selsta
together with log 0 or log 1?
-
moneromooo
Either.
-
hyc
ok glad you repro'd it. I've been running self tests on my ssd cuz I thought it was flaking out on me
-
moneromooo
Well, as I said, the one I got was a race between two threads getting data. You would have had just one.
-
hyc
hmmm
-
moneromooo
It'd also race with some other code, but that code only reads, so would not get you an assert there.
-
hyc
in the log file, some were e.g. P2P3 vs P2P1 so maybe there's more than one active at once
-
moneromooo
You'd need two sets of hashes at once from the excluive peer. I don't think that can happen.
-
hyc
ok. then I'll keep looking for flaky hardware here
-
moneromooo
But merge the lock and see if it happens again I guess :) If you can repro fairly easily, we'll know.
-
moneromooo
could*
-
hyc
I was also getting some DB PAGE_NOT_FOUND errors, which is why I'm testing my ssd now
-
hyc
but I suppose a concurrency issue could have corrupted a write too
-
hyc
ok I'll rebuild release branch with the patches
-
moneromooo
That was not doing anything with the db but reads.
-
moneromooo
But someone reported
monero-project/monero #7259, might be a new db related bug.
-
hyc
I saw that, one of the backtraces may have been in the DB, but the other was not