#monero-dev

04:10

selsta

.merges
04:10

xmr-pr

7220 7234 7235 7237 7239 7243 7244 7246 7247 7249 7250 7251
05:36

Snipa

.merges.
05:36

Snipa

.merges
05:36

xmr-pr

7220 7234 7235 7237 7239 7243 7244 7246 7247 7249 7250 7251
05:37

Snipa

Alright, stuffing my face to wake up some, then will merge.
05:37

Snipa

Sorry, I was 3/4ths brain dead last time I checked, so was not in a good head-space for attempting to merge.
05:41

selsta

oki thanks
06:44

Snipa

Merged, I think.
06:45

luigi1111w

thanks
06:47

Snipa

Of course, was just a bit brain-dead earlier from NYE celebrations.
10:07

dEBRUYNE

.merges
10:07

xmr-pr

Merge queue empty
10:13

fluffypony

.murders
10:13

fluffypony

I guess the murder queue is empty too
10:14

Snipa

Shame.
10:33

hyc

I just started a fresh IBD btw two boxes on my wifi. most of the time network is less than 2Mbps
10:34

hyc

I've seen a spike up to 11Mbps
10:34

hyc

very brief
10:36

hyc

I also set limit-up on the serving node to 8MB/s
10:40

hyc

hm, the client node just lost its connection. was using --add-exclusive-node too
10:40

hyc

dropped after 87552
10:44

hyc

client node seemed to be deadlocked.
11:04

hyc

restarted it. client node is a quadcore laptop but I never see more than 80% CPU use
11:05

hyc

would expect to see more since block verification is threaded
11:05

hyc

oo, a peak of 25Mbps
11:06

M5M400

seems low for local net
11:06

hyc

I get the impression that the speed is really throttled by blocks/sec. bits/sec only goes up when there are full blocks.
11:07

hyc

and yes, seems slow
11:07

M5M400

if speed increases with multiple incoming connections one would assume it's a bottleneck on the providing node
11:07

hyc

not disk limited either
11:08

hyc

the providing node is relatively idle
11:08

hyc

disk shows 7% busy, hundreds of KB/sec reads
11:09

hyc

monerod using ~5% CPU there
11:10

M5M400

hmm
11:10

M5M400

sounds like artificially throttled then
11:11

hyc

dunno where tho
11:11

hyc

client node is running --db-sync-mode fast:async:10000
11:11

hyc

it is not waiting for any disk writes
11:12

M5M400

how's memory activity?
11:12

hyc

10G free
11:12

hyc

on client node
11:12

hyc

4G cached
11:13

hyc

provider node 2.4G free, 17G cached
11:16

M5M400

that's utilization. how about activity? probably hard to monitor
11:17

hyc

I just caught a few seconds using perf
11:17

hyc

seems to be bottlenecked in LMDB. how embarrassing
11:18

M5M400

man, and I deleted the "let's replace LMDB with oracle just to be sure it's not the culprit" joke that popped in my head earler :P
11:18

hyc

LOL
11:20

hyc

highlandsun.com/hyc/perf.txt
11:21

hyc

6.7% time in mdb_txn_safe, which is a serialization point. so we're losing concurrency there
11:22

M5M400

like ACID compliance?
11:23

hyc

not quite. this is in the DB driver for its own consistency control, not internal to LMDB
11:24

hyc

that function exists to wrap LMDB transactions because higher level callers don't know about them
11:24

hyc

since the higher levels were all written for the non-transactional in-memory blockchain
11:26

hyc

I suspect that's why we're not getting concurrency in verification
11:27

M5M400

seems like a good optimization target
11:28

hyc

yeah. lots of work to remove that wrapper layer though, and make the higher level code smarter about DB transactions
11:30

mfoolb

is there a RPC call for print_net_stats?
11:31

hyc

there has to be
11:31

hyc

every monerod command uses RPC
11:31

hyc

(or can use it, to talk to itself)
11:31

mfoolb

here is not listed: getmonero.org/resources/developer-guides/daemon-rpc.html
11:34

hyc

better to just read github.com/monero-project/monero/bl…rpc/core_rpc_server_commands_defs.h
11:34

hyc

COMMAND_RPC_GET_NET_STATS
11:35

mfoolb

ty
11:37

binaryFate

curl 127.0.0.1:18081/get_net_stats -H 'Content-Type: application/json'
11:42

mfoolb

binaryFate: yes, thank you .. already integrated in my queries
11:58

hyc

bah. got a segfault, systemd stored the corefile compressed
11:58

hyc

zstd compressor dies decompressing the core file
11:58

hyc

what kind of idiot designs so much fragility into critical system facilities
12:25

sech1

hyc one doesn't need to spend any effort to have fragility in their systems :P
12:27

hyc

yes but the systemd folks have gone out of their way to do so
12:51

hyc

M5M400: anyway, this perf report still isn't telling the whole story. If those two LMDB functions were the actual bottleneck, CPU use would be at least 100%
12:52

hyc

since it's less, and there's no I/O wait, and network is fairly idle, then the rest of the time is probably mutex locking
13:05

hyc

has anyone else done a full sync from scratch lately? i'm getting lots of failures on v0.17.1.8, but restarting monerod gets past them
13:06

hyc

for a couple hundred thousand blocks till it happens again
13:06

moneromooo

Paste a level 1 log ?
13:06

hyc

ok next time it hits
13:08

hyc

was twice in the first 250,000 blocks
13:09

hyc

of course now I'm at 600k and it hasn't hit again
13:12

hyc

there it goes weuse.cash/2020/09/11/bitcoin-mixing-a-catch-22
13:12

hyc

doh
13:12

hyc

paste.debian.net/1179404
13:13

hyc

moneromooo: ^
13:13

moneromooo

ty
13:16

moneromooo

It's syncing off a single peer on the local network ?
13:16

hyc

yes
13:16

hyc

exclusive node
13:16

hyc

and after I exit and restart it just resumes syncing
13:41

moneromooo

hyc: there are two "Consistency failure in m_blocks_hash_check construction" messages in src/cryptonote_core/blockchain.cpp, is this the first or the second one, based on line numbers in the log ?
13:47

hyc

looking...
13:48

hyc

line 4827
13:49

hyc

1st one
13:50

moneromooo

ty
13:51

hyc

now up to 1151100 and it hasn't happened again yet
13:52

moneromooo

What was the first height it did this ?
13:53

hyc

first time it happened at block 282112, next at 535040
13:53

hyc

again at 544768
13:53

hyc

then 661504
13:53

moneromooo

Is pruning involved on any of these daemons ?
13:54

hyc

nope
13:55

hyc

hasn't happened again since 661504
14:33

hyc

seemed to run into a problem at 130572 but it disconnected and reconnected automatically and continued
14:34

hyc

1303572
14:37

moneromooo

Known, assuming no other error.
14:45

Lyza

well. looks like my node is frozen again, even though I am no longer updating the ban list with the ban command or with the dns list
14:45

selsta

ok, what does frozen mean
14:45

moneromooo

gdb, thread apply all bt
14:46

Lyza

same as before. Systemd says it's running but it won't respond to RPC. Even doing a simple 'monerod status' just results in the system printing the name of the release version, then hanging rpetty much forever until I hit ctrl+C
14:51

hyc

(yeah there was no error message on that disconnect/reconnect)
14:58

selsta

Lyza: ok, did not happen to any of my nodes yet, so you will have to run it in gdb so that we can find out where it freezes
14:58

hyc

if it's still hung, attach gdb to the process and get the backtrace like mooo said
14:58

Lyza

this time it seems to have happened while I was interacting with the node with a wallet. but yeah I'm installing gdb rn, if y'all could give me a specific command to run it with that would be helpful
14:59

hyc

if the process is still there, find out its PID
14:59

moneromooo

gdb /path/to/monerod `pidof monerod`
14:59

Lyza

hyc unfortunately already restarted, didn't even know that was a thing you could do
14:59

hyc

yeah that ^
14:59

hyc

oh well
14:59

hyc

usually also "gdb - <pid>" works, gdb can figure out the pathname when it reads the process memory
15:00

Lyza

so I don't need to do anything special when launching, I can just do that if it hangs again?
15:01

selsta

this person also reported freezing: monero-project/monero #7224#issuecomment-753482686
15:02

selsta

otherwise I did not read any reports about freezing yet
15:04

Lyza

alright well I'll keep an eye out and try to upload some useful info if it happens again
15:09

hyc

Lyza: yes, then after it attaches, "thread apply all bt"
15:12

Lyza

got it ty
15:31

selsta

if monero-project/monero #7254 gets a review we can include it this release, else next one
15:34

sech1

reviewing it
15:48

sech1

reviewed
15:53

selsta

thanks
15:54

moneromooo

hyc: I can't repro. If I paste you a log patch and you can repro it, can you paste the logs ?
15:54

moneromooo

paste.debian.net/hidden/3eeee02e is the patch.
15:55

moneromooo

Actually...
15:55

moneromooo

paste.debian.net/hidden/8066848e
15:56

moneromooo

The 1500 bit was to try and exercise it more.
16:13

hyc

I've restarted the sync
16:14

hyc

may have hardware issues here, this laptop hasn't been used in a while
17:04

selsta

.merge+ 7254 7255
17:04

xmr-pr

Added
17:24

selsta

monero-project/monero #7264 needs an approval
17:27

moneromooo

done
17:27

selsta

.merge+ 7264 7263 7248 7261 7262
17:27

xmr-pr

Added
17:27

selsta

thanks
17:27

selsta

ok ^ would be release
17:29

selsta

we still have an issue that daemon sync bans false positives, but we had this issue for a couple releases now
17:57

moneromooo

Which case exactly ?
18:01

selsta

just starting a chain sync from height 0 and I get a lot of false positive bans, currently at 40% synced and 24 peers banned
18:01

selsta

some of those 24 are in the block list but not all
18:02

moneromooo

Did I ask for logs ?
18:02

selsta

I don’t remember
18:03

moneromooo

Well, send them again, assuming you had, necause I don't remember either :)
18:04

moneromooo

Though it seems common enough I should be able to repro when I get to it
18:04

moneromooo

Didn't repro hyc's though so send logs if yo uhave them.
18:05

selsta

restarted sync with log 1
18:06

selsta

hmm is there a different log level I can use? 1 is super spammy during sync
18:07

moneromooo

You can always rm ~/.bitmonero/bitmonero.log-* from time to time if it fills up too much.
18:07

moneromooo

100 MB each by default.
18:11

moneromooo

1,blockchain:ERROR might do. I don't *think* a ban would have an interesting message in that category, and it's what adds the "blclk added" lines
18:15

selsta

First block hash is unknown, dropping connection
18:15

selsta

can send you full logs
18:16

moneromooo

Oh. Right, I know what that is. It's when you start downloading a lot higher than the current.
18:17

moneromooo

I've seen that today, I'd forgot already. Adding to the list..
18:17

selsta

but I assume is harmless apart from blocking some false positives during sync?
18:18

moneromooo

If you're asking if it can corrupt your db or the like, it can't.
18:57

moneromooo

paste.debian.net/hidden/5c71d1a8 is a likely fix, seems to be running good here.
19:03

selsta

ok will test
20:27

selsta

.merge- 7254 7255
20:27

xmr-pr

Removed
21:14

selsta

19:57 <+moneromooo> paste.debian.net/hidden/5c71d1a8 is a likely fix, seems to be running good here. <-- seems better with this patch
21:14

selsta

after 10% synced only 1 banned and the one banned is in the block list
21:24

selsta

still 0 false positives after 20%
21:52

moneromooo

hyc: I have reproed it once. The logs show the two values are the same, so it looks like a race. But you had a single peer, so it doesn't make sense...
21:56

moneromooo

My logs timings do show a race though.
21:57

moneromooo

Maybe two bugs.
21:59

moneromooo

hyc: paste.debian.net/1179470
22:01

moneromooo

Hmm, not enough. Somwthing else locking too.
22:03

moneromooo

Has to be the main lock: paste.debian.net/1179474 (not run yet).
22:10

selsta

"protocol: handle receiving a block hash we've not added yet" <-- do you want to PR this?
22:12

moneromooo

done
22:13

selsta

thanks, after 60% synced I have 2 false positives, but I assume they are banned for other reasons
22:13

selsta

idle or so
22:14

moneromooo

grep "dropping connection$", it'll tell you why.
22:20

selsta

restarted with log 1 now to see what the remaining false positives are
22:27

selsta

is it normal that I get so many of these? paste.debian.net/hidden/ad96263d
22:27

moneromooo

Is your node pruned ?
22:28

selsta

I’m not syncing a pruned node
22:28

moneromooo

Then it's not expected.
22:28

moneromooo

Oh, wait, not quite the same message I thought.
22:29

moneromooo

Then it'll depend why it ends up there, more logs needed.
22:29

moneromooo

net.cn:DEBUG
22:29

moneromooo

And block_queue:DEBUG
22:30

moneromooo

And cn.block_queue:DEBUG
22:30

selsta

together with log 0 or log 1?
22:30

moneromooo

Either.
23:36

hyc

ok glad you repro'd it. I've been running self tests on my ssd cuz I thought it was flaking out on me
23:37

moneromooo

Well, as I said, the one I got was a race between two threads getting data. You would have had just one.
23:38

hyc

hmmm
23:38

moneromooo

It'd also race with some other code, but that code only reads, so would not get you an assert there.
23:38

hyc

in the log file, some were e.g. P2P3 vs P2P1 so maybe there's more than one active at once
23:39

moneromooo

You'd need two sets of hashes at once from the excluive peer. I don't think that can happen.
23:39

hyc

ok. then I'll keep looking for flaky hardware here
23:40

moneromooo

But merge the lock and see if it happens again I guess :) If you can repro fairly easily, we'll know.
23:40

moneromooo

could*
23:41

hyc

I was also getting some DB PAGE_NOT_FOUND errors, which is why I'm testing my ssd now
23:41

hyc

but I suppose a concurrency issue could have corrupted a write too
23:42

hyc

ok I'll rebuild release branch with the patches
23:44

moneromooo

That was not doing anything with the db but reads.
23:45

moneromooo

But someone reported monero-project/monero #7259, might be a new db related bug.
23:47

hyc

I saw that, one of the backtraces may have been in the DB, but the other was not

5 years ago

« a day earlier

a day later »

today »