00:46:57 hyc, sech1, tevador, (and the likes of gingeropolous who helped test), thank yous soo much for RandomX. In my view this is groundbreaking work. There's been a lot of nonsense naysayers and you've all shown restraint which is admirable too. Great idea, great code, great delivery, and genuinely nice people. Thank you. 00:53:45 "Great idea, great code, great delivery, and genuinely nice people." This is why i'm here too. 02:12:12 I'm still trying to load randomx library from memory buffer into mmap and call randomx_alloc_cache without any errors. 02:47:38 thanks jtgrassie. and yes, sech1 and tevador are first class. 03:31:45 Finally, randomx_alloc_cache works 03:53:33 thanks cohcho , thought i had 03:54:47 it works now. yay 03:59:53 this a vega farm u think? https://xmr.nanopool.org/account/463tWEBn5XZJSxLU6uLQnQ2iY9xuNcDbjLSjkn3XAXHCbLrTTErJrBWYgHJQyrCwkNgYvyV3z8zctJLPCZy24jvb3NiTcTJ.c3ac2c09bbf24949b17d195caefa8d5225e44f8d39454d1d906e7b916929df7d 05:44:27 The last problem is that large constants are being used by absolute address within executable image (mov $0x5fb760,%rcx) instead of relative address to $pc. 07:02:32 cohcho I think that GCC has an option to compile everything as PIC (position independent code) 07:03:16 `gcc -static -fPIC ...` produces absolute offsets to rodata but everything else looks ok 07:04:01 `gcc -static-pie ...` is likely the option I need but it relies on some functions that initialize executable before actual main() 07:04:30 I'm trying to get what I should call externally in order to initialize executable but not run actual main() 07:04:41 for that -static-pie 07:05:44 You have to call entry point of the executable, C/C++ run static variable initialization there 07:05:52 then you can call other functions 07:06:57 I can't compile your two sentences into actual code, too ambiguous. 07:07:03 :) 07:07:46 And i think you're wrong because entry point will continue actual main() and to syscall with exit 07:08:00 If you're trying to loading executable binary into memory and do some stuff, you have to follow the normal procedure that OS does 07:08:40 some functions may still work fine if they don't use static variables 07:09:41 I need fully functional randomx library, some functions already tested with `gcc -static` binary. 07:15:38 if you check superscalar.cpp, you'll see a lot of constants in "MacroOp" class which are statically initialized. If you just load the code into memory, these won't be initialized and dataset initialization won't work. 07:16:05 so you have to either link the library properly or make .so/.dll file and load them properly, calling entry point 09:32:34 sech1: can you PR your optimization to RandomX repo? 09:37:07 yes, I'm trying to remove make my branch even with upstream before applying changes 09:41:30 .tr imi bag pula in masa de muist 09:45:06 miningpoolstats seems to be broken for monero: https://miningpoolstats.stream/monero 09:45:44 they decided that all pools have "invalid block height" 09:50:05 minexmr show Blockchain Height: 1979121 09:50:12 my node shows 1979122 09:50:40 or maybe minexmr just shows last mined block in the network, not current height 09:51:06 https://xmrchain.net/ is stuck 09:52:25 yes, minexmr changed to 122 and my node to 123 09:52:28 so it's ok 10:17:15 tevador tried to port my optimization to RandomX repo, but there are already too many differences, hashes don't match. I'll look into it later today. 10:17:46 Maybe it's some stupid one line bugfix somewhere 10:18:49 OK 10:44:54 tevador done https://github.com/tevador/RandomX/pull/166 10:47:09 cool, so this actually reduces the overall overhead 10:47:36 optimizations in the inner loop would increase the overhead 10:48:49 yes, combined loop is ~70% faster than doing hash -> fill separately 10:49:16 and I added prefetch to it to hide L3 latency 10:49:34 because Intel CPUs benefit from turning off HW prefetcher, so I have to add it there explicitly 10:50:03 let's see the speed up on ryzens with disabled prefetch 10:50:37 let's figure out how to disable it first 10:58:06 got 4% speed up on 3700X 11:01:45 disabling pretetcher is a global setting? most other s/w will be worse 11:03:00 sech1: 2% faster on 1700 with disabled prefetch 11:03:08 hyc: yes, global 11:03:36 where do you disable it? 11:03:41 BIOS 11:03:49 wtf, what's it called in BIOS? 11:04:07 something like L1 Stream Prefetcher 11:04:32 all my asrock boards have it 11:04:39 hmm 11:05:09 I don't see any prefetch options on my board 12:56:43 could this help? --> http://32ipi028l5q82yhj72224m8j.wpengine.netdna-cdn.com/wp-content/uploads/2017/03/GDC2017-Optimizing-For-AMD-Ryzen.pdf 12:58:19 "Avoid software prefetch. Improve Instruction Cache (IC) & Op Cache (OC) hit rate by using 12:58:20 efficient hardware prefetch. 12:58:25 bad advice for RandomX 13:00:09 OK, was an idea ;) 13:05:09 if you guys need an alternative block explorer 13:05:10 https://explorer.xmr.pt/ 13:15:55 For RandomX the advice would be "avoid hardware prefetch, improve hit rate by using efficient software prefetch" 13:16:19 HW prefetch knows nothing about where the next access would be, but RandomX code knows 13:16:50 is this valid for gpu rigs also ? 13:19:41 hardware prefetch is dumb, but works well in most cases because of spare bandwidth and sequential nature of most workloads 13:20:45 I would assume GPUs don't have hardware prefetch 13:21:06 They do actually 13:21:26 Vega GPUs have prefetch from memory to l2 cache 13:33:35 interesting, I guess there is no way to disable it? 14:51:57 I think this temphash should be typedef'd before merging https://github.com/tevador/RandomX/pull/166#pullrequestreview-324935989 14:52:27 there's no good reason to leave it up to API user to screw this up 14:55:10 hyc: maybe we should change void* output to char output[RANDOMX_HASH_SIZE] also? https://github.com/tevador/RandomX/blob/5c0486bd33068f59104e68dcba0465708e2087e0/src/randomx.h#L239 14:56:44 or are there use cases when you WANT to pass a pointer? 15:02:53 typedef or just make it a struct containing uint64_t data[8]? 15:03:28 I prefer typedefs, because then you don't have to think about what (foo *) means 15:03:31 I think tevador can handle this refactoring, no need to mix it with this PR 15:03:56 as for the output[SIZE] good question. 15:03:58 yeah, I was originally thinking to do it after merging 15:04:06 ok, fair enough 15:04:36 the only I can think of about output[] is if you wanted to splat the output into the middle of a larger buffer 15:05:54 but in general, when you can take advantage of typechecking, you should. 15:06:59 that's why I prefer C++ more 15:07:31 stricter type checks + templates, but you can still write C-style code 15:09:54 yeah, unfortunately this change spilled into the API, which must support C 15:10:32 heh. my statement "take advantage of typechecking" was only in the context of the limits of C. I think C++ is overboard. 16:16:38 those new optimizations on master? 16:16:46 of xmrig? 16:17:24 saw an improvement with my 1920x, but not the 7402 16:55:42 yes 16:56:07 Crashes on early 1st gen Ryzens have something to do with opcache: https://github.com/xmrig/xmrig/pull/1348#issuecomment-560122919 16:56:41 that guy managed to run it on his Ryzen without performance drop in the end 17:08:35 yeah that's me, working great now 17:09:26 I also have another 1700 that I dont have mobo for atm but changed it with the one I got working and opcache made it stable too 17:20:14 hyc sech1 https://github.com/tevador/RandomX/pull/170 17:20:54 nice 17:21:05 didn't think about this solution 17:32:49 tevador check your DM 19:12:05 16.4 KH/s on 3950X: https://www.reddit.com/r/MoneroMining/comments/e4ia1d/hashrate_optimization_amd_3950x/f9csosz/ 19:14:02 is the result credible? 19:14:15 this guy doesn't seem to all that certain 19:14:50 3600 MHz 15-14-14-14-32 RAM can do this 19:15:00 dual rank RAM, even better 19:15:28 560 H/s per thread is a normal number for Ryzen @ 4 GHz, I get the same 19:16:13 ok, so that means DDR4 bandwidth is not a limiting factor then 19:16:39 16.4 KH/s = 16.4 GB/s bandwidth, far from the DDR4 limit 19:16:56 transactions per second is the bottleneck here 19:17:14 sure. bandwidth assumes sequentail access, not random 19:31:51 does someone here have FreeBSD? 19:32:12 on a VM 19:32:21 I want to fix: https://github.com/tevador/RandomX/issues/149 19:32:25 virtualbox 19:33:11 first solution would be to remove CPU affinity support on FreeBSD 19:33:18 ah I can't access my VM at the moment. it's on the external SSD back home. 19:33:19 unless that struct has a different name there 19:35:14 https://www.unix.com/man-page/freebsd/2/cpuset/ 19:36:16 https://www.unix.com/man-page/freebsd/2/cpuset_setaffinity/ 19:37:15 first I will disable it and then when we have some test env, we can fix it 19:37:21 I always just comment that out, yeah. it's missing on android too 19:38:09 how to detect Android in the preprocessor? 19:38:17 __android__ 19:39:31 oh, wait 19:39:34 it's all upper case 19:39:38 __ANDROID__ 19:42:24 https://github.com/tevador/RandomX/pull/171 19:44:11 looks fine 19:47:11 looks like freeBSD is just missing #include 19:48:12 or maybe instead, https://www.unix.com/man-page/freebsd/3/pthread_affinity_np/ 19:48:53 so add #include and it should be fine 19:51:04 can be added later, I don't want to do blind changes without being able to test 19:51:41 right 19:52:02 well, I'll be back home in a few days, can followup on it then if nobody else does a freeBSD build before then 19:53:15 Damn, what a busy day - my gmail github folder has 50+ notifications today and counting 19:53:24 plus countless reddit/other forums notifications 19:54:10 I don't get reddit notifications. keeps life simpler ;) 19:55:14 https://github.com/tevador/RandomX/issues/153 19:55:18 this still needs solving 19:55:30 probably by disabling JIT altogether 19:57:55 or writing your own clear_cache() function in asm 19:58:10 that should only be a couple instructions 19:58:35 the main issue is that iOS will kill the process if it tries to set a page to be executable 19:58:38 AFAIK 19:58:52 ah, he never got far enough to even see if that's true 19:59:53 Technically, just calling a big enough function (64 KB of NOPs) will clear code cache 20:00:02 No need to use fancy instructions 20:01:47 they have this: https://developer.apple.com/documentation/bundleresources/entitlements/com_apple_security_cs_allow-jit 20:02:11 but AFAIK applications with this entitlement will never pass the App store review 20:04:05 Apple... What's even the point of supporting it? Proprietary closed eco-system vs open-source, permisionless crypto 20:04:37 because sheeple use their phones 20:04:57 the market share is pretty low 20:04:59 I don't see any of iphone users mining RandomX, lol 20:05:12 Android is winning market share btw 20:05:37 this was for a mobile wallet 20:05:52 Wallets can get away with VM interpreter 20:06:23 ios eco system sucks, the days of releasing on ios first are long gone. android is way dominant now 20:08:52 We totally forgot to celebrate one thing: RandomX on Monero is the largest CPU mining operation in history. Bitcoin was too small during CPU days, other CPU coins were much smaller too. 20:09:05 25% market share in Europe, 50% in US 20:09:16 but yea no one will mine on iOS lol 20:09:37 sadly, iphones still have the fastest mobile CPUs 20:10:08 That's going to change too 20:10:17 im surprised the US has such a large % EU needs to pick it up 20:10:20 Cortex A77 is promising 20:12:39 A77 is only really good if you're on 7nm. Apple has been in the lead even without being ahead on process. 20:20:39 2020 Macbooks will use ARM, Apple is ditching x86 20:21:01 haven't actually seen a usable ARM PC yet 20:29:33 yeah haven't really seen it yet 20:30:19 the current ARM SOCs all have next to no on-chip cache, and anemic DRAM support 20:38:54 so should we update monero source tree to use randomX 1.1.7 as well? 20:40:08 probably, but it will need some changes in miner code 20:40:25 because from this version, mining and verification use different API calls 20:40:42 ah. that sounds a bit annoying 20:41:01 if the API has changed like that, perhaps a larger version bump is warranted 20:41:15 it hasn't changed 20:41:27 there are 2 new API functions that can be used to mine faster 20:41:37 ok 20:42:01 but would be a waste not to use it in monerod 20:42:10 if moo finds a bug in -dev we might do a CLI point release anyway 20:56:37 Honestly, adding these 2 new calls to rx_slow_hash will create a big mess there 20:58:12 no, it should be added to the miner directly 20:58:26 miner.cpp? 20:58:30 yes 20:58:39 if it's there only, it should be fine 20:59:14 it's safer, fewer things can break that way 21:00:13 miner.cpp still calls rx_slow_hash, so it looks like bigger refactoring now. 21:01:34 and rx_slow_hash prepares data for mining threads, I don't see how to untangle it properly 21:03:24 in retrospect, it should have been v1.2.0 tag :P 21:04:12 there is no point to update from 1.1.6 then I think 21:04:26 if we don't use the new API