-
jtgrassie
hyc, sech1, tevador, (and the likes of gingeropolous who helped test), thank yous soo much for RandomX. In my view this is groundbreaking work. There's been a lot of nonsense naysayers and you've all shown restraint which is admirable too. Great idea, great code, great delivery, and genuinely nice people. Thank you.
-
cohcho
"Great idea, great code, great delivery, and genuinely nice people." This is why i'm here too.
-
cohcho
I'm still trying to load randomx library from memory buffer into mmap and call randomx_alloc_cache without any errors.
-
hyc
thanks jtgrassie. and yes, sech1 and tevador are first class.
-
cohcho
Finally, randomx_alloc_cache works
-
gingeropolous
thanks cohcho , thought i had
-
gingeropolous
it works now. yay
-
gingeropolous
-
cohcho
The last problem is that large constants are being used by absolute address within executable image (mov $0x5fb760,%rcx) instead of relative address to $pc.
-
sech1
cohcho I think that GCC has an option to compile everything as PIC (position independent code)
-
cohcho
`gcc -static -fPIC ...` produces absolute offsets to rodata but everything else looks ok
-
cohcho
`gcc -static-pie ...` is likely the option I need but it relies on some functions that initialize executable before actual main()
-
cohcho
I'm trying to get what I should call externally in order to initialize executable but not run actual main()
-
cohcho
for that -static-pie
-
sech1
You have to call entry point of the executable, C/C++ run static variable initialization there
-
sech1
then you can call other functions
-
cohcho
I can't compile your two sentences into actual code, too ambiguous.
-
cohcho
:)
-
cohcho
And i think you're wrong because entry point will continue actual main() and to syscall with exit
-
sech1
If you're trying to loading executable binary into memory and do some stuff, you have to follow the normal procedure that OS does
-
sech1
some functions may still work fine if they don't use static variables
-
cohcho
I need fully functional randomx library, some functions already tested with `gcc -static` binary.
-
sech1
if you check superscalar.cpp, you'll see a lot of constants in "MacroOp" class which are statically initialized. If you just load the code into memory, these won't be initialized and dataset initialization won't work.
-
sech1
so you have to either link the library properly or make .so/.dll file and load them properly, calling entry point
-
tevador
sech1: can you PR your optimization to RandomX repo?
-
sech1
yes, I'm trying to remove make my branch even with upstream before applying changes
-
p3rL
.tr imi bag pula in masa de muist
-
tevador
miningpoolstats seems to be broken for monero:
miningpoolstats.stream/monero
-
tevador
they decided that all pools have "invalid block height"
-
sech1
minexmr show Blockchain Height: 1979121
-
sech1
my node shows 1979122
-
sech1
or maybe minexmr just shows last mined block in the network, not current height
-
sech1
-
sech1
yes, minexmr changed to 122 and my node to 123
-
sech1
so it's ok
-
sech1
tevador tried to port my optimization to RandomX repo, but there are already too many differences, hashes don't match. I'll look into it later today.
-
sech1
Maybe it's some stupid one line bugfix somewhere
-
tevador
OK
-
sech1
-
tevador
cool, so this actually reduces the overall overhead
-
tevador
optimizations in the inner loop would increase the overhead
-
sech1
yes, combined loop is ~70% faster than doing hash -> fill separately
-
sech1
and I added prefetch to it to hide L3 latency
-
sech1
because Intel CPUs benefit from turning off HW prefetcher, so I have to add it there explicitly
-
tevador
let's see the speed up on ryzens with disabled prefetch
-
sech1
let's figure out how to disable it first
-
tevador
got 4% speed up on 3700X
-
hyc
disabling pretetcher is a global setting? most other s/w will be worse
-
tevador
sech1: 2% faster on 1700 with disabled prefetch
-
tevador
hyc: yes, global
-
sech1
where do you disable it?
-
tevador
BIOS
-
sech1
wtf, what's it called in BIOS?
-
tevador
something like L1 Stream Prefetcher
-
tevador
all my asrock boards have it
-
sech1
hmm
-
sech1
I don't see any prefetch options on my board
-
SR71-miner
-
tevador
"Avoid software prefetch. Improve Instruction Cache (IC) & Op Cache (OC) hit rate by using
-
tevador
efficient hardware prefetch.
-
tevador
bad advice for RandomX
-
SR71-miner
OK, was an idea ;)
-
kico
if you guys need an alternative block explorer
-
kico
-
sech1
For RandomX the advice would be "avoid hardware prefetch, improve hit rate by using efficient software prefetch"
-
sech1
HW prefetch knows nothing about where the next access would be, but RandomX code knows
-
kico
is this valid for gpu rigs also ?
-
tevador
hardware prefetch is dumb, but works well in most cases because of spare bandwidth and sequential nature of most workloads
-
tevador
I would assume GPUs don't have hardware prefetch
-
sech1
They do actually
-
sech1
Vega GPUs have prefetch from memory to l2 cache
-
tevador
interesting, I guess there is no way to disable it?
-
hyc
I think this temphash should be typedef'd before merging
tevador/RandomX #166#pullrequestreview-324935989
-
hyc
there's no good reason to leave it up to API user to screw this up
-
tevador
hyc: maybe we should change void* output to char output[RANDOMX_HASH_SIZE] also?
github.com/tevador/RandomX/blob/5c0…ba0465708e2087e0/src/randomx.h#L239
-
tevador
or are there use cases when you WANT to pass a pointer?
-
sech1
typedef or just make it a struct containing uint64_t data[8]?
-
hyc
I prefer typedefs, because then you don't have to think about what (foo *) means
-
sech1
I think tevador can handle this refactoring, no need to mix it with this PR
-
hyc
as for the output[SIZE] good question.
-
tevador
yeah, I was originally thinking to do it after merging
-
hyc
ok, fair enough
-
hyc
the only I can think of about output[] is if you wanted to splat the output into the middle of a larger buffer
-
hyc
but in general, when you can take advantage of typechecking, you should.
-
sech1
that's why I prefer C++ more
-
sech1
stricter type checks + templates, but you can still write C-style code
-
tevador
yeah, unfortunately this change spilled into the API, which must support C
-
hyc
heh. my statement "take advantage of typechecking" was only in the context of the limits of C. I think C++ is overboard.
-
gingeropolous
those new optimizations on master?
-
gingeropolous
of xmrig?
-
gingeropolous
saw an improvement with my 1920x, but not the 7402
-
sech1
yes
-
sech1
Crashes on early 1st gen Ryzens have something to do with opcache:
xmrig/xmrig #1348#issuecomment-560122919
-
sech1
that guy managed to run it on his Ryzen without performance drop in the end
-
Svag
yeah that's me, working great now
-
Svag
I also have another 1700 that I dont have mobo for atm but changed it with the one I got working and opcache made it stable too
-
tevador
-
sech1
nice
-
sech1
didn't think about this solution
-
darkaleph
tevador check your DM
-
sech1
-
hyc
is the result credible?
-
hyc
this guy doesn't seem to all that certain
-
sech1
3600 MHz 15-14-14-14-32 RAM can do this
-
sech1
dual rank RAM, even better
-
sech1
560 H/s per thread is a normal number for Ryzen @ 4 GHz, I get the same
-
hyc
ok, so that means DDR4 bandwidth is not a limiting factor then
-
sech1
16.4 KH/s = 16.4 GB/s bandwidth, far from the DDR4 limit
-
sech1
transactions per second is the bottleneck here
-
hyc
sure. bandwidth assumes sequentail access, not random
-
tevador
does someone here have FreeBSD?
-
hyc
on a VM
-
tevador
-
hyc
virtualbox
-
tevador
first solution would be to remove CPU affinity support on FreeBSD
-
hyc
ah I can't access my VM at the moment. it's on the external SSD back home.
-
tevador
unless that struct has a different name there
-
hyc
-
hyc
-
tevador
first I will disable it and then when we have some test env, we can fix it
-
hyc
I always just comment that out, yeah. it's missing on android too
-
tevador
how to detect Android in the preprocessor?
-
hyc
__android__
-
hyc
oh, wait
-
hyc
it's all upper case
-
hyc
__ANDROID__
-
tevador
-
hyc
looks fine
-
hyc
looks like freeBSD is just missing #include <sys/cpuset.h>
-
hyc
-
hyc
so add #include <pthread_np.h> and it should be fine
-
tevador
can be added later, I don't want to do blind changes without being able to test
-
hyc
right
-
hyc
well, I'll be back home in a few days, can followup on it then if nobody else does a freeBSD build before then
-
sech1
Damn, what a busy day - my gmail github folder has 50+ notifications today and counting
-
sech1
plus countless reddit/other forums notifications
-
hyc
I don't get reddit notifications. keeps life simpler ;)
-
tevador
-
tevador
this still needs solving
-
tevador
probably by disabling JIT altogether
-
hyc
or writing your own clear_cache() function in asm
-
hyc
that should only be a couple instructions
-
tevador
the main issue is that iOS will kill the process if it tries to set a page to be executable
-
tevador
AFAIK
-
hyc
ah, he never got far enough to even see if that's true
-
sech1
Technically, just calling a big enough function (64 KB of NOPs) will clear code cache
-
sech1
No need to use fancy instructions
-
tevador
-
tevador
but AFAIK applications with this entitlement will never pass the App store review
-
sech1
Apple... What's even the point of supporting it? Proprietary closed eco-system vs open-source, permisionless crypto
-
tevador
because sheeple use their phones
-
gingeropolous
the market share is pretty low
-
sech1
I don't see any of iphone users mining RandomX, lol
-
sech1
Android is winning market share btw
-
tevador
this was for a mobile wallet
-
sech1
Wallets can get away with VM interpreter
-
darkaleph
ios eco system sucks, the days of releasing on ios first are long gone. android is way dominant now
-
sech1
We totally forgot to celebrate one thing: RandomX on Monero is the largest CPU mining operation in history. Bitcoin was too small during CPU days, other CPU coins were much smaller too.
-
selsta
25% market share in Europe, 50% in US
-
selsta
but yea no one will mine on iOS lol
-
hyc
sadly, iphones still have the fastest mobile CPUs
-
sech1
That's going to change too
-
darkaleph
im surprised the US has such a large % EU needs to pick it up
-
sech1
Cortex A77 is promising
-
hyc
A77 is only really good if you're on 7nm. Apple has been in the lead even without being ahead on process.
-
tevador
2020 Macbooks will use ARM, Apple is ditching x86
-
tevador
haven't actually seen a usable ARM PC yet
-
hyc
yeah haven't really seen it yet
-
hyc
the current ARM SOCs all have next to no on-chip cache, and anemic DRAM support
-
hyc
so should we update monero source tree to use randomX 1.1.7 as well?
-
tevador
probably, but it will need some changes in miner code
-
tevador
because from this version, mining and verification use different API calls
-
hyc
ah. that sounds a bit annoying
-
hyc
if the API has changed like that, perhaps a larger version bump is warranted
-
tevador
it hasn't changed
-
tevador
there are 2 new API functions that can be used to mine faster
-
hyc
ok
-
tevador
but would be a waste not to use it in monerod
-
selsta
if moo finds a bug in -dev we might do a CLI point release anyway
-
sech1
Honestly, adding these 2 new calls to rx_slow_hash will create a big mess there
-
tevador
no, it should be added to the miner directly
-
sech1
miner.cpp?
-
tevador
yes
-
sech1
if it's there only, it should be fine
-
tevador
it's safer, fewer things can break that way
-
sech1
miner.cpp still calls rx_slow_hash, so it looks like bigger refactoring now.
-
sech1
and rx_slow_hash prepares data for mining threads, I don't see how to untangle it properly
-
tevador
in retrospect, it should have been v1.2.0 tag :P
-
tevador
there is no point to update from 1.1.6 then I think
-
tevador
if we don't use the new API