15:08:01 Research meeting today at 17:00 UTC (about 2 hours from now) 15:08:13 The usual agenda: https://github.com/monero-project/meta/issues/470 16:53:54 Meeting begins here in just a few minutes 16:59:20 OK, let's get started! 16:59:26 First, GREETINGS 16:59:28 hello 16:59:35 Hi 17:00:37 hi 17:01:02 * sarang waits for others to join 17:01:10 hey 17:02:02 Heyo 17:03:02 Next up, ROUNDTABLE, where anyone is welcome to share research of general interest 17:03:20 I have a few topics of interest 17:03:41 The recent preprint from CMU student researchers on transaction tracing has been updated to reflect suggestions and corrections: https://eprint.iacr.org/2020/593 17:04:01 hello 17:04:32 The researchers still claim that a small but nonzero number of post-changeover (i.e. the RingCT protocol switch) transactions were traceable, which didn't correspond with other numbers I'd found 17:04:43 So I decided to independently run the same analysis and compare 17:04:56 I ran updated numbers that account for all transactions up to the beginning of this week 17:05:38 If you run a full chain-reaction-type analysis, there are 7303 transactions after the changeover containing at least one deducible input 17:05:39 However 17:05:59 All of those transactions spend pre-changeover outputs 17:06:35 So if you filter out all transactions that aren't CT-in-CT-out, there are still precisely 0 deducible transactions/inputs 17:06:50 But wait, there's more! 17:07:10 The preprint also tries to determine how effective the guess-newest age heuristic is against modern transactions 17:07:35 Unfortunately, it uses those 7303 (or however many were in their block range) deducible post-changeover transactions as ground truth 17:07:48 and assumes that holds for all post-changeover transactions 17:08:02 huh that's pretty dumb 17:08:15 I wouldn't say it's dumb; it just failed to account for transaction types 17:08:44 So the key here is RingCT 17:08:54 Because there are "full-CT" transactions post-changeover that are deducible, the entirety of their ground-truth data set is based on spends of old funds, for which the modern selection algorithm does not apply 17:09:10 But there is an interesting twist 17:09:27 To be clear, it concerns transactions were non-RingCT outputs are essentially converted to RingCT outputs, right? 17:09:34 Among that ground-truth data set, the researchers find those transactions are _still_ twice as good against guess-newest from Miller et al.! 17:09:44 dEBRUYNE: yes, or that aren't converted to CT at all 17:09:52 > Because there are "full-CT" transactions post-changeover that are deducible, 17:09:52 (very limited cases involving single inputs) 17:09:55 Forgot a no, right ? 17:10:04 Correct, thank you 17:10:12 because there are NO full-CT post-changeover that are deducible (typo) 17:10:26 sarang: Thanks, I guess those are dust outputs then that first have to be converted to standardized outputs 17:10:29 dust or non-mixable 17:10:48 So the conclusions presented in the paper about transaction counts aren't wrong, but don't differentiate between type, which I think is very important 17:11:08 The conclusions about guess-newest are only valid for their ground-truth set, and cannot be extrapolated to CT transactions 17:11:17 The wallet warns you when you spend old outputs that privacy is lessen and it's known you should churn in that case. It should absolutely be mentioned in their paper that these transactions are particular and users are fully informed of the risk 17:11:31 I'm drafting an email to the authors to let them know of this, should they wish to revise again 17:11:50 They were very prompt in responding to my earlier email, and very quickly revised, which was great 17:12:32 Now that I have a complete deduced data set, I'm checking their results on effective anonymity sets of non-deduced transactions 17:12:45 I want to separate those by transaction type as well 17:13:04 Even though the preprint had errors, I'm glad they did the research 17:13:16 We should encourage student researchers 17:13:21 I'm unclear on the guess-newhest on this dataset 17:13:31 How so? 17:14:04 Since this dataset is specifically about real output being old, shouldn't that give very specific results when seeing how the heuristic performs, that can't be extended to other transactions? 17:14:27 Yes 17:14:34 That's what I was saying earlier 17:15:09 They assumed their ground-truth post-changeover dataset was representative of all post-changeover transactions, which is entirely false 17:15:26 pre-ringCT inputs could be used in rings just like modern coinbase outputs are, which makes me sad 17:15:31 which I think is very important <= Yes, because uninformed readers may infer that it concerns full RingCT transactions 17:16:40 IIRC the decoy selection algorithm is very different for these transactions, only selecting non-ringct. Since all of them are very old, they are old on the tail of the gamma distribution, very different from recent one 17:16:41 Using pre-ringct outs as fake outs would help shortly after ringct, but would otherwise introduce a number of known spent outs in rings. 17:16:54 *they are all 17:17:12 When I finish the effective anonymity data, it should give a much more clear picture of the status of modern transactions (before and after ring increases too) 17:17:34 The code still needs some cleanup to make it easier for others to run this analysis, either now or in the future 17:17:40 interesting point moneromooo 17:17:51 thanks to gingeropolous for the use of a fast machine for this analysis 17:18:46 If the researchers choose not to revise, I can always write a new preprint that presents this data 17:19:11 but I strongly suspect the researchers will revise again, since they did a very prompt first revision 17:19:29 super cool that you checked all these results 17:19:31 sounds great 17:19:38 I can also post the raw data for the post-changeover deducible transactions, in case anyone wants to specifically analyze them 17:19:47 super cool that their work is pulblicly reproducible 17:19:53 Again, I'm really glad they did the analysis 17:20:00 Yeah, I didn't end up using their Java code though 17:20:07 I wanted some extra data they didn't provide, so I rewrote 17:20:16 but kudos to them for making all their code public, for sure 17:20:40 I'd like to see if the code can be adapted to a certain analysis I have in mind, so I'm looking forward to it 17:20:41 Not really. More like negative kudos for those that don't. 17:20:46 What analysis is that UkoeHB_? 17:21:15 moneromooo: yes but that still tends to be the majority these days 17:21:49 Papers without it are really hearsay. Should never be published. 17:21:52 getting data on ring loops, and comparing to a purely randomly generated ring db 17:22:10 Define ring loop 17:22:13 moneromooo: agreed 17:22:31 A ring is a circular construct. A loop is.... a....... 17:22:55 e.g. two outputs are owned by the same person, a loop is when their descendents intersect in the same tx 17:23:07 Oh, output merging 17:23:14 ? 17:23:33 Can happen by chance too (from fake outs). 17:23:37 yes 17:23:55 yeah basically, so I want to see the probability of given loop sizes happening randomly 17:23:59 @UkoeHB_ important research. Results will be probably be depressing :- ( 17:23:59 I think the question is how likely it is to occur in practice vs not 17:24:03 If it was super fast, it'd be nice for the wallet to try and pick fake outs that generate "false positives". 17:24:24 There's code that will do forms of merge analysis, and it's something I have to add specifically to my check code 17:24:48 The graphs involved are likely quite large, so it isn't clear what the complexity of this is 17:24:49 If it was super fast, it'd be nice for the wallet to try and pick fake outs that generate "false positives" < 👍 17:25:27 FWIW the early analysis on output merging used deducible transactions only 17:25:41 right, I have some ideas about limiting the output range to first estimate exactly how long such analysis would take; can also limit the maximum loop size considered 17:25:47 UkoeHB_: to generate random data set, will have to select guesstimates for parameters like number of transactions per wallet. (really, would be a distribution, not single value) 17:25:48 yes 17:26:26 The interesting thing is that we may be able to establish statistical estimates of these parameterss based on the real blockchain data 17:26:39 Isthmus we can just use the transactions that already exist, but make the input rings randomly selected; this provides a direct comparison with the real ring db 17:26:43 Especially if rare "natural" occurrence, i.e. low false positive rate 17:27:11 This sort of analysis was considered as a major part of the churn framework as well 17:27:16 and do the randomly generated analysis multiple times for variance 17:28:10 Makes sense 17:28:52 Anyway, I've taken up a lot of time on that 17:29:04 Was there other research of interest to discuss from anyone? 17:29:05 I'll be interested in seeing plots where x-axis is # of txns made within a given wallet, and y-axis is statistical measures, like precision/accuracy/etc 17:30:16 I have a few quick updates 17:30:19 I’ve been doing some p2p network scalability research, creating some testing suites, etc. Still very early reading/planning, but hopefully will have some actionable insights for Monero. 17:30:20 yes it's a big project and might be worthy of a paper if it goes somewhere, we will see; I also want to see how well the gamma distribution is working by subtracting the theoretical distribution from what we have in reality 17:30:35 "see how well the gamma distribution is working by subtracting the theoretical distribution from what we have in reality" < YES PLEASE 17:31:26 @UkoeHB_ I have some algorithms floating around GitHub to identify and filter txns that use uniform decoy selection instead of correct algo. If you don't strip those out, it will introduce a bias in your results towards older oututs 17:31:54 I'll dig those up and send links 17:31:58 cool thanks 17:32:03 Yeah, trying to exclude old software will be important 17:32:12 since there's no consensus enforcement 17:34:07 I think ring analysis is too scary for anyone to tackle alone, so a collaborative and incremental effort seems reliable 17:34:14 Software can't be any older than the last ring size change, right? 17:34:29 Sorry, I meant software using old/incorrect methods 17:34:35 * Isthmus nods 17:34:39 "nonstandard" is a better term 17:37:49 A big reason why deterministic input sets are intriguing is because they're likely to contain many outputs from the same transactions 17:38:05 and therefore are included as a "standard feature" of all rings 17:38:08 "see how well the gamma distribution is working by subtracting the theoretical distribution from what we have in reality" I was doing stuff in that direction too, exciting! 17:40:08 I think I missed the last meeting. The Janus proposal was updated a week ago (https://github.com/monero-project/monero/issues/6456), and now the Janus mitigation is to encode the tx private key for recipients. For 2-out tx where there will only be 1 tx pub key, the 'change output' would use a 'hidden tx pub key' derived from the non-change recipient's encoded tx private key. An alternative would be for the 17:40:08 change output to use a unique 'derivation to scalar', however I am concerned that affects too much protocol-level code (could be wrong). 17:43:40 What happens where there is a 2-out tx but none of them is change? 17:44:01 I think you have to make a 3 17:44:08 3-output then? 17:44:45 The proposal is to enforce 1 tx pub key for 2-outs, and 1 key per output for >2-outs. All tx with no change output would have to be >2-out, even if it means adding a dummy output. 17:45:03 A 0 change is automatically added *only* if there's one output otherwise. 17:45:32 Right, and following the proposal there would be a very rare case of 2 non-change outs needing a dummy 17:45:58 It's an edge case, so I perceive this as a reasonable solution 17:46:18 Originally encoding the tx private key was disregarded since current tx share tx pub keys, but since we'd start enforcing more tx pub keys that problem is solved. 17:46:36 i.e. as a solution for Janus* 17:48:51 Well, the hidden tx pub key might be unnecessary now that I think about it.. anyway that's my dusty update. 17:49:25 but does this means that a 2 out tx always has change real or dummy 17:49:39 yes 17:50:00 Can this then be attacked? 17:50:32 the idea that there is always a change output in 2-out tx? that assumption can be made today already 17:51:14 * Isthmus quietly mumbles that if we had a 3-output minimum this wouldn't be an issue 17:51:24 Oh no wait, it would just move the issue to n+1 17:51:25 Sorry, connection problems; back now 17:51:40 but not with 100% certainty 17:52:20 well the shenanigans around 2-out tx are mostly to optimize scanning and tx sizes, since 2-out tx are ~95% of tx 17:53:41 that's true ArticMine 17:56:17 one can assume with high certainty that every transaction contains a change output. I don't understand why that's a significant observation. 17:56:53 Most txns. Churn doesn't, for example. 17:57:09 ^ 17:57:18 Why does it not ? 17:57:36 Doesn't have to 17:57:46 And would affect output merging later 17:57:48 Well, that's circular reasoning then. 17:57:50 Churning with change creates the loops UkoeHB_ mentioned earlier 17:58:08 churn has two change outputs, no? 17:58:23 It can 17:58:26 or one and a fake one 17:58:46 churn has an 'output to yourself' and a 'change output'; change outputs are handled differently in the code 17:59:18 sure but this is from a network perspective 17:59:36 yeah 17:59:40 Or a split 2 separate wallets under the control of one person 18:00:23 It introduces uncertainty 18:00:43 but why does it matter? 18:01:56 because even a small bias can grow. 18:03:26 Given the time, is there other research that needs to be brought up before adjourning? 18:03:47 I've got 2 updates, will keep brief for sake of time: 18:03:52 Our CCS for researching Monero’s post-quantum security is sooooooo close. Only 7% left, less than 40 XMR needed. https://ccs.getmonero.org/proposals/research-post-quantum-monero.html 18:03:55 If that could get topped off today, we’ll dive in immediately and have our first update at next week’s MRL meeting. :- ) 18:03:59 Unrelated - I also have one of Insight’s DistSys engineers buildling “speedup” networks, i.e. highly-connected peers with high bandwidth to propagate blocks/txns through the ad hoc network faster than organic propagation. Main goal is to eliminate the long tail in block propagation times. 18:04:02 The codebase is very modular with Terraform/ansible deployment and control scripts, so could be configured to spin up a Monero speedup network in the future. 18:04:14 That's all from me. 18:04:31 Nice! 18:05:02 I suppose I should mention that I welcome/request comments/questions/emoji on my funding proposal as well, so a decision can be made whether to open it: https://repo.getmonero.org/monero-project/ccs-proposals/-/merge_requests/148 18:06:21 thanks for opening it well in advance 18:06:32 OK, I suppose we can formally adjourn for the sake of logging 18:06:38 Thanks to everyone for joining in today 18:06:43 Discussion can of course continue! 18:10:48 thanks all 18:10:59 thanks 18:11:14 Ciao 18:27:20 Forgot to ask in the meeting, but has there been an update regarding the CLSAG audit? 18:32:12 sgp_ et al. are drafting the CCS while finalizing the terms for the SoW etc. with Teserakt and OSTIF 18:32:28 Meanwhile, Teserakt chose to begin their review anyway, with the clear understanding that funding would not necessarily be guaranteed 18:32:30 at this point we are waiting on Core 18:32:55 I'm waiting on them to confirm what buffer they feel is appropriate 18:35:09 What buffer was proposed? If I may ask 19:00:06 fyi UkoeHB_ the current iteration of my code does not include output data, since this isn't needed/used for the analysis I was running, and bloats the data set 19:00:13 but it's trivial to add that to the pull scripts 19:38:34 dEBRUYNE: currently 10%, but that's TBD 19:38:46 and that's set by Core 20:01:44 UkoeHB_: I'm going to update some analysis data formats anyway, so I'll pull in output keys if that's helpful (as well as some version data) 20:02:16 I'll toss it in a sqlite db or something, so it's portable (in theory!) and easy to run 20:48:49 sgp_: Ok, thanks 23:30:21 Awesome thanks sarang 23:58:56 no prob