Optimizations for Cointaxman #108

scientes · 2022-02-26T19:42:38Z

In the last few days some Optimizations came to my mind I'd like to share to not forget them and so that they can be discussed. I'd make each of these a separate issue but would discuss them first if they are worth it:

1. Add support for cached pairs to #14

Currently #14 doesn't check the db if parts of the price paris on the route are already downloaded. To make this the case some rework of the pathing and the db structure is required algo is required
Bogus CSV:

date,Token,operation,amount
1.1.2021 0:00,BTC,BUY,10            #Price written to db: btc/USDT:10000 ;asumed Path: BTC->USDT->EUR or BTC->EUR
1.1.2021 0:00,USDT,SELL,100000      #Path: USDT -> EUR
1.1.2021 0:00,BNB,FEE,0.5           # Assumed Path: BNB ->USDT->EUR

This Screnario would have worst case 5 requests (best case 3 with BTC->EUR and BNB->EUR) to the exchange (1-2s per request due to ratelimit)
These paths are for demonstration only and may be different in reality.
Currently with #14 the price written to db from the csv ist BTC/usdt and this price is ignored by #14 which will probably pick BTC->Eur which will be not as accurate as BTC/USDT (excat price known via csv)*USDT/EUR (low volatility due to being stable-coins)

to fix this we need to:

prioritze Paths with (highest to least weight wise):
- where the price comes from the csv (we would need a new columm which marks those prices)
- where a price pair is already in the db but not from the csv
- where a price pair has low volatility (fiat2fiat)
  -cache all requested pairs in the db (currently we save BTC/EUR for the path BTC->USDT->EUR and not BTC/EUR,BTC/USDT and USDT/EUR)

all in all this would bring us down to 2 requests (USDT-> EUR and BNB->USDT) and would also increase the Accuracy. In some edge cases we need to split our batches up into smaller chunks which could lead to more requests. But due to having higher accuracy this would be acceptable.

Due to this being also a bit complicated i would like to seperate this from #14 PR wise to not delay #14 longer than necessary

2. Option for using the Orderhistory instead of OHLCV

OHLCV is a little bit inaccurate due to being an average over 60s. An idea would be to use the order history which would be more accurate because we can choose almost the exact date when our transaction ocurred. Due to being not supported by some exchanges the ohlcv method should always be the fallback method. Also this would be considerably slower than the ohlcv method and should be optional.

3. Add support for resuming tax calculations from file

We only want to calculate taxes for the current year but if we already have tax reports generated for the previous years we should have an optional setting which would allow us to read in the old generated tax reports and not start from scratch. This would also be nice for the current year so that we can resume the calculation from where we left off.

4. read/write cached path data from/to db

The two current Bottlenecks for price calculation are the ratelimit and the pathfinding through the graph. We cant't speed up the ratelimit but there are two methods to speed up the graph.
Currently the Graph is generated from scratch for every exection.
There are two bottlenecks here:

For each pair in the graph the time when it was active on the exchange is queried via 1w OHLCV (1 request per pair, so about 0.5-2s per pair needed due to ratelimit) candles. This can't really be sped up much because we need to update this information on every startup (,caching this would mean us probably missing if a pair goes inactive on the exchange). This is already cached in memory and done on demand, so this only leads to a warmup period at the beginning of approx 30s-3min. I have no real idea how to speed this up other than caching
Each Path calculation takes about 1s. This is under normal cicrumstances not a problem due to the ratelimit being significantly longer but with my first optimization proposal this could become the new ratelimit. Fix: cache all viable paths per pair to the db. the Paths should only change when a new exchange is added to the pool thus caching these would be an option. This should not save the active period to the db because this can and will change.

5. tax calculation Multicore support

Currently for very big Tax reports (2k++ entries) we have a bottleneck at cpu level at the tax calculation due to being singlecore. A relatively easy fix would be the use a ProcessPool and calculate each coins tax returns in a different Process. Due to the overhead (1-3s) of starting the processes this is only viable for larger Tax returns

Feedback is appreciated

The text was updated successfully, but these errors were encountered:

provinzio · 2022-03-19T12:53:29Z

ad 1 Great idea! I am sorry, that I am currently unable to give #14 the attention it deserves. It is an awesome improvement and reduces my price fetching by more than 50%, good job! The code is good but complex. I am currently working on my tax declaration and might not be able to fully review and check it for this declaration.

ad 2 More accuracy is always good. Disabling this feature might never be a good option as we should strive for the highest accurracy for the real tax declaration. Keep in mind, that "bad" prices (calculated from a long price span) are saved "forever" in the database. Activating this feature in a future run won't fix/update the "bad" prices.

ad 3 I like the idea and this will definitly become in handy in the next years. Our savefile should remember the read-in account statements (with filepath and MD5-sum) to check, which files are already done for and which are left. This method should throw an error when an uncached account statement has a transaction which is prior to the last transaction in the savefile, because we might have to recalculate everything in this case to fullfill FIFO.

The calculation time for me is currently no problem. The most time is spend fetching prices. Therefore I would give this issue a low priority. But feel free to tackle this.

ad 4 good point

ad 5 parallelization might not be viable for all kinds of operation types. e.g. two sell operations can't be easily processed in parallel, as we have to detemine the sold coins of the first sell operations before we can evaluate the second sell operation. And we can not evaluate a sell operation if the buy/airdrop/commission operation in front of the sell operation isn't processed.

Actually, I do not see no good way to parallize the evaluation.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizations for Cointaxman #108

Optimizations for Cointaxman #108

scientes commented Feb 26, 2022 •

edited by provinzio

Loading

provinzio commented Mar 19, 2022

Optimizations for Cointaxman #108

Optimizations for Cointaxman #108

Comments

scientes commented Feb 26, 2022 • edited by provinzio Loading

1. Add support for cached pairs to #14

2. Option for using the Orderhistory instead of OHLCV

3. Add support for resuming tax calculations from file

4. read/write cached path data from/to db

5. tax calculation Multicore support

provinzio commented Mar 19, 2022

scientes commented Feb 26, 2022 •

edited by provinzio

Loading