Excel formula or macro help
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I have an excel sheet with roughly 200K rows on tab, 3 columns

id,CarMake,CarColor

on tab 2 I have just
ID and two empty columns

our job with the excel is to lookup the ID from tab 2 against tab 1 and aggregate the values. On tab 1 a ID can exist more than one time so we have to match and aggregate. Herein is where I have an issue today. the excel formula I am using now works but it renders the excel basically unusable in terms of performance.

=TEXTJOIN(",",TRUE,IF(master!A:A=A2,master!B:B,""))

I started looking down the path of vlookup and INDEX which are probably better but they are a bit beyond my excel hacking. I was using this for INDEX but it dies if it finds more than 1 match

=INDEX(master!B$1:B$200000,MATCH(A2,master!A$1:A$200000,0))

I also keep getting the sense that if tab1 is pre sorted you might be able to do something like INDEX and OFFSET so you arent doing a full table scan. if we make an assumption that tab 1 has dup IDs to collect but put some upper limit of say 10 on it then you would offset to the first match, scan the next 10 rows and be done. Not sure but its an idea and I dont know if excel can do this.

The other issue as you will notice in my sample is that if it hits an empty value on tab 1 it adds a zero on tab 2. In a perfect world this would dedup lookup values and remove the zeros.

Open to ideas for either a better formula to tackle all of this or if needed a macro.

here is the sheet to work with

https://www.dropbox.com/s/tllh9eehzss1ihr/bounty%20example.xlsx?dl=0

Tags
excel

Crowdsource coding tasks.

1 Solution


200K+ rows are really a lot of rows, I don't think this kind of calculation can be done in realtime in an efficient way.

So I realized a VBA script with two buttons:

1) Load all data from master sheet into a BTree for fast lookup search

2) Lookup and start matching all data loaded in the tree with the values in lookup sheet, it also fills all rows with the desidered values.

Check the solution there: https://1drv.ms/u/s!Av6a1kvwLlQrkMQYK3y6sszi5fM0Gw?e=C2sFYJ

With my tests this procedure can take up to 5 minutes, during process time it freezes excel, after that I prompt a message box and all became responsive again.

If you like this solution I think I can optimize the alghorithm, but since VBA doesn't support multithreading, don't expect big enhancements.

If you want I can prevent UI freeze calling DoEvents routine, but all process will be slower.

whoa - seems to be working. let me check it out a bit but this is great
Qdev 1 month ago
I fixed a little bug, where if you don't clear the Values2 column before starting matching it duplicates values. Now I am trying to see if I can improve performance.
Stefano Balzarotti 1 month ago
New update, check "bounty examplevV2.xlsm", I used buffers to fills cells in excel, this will make Start matching much more faster. On my pc I can process all in less than 1 min. I wait for your feedback :)
Stefano Balzarotti 1 month ago
View Timeline