Create Sankey Diagram from CSV data (3 winners)
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Would like to try and get a small javascript app that will let me give it a CSV file and it will generate a sankey diagram from it (ideally with d3js but open to others). For context we have a card game where we record what each player sorts the cards in to. We call these piles, they can also be considered categories. But we need a sankey to show of the players how did they generally sort things in to the piles. Each player for each game they play has a unique identifier so you will notice in the data we have IDs. That ID will be what we use to know what was played in a single game.

Now for the sankey part. we have rows and columns. Each row will be a specific card. so the more cards we have in our study the more rows we will have in our excel. The columns will be the number of piles we ask people to sort to.

For a real word use case imagine a deck of cards with flavors of ice cream. We ask you to store in to 3 piles your favorite flavor, and then at the end we ask you to sort flavors you hate, and the last pile is flavors you dont mind. so in this case we have 31 flavors and 5 piles. the first 3 piles will have just a single card in them but the last 2 might have more than 1 (likely will). but with this example you can see easily that we might have more or less flavors and more or less piles we ask you to sort to. Our job now with the data is to make a meaningful way to see what people tend to sort (cohorts of flavors if you will).

With d3 js we have this sort of example https://bl.ocks.org/d3noob/5028304 or this link is better https://bl.ocks.org/d3noob/raw/5028304/?raw=true

In our version of this we will need a way to show the viewer what each column interval means so they know if column 2 is favorites or most hated etc..

Now in the way we see this working. In column 1 we will indicate how many people sorted what flavor there. (we dont need to show a node if a card was not used), in column 2 we show the outcomes of the players/games in to what they sorted to next from the previous nodes... and so on until we have mapped all of the games and all of the piles in the csv.

As for the data I have a sample data set posted here on google sheets. https://docs.google.com/spreadsheets/d/1ilRxBTXsVMwNS3vLQIHb47yqtJ1mWO13zUU5NhKd6CY/edit?usp=sharing I have prepared the data in to 2 formats you can choose which will yield the best outcome for what we are trying to do. one format is pivioted on flavors and the other is pivoted on specific games. I dont think I care either one you use. You will see by the test data that its diversity and inclusion data and not ice cream but I dont think anything above would change based on this?

Ideally your submission is posted as a repl so we can easily verify it and have it running long term. I only really need this working in chrome. I dont care if the browser takes a little bit of time to crunch through the data set. Obviously the better perf and ux here will win but if your solution requires the browser to hang for 10 sec we dont particularly care. One thing that would be good to post in your final solution is how well this scales. for example if i have 30 flavors and 5 piles but I have 1K games vs 100K games. how well will this work in a browser.

Since I think im asking for some magic here I will up the prize to award both the winner and second/third place. Winner = 100 and 2nd place = 50, 3rd place = 25

(Note: if did not sort causes conceptual issues with the whole thing, we can ignore it)

There are several existing D3-based solutions to build Sankey diagrams. Is it ok to choose the best of them and make a solution based on the chosen one?
TheOsch 30 days ago
yes totally!
Qdev 30 days ago
Hello, would it be possible to provide quick example how would you like the diagram to look like? Just few nodes and links between them so it's easier to understand what are you trying to achieve. Thank you!
radosinsky 28 days ago
Hi, its either i dont know what am doing or your data set is invalid. It keeps bringing circles errors (using Google Sankey Charts) whenever i load the excel.
evancejaye 25 days ago
could be the data I suppose but the query from our side is pretty straightforward and I dont immediately see an issue visually. Let me know if you find a problem
Qdev 25 days ago
Okay, here is the image of the first two rows on the excel, pivoted on game ID that i had generated with my crude concept of the proposed solution. https://we.tl/t-WVkNdlCDyR . Am i even near the solution? I would love for us to close this challenge even if the time runs out.
evancejaye 25 days ago
awarded to TheOsch
Tags
javascript

Crowdsource coding tasks.

2 Solutions


Hi, i am sure this is nowhere near what you expect. But i have to upload it either way. If you can provide even more detailed explanation, am sure we can make this work. Please unzip the contents and upload the first two rows of the excel (Pivot is GameID ) to see the progress made so far.

https://we.tl/t-RodlK9iYPi

Regards!

The xls files i have used are also attached in the zip file.
evancejaye 25 days ago
hey this is great. I will have some detailed comments here so I will actually just link to a google doc for it in a few minutes so I can annotate screenshots etc..
Qdev 25 days ago
hey one note, dont be afraid to make this really tall. even if we remove the left most nodes with the gameIDs I have a feeling this visualization will be really tall to make room for all of the nodes in each column
Qdev 25 days ago
I have understood the key feedback point. Let me work on it and update you accordingly.
evancejaye 24 days ago
hey there, any updates on this? thx
Qdev 18 days ago
hey there, any updates?
Qdev 16 days ago
Hi, sorry i stopped working on this.
evancejaye 15 days ago
Winning solution

Maybe I didn't understand what has to be done... Can you please look at https://replit.com/@TheOsch/sankey ; does it resemble what you need or not? (It works with CSVs made of the sheet you provided, with any of the tabs)

Can you check the feedback I sent to the other solution? I think it would give some insight to where we were headed
Qdev 18 days ago
That's what... I see. Ok, I'll tell you when it will be done (within 24h). One question: the first column in the chart at https://docs.google.com/document/d/192WW05UN8b4cX2Ld1dykKWXSjB14gGl5JD3KKZKWuWQ/edit is game ID; but you've mentioned that there can be hundreds of thousands of games in the source sheet. I wonder how can the chart with 100K items in the first column look like. Is the column with game ID really needed, and does it have to contain all the games?
TheOsch 18 days ago
first column would not show imo... we dont see any value in listing all of the game IDs there. its more like what were the first sets of answers.
Qdev 17 days ago
hey there, any updates?
Qdev 16 days ago
Yes, it is. Can you please look at https://replit.com/@TheOsch/sankey again? I've just updated it.
TheOsch 15 days ago
Is something wrong? Please tell me, I'll fix it - I want to complete this task anyway.
TheOsch 13 days ago
Dude this is cool!
Qdev 7 days ago
Thank you
TheOsch 6 days ago
View Timeline