Select Random Value from Pandas list column for each row ensuring that value don't get picked again
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

I have a panda DataFrame below


import pandas as pd
data = {
'poc':["a", "b", "c", "d"],
'school':["school1", "school2", "school3", "school4"],
'volunteers':[["sam", "mat", "ali", "mike", "guy", "john"],
["sam", "mat", "ali", "mike"], ["rose", "sam", "mike", "jorge"],
["susan", "jack", "alex", "mat", "mike"]]
}
df = pd.DataFrame.from_dict(data) ​

I need to create a new column that has a random pick from the volunteers column to select 1 volunteer for each school ensuring that the same volunteer doesn't get picked twice

so far I have tried


import random
df["random_match"] = [random.choice(x) for x in df["volunteers"]]

but this just gives me a random pick without ensuring it is not repeated.

13 days ago
Tags
python
pandas

Crowdsource coding tasks.

2 Solutions


Hi

It's really not an optimized python solution, but this should do the work :

import pandas as pd
data = {
    'poc':["a", "b", "c", "d"],
    'school':["school1", "school2", "school3", "school4"],
    'volunteers':[
        ["sam", "mat", "ali", "mike", "guy", "john"],
        ["sam", "mat", "ali", "mike"], 
        ["rose", "sam", "mike", "jorge"],
        ["susan", "jack", "alex", "mat", "mike"]
    ]
}
df = pd.DataFrame().from_dict(data)

import random

random_match = []
for x in df["volunteers"]:
    v = random.choice(x)
    while v in random_match:
        v=random.choice(x)

    random_match.append(v)


df["random_match"] = random_match

print(df)

Best regards


Hello! I do not know any greater specific details about the given project, as such, I will have to make some inferences. Since you have given only first names in your data set, yet have multiple first names in multiple lists, and based upon the wording of the problem stated, I am going to infer that the students with the same first names are referring to the same student (although, since you have four lists, and four different "schools" in your data structure, I could be wrong, in which case, I have another solution that would work.) Assuming that each student whose name appears in more than one list is referencing the same student, and knowing that you don't want any student to be selected more than once, I designed the following solution. Please, let me know what you think!

import random
def random_choice_no_repeat(dat: list, out:list) :
    match = random.choice(dat)
    if match not in out:
        out.append(match)
    else:
        random_choice_no_repeat(dat, out)

volunteers = list()
for x in df["volunteers"] :
    random_choice_no_repeat(x, volunteers)
df["random_match"] = volunteers

And then of course you can see the results by doing

print(df["random_match"])

And you can obviously verify that this works by running it over and over and seeing if there are any repeats. I ran it quite a few times and did not see a single repeat!

View Timeline