Pandas - Combine two Columns with a comma between them
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

Sample data:
enabledUser,sourcedId,orgSourcedIds,employeeType,role,familyName,givenName,username
TRUE,2420,Admin,"MICE,MIWD",District Teacher SS,Smith,Judy,email.com
TRUE,5224,MICE,,Support LTS,Jones,James,test.com
TRUE,5360,MIWD,,Teacher LTS,Alcorn,Colbye,email.com
TRUE,5361,MICR,MICR,Teacher,McNair,Brandie,emialaddr.com

I have 5000+ rows of data so I am using pandas.
df=pd.read_csv(theabovefile)

What I need is a couple lines added to my script that will merge orgSourcedIds and employeeType, but put a comma between them if there is a value in the employeeType field.

The outcome for the first row, would need to be:
TRUE,2420,"Admin,MICE,MIWD",District Teacher SS,Smith,Judy,email.com

Outcome for the 3rd row, since there is no data in the 3rd row (employeeType) would be no change from what it is:
TRUE,5360,MIWD,,Teacher LTS,Alcorn,Colbye,email.com

awarded to Jana

Crowdsource coding tasks.

1 Solution

Winning solution

Hello mikec513

Here is a method that combine two columns with a separator.

def combine_two_columns(dataframe, column_name_1, column_name_2, separator=","):
    column_names = dataframe.columns.values.tolist()
    list_of_all_columns = []
    choosen_column_1 = []
    choosen_column_2 = []
    choosen_column_1_and_choosen_column_2_merged = []
    raw_data = dict()

    for name in column_names:
        list_of_all_columns.append([name] + dataframe[name].values.tolist())

    for column in list_of_all_columns:
        if (column[0] != column_name_1 and column[0] != column_name_2):
            continue
        elif (column[0] == column_name_1):
            choosen_column_1 = column
        else:
            choosen_column_2 = column

    choosen_column_1_and_choosen_column_2_merged.append(choosen_column_1[0] + "_and_" + choosen_column_2[0])
    for value_from_column_1, value_from_column_2 in zip(choosen_column_1[1:], choosen_column_2[1:]):
        if (value_from_column_1 != None and value_from_column_2 != None):
            choosen_column_1_and_choosen_column_2_merged.append(str(value_from_column_1) + str(separator) + str(value_from_column_2))
        elif (value_from_column_1 != None and value_from_column_2 == None):
            choosen_column_1_and_choosen_column_2_merged.append(str(value_from_column_1) + str(separator))
        elif (value_from_column_1 == None and value_from_column_2 != None):
            choosen_column_1_and_choosen_column_2_merged.append(str(separator) + str(value_from_column_2))
    list_of_all_columns.append(choosen_column_1_and_choosen_column_2_merged)

    for column in list_of_all_columns:
        if (column[0] != column_name_1 and column[0] != column_name_2):
            raw_data[column[0]] = column[1:]

    return pd.DataFrame(raw_data)

The following three lines give the result you want

new_df = combine_two_columns(df, "orgSourcedIds", "employeeType")
column_names = ["enabledUser", "sourcedId", "orgSourcedIds_and_employeeType", "role", "familyName", "givenName", "username"]
new_df = new_df[column_names]

Best,
Jana

Jana, I get this, TypeError: must be str, not float and it references the line: choosencolumn1andchoosencolumn2merged.append(valuefromcolumn1 + separator + valuefromcolumn_2)
mikec513 29 days ago
I made a change to the code. My omission. Try it now and tell me if it's fine.
Jana 28 days ago
Jana, everything worked great, with one exception, it added a nan to every row where it did not have a "employeeType". Example from the 2nd row: "MISG,nan" First row was great! I will see if I can figure this out. Great work on this, you did everything I asked, I just didn't think about that nan value happening, I will figure out how to strip that out.
mikec513 28 days ago