updating pandas dataframes conditionally
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

i have 2 dataframes: df_Full, and df_Val - they share a common column id_loc

i want to create a new column in df_Full and set each row to the value from df_Val based on matching id_loc (there are multiple times this id will appear in df_Full)

example structure:

i feel more of a lookup functionality to retrieve the values, than a merge per say will provide the end result i am looking for?

awarded to kerncy

Crowdsource coding tasks.

3 Solutions

Winning solution


you can try the following command :

df_Full.merge(df_Val, how='left', left_on='id_loc', right_on='id_loc')

If data id_loc is not found in the right table, it will result in a NaN entry.
You can have more information on this function using this link

edit : it will copy all the column of df_Val to the resulting dataframe

thanks for the reply, as you mention, this seems to result in NaN's, the functionality i am looking for is more a lookup value, than a merge i would say, i will update original post with a diagram to hopefully visualize better my goal - in case you have any suggestions?
weaverk over 2 years ago
i've Seen your sample and it matches my solution. The NaN is only for values that are not in the df_Val. If you don't want to keep thèm, replace the left by inner in second parameter
kerncy over 2 years ago
thank you!
weaverk over 2 years ago

You can simply use a join

as i understand it, .join uses the merge internally so i am not sure if it will arrive at my outcome, i have edited the question to hopefully make it clearer, thanks for suggestion!
weaverk over 2 years ago

The below code works for me.

import pandas as pd
d1 = {'id_loc': [1, 1, 200, 45, 1], 'var1': [1, 2, 3, 4, 5], 'var2': ['a', 'b', 'c', 'd', 'e']}
d2 = {'id_loc': [1, 2, 45, 35, 100, 200, 250], 'var3': ['x', 'v', 'z', 'f', 'j', 'y', 'p']}
df_Full = pd.DataFrame(d1)
df_Var = pd.DataFrame(d2)
df_Full['var3'] = 0
for index, row in df_Full.iterrows():
    temp = (df_Var.loc[df_Var['id_loc'] == row.id_loc]) #row in df_Var matching id_loc from df_Full
    df_Full['var3'].iloc[index] = temp['var3'].values[0]  #get the var3 value
View Timeline