Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- import pandas as pd
- df = pd.DataFrame({'_id':'Y100','paper_title':'abc','reference':['sdfaqdtsdf','sdfkdsjgkgg','fjafjhafkj']},{'_id':'Y101','paper_title':'efg','reference':['cdfabctzdi','vjedbvjbdjk','efhlghjehg']},{'_id':'Y102','paper_title':'lmn','reference':['zdfabdtssf','boblfbjbsfb','qwhfefqwfob']},........)
- df.set_index(['_id','paper_title'], inplace = True)
- print(df)
- Out[1]:
- _id paper_title reference
- Y100 abc sdfaqdtsdf
- abc sdfklmngkgg
- abc fjafefgfkj
- Y101 efg cdfabdtzdi
- efg vjedbvjbdjk
- efg efhlmnjehg
- Y102 lmn zdfabdtssf
- lmn boblfbjbsfb
- lmn qwhfefqwfob
- Expected results:
- _id paper_title reference this_paper_presented_in
- Y100 abc ['sdfaqdtsdf','sdfklmngkgg','fjafefgfkj'] [Y101,Y102]
- Y101 efg ['cdfabdtzdi','vjedbvjbdjk','efhlmnjehg'] [Y102]
- Y102 lmn ['zdfavdtssf','boblfbjbsfb','qwhfefqwfob'] [None(if this paper_title not present in column reference)]
- SideNote:
- Same paper_title can not be present in it's own reference row i.e (Y100 paper_title can not be present in same reference)
- `this_paper_presented_in` column is the one which will have list of _id's if the `paper_title` values of those _id's are present in `reference` column
- Here is the actual dataframe [https://imgur.com/73t6D26] if someone might want to look the original dataframe.
Add Comment
Please, Sign In to add comment