Advertisement
Guest User

Untitled

a guest
Sep 21st, 2019
89
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 0.94 KB | None | 0 0
  1. def explode_hstore(df, column):
  2. """Explode a column of a dataframe containing PostgreSQL hstore k-v pairs in the format of:
  3.  
  4. "foo"=>"bar", "baz"=>"quux", ...
  5.  
  6. Every key becomes a column. If a given row's pairs doesn't have a key, then the resulting column's value
  7. will contain NaN.
  8. """
  9. # split the tags column out as a new series, and break up each k=>v pair
  10. s = df[column].str.split(', ').apply(pd.Series, 1).stack()
  11.  
  12. # extract the keys and values into columns
  13. extracted = s.str.extract(r'"(?P<key>[\w-]+)"=>"(?P<val>[\w-]+)"', re.IGNORECASE)
  14.  
  15. # toss the unnecessary multi index that is generated in the process
  16. extracted.index = extracted.index.droplevel(1)
  17.  
  18. # pivot the table make the rows in keys become columns
  19. pivoted = extracted.pivot_table(values='val', index=extracted.index, columns='key', aggfunc='first')
  20.  
  21. # join with the original table and return
  22. return df.join(pivoted)
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement