a guest Oct 21st, 2019 68 Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
- Textual Description:
- - Clear and helpful markdown cells to explain what's going on and their reasoning.
- - There could be more comments in the code to better explain technically what's going on, for example:
- `dfa=dfa.set_index([dfa.index,'publication year','paper title','conference']).stack().str.split(',',expand=True).stack().unstack(-2).reset_index(-1,drop=True).reset_index()`.
- Is one single line with only one comment: "#expanding the rows by the author names"
- - Markdown cells are nicely written, clear English, correlate to the code and have nice formatting (bold, mono, titles, ...)
- Code Quality:
- - Some lines are simply WAY too long, executed cell 12 is a good example:
- `df.loc[(df['author names'].str.find('Sheila A. McIlraith')>-1) & (df['conference']=='aaai') & (df['publication year'] == 2018)&(df.index.values!=d1), 'author names'] = df.loc[(df['author names'].str.find('Sheila A. McIlraith')>-1) & (df['conference']=='aaai') & (df['publication year'] == 2018)&(df.index.values!=d1), 'author names'].apply(remove_authorS)`
- this is ONE line without any new lines. Most plots are also written in ONE line. Note: you can add new lines in between chaining function calls to have a more readable code and matplotlib can be written on different lines.
- - Some code duplication (executed cell 11 and 12), modularize as much as possible and re-use precomputed values instead of recomputing them.
- - Appending web-scraped papers to the dataset is done weirdly, one dict per year, then concatenating them one by one without a for loop nor a better call to `from_dict`. This could have been done in one line in a cleaner way
- - Some names are poorly chosen, although it is hard to always find meaningful names, the dataset is suddenly called 'df56'. Why 56?
- - Overall, most cells are well written, easy to read, the lines make sense and chain in a meaningful way
- - Good use of Panda and regexp, instead of redefining slow functions to get the job done in a dirty way they looked into how to make Panda do it efficiently in one line!
- - Impressive cleaning, proactive. For the authors they did some research, finding out one convincing possibility of why some authors have such high number of authors. Then cleaned the dataset accordingly. (Satinder P. Singh and Sheila A. McIlraith aaai papers)
- - Some graphs are scrollable for some reason, which makes them not very practical to look at, as you have to scroll up and down to see the entire thing
- - Graph on 'Number of papers per author' could have been sorted from authors with most papers to authors with least papers, to have a better idea on what the distribution looks like
- - The results are convincing, graphs are clear. Although some axis's names are missing (most of the time y-axis is in the title and sometimes even the x-axis is missing)
RAW Paste Data