Advertisement
Guest User

Untitled

a guest
Sep 22nd, 2019
101
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 1.41 KB | None | 0 0
  1. # Holm-Bonferroni (Holm) correction chosen for multiple hypothesis testing vs. the default Holm-sidak (hs) in statsmodels
  2. # as hs assumes the individual tests performed are independent, which may not apply in this situation. Holm does not make
  3. # this assumption but offers less statistical power.
  4.  
  5. def df_p_value_calc(df):
  6. """Using p-values provided, determines the alpha level required for a multi-hypothesis test and determines whether
  7. each p-value meets the necessary threshold and performs the A/B test. ALL test results must by True for statistical
  8. significance to be established.
  9.  
  10. This function uses the Holm-Bonferroni method to address the multiplicity problem.
  11.  
  12. Parameters: df is the dataframe containing the publications, article counts, and rates for each category.
  13.  
  14. Output: a list of tuples, where the first element of the tuple is the category name and the second is a tuple
  15. containing: an array of Booleans (True if the alt. hypothesis is accepted, False if not), an array of floats
  16. (p-values corrected for multiple tests), the corrected alpha under Sidak, and the corrected alpha under Bonferroni.
  17. """
  18.  
  19. result_list = []
  20.  
  21. for c in df['category'].unique():
  22. mask = df['category'] == c
  23. cat_df = df[mask]
  24. p_values = cat_p_value_calc(cat_df)
  25. result_list.append((c, multipletests(p_values, method = 'holm')))
  26.  
  27. return result_list
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement