Advertisement
Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- # Holm-Bonferroni (Holm) correction chosen for multiple hypothesis testing vs. the default Holm-sidak (hs) in statsmodels
- # as hs assumes the individual tests performed are independent, which may not apply in this situation. Holm does not make
- # this assumption but offers less statistical power.
- def df_p_value_calc(df):
- """Using p-values provided, determines the alpha level required for a multi-hypothesis test and determines whether
- each p-value meets the necessary threshold and performs the A/B test. ALL test results must by True for statistical
- significance to be established.
- This function uses the Holm-Bonferroni method to address the multiplicity problem.
- Parameters: df is the dataframe containing the publications, article counts, and rates for each category.
- Output: a list of tuples, where the first element of the tuple is the category name and the second is a tuple
- containing: an array of Booleans (True if the alt. hypothesis is accepted, False if not), an array of floats
- (p-values corrected for multiple tests), the corrected alpha under Sidak, and the corrected alpha under Bonferroni.
- """
- result_list = []
- for c in df['category'].unique():
- mask = df['category'] == c
- cat_df = df[mask]
- p_values = cat_p_value_calc(cat_df)
- result_list.append((c, multipletests(p_values, method = 'holm')))
- return result_list
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement