Advertisement
elena1234

violin and scatter plot with stratification, frequency table and FacetGrid

May 12th, 2022 (edited)
748
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
Python 3.39 KB | None | 0 0
  1. import numpy as np
  2. import pandas as pd
  3. import matplotlib.pyplot as plt
  4. import seaborn as sns
  5. from scipy import stats
  6.  
  7. plt.style.available
  8. plt.style.use("seaborn) # how to choose the style
  9.  
  10. #############################################################
  11. titanic.plot(kind = 'scatter', figsize = (15,8), x = 'age', y = 'fare', c = 'survived',marker = 'x', s = 20, colormap= 'viridis')
  12. plt.show()
  13.  
  14. #################################
  15. cars.plot(kind = 'scatter', x = 'horsepower', y = 'mpg', figsize = (12,8),c ='cylinders', marker = 'x',colormap ='viridis')
  16. plt.title('Horsepower vs MPG', fontsize = 18)
  17. plt.xlabel("horsepower", fontsize = 15)
  18. plt.ylabel("mpg", fontsize = 15)
  19. plt.show()
  20. plt.scatter(df["Length"], df["Height"], marker = "D")
  21. plt.title("Relationship between Length and Height")
  22. plt.show()
  23.  
  24. #################################
  25. da = pd.read_csv(
  26.    "C:/Users/eli/Desktop/YtPruboBEemdqA7UJJ_tgg_63e179e3722f4ef783f58ff6e395feb7_nhanes_2015_2016.csv")
  27.  
  28. '''
  29. Question 1
  30. Make a scatterplot showing the relationship between the first and second measurements of diastolic blood pressure (BPXDI1 and BPXDI2).
  31. Also obtain the 4x4 matrix of correlation coefficients among the first two systolic and the first two diastolic blood pressure measures.
  32. '''
  33. sns.scatterplot(data=da, x="BPXDI1", y="BPXDI2",  alpha=0.3)
  34. # Most of the data is concentrated between 40 and 100 BPXDI1 and between 40 and 100 BPXDI2
  35.  
  36. df = da.loc[:1, ["BPXDI1", "BPXDI2"]]
  37. df.corr()
  38. '''      BPXDI1  BPXDI2
  39. BPXDI1     1.0     1.0
  40. BPXDI2     1.0     1.0 '''
  41.  
  42.  
  43. '''
  44. Question 2
  45. Construct a grid of scatterplots between the first systolic and the first diastolic blood pressure measurement.
  46. Stratify the plots by gender (rows) and by race/ethnicity groups (columns).
  47. '''
  48. da["RIAGENDRx"] = da.RIAGENDR.replace({1: "Male", 2: "Female"})
  49. sns.FacetGrid(da, row="RIAGENDR",  col="RIDRETH1").map(
  50.    plt.scatter, "BPXDI1", "BPXDI2", alpha=0.4).add_legend()
  51.  
  52.  
  53. '''
  54. Question 3
  55.  
  56. Use "violin plots" to compare the distributions of ages within groups defined by gender and educational attainment.
  57. '''
  58. sns.FacetGrid(da, row="RIAGENDR", col="DMDEDUC2").map(
  59.    sns.violinplot, "RIDAGEYR", alpha=0.4).add_legend()
  60.  
  61.  
  62. '''
  63. Question 4
  64.  
  65. Use violin plots to compare the distributions of BMI within a series of 10-year age bands. Also stratify these plots by gender.
  66. '''
  67. da["agegroup"] = pd.cut(da.RIDAGEYR, [10, 20, 30, 40, 50, 60, 70, 80, 90, 100])
  68.  
  69. sns.FacetGrid(da, row="RIAGENDR", col="agegroup").map(
  70.    sns.violinplot, "BMXBMI", alpha=0.4).add_legend()
  71.  
  72.  
  73. '''
  74. Question 5
  75.  
  76. Construct a frequency table for the joint distribution of ethnicity groups (RIDRETH1) and health-insurance status (HIQ210).
  77. Normalize the results so that the values within each ethnic group are proportions that sum to 1.
  78. '''
  79. x = pd.crosstab(da.RIDRETH1, da.HIQ210)
  80. x.apply(lambda z: z/z.sum(), axis=1)
  81.  
  82. #########################################################
  83.               #  FacetGrid
  84. ''' Create a histogram of the ages grouped by cholesterol levels.
  85. The plots show that the most people are with normal cholesterol levels. '''
  86. df['cholesterol'] = df['cholesterol'].replace({1: "normal", 2: "above normal",
  87.                                                                 3: "well above normal"})
  88. df_age_chol = df[['age', 'cholesterol']]
  89. g = sns.FacetGrid(df_age_chol, row = 'cholesterol', height = 5, aspect = 3)
  90. g = g.map(plt.hist, 'age')
  91. plt.show()
  92.  
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement