Untitled

The scientific method:
•	Results based on objective & systematic observations,
•	Purpose: Establish theories based on empirical observations to explain phenomenon
•	Empirical validity
•	Can be disproven empirically
•	Non-normative information
•	Accumulated knowledge
•	Explains why and how a phenomenon occurs
•	Causal explanation: X leads to Y.
•	4 ways to prove causality: Covariance (X change leads to Y change), removal of false relations, establishment of order in time, development of theory
•	Prediction
•	Probabilistic
•	Parsimony:  Simpler is more likely to be correct
•	Occam’s Razor: competing hypotheses equal in other respects, select the one that makes the fewest new assumptions

Structure:
•	Research proposal (how, empiric, scientists)
•	Theory:  Claims that handle information and explain phenomenon, self-evident basic terms, assumptions, definition of the main terms, claims that can be disproven, mid-range vs broad-range theories (specific empirical phenomena and not broad concepts)
•	Research Proposal  (tentative hypothesis linking two phenomena) analyzed through empirical data>research (empirical test to verify it, answers the proposal, includes methodology, collecting data, and analysis of data), adjustments & broadening (rejection/acceptance)
•	Research proposal: empirical claim, general phenomenon, logical, specific (what category of x will affect what range of Y), related to how it will be measured, can be tested, not tautological (measured the same way)
Structure II:
•	Causal Theory > Hypothesis > Empirical Test > Evaluation of Hypothesis > Evaluation of Causal Theory > Scientific Knowledge
Techniques:
•	Deductive: Logical premises and universal generalization. All politicians are X, dude is a politician, dude is X
•	Inductive:  Reasoning from empirical observations to support the theory
Problematic as science:
•	Practical: hard to quantify subjects, political behavior is complex, subjective, abstract, hard to gather data
•	Philosophical: political behavior is subjective, facts are constructed
A theory: symbols with a logical connection that represent our beliefs in what happens in the world, provides a connection between two variables or more, theory>assumptions
Variables:
•	Dependent
•	Independent
•	Antecedent (before X)
•	Interfering (between X and Y, depends on X, explains Y)
•	Nominal definition:  dictionary definition, definition through other terms, positive over negative definition, from literature
•	Operational definition: connects the empirical with the theoretical.  An observable phenomenon represents an abstract concept.
Measurements:
•	Nominal
•	Categorical/Ordinal
•	Interval (0 doesn’t mean absence)
•	Ratio (0 means absence, can measure ratio difference)

Conceptualization:
•	Clear, exact, informative concepts
•	Concept traveling:  does it work if you change the field (different country)
•	Concept stretching: can be stretched excepting loss of meaning

Level of analysis:
•	Individual
•	Groups
•	Institutions
Ecological fallacy:
•	Making  conclusions on a different level of analysis than the one presented by our data, false attributes based on them belonging to another group
Reliability & Precision:
•	Am I measuring my thing correctly?
•	If the difference in repeated testings is small
•	Test-retest (same results?)
•	Alternative parallel forms:  two different FORMS of measurements to cross-reference the answer (quiz then phone quiz)
•	Split halves method: splitting the same questionnaire into two parts


Validity:
•	Am I measuring what I think I am?
•	If the distance between the measurements and the true value are small
•	Exclusivity test: don’t involve other subjects beyond the change of the variable, exhaustion test:  cover everything about it
•	4 tests: face validity (can it be doubted), content validity (does it cover everything), construct validity (does it match up with other measurements), inter-item (does checking the same term give similar results through different measurements)
Theories:
•	Must be refutable (Falsifiability)
•	No self-contradictions
•	Concreteness (no abstract shit)
•	Generalized as much as possible
•	Parsimony
•	Leverage: explain the most with the least variables
•	Avoid endogamy, don’t pick based on the result variable

Causality:
•	More X = more/less Y on average
•	4 complications:  bad theory, covariance, timeline, alternative theories
•	Counterfactual (hospital makes you sick example). Solution: take two exact situations except for X
External/Internal Validity:
•	Internal: did my research prove it
•	External: is it applicable to society at large
•	Internal validity loss: historical (things that happened during the research),  maturing,  specific group loss (experimental mortality), machinery,  testing itself, group selection (purpose/not), pleasers
•	How to fight group selection: randomization, matching,
•	Trade off


Studies:
•	Classic study/experiment, observation (questionnaires)
•	Randomized Controlled Experiment: two groups (with/without), randomized, treatment is controlled, environment is controlled, checks before/after (high internal low external)
•	Post-test design (1 test after), repeated measurement (before/after), multi-group design
•	Natural Experiment: just observe, high external low internal
•	Nonexpertimental/observational: high external low internal
•	Small N designs:  comparative case study. Has a small N. Problem: not enough cases, outliers, no generalization. Need a strategic guess.
•	Cross Sectional:  quantifying X and Y at the same point in time. Too few variables/observations. Hard to put the timeline. More external than observable.
•	Time series: same shit analyzed but in different points of time. Easy to link X to Y cause it’s the same unit being observed. Easy to organize the order of events.
•	Panel: repeated checks over time on the same sample
Selection Bias:
•	Problem: endogamy
•	Randomization: large N not related to the parameters, small N could create selection bias
•	When we do not select correctly randomly and the sample pool is not representative. Results from endogamy and from selecting according to Y. DON’T SELECT ACCORDING TO Y!
Most similar:
•	Similar in the control variables, different I nthe main variable
•	Most different: same Y rest is different. Problem: no change in Y, can’t base causation
•	If doing a large N, take a random representative pool
•	N then take a strategic pick through most similar
Observations:
•	Direct/indirect, participatory/not, overt/covert, structured/unstructured
•	Physical imprints: Erosion measures (analyze the natural remains of something to determine how used it was), accretion measures (collection of remains by men which indicate a specific behavior)
•	Archives
•	Participatory: the researcher participates in the group
•	Covert/overt (aware/unaware of the test)
•	Structured/unstructured (specific behavior is noted vs all behavior is noted)
Descriptive Statistics:
•	Central tendency (mode/median/mean), dispersion (range, interquartile range, mean absolute deviation, variance, standard dev)
•	Variance: to explain the change in our Y. More dispersion around the mean > bigger variance. All equal = zero variance.
•	Standard Dev: Square root of variance. For sample: n-1
•	Frequency Distribution:  a table that indicates the amount of observations for each variable. Can include relative frequency, percentage, and cumulative percentage.
Boxplot:
•	Can see min, max, q1, q3, median, IQR, outliers


Stats:
•	Expected value: average of the statistic from an infinite amount of samples
•	Central Limit Theorem:   When the sample pool is big enough, the sample spread of the mean will be normal, without being dependent on the spread in society. Average will be Myuu, standard dev will be sigma/rootN.
•	Normal spread: mean=mode=median, Central Limit Theorem, 68% within 1 std, 95% within 2, 99.7% within 3
•	Standard Normal Distribution: mean 0, standard dev and variance of 1
•	Z score: calculates the amount of stds one observation is removed from the average
•	Statistical conclusion: either hypothesis testing (confirm/debunk hypothesis through the theory of probability), point and interval estimates:
•	Hypothesis testing:  take the assumptions and make them into a statistical declaration, and check them through probability.  Compare the hypothesis to the sample.
Hypothesis testing:
•	Checking H0 instead of our own assumption
•	Deny H0
•	Identify the statistic that is relevant (Average?)
•	Determine the sampling distribution,
•	Decision Rule:  5% rule
•	Critical Region: the area that is impossible under H0, therefore having a value there means rejecting H0. Critical Values: define rejection areas.
•	Check the observed test statistic. P value too.
Significance level:
•	The probability of committing a type 1 mistake.
•	Confidence interval:! Define as such: if taking infinite confidence intervals, 95% of them would include the parameter.
Cross-tabs:
•	For nominal/ordinal, for ratio + ratio/categorical we use linear regression
•	Marginal probability: the odds of a case having P(X=A) or P(Y=R), Join Probability: odds of a case having P(X=A) and also P(Y=R)
•	Statistical independent if the odds of P(X=A and Y=R) = P(X=A)*P(Y=R)
Chi Square:
•	A test for statistical independence. HO=Independent.  The stronger the dependence the more significant the result.
•	ALWAYS RIGHT SIDE!
Kendall association:
•	Concordant (Gus is always bigger than John), Discordant (Gus>John but also John>Gus), Tied (Gus=John)
•	Gamma: calculates the probability of two being concordant divided by discordant (assuming NOT equal)
•	More concordant is positive, more discordant is negative. Range: -1 to 1. 0 is equal.
•	Tau B punishes through the tied cases
•	Tau C: intended for asymmetry of categories
•	PRE: Proportional Reduction of Error.  For when you have NOMINAL that therefore can’t be ordered. Reduces the error of guessing category B when knowing category A. 0 to 1 (no correlation/max correlation)
•	PRE: Lambda. First, calculate variable A will be the mode (least errors). Then look at B and do the same.
Lamba is for nominal, everything else is Tau B, Tau C, Gamma

Pearson:
•	To check for linear correlation between two variables, covariance, -1 to 1
Regression:
•	Least square principle. Line passes where the sum of distances (vertical) SQUARED is the smallest
•	Rsquare: % of variance in Y explained by X. R: pearson, calculates the spread around the regression.
•	Positive error: the observed Y > expected Y
•	Negative error: observed y < expected Y
Control Variable:
•	Is there a variable Z related to X and Y that can explain the covariance?
•	In experiments: split into two groups (test and control group). Non-expertiment/simulation: statistical methods.
•	Non-interfering, interfering (external false, mediating, conditioning).
•	External false: disproves the X and Y connection
•	Mediating: the connection was true but indirect, better explained through a middle connection
•	Conditional: X and Y are connected causally but only under certain conditions