Designing Samples
Designing Samples (excerpts from http://www.mshipke.com/teachers/)
Suppose we want to gather information about a group of people.
- If the group is small we can study each group member directly.
- If, however, the group is very large, studying each member of the group may not be feasible.
As an alternative, we can select a smaller group of people who fairly represent the entire group.
The entire group of individuals that we want information about is called the population.
The part of the population in the study is called the sample.
The method we use to select the sample is called the sample design. The design of the sample is very important. If the design is poor, the sample will not accurately represent the population.
Types of Sample Designs:
Voluntary Response Sample
- People choose themselves to be in the sample by responding to a general appeal
Problem: People with strong opinions (often strong negative opinions) tend to reply, so they are over-represented
Convenience Sample
- Individuals who are easiest to reach are chosen for the sample
Problem: This group may not be diverse enough to accurately represent all students.
Both Voluntary Response Samples and Convenience Samples result in a sample that is not representative of the population. These are biased samples because they favor certain outcomes.
Random selection eliminates bias from sample choice.
Simple Random Sample (SRS)
- Individuals are selected so that all possible combinations of individuals are equally likely to be in the sample
- Example: Generate a list of student ID numbers for all students; then randomly select student ID numbers and choose those students for the sample
Systematic Random Sample
- The first individual is chosen at random; then a system or rule is used to choose all other individuals
Example: Obtain an alphabetized list of all students. Choose every 5th person on the list.
Stratified Random Sample
- Divide the population into groups of similar individuals; choose a SRS in each group to form the full sample
Example: Divide all of the students into four groups: freshmen, sophomores, juniors, and seniors; the choose a SRS from each grade level
Multistage Sample
- Select several groups; within each group, select a subgroup; within each subgroup select individuals for the sample.
Example: Select several departments within the school. Within each of those departments, select several teachers. Choose several students within each class.
Cluster Sample
- Select several groups; within each group, select several subgroups; within each subgroup select ALL individuals for the sample.
Example: Select several departments within the school. Within each of those departments, select several teachers. Choose ALL students in each class.
Although random selection eliminates bias from our choice of sample, it does not guarantee that our sample is representative of the population.
Potential problems include:
Under-coverage:
- some groups are left out of the process of choosing the sample
Non-response:
- An individual chosen for the sample cannot be contacted or refuses to cooperate
Response Bias
- The behavior of the individual or interviewer may influence the accuracy of the response
Wording of Questions
- Confusing or leading questions influence responses; poorly worded questions will not yield accurate responses.
Designing Experiments
If we want to examine a cause and effect relationship, we conduct an experiment
The individuals on which the experiment is done are called experimental units
If the units are people, they are called subjects
The experimental condition we apply to the units is called the treatment
When designing an experiment we want to minimize the effect of lurking variables so that our results are not biased.
Because we may not be able to identify and eliminate all lurking variables, it is essential that we use a control group.
The control group gets a fake treatment to counter the placebo effect and/or any other lurking variables present.
Having a control group allows us to compare the results of the treatments.
Experimental Design
Step 1: Choose treatments
- Identify factors and levels
- Control group
Step 2: Assign the experimental units to the treatments
- Matching (place similar units in each treatment group)
- Randomization (randomly assign units to each treatment group)
Remember, if we want to examine a cause and effect relationship, we conduct an experiment.
If an experiment is well-designed, a strong association in the data does imply causation, since any possible lurking variables are controlled.
Principles of Experimental Design:
- Control the effects of lurking variables by comparing several treatments (include a control group if possible).
- Use randomization to assign subjects/units to treatments.
- Replicate the experiment on many subjects/units to reduce chance variation in the results.
Note: An effect is called statistically significant if it is too great to be caused simply by chance.
Even a well-designed experiment can contain hidden bias, so it is extremely important to handle the subjects/units in exactly the same way.
One way to avoid hidden bias is to conduct a double-blind experiment.
In a double-blind experiment, neither the subjects nor the people who have contact with them know which treatment a subject has received.
Types of Experimental Design:
In a completely randomized design, all subjects are randomly assigned to treatment groups.
In a block design, subjects are first split into groups called blocks. Subjects within each block have some common characteristic (for example: gender, age, education, ethnicity, etc.) Then, within each block, subjects are randomly assigned to treatment groups.
In a matched pairs design, there are only two treatments. In each block, there is either:
- a single subject receiving both treatments, or
- a pair of subjects, each receiving a different treatment
Two Variable Data
In order to see if a relationship exists or if there is a possible cause-effect relationship between two variables from an experiment, the following steps should be used:
- Enter the data into the lists of your calculator or spread sheet.
- If using a TI, turn on the correlation co-efficient (r- value). Go to “catalog, then turn the diagnostic on.
- Make a scatter-plot of the data.
- Find a Least Squares Regression Line. Using your TI, the commands are as follows: stat, calc, linear regression L1, L2, Y1.
- Graph the line and scatter-plot together and interpret the r-value.
- Use the equation and r-value in the context of the problem. Explain the meaning of a and b values and how they relate to the data set.
- Determine if there is a cause-effect or just a correlation or no correlation between the explanatory and response variables.
- If possible, explain how the equation can be used to make predictions.
|