Monday, May 7, 2012

Moo Cows


Moo Cows

            One day I was sitting in a statistics class being pressured to figure out a project to analyze. Then cows came to mind. We discarded this idea, and then thought about doing something involving local pudding plants. After the realization that we don’t have many pudding plants, we came back to the cow idea. We planned our question to state, “Are the number of black spots on baby calves normally distributed?”          
We chose to do a population census to show the true number of spots a calf has. With forty-one calves on May 7, 2012 at the Perazzo Brothers’ Dairy we counted the number of black spots (that were at least three centimeters in diameter) on the calf in every pen (individual pens have individual calves).
            Our rough data looked like this:
Pen Number
Calf ID Number
Number of Spots
1
2091
4
2
2090
2
3
2089
1
4
2088
1
5
2087
1
6
2086
48
7
2084
6
8
2082
30
9
2081
1
10
2080
1
11
2079
7
12
2078
7
13
2077
8
14
2076
9
15
2074
3
16
2073
4
17
2072
3
18
2071
2
19
2070
1
20
2069
2
21
2068
15
22
2065
1
23
2064
1
24
2063
43
25
2062
6
26
2061
4
27
2060
1
28
18025
5
29
2057
1
30
2056
1
31
2055
6
32
2054
28
33
2053
6
34
2052
3
35
2050
1
36
2049
13
37
Untagged 1
8
38
Untagged 2
2
39
Untagged 3
10
40
Untagged 4
9
41
Untagged 5
14

            Our 5-Number Summary looks as follows:
                        Minimum – 1
                        First Quartile – 1
                        Median – 4
                        Third Quartile – 8.5
                        Maximum - 48
Now, on a side note we must add in a word about our bias. And don’t laugh because if you were collecting data with us, you would understand, but there was bias from us as data collectors. There were some calves that we loved more than the others and we spent more time with. For example, Calf #2077 we had nicknamed ‘The Devil’ and wanted to spend the least amount of time possible with her so we probably weren’t as thorough with her as we were with Calf Untagged 5, also known as Casper, who we loved spending time with. This bias could theoretically be fixed with some sort of blinding but we are confident that whoever went and took data would have the same conflict we did and naturally be drawn to some particular calves.
Then to organize our data, we put it into a histogram using ranges with a five spot difference between two of the separate bars (see below).

            According to the graph above, we conclude to say that the distribution of the number of spots on baby calves is definitely not normal. The graph shows that it is unimodal and skewed to the right. It does not have a normal curve or is normally distributed.
Noticing that there were four particular cows that were skewing the data, we took them out of the graph and had steak and milkshakes for dinner. It was delicious. No, but seriously. And we noted them as outliers. We made a new graph and analyzed the new distribution.


            As one can see, it is still unimodal and skewed to the right (as we figured it probably would be… the shape shouldn’t change or anything. Although it would have been nice if it had become “normal”… then at least we would have somewhere exciting to go with this project). We also changed the ranges that we used from going 1-5, 6-10, etc. to 1-3, 4-6, and so on. This made the distribution slightly different, but still the same basic idea. The bars on the graph also got wider for some reason. Guess that they enjoyed the steaks and milkshakes, too.
            Then, noticing that we had made a fatal error, we set out to fix that right away.
            “What fatal error?” you might ask. We had forgotten to check the conditions for a normal model. Yeah, we’re pretty ridiculous.
            Normal models have to be independent, random, and have a large enough number of experimental units. Since none of the calves were siblings we didn’t need to worry about the calves being independent. And because we were using all of the calves on a huge dairy, we didn’t figure that a big enough number was a problem. But the thing we neglected was that we needed a random sample. Nothing about our all-in-one census was random or a sample.
So in order for us to complete this condition we decided to use a random number generator to pick out fifteen random numbers and then graph those results according to the numbers on the first graph (the rough data list). We did this ten times, getting ten different sets of random numbers to compare to the Normal model.
            The distributions of each of our ten sets of randomly assigned pen numbers were unimodal (except for graph #5 which was bimodal) and were skewed to the right, just like the distribution of our graph for all of the data. These graphs can be found on the last page. We didn’t want to go and stick ten graphs right in the middle of all of this writing. It might be kind of distracting.
            Even though it didn’t qualify (probably because our “n” was not large enough), we designed a normal curve off of our total data set anyways. Our mean was 4.5946 and our standard deviation was 3.9753. Based on these values, our normal curve ended up looking like this:


By looking at this, one can see that, based on the Empirical rule, 68% of the calves in our sample had between .6193 and 8.5699 spots. Since earlier we had taken an interest in how the 1-2 spot range always had the most calves in it in our ten random samples, we conducted a test to find the probability that a calf from our sample had 1 or 2 spots. The normal curve from this test is on the next page, because there supposedly not enough room for it on this page.

After plugging the numbers into our handy-dandy online calculator (try finding a normal probability when you’ve already turned your calculator into the teacher, yeah it’s sort of difficult) we discovered that the probability of a calf from our sample having one to two spots was .074. This is a small probability, so I guess the calves at the Perazzo Brothers’ Dairy are just boss or something, because all of our graphs had the 1-2 with the most (except for graph number five and its stupid bimodal-ness).
In conclusion, the distribution of spots on calves is not normal; at least not for our sample. In all the samples we conducted they we severely skewed to the right.
 It might’ve been normal if we had a larger sample number. But, since we didn’t, our conclusion is limited to the calves at the Perazzo Brothers’ Dairy.
If we wanted to be able to expand the limitations of our sampling, we could sample more dairies in Fallon or other places to get a larger sample size. If someone were to replicate the same idea for this project, we suggest using other dairies and collecting data for a larger sample size. And in this way, you could apply this study to a larger region than just at the Perazzo Brothers’ Dairy.
All in all, this sample was fun to do, but disappointing with the results. 

No comments:

Post a Comment