Jan 28

ANALYSIS

Wilson Stringer

Independent Study

Mr. Bessias

28/01/19

Independent Study Analysis

 

Part 1: Programming

 

Over the past two semesters, I have been studying and working on my independent study. The point of my study was to see if socioeconomic status correlates with bad pregnancy outcomes (stillbirth, preterm birth, low birth weight). I started out by programming through R, but then I switched over to SAS because it was more logistic. I used SAS for the past few months to program and find the outcomes that I did.

The first thing that I had to do in SAS, was create a library (named W_ZAPPS) and import my dataset so that I could use statistics to find outcomes. I then defined all of the variables that I planned to use, for example: I wrote “mat_age” for maternal age, “maritalcat” for marital status and so on. In total, I used around 100 different variables varying from what type of water was used in the house, to how many years of education the mother has. I then had to sort the variables into categorical and numerical. I had to program the yes and no answers to 1 and 2, so it would work in the programming. After that, I had to clean the data to make sure the data was legitimate and there were no errors. For example, in some cases I had to remove all of the duplicates so any of the extras I programmed out. I did this for several of the variables so the data was cleaned and ready to be analyzed.

Writing the program was one of the hardest parts because SAS has a very unique way of making sure the code is correctly written down. After every line of code used a ; has to be placed so that the line can be ran and spit out the data. Here are some examples of the written code used to answer questions.

Do HIV + women have a lower hemoglobin count? (The answer was Yes)

Proc ttest; class HIV; var hemoglobin; run;

You can see that I ran a t-test in which I chose the class, HIV, and the variable was hemoglobin. Another example would be:

Do people with more education have better floors?

Proc freq; tables education*floors /chisq; run;

In this case, I did a chi square test in which the two variables were education and floors so there is an * placed in between them.

The answer to this question is No. This finding was very surprising. (I will get into this later)

One more example would be:

Are women with piped water more likely to have unfinished floors?

Proc freq; tables piped_water*floors /chisq; run;

The answer was Yes.

The reason I ran these tests was to see if there was a correlation between the two variables and there was in some cases, and others no. If the p value was below .05, then they were statistically significant and that is why I could make these conclusions.

To conclude with the programming proportion of this part, I enjoyed being able to program using SAS. At first I had to get my father to help me, but after doing it several times, then I got the hang of it and it was easier to find the answers. I now have a basic understanding of SAS and this is going to be very helpful someday, especially if I want to be a computer programmer or want to study anything that requires programming.

 

Part 2: Outcomes

There were many outcomes that were produced with the lines of code that I used. Some findings were more interested than others. These outcomes were the results of different variables being compared to each other. I found several particularly interesting because the results were not what I was expecting. For instance, one of the outcomes was looking at HIV at enrollment (for the mother) and seeing if it is correlated with bad pregnancies outcomes. Out of the 1194 women for this variable, 897 did not have HIV at enrollment, and 297 did. This produced a P value of .1853. This value was surprising to me because I would think that if the mother had HIV then this might affect the outcome of the baby. I was misinformed because this is not the case. I thought that possibly HIV would affect the outcome, but it did not. This is why this value was surprising to me. Another example of an interesting case would be the years of maternal schooling. I would have thought that the more education you had, then the less likely you will have bad pregnancy outcomes. Bad pregnancy outcomes do not correlate with maternal schooling, the p value that was produced was .94. This was interesting because it is the exact opposite that I thought it would have been. I thought that if your mother has more education, then I would assume you would be a little wealthier, and maybe you could help get better outcomes, but this is not the case at all. Since the p value is above .05, then this is not statistically significant which means we can not make the connection that there is a correlation between maternal schooling. The final outcome that I would like to mention is that having a prior stillbirth does not correlate with bad pregnancy outcomes. I found this particularly interesting because you would think that having one in the prior would affect your pregnancy, but it did not. The P value was .0655 which is very close to being statistically significant but not quite there. This shocked me the most because I truly thought this would have been significant. This is the same for having a prior miscarriage and I do not know why this is the case. Overall, there are many different outcomes that are surprising, but these several ones are the ones that I wanted to dig deeper into.

 

Part 3: Reflection

To wrap up the final analysis of my independent study I would just want to talk about my learning experience through this whole semester. I was able to learn how to program in SAS and this is a very useful skill that I will continue to add onto. I was able to study this topic about pregnancy outcomes and use statistics with everything. I thought that this independent study was very interesting and worthwhile. Although I did not take the normal way to complete this study, I still got it done and loved doing it. I am going to study this in the future and hopefully be the one who collects the data, rather than the one doing the analysis. I liked being able to choose my topic and do a semester course on what I found interesting. I think that being able to use these skills in the future will help me. My learning experience was enjoyable because I got to work using my schedule and work in what I wanted to do. I think that doing independent studies are extremely worth it because you get to dive and study a topic that you find interesting and that DA does not offer. There are many benefits to doing these type of studies because it is doing something that you genuinely want to do. There are many classes that students do not get or like and by doing this, you get to pick what you want to study. I enjoyed this very much.

Dec 17

Statistics

I am sorry that I have not been uploading recently, I have been working on my project and simply forget about doing this at times. I will update you on what is happening though. I have now switched to SAS because it is easier for me to work on and my father can help me work on this at home. We have now analyzed a lot of the data and we have created t-tests and used statistics to get the information that we need. There are 3 variables that we are working with and we have used these to created certain tests. For example, we looked at whether people with running water have a higher chance of furnished floors. And do people with more education happen to have fewer pregnancies. We have been programming in SAS and using the data to create analysis and other such things. I am almost done with this part and then I can create my presentation or a paper to show to the committee.

Nov 14

Data

Over the weekend and through this week I have been working on the data table that I am going to use. I created it and I have been adding things to it throughout this week. It has the qualitative quantities and the different characteristics that I am going to use. I used word to create this and I also got the massive data set from ZAPPS. I am currently putting this into R so I can run the statistical tests. I am finally making headway on the most interesting part.

Nov 06

ZAPS Data

My mom sent me a ZAPS Database as an example of what mine would eventually look like with a paper at the end and all of the other things involved in this. I printed out this data and I was annotating what I thought was interesting and necessary to add to mine. This data is part of the Zambian Medical journal data and it includes all of the data that my parents have been using for years. This was just an interesting preview of what I need to do to make mine look similar to this one.

Nov 05

Progress so far

Today I would just like to talk about where I am as of now and where I need to be in the timeline that I gave myself. I have been using R and discovering it throughout the first Quarter. I have the Dataset and I am cleaning that right now. I have been talking to my Moms friend, Joni. I talked to her earlier this summer when I was in Zambia doing the website that I created (www.speaknyanja.com). I will be finishing up cleaning my data before I do any tests. I also researched more about the tests and I am going to have to perform Z-tests and different examinations using statistics. I have been busy as out late because the end of the first Quarter, but I cannot wait to attack this and hit the ground running in the second Quarter.

Nov 02

Data

Today I looked through an old data set that my mom gave me as a sample for what I should be looking for. I was able to go through it and figure out what I need to do. It was a good help that I needed because I was a little confused on what exactly I need to be doing. I figured out that after cleaning the data, I can finish this study because all I will have to do next is some background information and then the statistical tests. After that comes the paper or presentation that I will need to complete.

Oct 31

Sorting Data/Cleaning

Over the past few days, I have been talking more and more with my mom’s friends about how I am going to clean the data. I have a meeting set up with her this weekend and she is going to help me with the data processing. On top of that, I am going to need to set up a point system to try and see if the data is significant or not because of all the data I have. I should be able to meet with Mr. Ross soon and start planning what I am going to do with the statistics involved.

Oct 29

Data tables

So this week I am going to meet with one of my dad’s friends that helped gather the information and she is going to help me program and clean the data. When I went to Zambia this summer I met with her and she agreed to help me so that is what I am going to do throughout the week because she is going to give me tips and what not and we should be good. On top of this, after this I get to do the statistical annalyis and that is the better part. Right now I am still cleaning the data because of the different responses so after this is done I can finally move on and closer to the final paper.

Oct 25

Cleaning Data

As it has come to the end of the quarter I am right where I need to be in order to move along with this independent study. I have become a little more familiar with R and I think soon I will be able to go meet with Mr. Ross on a more frequent basis. I have started cleaning the data and I will be able to finish soon. That being said, I do not know really anything about the statistical analysis so I am going to have to research this and then be able to use this in my work. I think that I am on track to finish this paper by the end of semester 1.

Oct 24

More about data cleaning

“Data Cleansing or data scrubbing is the process of identifying and correcting inaccurate data from a dataset. With reference to customer data, data cleansing the process of maintaining consistent and accurate (clean) customer database through identification & removal of inaccurate (dirty) data.”- this is what Data cleaning is and the reason its this important is because it makes the data reliable and then you can take what you want from the data no matter what.

Older posts «