My Adventures in Data Science – Week 1

You may have gotten a hint of it from reading a few of my posts here, but if you follow me elsewhere say, on this blog, you’ll know that I’m passionate about data science.  I decided to enroll in General Assembly’s Data Science part time, semiweekly course.  11 weeks of programming, statistics, and sifting through data.  If you’re a geek like me, that translates to approximately 11 weeks of fun!

The class is set up to focus on developing ‘Type A’ data scientists (analyst leaning) rather than ‘Type B’ data scientists (programmer leaning), which I am a little disappointed about.  While I do need a strong refresher in my statistics, I definitely wanted to ramp up my coding abilities more.  Still, I’m hoping to get a lot from this course, especially in regards to learning complex machine learning algorithms and better Python programming practices.  Above all, it will serve as a good stepping stone for finding a new role in data science, or for preparing myself for further studies in data science.

The first week so far has just been introductions.  My classmates are so diverse in personal, professional, and academic backgrounds.  I’m certain I’m one of the youngest in the class, if not the youngest.  Some have had a lot of experience working with data as analysts with no programming background, and others have done a ton of coding, but lack a strong statistics foundation.  I feel like I’m somewhere in the middle, but we’ll see.

In the Monday class, after introductions and orientation, we learned some command line basics.  Easy stuff for me, as I’d been using it extensively in school and in work, but I recognize it’s very new and cryptic to others.  I finished my exercises early and helped the people sitting next to me with theirs.  (I felt really cool about that.)

On Wednesday, some alums from the previous class visited to present their class projects.  One student created a model to predict GDP from data collected from the CIA website.  Another built a recommendation system for meal recipes based on inputted ingredients.  I felt both excited and scared about my own project.  The instructor told us to expect putting in 200 to 300 hours of work! Still, I’m stoked that in just a few weeks I’ll know enough to start putting together insightful analyses and predictive models of my own.  Can’t wait! I hope I can use the data I collected from the Simple Word Count Tracker I created for NaNoWriMo.  I think it would be awesome to be able to use data to predict if someone could win NaNoWriMo before NaNoWriMo even began.  More on that as class goes on, I guess.

Wednesday was also supposed to be the day we go over some Python basics, however, we got a little behind during the lesson on Git.  I’m familiar with both Python and Git, so that lesson felt pretty slow for me even though it was good review.  Hopefully things will pick up in next class.  Looking forward to it.

In the meantime, here’s a quick visualization (created in Tableau) of something I learned this week about the Data Science Workflow:

Screenshot 2015-12-07 11.35.52