Why Did I Start a Data Science Blog? The Benefits of Sharing your Code

I launched my blog in May 2021. In this post, I hope to answer questions such as “Why do you have a blog?” or “What is the purpose of it?”, and who knows, maybe even give you the motivation to start your own blog. In no particular order, here is the vision and purpose of my blog.

1. Paying it Forward

I was first introduced to R and the world of data science by my amazing mentor Professor Jo Hardin. I became familiar with the RStudio interface and some simple functions by taking classes at my school, Pomona College. After class I remember looking up how to use more advanced packages and all of R’s cool features. I started looking for resources but was overwhelmed by the vast amount of websites and courses out there. After almost an entire year, I started to get a hang of it, but at the beginning I didn’t know where to start. That inspired me to want to create a technical blog from a student’s perspective. Essentially, I want to document my learning journey.

I have learned all of what I know because someone mentored me, published a course or wrote a blog. In the same way, I aspire that my work can support an upcoming data scientist. Soon, I will be writing a post with a variety of resources that I used and continue to utilize, so be on the lookout for it.

I can’t emphasize enough how important this kind of practice is. No matter how many Coursera, DataCamp or bootcamp courses you’ve taken, you still need experience applying those tools to real problems. This isn’t unique to data science: whatever you currently do professionally, I’m sure you’re better at it now than when you finished taking classes in it. Therefore, I will be showing my code in each of my posts with the hope that you will attempt to reproduce and build upon it.

2. Learning by Writing

I really enjoy learning new stuff in many different domains. Before launching this blog, I believed that I understood a line of code as soon as I was able to teach it to someone. If I was not able to explain it in a clear and understandable way, it meant that I needed to study it more thoroughly because I actually did not fully understand it.

This is often referred as the Feynman technique. This method of learning is based on the fact that in order to fully master a topic you need to be able to explain it back to someone in simple terms.

Throughout this blog, I realized that in order to learn and fully understand something new, one must:

  • be able to clearly communicate it and teach it in simple terms,
  • but also be able to write it down in a precise and concise manner

So although this blog was first launched to share concepts I am most familiar with (hoping that it would be useful to some people), I now also use it to learn by writing. I think that this additional way of learning is actually more powerful because writing allows me to consolidate my understanding.

Obviously, I am learning mainly about statistics and its applications in R as they are the main topics of the blog. However, I never thought that I could also learn so much about:

  • writing (a skill I still need to improve)
  • communicating results (as any data scientist will tell you, results without proper communication are useless and writing a blog is a great practice)
  • web development (which are increasingly important skills nowadays)
  • project management (as you build something from scratch and wish to develop it)

Maintaining a blog will teach me essential skills that are usually taught at full-time jobs. With a blog, you are responsible of everything from the content to readers’ inquiries, similar to an employee who has to deal and communicate with end users.

3. Getting Feedback

Imagine applying for your first job as a data scientist. You have gone through a bunch of online courses, read a couple books, and practiced some analyses, but you still don’t feel ready or your first couple of interviews did not go well. You decide that you need some more practice. What should you do?

What skills could you improve on? It’s hard to tell when you’re developing a new set of skills how far along you are, and what you should be learning next. This is one of the challenges of self-driven learning as opposed to working with a teacher or mentor. A blog can therefore be seen as a powerful peer-review method of your understanding of a concept, code or R practices. It is also better to make mistakes when working on a toy example and correct them, than to make mistakes on a real project at your workplace.

Note: The majority of the resources/links I will be including in this blog are for doing data science in R. There are many good software options, however the resources for getting started in R are outstanding.

As always, if you have a question or a suggestion related to the topic covered in this article, please feel free to contact me!

Ian Krupkin
Ian Krupkin
Statistics Major