Natural Language Processing Roadmap – Part I: Data Science

natural-language-processing-roadmap-data-science

In this three-part series, we will try to build a complete Natural Language Processing Roadmap. Think of this roadmap as your guide to go from young Padawan to Master Jedi! We will start with the basics – Programming and Data Science.

Building a Natural Language Processing Roadmap

Hello again! I hope your journey Towards NLP is going well. Whether that’s the case or you feel like you are struggling, I thought I would share this really helpful roadmap I created for myself when I first felt like I was trying to move in every direction. Sometimes it is hard to prioritize or to see in which direction you should be going, so hopefully, this will help you. This is a complete Natural Language Processing Roadmap, but it is also so much more! I divided it into three sections: a Data Science Roadmap (this first part you are reading); a Deep Learning Roadmap, and finally, a Natural Language Processing Roadmap. We will talk about each section in a different article.

By clicking on the topic box, you will be redirected to the corresponding part of the article, where you will find links to different courses and resources. I will also link other blog articles where I go more into details for each topic. At the end of our journey, you will also find more useful tips to improve your NLP and Data Science skills, so stick around, and as always, enjoy!

Step 0 – Find a Mentor

I go into much more details about where and why you should find a mentor in this article. While this is not a necessary step (that’s why you’ll find it dotted in the roadmap), it is incredibly helpful. Especially if you are starting from scratch, it is nice to have someone to talk to.

Step 1 – Learn Computer Science Fundamentals

Here is the deal. You don’t really need to learn about computer science basics. You can definitely go by and become a very good Data Scientist or NLP Engineer without knowing about transistors, Ada Lovelace, Charles Cabbage, Web protocols, binary counting, etc. 

However, it is a great base to start. Having a general idea of how your computer is working and how the different pieces fit together really helps when you are trying to understand some obscure bug. Or when you are reading documentation and academic papers, and you need to figure out why some design or architectural choices were made. Of course, you don’t need to go into too many details if you are not interested, but dipping your toe in the water won’t hurt. And frankly, having at least a high level idea of how computers work nowadays is absolutely a must.

Here you can find two really nice resources to introduce you to computer science here:

The first two resources are a really great place to start learning more about Computer Science. They are easy to follow but really informative at the same time (which isn’t an easy feat to accomplish!). 

A Few Words about My Favorite Courses

Harvard’s CS50 has more of a hands-on, programming oriented approach. You will cover all the basics as you carry out small projects and exercises. This course is an excellent resource to start programming while understanding what is going on under the hood of your program. The Computer Science concepts discussed in the course are a true must, and mastering them early on will make programming so much easier.

Crash Course Computer Science is a truly amazing resource. I personally believe it should be used in schools to teach small kids about Computer Science. This is a YouTube series of 40 videos of about 10 minutes each, which makes it perfect if you have to pack your learning time in an already jammed schedule. It is fun and easy to follow, and it makes even harder concepts easy to understand. The host, Carrie Anne Philbin does a remarkable job at explaining the different topics. And since each topic rarely takes up more than one video, you are sure to get a really well-rounded view of the field! This course is perfect if you are starting from zero, if you want to make sure you have covered all your basics, or simply if you are curious about Computer Science.

A little Plus

You’ll learn bits and pieces about how to use Git and GitHub in any of the different courses you will follow, but I believe these are fundamental tools to master even if you plan on programming for small personal projects. They will make your life so much easier! So here are a couple of courses you can have a look at:

Having a basic understanding of what the Command Line is, and how it works (and how it can work for you!) is really important. Here is a nice introduction with some really helpful explanations:

Step 2 – Master a Programming Language

I won’t go into too many details about the pros and the cons and the whys and the whos of Python. You can read more about the pros and cons of Python and why some people do not believe it is a good language to start programming in this article I wrote a while back. I personally believe (and I am not the only one!) that Python is a great language to start learning. It is easy to read, intuitive, elegant, sleek, and flexible. It is undoubtedly the programming language of Scientific Research, Data Science, Machine Learning and NLP. And of course as you undertake different projects you will be faced with different challenges, and you might need to use other programming languages. That is how we learn and grow our skill set.

But for the moment being, we just want to start, and Python is the perfect place. So here you will find a list of resources to learn Programming with Python. If you want more information about this topic, don’t forget to check out this article. In the article you will find more details about each course, and, you guessed it, a complete Roadmap to learn Python!

Step 3 – Learn how to Play with Data

As you can see in the roadmap, Data is going to be a pretty big part of your journey. After all, this branch of Computer Science is called Data Science. And as someone once said (sorry, I don’t remember who said it specifically), the Data comes before the Science. And I’ll go even further and say, there is no Data Science without Data. In Machine Learning especially, Data is the starting point. You need to find it, collect it, store it, organize it, clean it, analyze it, test it, augment it, and much more. 

In the roadmap, I put three branches concerning Data:

  1. Data Collecting and Cleaning
  2. Data Visualization
  3. Data Engineering

In the list above, I put them in order of importance. 

No matter what your goal is, if you want to dabble in Data Science, ML and/or NLP, you are going to have to learn how to collect and clean data. You might get on without knowing SQL and database management in the beginning, since there are so many datasets ready to be downloaded with a couple of lines of code. But further down the line, you are (hopefully!) going to want to create your own dataset to start experimenting on personal/professional projects. So the sooner you start playing with SQL, Pandas and NumPy, the better. Here are a few interesting resources for you to develop your Data collecting and cleaning skills:

A Roadmap Beyond Data Science

Data visualization is also an important part of the process. It helps explain your analysis and lets the data tell the story it is meant to tell. I would say that if you only want to be able to carry on small personal projects, it isn’t necessary. But, if you want to present your projects to a larger public, for peer review or for business purposes, being able to show the impact of your work through storytelling and data visualization is going to be an instrumental skill.
Here is how to fine-tune your skills:
Data Engineering would be the last step to truly master the data you are working with. At large data-driven companies, data engineering supports R&D teams by making clean data available to research engineers and scientists. It is a separate field, and if you only want to focus on the statistical algorithm side of the problems, you may want to skip this section (that’s the reason why in the roadmap you have a little “wait” icon).
Building an effective data architecture, simplifying data processing, and sustaining large-scale data systems are all responsibilities of a data engineer.
Engineers develop ETL pipelines, automate file system chores, and optimize database processes to make them high-performance using Shell (CLI), SQL, and Python/Scala.
Another important talent is the ability to deploy these data structures, which necessitates knowledge of cloud service providers such as Amazon Web Services, Google Cloud Platform, Microsoft Azure, and others. Google Cloud offers a couple of useful resources:
Don’t forget to go to kaggle.com for more resources!

Info Point

Click here to find more info about the roadmap down below.

How to Read the Roadmap

  • in black, you will find broader topics
  • in light blue, you will find skills to master
  • the little tabs with a settings/clog icon represent suggested languages, libraries, tools etc.

Step 4 – Master Essential Statistics

Ok, here comes trouble. I know that math might seem scary or boring to most of us. But, trust me, is going to help you. A lot. Plus, you don’t really need to do too much math. Just enough so that you understand what is going on under the hood of your model. Not even that. You need to know enough so that you can read and understand explanations about what is going on in your model. At least that is my case. I don’t understand everything directly, but if I start looking around I know I can understand, and it feels like having a safety net. Plus, it is not really the type of math you did in high school – unless you did statistics in high school 😉 . It is really kind of fun once you get the hang of it. 
Here is a great little specialization you can follow that will explain all the concepts you need:
The Specialization includes three courses and is fascinating, so definitely give it a try!
Here are another couple of recommended resources:

Step 5 – Master Machine Learning

 Here we are. We are finally to the Fun part! If you stuck around, congratulations! For this topic, we will go straight into the suggested courses, since you can find all about Machine Learning in this article. Have fun with it!
 
data-science-roadmap

I hope you enjoyed the journey so far! Click on the Deep Learning and Natural Language Processing buttons to access parts two and three of our journey! You will find two more complete roadmaps, a complete Deep Learning roadmap, and a complete Natural Language processing roadmap. Make sure you let me know if there is anything else I missed or I should add! 

As always, have fun, and see you in an 8-bit!