Setting up your Python workspace for data analysis

So you want to learn python to get into data analysis, but don’t know where to start? There are lots of new components and concepts that come with learning python.

Some of the common starting questions are:

  • “How do I install python on my PC and how do I start using it?
  • “What programs do I need so I can use python to analyse data?”
  • “What are the best python related tools out there?”

Let’s think of working with Python in a similar way that you’d work on a carpentry project. You’ll need a specific workspace to build and work on your project. You’ll also need a variety of tools, which you can get individually or through a toolbox. You’d start with a piece of wood and slowly shape it into a final form.

In an analogous way but for a python-data project, you’ll need to start with some data that you will need to “lay” on a workspace so you can work on it and mould it into a final form. You’ll need a variety of data-related tools to help you shape and transform your data into its final state.

In the python-data scenario, these “specific tools” are known as python “libraries” or “packages”. The tools will depend on your specific data requirements (just like cutting a log might need a chainsaw, transforming some data might need a specific data library).

You will need to bring these tools into your workspace so you can use them on your data. But, where shall you get these “tools” from? How should you install them into your PC (along with Python itself!?).

The easiest solution for data science is to use a “toolbox” that will come with all the necessary tools you need. In this case, Anaconda (or Conda) will be your solution, as it comes with a lot of data specific “tools” (ie. Libraries) and it helps you manage them . By installing Anaconda into your PC, you’ll get Python installed, plus several more useful libraries too (eg. Pandas, nnumpy, json, requests).

Once you have all the necessary tools, you’ll still need a workspace (eg. a desk) where you can “lay” your project on. There are several options out there, but perhaps one of the best ones (in my opinion) are Jupyter Notebooks. Within them, you’ll be able to write and run your code, as well as see the results of your code on your data.

Jupyter Notebooks are a particularly useful workspace, as on top of letting you interact visually with your data, they also let you add formatted notes and images into your notebook, which comes in really handy when wanting to document and go back through the logic of your code. And the good news: once you have installed Anaconda, you can use it to easily install Jupyter Notebooks.

So in short, if you want to get started with python for data analysis, you can use:
1. Jupyter Notebooks – to write and run your code
2. Anaconda – to install and manage your Python “tools” (libraries) as well as the installation of the Python language