Python is popular in Big Data & data science projects. This tutorial outlines the basic steps to get started with Python on Mac OS.
1. Install Xcode
Xcode can be installed via Apple appstore. Xcode is Apple’s Integrated Development Environment (IDE). Xcode is a large suite of software development tools and libraries from Apple.
2. Install the Apple command line tools
Once Xcode is installed, install the command line tools via “Xcode menu” –> “preferences” –> “command lines tools”, and click the install button. This may take a while to install. Once installed you can verify with a Terminal window
1 2 | $ xcode-select -h |
The Xcode Command Line Tools are part of XCode. The Xcode Command Line Tools include a GCC compiler, and many common Unix-based tools require the GCC compiler
3. Install homebrew
Homebrew is a package manager for OS. On a Terminal window type
1 2 | $ ruby -e "$(curl -fsSL https://raw.github.com/mxcl/homebrew/go)" |
Verify if brew is installed properly by typing the following on a Terminal window:
1 2 | $ brew doctor |
3. Install pyenv
“brew install python3” will give you a version of Python, but the version you get will be out of your control if you let the tool manage your environment for you. You want pyenv to control the Python environment.
1 2 3 | $ brew install pyenv |
Add “eval “$(pyenv init -)”” to your ./bash_profile as shown below:
1 2 3 4 5 | $ cd ~ $ echo 'eval "$(pyenv init -)"' >> .bash_profile $ source ~/.bash_profile |
If you did not configure eval “$(pyenv virtualenv-init -)” to run in your shell, you can manually activate/deactivate your Python versions like:
1 2 3 4 | $ pyenv activate <environment-name> $ pyenv deactivate |
4. Install useful tools
The zlib compression algorithm and the SQLite database are dependencies for pyenv and often cause build problems when not configured correctly.
1 2 3 | $ brew install zlib sqlite |
Add these exports to your current terminal window:
1 2 3 4 | $ export LDFLAGS="-L/usr/local/opt/zlib/lib -L/usr/local/opt/sqlite/lib" $ export CPPFLAGS="-I/usr/local/opt/zlib/include -I/usr/local/opt/sqlite/include" |
5. Install Python
Note: If the above definition is not found then do: “$ brew update && brew upgrade pyenv”. If you want to know the available versions of python – “$ pyenv install –list”.
Let’s install python 3.8.6 here.
1 2 3 | $ pyenv install 3.8.6 |
6. Install pip
Pip is the popular package manager tool for python.
1 2 3 4 5 6 | $ [sudo] curl -O http://python-distribute.org/distribute_setup.py $ [sudo] python distribute_setup.py $ [sudo] curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py $ [sudo] python3 get-pip.py |
If you get any permission issues type “sudo” in front of the each command shown above. Alternatively, you can also install pip with:
1 2 3 | $ easy_install pip3 |
With pip3, you can install the required python packages.
1 2 3 4 5 6 7 | $ pip3 install pytz $ pip3 install pysci $ pip3 install numpy $ pip3 install stringcase $ pip3 install pymysql |
You can list the installed packages with versions:
1 2 3 | $ pip3 list |
All the packages will installed in the global environment (E.g. /usr/lib/python3.8.6/site-packages). So, if you want to use different versions of the packages for different projects, you can’t. This is where you need to create isolated Python environments with tools such as virtualenv, pyenv, conda, etc.
7. Adding a virtual environment
virtual environments enable you to isolate dependency management on a per-project basis. The global command sets the global Python version. The local command is often used to set an application-specific Python version.
1 2 3 4 | $ pyenv global 3.8.6 $ $(pyenv which python3) -m pip install virtualenvwrapper |
1 2 3 | $ ls ~/.pyenv/versions/ |
Outputs: 3.8.6
1 2 3 | $ pyenv which python |
Outputs: ~/.pyenv/versions/3.8.6/bin/python
1 2 3 | $ pyenv versions |
Outputs:
system
* 3.8.6 set by user
1 2 3 | $ python -V |
Outputs: Python 3.8.6
1 2 3 | $ pyenv global system |
1 2 3 | $ python -V |
Outputs: Python 3.8.6
local – project specific
When you’re working on multiple Python coding projects, you might want a couple different version of Python and/or modules installed. This helps keep each workflow in its own sandbox instead of trying to juggle multiple projects (each with different dependencies) on your system’s version of Python.
Say, the project you work on runs only with Python 3.8.6, you can set the version locally and confirm it’s in use:
1 2 3 4 | $ pyenv local 3.8.6 $ python -V |
Outputs: Python 3.8.6
Often your project will depend on specific versions of the packages as some dependent libraries move faster than the project you are working on, hence you need to isolate the packages used by a project.
Let’s create an environment, and name it “databricks_project”.
1 2 3 4 5 | $ pyenv virtualenv 3.8.6 databricks_project $ pyenv local databricks_project $ pyenv versions |
Outputs:
system
3.8.6
3.8.6/envs/databricks_project
* databricks_project (set by User)
Let’s activate the environment “databricks_project”.
1 2 3 | $ pyenv activate databricks_project |
Now you are ready to install the libraries you would like to use.
1 2 3 | $ pip3 install databricks-connect --user |
shell
The shell command is used to set a shell-specific Python version. For example, if you wanted to test out the 3.8-dev version of Python, you can:
1 2 3 | $ pyenv shell 3.8-dev |
pyenv System Python => pyenv global (~/.pyenv/version) => pyenv local (.python-version file) => pyenv shell ($PYENV_VERSION)
Virtualenv Vs. pyenv Vs pipenv Vs. conda
Virtualenv was the default way of creating virtual environment for many years, but now a days people are moving to improved pipenv or conda.
pyenv is a Python version management. pipenv is a packaging tool for Python application and manages package dependencies and its sub-dependencies.
pipenv is preferred over virtualenv.
If you are a data engineer or scientist it can be frustrating to set up all dependencies like Numpy/Scipy. Anaconda is a distribution of python that makes it very simple to install those packages. Anaconda has its own virtual environment system called conda.
Create a new environment:
1 2 3 | $ conda create --name databricks_project python=3.8.6 |
Recreate an environment:
Save the environment:
1 2 3 | $ conda env export > databricks_project.yml |
Recreate the environment:
databricks_project
$ conda env create -f databricks_project.yml
8. Install git
It is a GitHub client to pull and share code.
1 2 3 | $ brew install git |
If you want to restart your shell for changes to take effect
1 2 3 | $ exec $SHELL |
OR
1 2 3 | $ source ~/.bash_profile |