Setting up Python for Data Science on M1 Mac

Pareekshith Katti
3 min readOct 8, 2021

I recently bought a M1 Macbook Air, i found it quite difficult to get all my libraries working, so i thought i’d explain how i got them working.

There are multiple ways to install python

  1. Through Homebrew
  2. Through Miniforge
  3. Anaconda Via Rosetta
  4. Install from python website

After a couple of tests, i found Rosetta version to be quite slow compared with both the native version of python as well as python running on my linux machine (Intel 10th gen). I also found out that miniforge was the easiest way to install.

Step 1: Install Xcode command line tools

xcode-select --install

Step 2: Install Homebrew

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Step 3: Install Miniforge

You can install miniforge either from brew or via miniforge on github.

brew install miniforge

github — GitHub — conda-forge/miniforge: A conda-forge distribution.

Step 4: Install the required libraries

Miniforge comes with python 3.9. Based on the libraries you need, you might want to downgrade python for compatibility. This was the case for me, so i downgraded to 3.8

conda install python=3.8

You might want to create new environment as well.

conda create -n [env name here] conda activate [env name here] conda install python=3.8

For the libraries, you can just conda install or pip install if the library is not available in conda.

If you have a lot of libraries that you want to install you can put them in a requirement.txt file and use

cat requirements.txt | xargs -n 1 conda install

This will prevent conda from failing because of a single library.

After this, you can try pip install the libraries that weren’t installed by conda.

You can know these libraries by using

conda install --file requirements.txt

Most of the popular libraries like tensorflow, sklearn, pandas etc work right out of the box. Some obscure libraries might not work, you can try installing from source if it happens or find an alternatives. One of the libraries pykrige did not get installed but i found out that GaussianProcess in sklearn does the same thing.

However, i had two main problems.

  1. pycaret did not install since it had a hard requirement of a scipy version that doesn’t work with M1.
  2. XGBoost threw segmentation fault when XGBClassifier().fit() was called.

I did find workarounds and it might work for you as well.

Workaround 1: Installing Pycaret

Step 1: Install pycaret without dependencies

pip install --no-dependencies pycaret

Step 2: Installing the requirements

pycaret/requirements.txt at master · pycaret/pycaret · GitHub contains requirements for pycaret

I removed version requirements for all packages containing < or <= or ==

pandas
scipy
numpy
seaborn
matplotlib
IPython
joblib
scikit-learn
ipywidgets
yellowbrick>=1.0.1
lightgbm>=2.3.1
plotly>=4.4.1
wordcloud
textblob
cufflinks>=0.17.0
umap-learn
pyLDAvis
gensim
spacy
nltk
mlxtend>=0.17.0
pyod
pandas-profiling>=2.8.0
kmodes>=0.10.1
mlflow
imbalanced-learn
scikit-plot #for lift and gain charts
Boruta
numba

i named the file pycaret_requirements.txt and installed dependencies using

cat pycaret_requirements.txt | xargs -n 1 conda install

for all the packages that did not get installed, i pip installed it.

Workaround 2: Installing XGBoost

You might have to install cmake, libomp if not already installed.

brew install cmake libomp

I got XGBoost working by using — no-binary option

pip install xgboost --no-binary xgboost -v

If you still get segmentation fault, try removing libomp and reinstalling xgboost using the above command.

Testing

I ran both classification and regression example from pycaret and it worked well.

I’ve attached a few screenshots

Classification:

pycaret classification
xgboost classifier

Regression:

pycaret regression
xgboost regression model

--

--