Reproducible

Getting started with Anaconda

Virtual environment

Let’s open Anaconda Prompt (or Command Prompt if you’re Mac user).

The first step is to choose the environment name. Usually, it’s = project name.

conda create --name redcar python=3.7.4

After you have created a new virtual environment, you need to activate it to use it. To do so, use the following command:

conda activate redcar

Now you can install all the packages that you will use in this project. Let’s try to install pandas:

conda install pandas

You can always install a specific version of a package conda install pandas==0.25 (e.g., your project group member using not the latest one.) You can also install a set of packages at once. Just type them separated with space:

conda install matplotlib seaborn numpy

Let’s take a look at what packages do we have:

conda list

To dump this list of packages into a single file use:

conda env export > environment.yml

🎉 We created a file conventionally called environment.yml that contains all the packages that have been installed! Now we can pass by this file to a person and he or she will be able to recreate the same setup. Usually this file is stored under C:/Users/<your_user_name>/ (and for Mac )

Alternatively, if you want to build an can use the following command:

conda list --explicit > spec-file.txt

Note that this specification file will allow you to install on the same operation system (i.e. Windows).

Now let’s get back to our base environment. For this simply type:

conda deactivate

Let’s finalize our practice by deleting this virtual environment 😱 :

conda env remove --name redcar

Don’t worry! We doing this only to recreate it again from previously environment.yml file 🦉 . If you somehow forget to do so, take one here and put it to C:/Users/<your_user_name>/ (and for Mac ). The command for creating an environment from the file is as follows:

# Deafult option
conda env create --name redcar --file environment.yml

# Or if you used the second option
conda list --explicit > spec-file.txt

Alright! We’re back on track. We have a new virtual environment with a base set of packages to continue our work!

However, to make this environment work inside JupyterLab (or Jupyter Notebook), we need to tell JupyterLab that the environment exists. We can do this with the following command:

python -m ipykernel install --user --name=redcar

{% hint style="warning” %} 🧠 Note: There are other ways to manage virtual environments pipenv or venv. We cannot say which one is better. As usual there are pros and cons. Our advice is: whenever you feel uncomfortable with the tool that you’re using, dive in to find a better option. {% endhint %}

Cookiecutter Data Science

Cookiecutter is a tool that helps to create project templates for Python packages, Java and Android applications, etcetera. Having a project template with a couple of lines of code prevents you from manual work (and that’s the end goal, right 🐌?).

The project template that we will use was designed by DrivenData and called Cookicutter Data Science. The project website says:

“Cookiecutter Data Science is a logical, reasonably standardized, but flexible project structure for doing and sharing data science work."

After testing in numerous ⚔, we concluded that it’s pretty handy. So let’s continue by installing Cookiecutter and Cookiecutter Data Science.

The first step is to open Anaconda Prompt (or Command Prompt) and activate the virtual environment.

conda activate redcar

Now let’s install Cookiecutter with pip (it’s not available with conda 🤷):

pip install cookiecutter

Nice! The next step is to point Cookiecutter to a specific project template:

cookiecutter https://github.com/drivendata/cookiecutter-data-science

You’ll get a set of questions such as:

  • project name and repo name (usually they’re =),
  • author name (surname and company if any),
  • short description of your project (a couple of line of code for README.md),
  • licence (read more about licenses here),
  • S3 bucket and AWS profile (for establishing a pipeline with Makefile).

Let’s fill it up!

{% hint style="success” %} Alright! Great success! Now it’s time to continue this work with Git and GitHub. {% endhint %}