Introduction to Python for Actuaries

John Bogaardt FCAS MAAA

Brian A. Fannin ACAS CSPA

February 6, 2020

What is Python?

What is Python?

Python is a programming language invented by …

… this guy

… and he named it Python because …

… he was into Monty Python when he wrote it

What can it do?

  • Interpreted program, like R or VBA
  • Runs on any OS
  • Comes standard on Mac and Linux
  • General programming
  • Access the file system

How does it do it?

The Zen of Python

  • Beautiful is better than ugly.
  • Explicit is better than implicit.
  • Simple is better than complex.
  • Complex is better than complicated.
  • Flat is better than nested.
  • Sparse is better than dense.

More zen

  • Readability counts.
  • Special cases aren’t special enough to break the rules.
  • Although practicality beats purity.
  • Errors should never pass silently.
  • Unless explicitly silenced.
  • In the face of ambiguity, refuse the temptation to guess.
  • There should be one– and preferably only one –obvious way to do it.

What else does it do?

Analytic data

Visualization

Data from the web

Other cool web stuff

Data science

How can I incorporate Python into my workflow?

Similarities between R and Python

  • FLOSS - Free, Libre, Open Source Software
  • Wide support (take that, Julia!)
  • Interpreted, REPL
  • Rich package ecosystem
  • Not OS dependent
  • Fast to program
  • Execution not always as fast as C, but possible to use C routines when needed
  • great database support - from RDBMS to Spark, Hadoop, etc.

How does it compare with R?

R …

  • loves statistics
  • hasn’t really settled on OOP
  • vector support out of the box
  • does anyone really use try()?
  • has some actuarial packages
  • RStudio!!!
  • lacks consensus around machine learning - H2O, caret, ROCR?

Python …

  • is general purpose
  • has easy and consistent support for OOP
  • vectors are available in numpy
  • great exception handling
  • not much actuarial focus
  • no consensus on a FLOSS IDE
  • scikit-learn!

Compare with Excel?

Excel …

  • Closed source, paid commercial license
  • Data is visible, but logic is hidden
  • Great for camera-ready output

Python …

  • FLOSS
  • Logic is visible, data is abstract
  • A bit more work to document output

Practicalities

  • Integrated Development Environment (IDE)
  • Version 2 or 3?
  • Sharing your work

Jupyter

Intro scikit learn

scikit-learn

  • Built on NumPy, SciPy, and matplotlib
  • Open source, commercially usable - BSD license
  • Common functional interface for:
    • Data splits, cross validations
    • Data transforms
    • Training, test, scoring
    • Pipelines to manage workflow
    • GridSearch for model tuning

But what algorithms does it support?

algorithms cheat sheet

Estimators

  • Estimators are the building block of scikit-learn. Almost everything is an estimator.
  • All estimators have fit() and predict() methods.
  • Supervised techniques generally have a score() method as well.
  • Some special estimators have a transform() method to pre-process data.
    • Generate new features
    • Clean data
    • Other pre-processing

Pipelines

Pipelines facilitate workflow. Pipelines are estimators too. Each step in a pipeline must be a transformer except for the final step.

pipeline image

How about an example?

Wrapping up

Python and you

  • If you already know R, check out Python
  • If you don’t know R, check out Python
  • If you’re an actuary, check out Python
  • Jupyter is a great place to start experimenting
  • scikit-learn is worth exploring

Thank you for your time!

Questions?

References