Skip to content



  1. Apache Spark
  2. Delta Lake
  3. ML Flow
    1. data scientist: how do I deploy this model
    2. model registry, deployment
    3. alternative: Kubeflow

Apache Spark

Fault tolerant

Like Hadoop but faster????

Databricks helps you manage your Apache Spark faster

At Forma, we use Databricks to run commissions runs super quickly (order of magnitudes faster)

will feel at home with Jupyter notebooks

Data lakehouse

Data warehouse

  • expensive hardware
  • easily do analysis on your data
  • security

Data lake

  • massive amounts of data
  • cheap hardware


  1. lake: cheap hardware and store tons of data
  2. still being able to do analysis on data

Databricks vs Snowflake


Last update: 2022-11-04