Lateral joins in Spark

In this short article, I will explain lateral joins in Spark. I will demonstrate lateral joins using a simple example in Scala, and also show alternative approaches like inner join calculation with window functions and pure Spark SQL. The def lateralJoin(right: Dataset[_]): DataFrame function on Dataset was introduced in version 4.0.0, but according to the documentation, the lateral subquery feature was first introduced in version 3.2.0. Source code is located here: https://github.com/JurajBurian/spark-playground. ...

October 20, 2025 · 5 min · 874 words · Me

Developing Spark jobs in Scala

In this short article, I would like to cover the entire development cycle of Spark jobs. I will explain following topics: setup of sbt multi-module project how to write modular code how to write testable code how to write unit tests how to write integration tests Source code is located here: https://github.com/JurajBurian/spark-example . The sbt project. In this sample project, we have defined a single Spark job as a module alongside a common module to demonstrate modular development. This approach is useful when developing several Spark jobs, that share models or functionality. ...

August 11, 2025 · 9 min · 1788 words · Me