CCA Spark and Hadoop Developer Certification—prove your skills. Build your career. In the conclusion to this series, learn how resource tuning, parallelism, and data representation affect Spark job performance.

In this post, we’ll finish what we started in “How to Tune Your Apache Spark Jobs (Part 1)”. I’ll try to cover pretty much everything you could care to know about making a Spark program run fast.

The spark-avro library supports writing and reading partitioned data. You pass the partition columns to the writer. For examples, see Writing Partitioned Data and Reading Partitioned Data.

Big Data with Hadoop and Spark Online Training with Certification (incl. Free Lab Access)

Spark write avro
How-to: Tune Your Apache Spark Jobs (Part 2) - Cloudera Engineering Blog