Athena bucketing

What is Athena bucket?
What is difference between partitioning and bucketing?
What is bucketing in AWS?
What is the purpose of bucketing?

What is Athena bucket?

To reduce the data scan cost, Athena provides an option to bucket your data. This optimization technique can perform wonders on reducing data scans(read, money) when used effectively. If you are familiar with data partitioning, then you can understand buckets as a form of Hash partitioning.

What is difference between partitioning and bucketing?

Partitioning helps in elimination of data, if used in WHERE clause, where as bucketing helps in organizing data in each partition into multiple files, so as same set of data is always written in same bucket.

What is bucketing in AWS?

A bucket is a container for objects. To store your data in Amazon S3, you first create a bucket and specify a bucket name and AWS Region. Then, you upload your data to that bucket as objects in Amazon S3. Each object has a key (or key name), which is the unique identifier for the object within the bucket.

What is the purpose of bucketing?

Bucketing in hive is useful when dealing with large datasets that may need to be segregated into clusters for more efficient management and to be able to perform join queries with other large datasets. The primary use case is in joining two large datasets involving resource constraints like memory limits.