{{< partial "learn_x_header" >}} ## What Will You Learn? - What is Amazon Athena? - What are the basics of Amazon Athena? - What are the use cases for Amazon Athena? - How can I import data into Amazon Athena? - Where can I learn more about Amazon Athena? ## The Basics Learn Amazon Athena basics. Amazon Athena is meant for querying copious amounts of data. It's a straightforward service based on [Presto]. The Presto Foundation calls Amazon Athena and Amazon EMR [Presto Cloud on their website.] It's [not cheap] to use Amazon Athena, but it is convenient. It costs $5.00 per TB of data scanned. It's possible to pay a third of that with compression. Additional savings occur when querying a single column. The golden feature of Athena is its scaled cluster of servers. Setting up your own [Presto] analytics engine would take more work. ### Primary Use Cases - Querying vast amounts of - Log Data - Behavioral Data - Noncritical data ## Importing Data I needed to query a 58G CSV file. I could load the ~1 billion rows into PostgreSQL, but any queries on the data caused my machine to crash. I wrote a Python script to query the data, which works but takes too long to process. So, I brought out the big guns. I tried Athena as a solution. Here's how it went down. I created this table to store my CSV file. This table holds the output of an AWS command-line query. ```sql DROP TABLE learnamazonathena.s3_objects; CREATE EXTERNAL TABLE IF NOT EXISTS learnamazonathena.s3_objects ( time string, bytes bigint, object string ) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' STORED AS TEXTFILE LOCATION 's3://learn-amazon-athena/csv'; ``` Amazon S3 is now the data store for the CSV file. I generated the CSV file with these commands. ```sh # Query all S3 objects recursively aws s3 ls --recursive learn-amazon-athena > s3_objects.csv # Format the output as a CSV file perl -p -i -e ';s/(^.{0,19})(\s+)([0-9]+)(\s)(.+)$/"\1","\3","\5"/g'; applications.csv ``` I ran this query against the newly created table. ```sql SELECT split(object, ';/';), SUM(bytes) FROM learnamazonathena.s3_objects GROUP BY split(object, ';/';) ``` This query took seventeen seconds to complete. I love how fast and straightforward Amazon Athena is. The CSV file I created is small in the grand scheme of things. Facebook built Presto to query obscene amounts of data and later open-sourced it. Go open source! ## Learn Amazon Athena - Beyond the Basics - **Books** - [Getting Started - User Guide] — AWS - [Amazon Athena - User Guide] — AWS - [Athena FAQ] — AWS - **Video** - [Amazon Web Services: Data Analytics] {{< partial "category_footer" >}} [Presto]: https://prestodb.io/ [presto cloud on their website.]: https://prestodb.io/resources.html#cloud [not cheap]: https://aws.amazon.com/athena/pricing/ [getting started - user guide]: https://docs.aws.amazon.com/athena/latest/ug/getting-started.html [amazon athena - user guide]: https://docs.aws.amazon.com/athena/latest/ug/athena-ug.pdf [athena faq]: https://aws.amazon.com/athena/faqs/ [Amazon Web Services: Data Analytics]: https://aws.amazon.com/athena/getting-started/