What Will You Learn?

  • What is Amazon Athena?
  • What are the basics of Amazon Athena?
  • What are the use cases for Amazon Athena?
  • How can I import data into Amazon Athena?
  • Where can I learn more about Amazon Athena?

The Basics

Learn Amazon Athena basics. Amazon Athena is meant for querying copious amounts of data. It’s a straightforward service based on 🔎Presto. The Presto Foundation calls Amazon Athena and Amazon EMR Presto Cloud on their website.

It’s not cheap to use Amazon Athena, but it is convenient. It costs $5.00 per TB of data scanned. It’s possible to pay a third of that with compression. Additional savings occur when querying a single column. The golden feature of Athena is its scaled cluster of servers. Setting up your own 🔎Presto analytics engine would take more work.

Primary Use Cases

  • Querying vast amounts of
    • Log Data
    • Behavioral Data
    • Noncritical data

Importing Data

I needed to query a 58G CSV file. I could load the ~1 billion rows into PostgreSQL, but any queries on the data caused my machine to crash. I wrote a Python script to query the data, which works but takes too long to process. So, I brought out the big guns.

I tried Athena as a solution. Here’s how it went down.

I created this table to store my CSV file. This table holds the output of an AWS command-line query.

DROP TABLE learnamazonathena.s3_objects;
CREATE EXTERNAL TABLE IF NOT EXISTS learnamazonathena.s3_objects (
         time string,
         bytes bigint,
         object string
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
STORED AS TEXTFILE
LOCATION 's3://learn-amazon-athena/csv';

Amazon S3 is now the data store for the CSV file. I generated the CSV file with these commands.

# Query all S3 objects recursively
aws s3 ls --recursive learn-amazon-athena > s3_objects.csv

# Format the output as a CSV file
perl -p -i -e ';s/(^.{0,19})(\s+)([0-9]+)(\s)(.+)$/"\1","\3","\5"/g'; applications.csv

I ran this query against the newly created table.

SELECT split(object,
         ';/';), SUM(bytes)
FROM learnamazonathena.s3_objects
GROUP BY  split(object, ';/';)

This query took seventeen seconds to complete.

I love how fast and straightforward Amazon Athena is. The CSV file I created is small in the grand scheme of things. Facebook built Presto to query obscene amounts of data and later open-sourced it.

Go open source!

Learn Amazon Athena - Beyond the Basics


Related Articles by Category

☁️ Amazon Web Services

Cloud computing services and infrastructure from AWS.

☁️ Cloud

Cloud computing platforms, services, and infrastructure.

📉 Data Analysis

Examining data to discover patterns, trends, and actionable insights.

🗄️ Databases

Storing, organizing, and retrieving data efficiently.