(Photo by Alexander Sinn on Unsplash)
Briefing
Athena helps you analyze unstructured, semi-structured, and structured data stored in Amazon S3. Examples include CSV, JSON, or columnar data formats such as Apache Parquet and Apache ORC. You can use Athena to run ad-hoc queries using ANSI SQL, without the need to aggregate or load the data into Athena.
- Serverless
- Interactive query platform
- Support formats:
- CSV
- JSON
- Avro
- Apache Parquet (columnar)
- Apache ORC (columnar)
- Use Presto (a distributed SQL query engine for big data)
- Integrate with AWS Glue Data Catalog
- Integrate with QuickSight for data visualization
History
From ancient to modern times. Get through all the context.
- 2023-04-28: Introducing Athena Provisioned Capacity
Reference
Examples
- Access Amazon Athena in your applications using the WebSocket API, 2023-03-02, by Abhi Sodhani and Robin Zimmerman
- How Novo Nordisk built distributed data governance and control at scale, 2023-04-28, by Jonatan Selsing, Alessandro Fior, Anwar Rizal, Moses Arthur, Hassen Riahi, and Kumari Ramar
- Implement a serverless CDC process with Apache Iceberg using Amazon DynamoDB and Amazon Athena, 2023-08-16, by Vijay Velpula, Karthikeyan Ramachandran, and Sriharsh Adari
- Apache Iceberg is an open table format for very large analytic datasets. Iceberg manages large collections of files as tables, and it supports modern analytical data lake operations such as record-level insert, update, delete, and time travel queries. The Iceberg specification allows seamless table evolution such as schema and partition evolution, and its design is optimized for usage on Amazon Simple Storage Service (Amazon S3). Iceberg also helps guarantee data correctness under concurrent write scenarios.
- Use Amazon Athena to query data stored in Google Cloud Platform, 2023-08-15, by Jonathan Wong
Articles & Talks
- How Small and Medium Businesses Can Develop a Modern Data Strategy, 2023-06-14, by John Walker, Dimple Dhar, and Kunle Adeleke
- In the era of big data, small and medium-sized businesses (SMBs) often find themselves wrestling with a deluge of data from an ever-growing range of sources. According to Gartner, 60 percent of organizations do not measure the costs of poor data quality. A lack of measurement results in reactive responses to data quality issues, missed business growth opportunities, and increased risks.
Comparison
- How is AWS Redshift Spectrum different than AWS Athena?, 2017-09-01, by Thomas Spicer