Fields:
- Log format version: The version number of this query log. If we add fields to the log or change the format of existing fields, we'll increment this value.
- Query timestamp: The date and time that Route 53 responded to the request, in ISO 8601 format and Coordinated Universal Time (UTC), for example, 2017-03-16T19:20:25.177Z.
- Hosted zone ID: The ID of the hosted zone that is associated with all the DNS queries in this log.
- Query name: The domain or subdomain that was specified in the request.
- Query type: Either the DNS record type that was specified in the request, or ANY. For information about the types that Route 53 supports, see Supported DNS record types.
- Response code: The DNS response code that Route 53 returned in response to the DNS query.
- Layer 4 protocol: The protocol that was used to submit the query, either TCP or UDP.
- Route 53 edge location: The Route 53 edge location that responded to the query. Each edge location is identified by a three-letter code and an arbitrary number, for example, DFW3.
- Resolver IP address: The IP address of the DNS resolver that submitted the request to Route 53.
- EDNS client subnet: A partial IP address for the client that the request originated from, if available from the DNS resolver.
-- raw to source
COPY (
SELECT regexp_split_to_array(message, ' ') as parts
FROM read_parquet("data/raw/*")
WHERE message NOT LIKE 'Route 53%' -- Exclude the test event
) TO 'data/source/data.zstd.parq' (FORMAT PARQUET, COMPRESSION ZSTD)
--source to final
COPY (SELECT
strptime(parts[2], '%Y-%m-%dT%H:%M:%SZ') as timestamp,
parts[4] as domain,
parts[5] as query_type,
parts[6] as response_code,
parts[7] as protocol,
parts[8][:3] as location,
parts[9] as ip_address,
parts[10] as additional_info
FROM read_parquet("data/source/*"))
TO 'data/final/data.zstd.parq' (FORMAT PARQUET, COMPRESSION ZSTD);