I have a huge set of object (i.e. traceroutes) hosted in Google BigQuery. Those objects can be considered as arrays of IP addresses (called hops). I am trying to find a query to identify all the traceroutes crossing a few set of prefixes of interest (i.e. prefixes_of_interest). Those prefixes can be found in another dataset hosted in BigQuery as well. I have been able to write a query to discover all the traceroutes crossing a specific set of IP addresses (i.e. ip_of_interests), after start_date via this query:
WITH
traceroutes AS (
SELECT
hops,
start_time
FROM
`traceroute` AS t
WHERE
DATE(start_time) >= start_date )
SELECT
hops,
h.ddr,
start_time
FROM
traceroutes,
UNNEST(hops) AS h
WHERE
h.hop_addr IN (
SELECT
IP
FROM
`ip_addresses`)
Now, in practice, I could translate all of my prefixes to IP addresses and reiterate a similar query but that feels like a very inefficient way to proceed. I would like to know if there is a way to match the h.hop_addr prefixes with the one from my database prefixes_of_interest. In practice, my subnetworks have different lengths (most are /24 but I have a few /20) but a solution that would match /24 would do the trick and allow me to save tons of space and processing time I reckon. Hopefully the question makes sense!
