Project SHADOWSTAR: A Data Driven Approach to Network Block Enumeration (Part 1)

by Peter Crampton | Jul 9, 2021

Reconnaissance (recon) is a critical yet often underserved area in information security. For most, recon simply doesn’t have the same allure as its cousins: enumeration, exploitation, escalation, etc. and thus it often doesn’t get the attention it rightfully deserves.

TL; DR

In this post, we put recon at center stage and discuss:

An introduction to network block enumeration and why it matters.
Discuss some network protocols, Internet history and enumeration methodologies that are commonly employed.
Some of the pitfalls we’ve encountered with these existing protocols and techniques.

In the follow-up to this post (coming soon), we’ll show you how we’ve leveled up our recon game at SRA to automate this process and take advantage of these techniques. We’ll also walkthrough the release of SHADOWSTAR so you can level up your recon game.

Introduction to Network Blocks

Network blocks are a fundamental Internet resource that many organizations own. Every network block corresponds to an IPv4 or IPv6 range that is assigned to a particular entity. They can range in size from very large blocks of thousands of IPs like this 108.177.0.0/16 to just a single IP like this 108.177.16.19/32. An entity can own multiple blocks and sometimes a single block can be owned by multiple entities simultaneously through a process called sub-allocation. We’ll touch on sub-allocation shortly.

With all the network blocks and address space in the world, it would be difficult to have a single organization coordinate all allocations globally. This is where registries come into play.

Registries exist to manage, organize, and allocate the network blocks of the world, and are federated into a hierarchy-like system. Registries operate at different levels of scope, most being familiar with the regional variety such as ARIN and RIPE-NCC. Regional Internet Registries (RIRs) like these manage entire geographic regions themselves; however, some RIRs like APNIC and LACNIC further federate management to National Internet Registries (NIRs) who in turn can federate even further to Local Internet Registries (LIRs) and Internet Service Providers (ISPs).

In addition to the registries, different international organizations exist to help coordinate the allocation and assignment of these resources. The major players here are IANA (Internet Assigned Numbers Authority) and the NRO (Number Resource Organization). Their main job is to work with the registries to coordinate their allocations and ensure that there are no problems as well as provide statistics to the public to show allocation trends over time.

Sub-allocation occurs when an entity that owns a network block has decided to partition the block into several pieces and give those pieces out to other entities for them to operate (possibly autonomously). It is important to distinguish that with sub-allocation, we are NOT talking about entities like ARIN and RIPE-NCC but entities like ISPs who own large network blocks allocated to them by a registry like ARIN; it is ISPs and alike who perform the sub-allocation to their customers or partners.

One final point to mention about sub-allocation is that sub-allocating entities are not strictly required to report their sub-allocations back to the NIRs/RIRs; the implication of this being that Internet registries may not (and experimentally often don’t) have a complete record of all of the network blocks that belong to an organization. Note: This is an interesting point and will motivate some discussions about using BGP and IRR data later.

Network block enumeration refers to the process of identifying network blocks that have been allocated or assigned to a particular entity. Network block enumeration plays a critical role in the reconnaissance phase of a penetration test and helps to provide visibility into IP space which may host on-premises infrastructure or other in-scope systems for testing. Typically network block enumeration is achieved via keyword searching using a variety of different methodologies.

For more information about registries, sub-allocation, and other details relating to Internet number management, you should refer to RFC 7020 ^[1].

If there is one thing to take away from this section it’s that when you sit down to perform the process of network block enumeration, there are a lot of different entities to be considered depending upon your client and scope. In the next section we will cover the prevailing methodologies that exist for network block enumeration.

Network Block Enumeration – Typical Discovery Methods

The way we used to perform network block enumeration at SRA for penetration tests and Red Teams was by doing keyword searches on the RIRs. We would go through the WHOIS web services exposed on their respective websites and collect any network blocks that matched our queries.

Instead of doing it this way, you could also collect this information from the RIRs using the two different lookup protocols: WHOIS and RDAP; these protocols allow you to query for a resource like an IP address or domain name and get back registration information, that is, who owns it. Let’s explore these a bit.

WHOIS is still a very widely used protocol: according to the ARIN 42 talk titled “Directory Service Defense” ^[2], 90% of ARIN’s requests still come from WHOIS over port 43. Recall WHOIS only allows you to perform lookups and that’s it. Moreover, WHOIS lookups themselves are not very useful since you cannot do any kind of keyword-based searching.

With that said, every RIR seems to have developed some varying degree of non-standard extensions to WHOIS to make more robust queries possible. ARIN and RIPE-NCC have developed their own (incompatible) web services which wrap the WHOIS protocol and make it significantly easier to perform robust enumeration, WHOIS Restful Web Service (WHOIS-RWS). But what about the other RIRs: LACNIC, AFRINIC, APNIC? Historically, we always had difficulty operationalizing these RIRs and to explain why it helps to discuss the different lookup interfaces that exist for RIRs.

WHOIS vs WHOIS-RWS vs RDAP

There’s a good chance you may not have heard of RDAP before, so we will explore some fundamental differences between RDAP and WHOIS before proceeding. RDAP is a relatively new protocol defined in 2015 in RFC 7480 ^[3]. The protocol’s primary reasons for existing are standardization and internationalization.

The WHOIS protocol has suffered from its own success. It has become one of the most widely used protocols since its definition in 1985, yet the protocol itself has no mechanisms for dealing with common internationalization concerns such as textual encodings other than ASCII. This, combined with the fact that the WHOIS protocol definition is very minimal, has led to inconsistent implementations between the RIRs. RDAP was meant to try and correct these and other failings of WHOIS.

Note: RDAP is not the same as WHOIS-RWS. WHOIS’s simplicity and ubiquity gave rise to powerful RESTful web services (RWS) that are provided by registries like ARIN ^[4] and RIPE-NCC ^[5].

This is why we had difficulty operationalizing APNIC, LACNIC, AFRINIC since they do not have the same kind of WHOIS-RWS that ARIN/RIPE do; instead, they merely expose a web interface to perform direct WHOIS lookups. Recall there is no conception of “search” or “organization” in regular WHOIS, just object lookups. RIRs implement their own custom extensions for providing those abstractions at their own discretion and APNIC, LACNIC and AFRINIC simply don’t expose the kind of interface we want.

Back to RDAP. You can think of RDAP as basically being “WHOIS over HTTP”. RDAP is basically a REST API which returns registrant information as structured data in JSON format. Here’s an example from ARIN’s RDAP server:

https://rdap.arin.net/registry/ip/8.8.8.8

This lookup would be analogous to doing a WHOIS lookup on 8.8.8.8. RDAP also supports a standard search interface which allows you to perform keyword-based searching unlike WHOIS, which only natively supports direct lookups.

RIR Data Dumps

Most people who do network block enumeration use one or more of the lookup methods described above. However, there is another way which is not as widely popularized.

We didn’t realize that many RIRs publish daily snapshots of their databases and provide them for you to download. These exports have personally identifiable information (PII) redacted but contain useful information for doing network block enumeration. The dumps that we are aware of as of writing this are:

https://ftp.ripe.net/ripe/dbase/
https://ftp.lacnic.net/lacnic/dbase/
https://ftp.afrinic.net/pub/dbase/
https://ftp.apnic.net/pub/apnic/whois/

Three things are worth mentioning:

These dumps are in a format called RPSL (routing policy specification language) which is defined in RFC 2622 ^[6]
LACNIC’s data is heavily redacted. Their database dump is produced to support a GeoIP initiative they have ^[7]. They release details on every allocated IPv4/IPv6 block with almost every field redacted except for the geographic location of the registrant of each network block.
ARIN does not publish a public database dump of WHOIS registrant information. They publish a public dataset as part of the Internet Routing Registry (IRR) program. This dataset is not the same as the WHOIS database but there is some overlap.

ARIN and LACNIC have a formal process for requesting bulk data access. At the time of writing, LACNIC appears to not be taking requests, but that may change in the future:

Assuming you can acquire one or both datasets to use ethically, then you will have very good theoretical visibility into the global picture of network block allocation.

RIR vs IRR data for enumeration

As mentioned above, ARIN does not publish WHOIS registrant information, but they do publish a dataset of Internet Routing Registry (IRR) data. Essentially, IRR data dumps are compilations of CIDR prefixes which are supposed to correspond to actual routes advertised by ASNs. IRR data is commonly used for network engineering related to Internet routing.

The IRR data is not an authoritative source of routing prefixes advertised by ASNs nor does it have to correspond to real routes; it is simply an auxiliary data source offered voluntarily by a loose federation of entities which make up the IRR providers. Here’s a short list of the key players:

ARIN
RIPE
APNIC
AfriNIC
LACNIC
LEVEL3 (now CenturyLink)
NTTCOM
RADB

You’ll notice that there is significant overlap between the RIR data sources and the IRR data sources, indeed every RIR is also an IRR data source. However, there are now some other players too like NTT and CenturyLink.

The amount of data within IRR data dumps is generally a lot less than the RIR dumps but more importantly, the data from IRR dumps must be used with caution. IRR data is known to be less accurate in general than RIR data because it is often not as actively maintained. This means that when you receive results back from an IRR data source, you should more thoroughly analyze the reported prefix to try to determine if it is still valid.

If we have an issue with validating the authenticity of IRR data, it is natural to wonder why we would bother with IRR data at all, why not just use RIR data? Recall network block sub-allocation: this is where IRR data comes into play.

Routes, though they are not explicitly network blocks per se, often can be treated as such, especially if they come from ISPs and do correspond to network blocks, just sub-allocated ones.

Notice that IRR players like LEVEL3 and NTT are ISPs; they provide Internet services to customers in addition to supporting global routing. We have found many routes listed within IRR data which tied back to our clients, assuming we can perform some validation, we treat those routes as CIDR blocks which are sub-allocated to our clients.

In practice, we have had tremendous success using IRR data. Here’s a sample of some of the things we’ve found that we otherwise wouldn’t have:

Exposed network infrastructure: Cisco routers/switches
Exchange and OWA servers
Single-factor VPN portals
Miscellaneous web application administration consoles

In summary, IRR data is plentiful, exposes intra-ASN routes as well as inter-ASN routes, and is available for bulk download, thus we chose to use IRR data in the SHADOWSTAR tool as a primary data source.

If you’d like to learn more about IRR, you can read about it on their website ^[10]. You may also like to check out the RADB’s website (a popular IRR provider) here. ^[11]

So there we have it, a solid breakdown of the various components that go into network block enumeration. In our next post, we’ll go over the release of SHADOWSTAR and how to get set up with it. We’ll also highlight some of the areas that it can help level up your recon game.

References

Peter Crampton

Benchmarked Threat Resilience

Cut Cloud Technology Costs

Vulnerability Management Simplified

Intelligence by Design