Asset Discovery: Making Sense of the Ocean of OSINT
9 Aug 2019
Comprehensive Talk
Asset Discovery: Making Sense of the Ocean of OSINT
Richard Gold
Abstract
Asset Discovery: Making Sense of the Ocean of OSINT
When performing OSINT reconnaissance against a target, it’s often very difficult to accurately define the scope. There are so many sources of information and so many diverse types of data. It quickly becomes overwhelming. While there are many excellent OSINT tools already available to the discerning OSINTer, their focus is usually on breadth of collection. Our experience is that asset traceability and narrowly-focused discovery help us to discover the best results. To that end, we’ve developed a tool: the “Offensive Orca” [https://github.com/digitalshadows/orca]. This approach focuses on comprehensive asset discovery coupled with narrow scoping to avoid false positives.
As a brief overview, Orca does the following:
- Domain discovery with Google and Shodan
- Sub Domain Enumeration Lookups
- WHOIS Lookups
- Export to Excel spreadsheet
How Orca Works
The end goal that we have in our targeting is to discover vulnerable or misconfigured systems. By vulnerable, we mean that a network service running on a machine is vulnerable to a (public) exploit. By misconfigured, we mean that a network service is revealing sensitive information or public (industry default settings) access. Setting the goal of our OSINT research explicitly upfront helps us to discard unnecessary data sources and ensures our collection is useful.
Our rules of engagement are:
1. No exploitation
2. No authentication bypass
3. No Denial of Service (DoS)
This keeps us on the right side of the law!
In order to achieve our goal, we need to find the systems belonging to our target. Scoping is critical here. The consequences of misidentifying a system are severe. In order to have confidence in our targeting, we must be able to trace a discovered asset back to the initial piece of asset data. Any reconnaissance engagement starts with an initial pool of asset data. No engagement starts in a vacuum. Small pieces of information are typically provided to drive an engagement. Examples of these are company name, domain names and IP ranges. Depending on an engagement, you may receive more or less of these initial pieces of asset data and they will also be more or less reliable!
For each piece of asset data, a lookup needs to be performed, e.g., from a company name to a set of domains, IP ranges/addresses, from a domain to list of subdomains and so on. When we look up a piece of asset data, whatever result that is generated is stored in the Orca’s database, it also stores the ID of the piece of data that was used to seed that lookup.
To discover domains from a company name, we can automatically Google the company name and check which domains are returned in the results. We can also use SHODAN to perform an organizational search to discover domains from a company name.
We immediately hit our first problem. How do we know that the domains returned are associated in any way to our target? The tragic answer is that we don’t, without manual checking. That’s why the Orca prompts the operator after each domain so that the necessary checks can be made. It is a time-consuming and frustrating task but doing that initial work up-front saves a world of pain later. There are a few tips’n’tricks for this, but it’s ultimately target-dependent and requires knowing some context about the target in terms of which sector, geography, etc. it operates in. The importance of this cannot be overstated. If the initial asset list is not appropriately curated, then it will be hard to have confidence in any future results.
Hostnames can be discovered by a process of subdomain enumeration. There are excellent sources of data for which subdomains/hostnames exist for a particular domain. Our two preferred sources are :
Rapid7’s Forward DNS data set (https://opendata.rapid7.com/sonar.fdns_v2/)
Certificate Transparency logs (https://crt.sh/).
Combining these two sources together is very powerful. The Orca can do this for you. Certificate Transparency is particularly interesting as it is a real-time stream so there are cases where you can catch a machine having a certificate issued but before a comprehensive managed security configuration is applied. The OWASP amass tool (https://github.com/OWASP/Amass) is our current go-to tool for performing this process when we are not using the Orca. The advantage of enumerating subdomains by using the main domain as an anchor is that if a hostname belongs to a domain, we can have high confidence that the two are related. This dramatically cuts down on false positives.
IP ranges can be found via free text searches of WHOIS data, especially the organization name or net name. This is also an error prone process. As with the previous section, the operator must curate the results from this search. This WHOIS data can be collected manually from the Regional Internet Registries (RIRs) responsible providing a convincing use-case can be made, or access can be purchased from one of several WHOIS data providers. A note of caution around cloud providers: unless the operator is extremely sure about the provenance of the IP range, it is recommended to exclude cloud provider ranges from your asset discovery process.
Once a curated set of hostnames and IP addresses has been discovered, it is then required to figure out what services are running on these hosts. If we are conducting a passive reconnaissance exercise, we need to use a third-party service such as SHODAN, if there is not this kind of requirement, we can use an active scanning tool such as masscan or nmap. In the case of SHODAN, it now returns CVE information which greatly assists the lookup process. Previously, an operator would have to take the CPE (Common Platform Enumeration) information, e.g., “cpe:/a:microsoft:internet_explorer:8.0.6001:beta”, and look it up in a CVE database such as those maintained by Mitre.
When a list of CVEs has been created for our validated set of hosts, it is worth considering how we provide an assessment of this list. Not all CVEs are remotely exploitable and, in this case, we are performing OSINT against a remote target, so local-only vulnerabilities and exploits are not directly useful. We typically use third-party services such as ExploitDB or the Metasploit exploit collection to see which vulnerabilities we have discovered have public exploits available. Most don’t. By restricting ourselves to only remote services which are directly exploitable in practice, we can avoid the alert fatigue associated with a high number of theoretical vulnerabilities.
Given the ubiquity of Excel, Orca can generate a spreadsheet which contains the findings, that is, hosts with exploitable remote vulnerabilities and the set of all discovered assets. Exporting to a spreadsheet means that it is straightforward to get an overview of the important findings whilst also making it straightforward to export a target list, for offensive operations, or a list for remediation, for defensive operations.
In summary, our OSINT approach focuses on comprehensive asset discovery coupled with narrow scoping to avoid false positives. By setting an explicit and clear goal upfront about the results we want, namely exploitable or misconfigured systems, we can avoid a lot of the noise generated by a typical OSINT discovery process. We use a standard reporting approach, that is, an Excel spreadsheet (other formats such as raw JSON and CSV are forthcoming!), to enable consumers of our OSINT the maximum flexibility and integration with their existing workflows when it comes to processing the results of our work.