OSINT notes

Intro

This is a small post about my experience with a project I recently did, an “OSINT flyover”. The projects are not finished yet, but once they are, I’ll update the post to say how the client accepted their report.

Goals & Targets

The target was a company. Let us call them EvilCorp. A medium-sized business, internationally branching, providing services (not products). I got a URL and an organization ID to go on, together with the headquarters location.

My goal: Provide as much information as possible from OSINT only, that means no nmap scanning, no DoS attacks, no excessive touching of any kind. This means all I can work with is stuff someone else already went through and collected. This also serves to simulate a very covert attack, since the target will not tie any non-standard actions back to me. Most I will be able to do is visiting the websites and collecting screenshots, but we’ll get to that later.

How deep does this rabbit hole go?

Okay, so we have a domain, let’s say evilcorp.tld. I had several targets in mind, based on priority:

  • Domains (all domains owned by the company)
  • Subdomains (all xyz.evilcorp.tld or similar)
  • Services (Every open port on those subdomains I can find without touching these devices myself)
  • Miscellaneous
    • Breached user accounts
    • E-mail schema
    • Wireless access points
    • Physical layouts

Domains

Let’s start from the top. We cannot afford to get into depth-first searching, since time is of the essence and we may miss some important part of the network.

A hypothetical example: Let’s say I find git.evilcorp.tld. I find they’re running an old gitlab instance and that user information and code is open (basically security by obscurity). I spend 2 days documenting my findings in this part, which turns out to be low-level threats to the company if there was a very motivated attacker. Now, if I took my time doing breadth-first searching, I would have seen the rdp.c-corp.com domain, which runs Windows 2008 and is extremely vulnerable to unsophisticated attacks. My report would be pretty useless, full of something the client may actually be aware of already! So breadth-first, all the way.

Starting out, we need to find the domains associated with this company. Certain countries do not provide DNSSEC, so if you’re investigating these, you may do a lookup of the domain, find the business ID and find a registrar for that country. You may have all of these served up on a silver platter, then!

If you’re not that lucky, you can put the company name in ViewDNS, and what you get is a list of possible domains registered with the same company. Sure, if we’re talking about a company name like EvilCorp, we’re bound to get a boatload of data. It is now imperative to scope them out, so that we don’t give the client misinformation. For this, we use whois. It’s quite easy, if you script it, you can just have a script that goes through WHOIS records, domain by domain, and afterwards you get a simple table of domains and owners, one per line.

Subdomains

For subdomains, it is better to use more tools and have duplicate results than rely on one tool that may give you incomplete results. I usually use the following list of tools to help me make a more educated guess:

  • Sublist3r
  • SecurityTrails API
  • Amass
  • DNSDumpster.com
  • others (didn’t use them often enough to care)

So starting from the top, Sublist3r gives a list of subdomains based on search engine findings. It can take a domain, a list of engines, an output file to spit your results into. All in all, a simple-enough device.

SecurityTrails API is a login-required site that can give you a lot of info. All sorts of records, MX, NS, SOA, you name it, it’s there. Including historic records. A very valuable resource, since it can also do what dig will do for us later.

Last but not least on the list is Amass, a tool by OWASP. The framework itself comes with a heavy list of potential APIs it can use, but I can’t give you those. I got them fair and square, so just find your own. The tool actually gives you a database you can search. After you run amass once, maybe run the script, and unexpectedly close out, it’s fine! You just run amass db and everything works! (Provided you set up properly).

With all these tools, you should have a nice, almost complete list of subdomains you can search through.

Going from subdomain to IP address.

We have our list of domains and subdomains, which can range from a few to a few hundred. Yes, hundred. This should be most of the stuff logged and cached in search engines and dumpsters. What now?

Well, now we need to know how many of these are actually working. I found there are 2 common ways companies work with their DNS space:

They either

  • Order every single subdomain they can (evilcorp.com, evilcorp.eu, evilcorp.net, etc.) and never use 90% of those, or
  • They register a couple of these, and then get too many subdomains to use.

We can filter out both of these with a text file containing all your domains and subdomains, and – drumroll – dig.

Digging for gold

Why do stuff manually when you can script for 3 hours and save 10 minutes of your time, then lose the script forever?

– a colleague of mine

That is the mindset! Let’s see how dig works usually.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
dig google.com

; <<>> DiG 9.16.11-Debian <<>> google.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16102
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;google.com. IN A

;; ANSWER SECTION:
google.com. 300 IN A 172.217.XXX.XXX # This is our output, google.com points to this IP!

;; Query time: 28 msec
;; SERVER: X.X.X.X#53(X.X.X.X)
;; WHEN: Mon Mar 08 19:12:27 CET 2021
;; MSG SIZE rcvd: 55

This does not look very fun to script through, right? Every command gives you dozens of lines of output with only a few being meaningful. I don’t need to see what DNS server I contact every time! I do not need the whole pseudosection, and I do not need to see the question section either! So let’s start this from scratch.

The first option we have in the dig command help is +noall. This command takes the whole command result… and deletes it. The result is an empty output. Nada. To get anything now, we can add just the section we want to see. We only need the answer section, so we’ll pop in a little +answer and what do we get?

1
2
dig +noall +answer google.com                   
google.com. 134 IN A 172.217.XXX.XXX

That’s more like it! One line, nice and easy. That’s the next step on our way to running hundreds of domains and getting ALL THE IPs in a matter of minutes.

To do so, we can sed the output of this command and change the whitespace to a comma, so that our output looks more like CSV and makes it cuttable.

1
2
dig +noall +answer +nottlid google.com | sed 's/\s\+/,/g'
google.com.,IN,A,172.217.XXX.XXX

Nice! We have a nice CSV output, but there’s still one thing we need to cut out. Honestly, if you find a need to leave the IN in, by all means. I prefer to cut it out. To cut out separate fields in a CSV with a bash script, we can use cut:

1
2
dig +noall +answer +nottlid google.com | sed 's/\s\+/,/g' | cut -d',' -f1,3,4
google.com.,A,172.217.XXX.XXX

Now I’m happy! Now just to do the same command… with a hundred domains and subdomains.

In one fell swoop

Before I knew how dig worked properly, I did a for loop which read my domains.txt file one by one and ran dig for each one. Well, there’s an easier way!

1
dig -f $YOURFILE dig +noall +answer +nottlid google.com | sed 's/\s\+/,/g' | cut -d',' -f1,3,4

This command, if $YOURFILE contains domains and subdomains one per line, you should get a flurry of output which you can just pipe into a file, say… ips.txt. Now, this is a full-blown CSV, so you can easily plot this in Excel, filter through, do all the magic, but we’ll just go through a few hoops to get unique IPs.

1
cat ips.txt | cut -d',' -f3 | sort -u

This command:

  • prints the list to screen
  • cuts out the 3rd field (the last record, the IP)
  • sorts the list and only prints unique records

This goes into another file, again. Unique-ip.txt, let’s say.

Now we get 3 lists: Subdomains, unique IP addresses, and then the whole dig output in a CSV format.

Trust your pal Dan. You can always ‘show Dan’.

Okay, starting to look nice, huh? Now we need actionable data. Something to pass on to the pentesting team that they can take and go straight from the CSV to running some exploits.

What do we want? Ports. Who can give them to us? Shodan!

If you don’t have an API key for Shodan, I am absolutely recommending you get a subscription. I got mine on the Black Friday or Cyber Monday, when it was 5$ for a lifetime subscription.

There is a CLI script for this, to make our lives easier.

1
2
3
4
5
6
7
8
9
10
11
12
shodan host 172.217.XXX.XXX
172.217.XXX.XXX
Hostnames: #################################
Country: United States
Organization: Google
Updated: 2021-03-08T02:29:39.172331
Number of open ports: 2

Ports:
80/tcp
443/tcp
|-- SSL Versions: -SSLv2, -SSLv3, TLSv1, TLSv1.1, TLSv1.2, TLSv1.3

This is the usual output. If there are vulnerabilities listed in shodan, you get those, too. Now, if you want all this output, by all means. I like my output a little more… csv-y.

Again, a lot of sed, grep and cut, and we’ll get there!

1
for i in $(cat unique_ip.txt); do echo $i && shodan host $i | egrep -E 'tcp|udp|Vulnerabilities' | sed -e "s/^\s\s\+/$i,/" -e "s/\/tcp\s\?/,/"; done > shodan_all

The output now looks something like this!

1
2
3
4
5
6
7
[SNIBBEDY SNIB :DDDDD]

172.217.XXX.XXX
172.217.XXX.XXX,80,MyCoolServer
172.217.XXX.XXX,443,AnotherCoolServer

[SNIBBEDY SNIB :DDDDD]

Another great win for us! We get another CSV file to put in our recon spreadsheets! This script will run through each IP in turn and give you a list of open ports in a CSV format. This format is easy to filter through and gives you a one-glance overview of how many ports are open, since it’s one port per line, not one IP per line.

Collecting screenshots

All the work you did up to this point is nice and all, but unless the client gets a report they can look at and see everything you are seeing, you may as well have been outputting everything into /dev/null. We need to give our report a nice look, and there is nothing better than pictures for that. Screenshots is what we need. There are way too many tools to use for this purpose, so I will only talk of two: Aquatone and webscreenshot. There are loads more, I’m sure, but these are the two I used.

Aquatone

Aquatone is an interesting tool, very quick to do its job and customizable (you can set a huge timeout for those pesky Tor-routed searches), and it gets the job done. One thing I may see presenting a problem is the fact that you need Chrome installed. Chrome or Chromium, but you need it. Not the smallest way to do things.

To use it, you cat the URL list and pipe it into aquatone. The added benefit is a nice HTML report that even provides you with the frameworks used. Very big bonus in case you don’t get the info from Shodan.

Webscreenshot

This one is more of a “read your own adventure” type of deal. You can choose the engine that actually takes the screenshots, plus everything but the kitchen sink. This is the first tool I used for my own reports, and it still holds up nicely.

Miscellaneous findings

You can get a lot more than just domains, subdomains and ports. One of these examples is wireless networks, physical layouts, anything and everything you can think of. Ask yourself the question: What Would an attacker do?

Wireless access points

WiGLE.net has a beautiful map, and if you’re logged in, you can do an advanced search. Try the main site of the company first (easy to find out), and then look at WiGLE to see if any wireless networks match the name of the company. Then, you run a WiGLE search for any networks with the same name (or append -guest to get guest networks). This may simulate an attacker who is near the site and trying to get in. The client should know that an attacker can find this information through OSINT, without even taking a step outside their home.

Layouts & Access routes

If you can find the company site on Google StreetView, look around for any technology that may give you an idea of what security is in place. Is it an empty lobby or a reception desk? Are there security guards around? Cameras? What access system is in place? Are there garages? These and more may be interesting to a client.

Security badges

One often overlooked, but very surprising finding for a client is if you can, through OSINT only, get their employee badge template. Sure, folks get trained to not take photos of their badges, that’s why they are going up in value-as-a-finding. If you can find them from the intern who couldn’t be happy enough to start working at that company, that’s one thing. But often, the best shots you’ll ever have a chance of getting are on the company’s own social media. YouTube channel, Twitter, all these photos. From just 2 little blurry shots, you can guess pretty accurately what the badge looks like.

Writing a report

Now we get to the juiciest – and possibly longest – part of every report. The report is what your work will be presented with. Your reputation stands and falls on that document. Once it is sent, it’s over.

Now that we have the drama out of the way, writing a report is not really that difficult, it mostly depends on the amount of information you have and the filtering you do. I can’t tell what you’ll find, so I can’t say how to order your report.

You have to follow all the usual report routines, as follows:

  • Cover sheet
  • Table of Contents
  • Executive Summary
    • This provides a “WOW” effect, as well as an easy sorting for C-levels to get the general idea and for them to have an overall idea of how to proceed further.
  • Techniques & methods
  • Domain-by-domain report part
    • This will be the flesh of your document. This can easily go to dozens of pages with a pretty small target (my target had only one third of their domains used for anything and it still went up to around 30 pages of only the domain-by-domain bit.
  • Tools used, sources

Now, it seems very off-hand to give unsubstantiated opinions on your findings without testing them out, but the point of OSINT is to be hands-off, but still provide value. If you find an out-of-date system, please tell the client about it. If you find an open RDP server, you can imagine how that might work out. By all means, don’t go on crazy rants, but use your common sense and if you find something you KNOW could be exploited, say so.

Final thoughts

This is what I did for my now 2 projects considering OSINT. I’ll keep this article updated to tell you about how it flew with the client. But so far, it’s looking good.