Facebook leak - what is it good for?

So, we have another leak on our hands. The lucky winner of this year’s lottery is Facebook! And the unlucky losers are 533 million people. A few days ago it was published for free, and I actually made the effort this time. This post is not about what it means for users, because, as Facebook put it, It’s their fault for putting all that shit out there. I will talk about the version of the database I found and what we can do with it.

Format of the database

Since this is the first database I’ve seen of this size (with this amount of columns), it is good to establish what each column is. For that, we should first establish what the columns actually mean. Since not all records are fully populated, we need to find a record that has all columns filled.

If your copy of the DB is column-separated (like mine is), we can

1
grep -v "::" source.txt

and this will give us records which have all the columns populated.

Example output:

1
Phone number:Facebook ID:First name:Last name:gender:City:District/wider area:Relationship status:Employment status:Scrape-date 00:00:00 AM:e-mail:birthdate

All this data at our fingertips! But the columns are long AF (especially with those wOnKy QuTiEPyE PrOFilEs UwU). Armed with the column knowledge, we can cut and sed and awk… and shit. I intentionally left in the scrape time, since I forgot to take it into account when I first started cutting for names and e-mails.

How do we get what we want?

As I said, grep and cut and sed and awk and shit.

To check a record based on some info we have, we can use grep

1
grep $PHONENUMBER file.txt

If the record for a phone number exists, it will show up. To get names, we need to remember the separating column

1
grep FIRSTNAME:LASTNAME file.txt

This gives us the line. But I really don’t need the scrape date, and most of the time I don’t need the city. Let’s say I just want a name, number, and e-mail.

To select columns we want, we use cut.

1
cut -d":" -fX,Y,Z

where X,Y,Z are the column numbers we want. For example, if we simply want phone#, first name, last name, we can

1
2
3
4
grep John:Doe file.txt | cut -d":" -f1,3,4

OUTPUT EXAMPLE
12345678901:John:Doe

You get the idea. Any column can be filtered out this way.

Clean-up and packaging

Got some nice-as-heck data. But I still don’t like those phone numbers. I want them in the beautiful +1 (234) 5678901 format I’m used to. SED TO THE RESCUE!

1
2
3
4
grep John:Doe file.txt | cut -d":" -f1,3,4 | sed "s/\(^.\)\(...\)\(.\{7\}\)/+\1\ \2\ \3/"

OUTPUT EXAMPLE
+1 234 5678901:John:Doe

This is better. But we can do one more trick. Sed can get rid of the annoying colons for us and replace the result with a nice format.

1
2
3
4
grep John:Doe file.txt | cut -d":" -f1,3,4 | sed "s/\(^.\)\(...\)\(.\{7\}\)/+\1\ \2\ \3/ ; s/:/\ -\ / ; s/:/\ /"

OUTPUT EXAMPLE
+1 234 5678901 - John Doe

This looks nice. We can export this! Instead of grepping, you can just cat file.txt and boom, you’ve got a list of phone number and name, line by line. Once you get a call from someone you don’t know, you can check if you know that person.

Privacy impact - offensive

I have so far found several uses for the database. I did not misuse any of those for illegal stuff, but I did try to find all my friends in the DB, just to let them know as a PSA that they’ve been found there. I will present several uses for OSINT.

Contact association

If you get a call from an unknown number, it is simply a matter of looking it up in the database and you get the name of your caller. This may be extremely useful for those shady marketing companies that propose BYOD or lazy spammers/scammers who use the same number for Facebook as they do for their business. This only goes if you get a call from someone you don’t know and you want to mess with them.

Profile lookup

But hey, a name is useless if that name is listed as “420 L33T h4xx0r” and the scrape date is sometime in 2017. This is where the Facebook ID comes in. It is positioned at field 2, and if you go to facebook.com/ID, you can find that person’s up-to-date information. This may show you that someone changed their name, the name they are using now on their profile, and much more. See some facebook OSINT tutorials for that.

I know this does not go for everyone, but I actually had a chance to use this for a legitimate OSINT investigation.

Let us assume you are looking for Mr. Anderson and you know he lives in a small town with quite a unique name (let’s say Snailville-on-the-Rhine, I dunno). Looking up all Andersons is useless, there are way too many to go through. However, it’s all about correlation of data. If Mr. Anderson put his number, full name and town of residence in his Facebook bio, we can first grep for Anderson, and after we get all the Andersons, grep for Snailville. If this returns a lot of results (maybe it’s an Anderson colony), we can further grep for first names or whatever we can find from the bio. This is an extremely useful pivot, especially since every record has a Facebook ID. If that ID is still active, you can quickly confirm it’s your target and go on from there.

Privacy impact - defense

Sure, I keep spouting the great uses for OSINT and hacking, but this is because I’m not in the database. I haven’t been part of Facebook for years, back when it was cool to actually be on Facebook. You may face a different situation, with your personal information listed as part of this breach. What then?

It’s not the end of the world. You probably don’t think so. But I was asked the question “What can we do about this?” The answer depends on your threat level, exposure level and/or simply on dedication.

Threat level: low

If you feel this is very low-risk and you’re willing to live with what I’ve described above and more, you don’t need to do anything. Just get used to the two or three calls with telemarketers, and hang on. If you find that telemarketers are onto you, just let them know that you want them to remove your number from their sources. Do that enough times and you may regain a bit of calm (this goes more for Europe, since we have the GDPR and companies are obligated to remove a number and never call it again, or else they face legal liability).

Lock down your profile, too, since anyone can find your profile pic and bio if you don’t do so.

Threat level: I need my phone number safe now!

Change your phone number and lock down your profile (if it isn’t locked down already). The number you have now is not private anymore, it is associated with your profile on Facebook. If you change your number, the threat avenue of “just call that person” are off. Still, your current profile may be associated with your old phone number, but if your profile is private enough, you may enter risk acceptance.

Threat level: I don’t want people knowing about my profile either!

Now we’re getting to the higher risk levels. This means that you (understandably) don’t want anyone to be able to look at your facebook page and associate your face and name to a phone number. This is harder, since it will require you burn the Facebook bridge and build a new one.

  1. set your profile to completely private. If need be, remove friends, but not necessary.
  2. Delete your phone number from facebook, change your password to something extremely long and random.
  3. Change the profile info (generate a profile on fakenamegenerator.com, grab a profile pic from thispersondoesnotexist.com, all the good stuff).
  4. Delete your facebook profile. Yes, completely delete it.
  5. Get a new phone number, one not associated with the leak. Burn the old one. If you feel it necessary, burn the e-mail address too (if associated)
  6. Create a new profile, taking care not to leak any of the information that got out previously, using another e-mail, no phone number, not your real name, etc.

Steps 1-5 will render the data in the leak pretty much useless. Step 6 will enable you to still use Facebook if you choose to do so.

Threat level: I am being stalked/extorted and now my stalkers know my contact information!

As a great man once said: Burn it all down (thanks, Jerry!)

You need to get rid of all the information presented in the previous threat level, but don’t do step 6. You don’t need Facebook. It’s simply a way to put your face and life in the hands of one site that will eventually dump its secrets to either the highest bidder or simply someone online who will sell that data for a fraction of a bitcoin. As I reasoned when I deleted my facebook all those years ago, if someone wants to stay in touch, they will. If they don’t want to, that’s that. I found most people went along with my new contact types, and it helped my mental health, too. If you want to know why, there are reports of Facebook manipulating what people see and checking if they get depressed or not. To me, that’s toying around with my head and only the voices in my head are allowed to do that!

In closing

Do not consider this the complete information. I’m only giving my $.02, a little piece of my mind and what I have found useful in the huge dataset so far. There may be other uses, but I needed to put my thoughts down on paper. If you found any of this useful, let me know! (No, I won’t tell you where to find it, the whole 61GB package is one good search away).