What is bitsquating?
Bitsquatting is the name given to "DNS Hijacking without exploitation" as described by Artem Dinaburg and refers to the act of registering a domain which is 1-bit off from another domain, the target. The idea is to exploit the fact that memory can become corrupted, leading to a bit flipping. By registering domains which are 1-bit off from other (likely popular) domains, when memory on a client that contains the target domain becomes corrupted it is possible it may end up pointing to the 1-bit off domain, leading to the client connecting to the 1-bit off not the original with no direct exploitation, or mistake on the client part. This attack cannot be targeted at specific users, as the owner of the 1-bit off domain cannot (within the scope of this attack) trigger this memory corruption on client machines. As such this attack works best when used against a popular domain, where the likelihood of a connecting client experiencing memory corruption is highest.
For further information I recommend Artem Dinaburg's blog post on the topic http://dinaburg.org/bitsquatting.html
Bitsquat Detector
Core Tool
The full sourcecode can be found at https://github.com/Jack-Barradell/bitsquat-detector
Checking one byte
I found the idea of bitsquatting interesting, and as such I worked on developing a tool which takes a domain and calculates all valid domains which bitsquat on it. As such I fired up my text editor and got to work. First I got to work on a function which took a byte of binary representing a letter as a string. e.g.
'01101110' == 'n'
By converting the binary to a list of bits, I could flip each bit in turn and check if the resulting byte represented a letter which would be valid in a domain ensuring only 1-bit was changed at a time. For example, changing the last bit of 'n' from 0 to 1 causes it to represent 'o' instead. Each time a valid character was found I added the changed byte to a list and once all the bits had been tried this list would contain all valid url characters that are 1-bit off from the original.
One issue in this section was which character are valid in a url. After some research I found the characters a-z and 0-9 were valid and that dashes ("-") and underscores ("_") were also valid, but not as the first or last character. I also found a little-known piece of information that multiple dashes can follow each other, e.g. www.valid---url.com. Now armed with this information I setup a regex to check if the calculated 1-bit flipped binary represented a valid url character.
Another issue I found was that some lowercase characters were 1-bit off from their uppercase counterparts, but when used in a url they were effectively the same character. My initial thought to solve this was to exclude uppercase characters as invalid, but this would mean if a lowercase character was 1-bit off from an uppercase character which represented a different letter then a valid bitsquatting domain would be excluded. I then came to the solution which I implemented, I compared the original character to the generated character which I converted to lowercase, if they matched then I excluded the character, otherwise it was valid.
Checking the whole url
Now with a tool to check individual bytes of the url, I implemented the functionality to check the whole string, this was relatively simple to achieve and started by converting it to a list of the characters represented in binary. Each of these characters was then passed to the function to check a single byte. For every byte passed a simple process was then carried out, the check byte function returned a list of valid replacements for this byte. With the list of possible replacements for the byte, a copy of the url is generated for each replacement, with the appropriate byte being replaced with a valid substitution for it and added to a list of bitsquatting urls. So if there were 4 valid replacements for a byte, 4 copies of the full url would be created, 1 for each replacement. For example if the whole url was dog.com and the byte being checked was "d" then the following urls would be generated and added to the list
tog.com
log.com
fog.com
eog.com
This process is repeated for each byte until there is a list of all bitsquatting urls, for the same example. dog.com would result in
tog.com
log.com
fog.com
eog.com
dgg.com
dkg.com
dmg.com
dng.com
dow.com
doo.com
doc.com
doe.com
dof.com
Checking if domains are registered
Next I wanted the tool to be able to tell if the discovered bitsquatting urls are registered or not. To do this I decided to use a whois tool, settling on pythonwhois due to it's easy API, packaged whois tool and ease of install simply using pip
> pip install pythonwhois
After some experimentation with the tool, I found the "status" field only existed if the domain was already registered and therefore not available. This made the process of checking if the domain was registered relatively easy, when the flag to check domain status is used, the tool simply performs a whois check on the domain and checks to see if the status field is returned, if it is then the domain is registered, otherwise it is not.
Whois rate limiting
Although it was a simple thing to implement, the domain availability check had some flaws. Mainly when the tool was used to check a large number of domains in a short space of time, the whois requests quickly became rate limited. Although the solution to this was obvious, (limit the number of requests sent in a time period) I needed a way of knowing if the tool had been rate limited, otherwise the availability checker may give false results. Luckily I noticed a specific string would be sent in the "status" part of the whois response if the rate limit was hit. As such I merely added a check to the code for this string, and if it was present toggled the domain availability checker off while alerting the user. In order to prevent the rate limit being hit, I did some research and found that by limiting the requests to one every 5 seconds the rate limited was not hit (At least in any of my tests). To implement this I simply added a call to time.sleep() after each whois check to sleep the program for 5 seconds and avoid rate limiting.
Final Thoughts
Overall I found the topic of bitsquatting very interesting, the idea of exploitation through hardware level issues with no direct interation made it highly intriguing and inspired me to carry out the work on developing this tool to find bitsquatting addresses. I will be keeping an eye out in future for any refernce to this attack being used in real life and may sometime setup further tools to do tests on this, such as registering a bitsquatting domain and recording statistics on the number of hits it gets in a given time-frame.
Thank you for reading about my time spent with bitsquatting and check back soon for more developments!