How to verify if a UUID follows the IETF specification
I was playing around with creating UUIDs for adding uniqueness in a project, so I started reading the spec for creating one in the RFC 4122. Also as a side note, the spec sheets there are really interesting and finely detailed for almost anyone to understand. I don’t know why I spent so much time on this, I was just having fun and time seemed to pass by..
Understanding the spec
The one I was after was a type 4 UUID which is randomly generated consisting of hex values under section 4.4. Luckily, from everything explained, this was the smallest algorithm required to implement:
4.4. Algorithms for Creating a UUID from Truly Random or
Pseudo-Random Numbers
The version 4 UUID is meant for generating UUIDs from truly-random or
pseudo-random numbers.
The algorithm is as follows:
o Set the two most significant bits (bits 6 and 7) of the
clock_seq_hi_and_reserved to zero and one, respectively.
o Set the four most significant bits (bits 12 through 15) of the
time_hi_and_version field to the 4-bit version number from
Section 4.1.3.
o Set all the other bits to randomly (or pseudo-randomly) chosen
values.
So maybe that’s easier to understand if you read it from the top, but what it essentially says is that you can generate a random set of hex values for 30 of the 32 values, but the 13th bit must be 4, which is a way of identifying what type of UUID it is (i.e. type 4), and the 17th bit can be of 8, 9, a or b. Here’s an example:
a0424604-03c6-4468-963b-002e5fbe2812
^ ^
| |
always 4|
|
either 8,9,a,b
The code for writing this was fairly simple and can be found on GitHub, but I wanted a way to verify it was created correctly.
Verifying the correct form
After asking around and a few StackOverflow questions later, it seemed easiest to use a regex expression to solve this. I came up with a regex expression that tests for all four possible formats it could be expressed in. They look like this:
Lower case without hypens: 7185f40e722c4cfa8de5daedf048ea12
Upper case without hypens: 21A338B30A57462780450D4B6AF7A3EE
Lower case with hypens: 7e0b2da6-38c3-4873-83f7-aab0cacb7603
Upper case with hypens: FDFC7265-BA5E-4A63-9A51-AC661107EB37
This is the regex expression I finally ended up using:
[0-9a-fA-F]{8}-?[0-9a-fA-F]{4}-?4[0-9a-fA-F]{3}-?[89abAB][0-9a-fA-F]{3}-?[0-9a-fA-F]{12}
To break down what the regex says:
- First 8 characters can be anything from 0-9, a-f, A-F:
[0-9a-fA-F]{8}
- A single hypen may follow:
-?
- Next 4 characters can be anything from 0-9, a-f, A-F:
[0-9a-fA-F]{4}
- A single hypen may follow:
-?
- A single ‘4’ must follow
- Next 3 characters can be anything from 0-9, a-f, A-F:
[0-9a-fA-F]{3}
- A single hypen may follow:
-?
- The next character must be either 8, 9, a, b:
[89abAB]
- Next 3 characters can be anything from 0-9, a-f, A-F:
[0-9a-fA-F]{3}
- A single hypen may follow:
-?
- Next 12 characters can be anything from 0-9, a-f, A-F:
[0-9a-fA-F]{12}
Conclusion
Spec sheets explain everything you need about said thing even though they look ugly and seem too monotinous to read. You can find a couple of tests I wrote on GitHub as well in an objective-c implementation. This is what one of the tests look like:
- (void)testCorrectUUIDFormat
{
UUIDGenerator *u_generator = [[UUIDGenerator alloc] init];
NSString *sample_uuid = [u_generator uuid4WithCaps:false hypenated:true];
NSString *pattern = @"[0-9a-fA-F]{8}-[0-9a-fA-F]{4}-4[0-9a-fA-F]{3}-[89abAB][0-9a-fA-F]{3}-[0-9a-fA-F]{12}";
NSRange searchRange = NSMakeRange(0, [sample_uuid length]);
NSError *error = NULL;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&error];
NSArray *matches = [regex matchesInString:sample_uuid options:0 range:searchRange];
[matches count];
for (NSTextCheckingResult* match in matches) {
NSString* matchText = [sample_uuid substringWithRange:[match range]];
NSLog(@"match: %@", matchText);
NSRange group1 = [match rangeAtIndex:0];
NSLog(@"group1: %@", [sample_uuid substringWithRange:group1]);
}
NSLog(@"Our UUID, %@", sample_uuid);
NSLog(@"Our UUID length, %lu", (unsigned long)[sample_uuid length]);
XCTAssertEqual([matches count], 1, @"UUID generated doesn't match the type 4 UUID RFC");
}
Refs: