I have been wondering about a way to identify a computer uniquely by an id, a long integer. This is a bit like using MAC addresses to identify NICs.
One thing I am uncertain about is whether or not to allow swap outs, complete replacements for failed hardware, to take the same id as their dead predecessor in a role. There is also the issue of what to do about identifying ‘roles’ or ‘postholders’ generally, as in ‘the machine that performs function x’. I propose to duck these issues for now and have maximal uniqueness if you wish, or the converse, if you wish, where a ‘machine’ is a possible alternate destination for communication so in that view swap outs are treated as the same machine. You use either scheme as you prefer.
I cannot simply use MAC addresses, as machines can change them and can have several for whatever reason, and of course they can have multiple NICs each with multiple MAC addresses and in some cases the physical NICs are removable.
Domain names are out for a large variety of reasons.
I could use MAC addresses if (i) someone were to decide to agree that you would always keep one MAC address forever associated with a particular machine and never move a hardware NIC that owned that MAC address to another machine, and (ii) one of your MAC addresses would be picked to be the canonical one, and links could be made associating all other MAC addresses that a machine has with that one canonical address. For this, you could simply pick the lowest MAC address, that would presumably be as good a method as any?
Any thoughts?
The minimum requirement is that of the allowing comparison for equality, inequality, and the assignments making sense when those comparisons are made. If two ids are different then they are not the same machine, and one I’d means that you are talking to only one machine not a group of boxes at one place hiding behind one IP address or a widely separated group of machines hiding behind anycast. If you need to deal with those situations then a different kind of id space is needed.
Where could these numbers be recorded? What would you associate with them? I should think some sort of associations with domain names and links to sets of MAC addresses would be one thing, and a machine could possible have associations with IP addresses. All of these would be subject to change and have lifetimes, and ideally would need change-notification events to give maximum safety in their usage. There would be a lot of problems straight away with bogus claims of identity. Because the ids are meaningless, there is no harm in anyone just asking for a new id and getting the next integer in sequence, but it is essential to avoid a screw-up where the same machine is identified twice, but the question of who knows which machines are which is not something that is completely straightforward. Having to give a canonical MAC address as a minimum amount of information guards against duplicates straight away at a basic level, but it does not stop someone from registering a machine that belongs to someone else, not that this is even a problem to being with. If you recorded an ‘adminstered-by’ field value at the same time, then straight away you have created a problem because that person is effectively ‘the owner’ or might as well be, because someone else could complain that a bogus person has recorded their machine and will have the ability to start adding further data to that object which is also bogus. If someone could prove they ‘owned’ a box that had a particular MAC address associated with it then all would be well, but I have no idea how to do that. Even if a trusted entity could be brought in to check, it is always possible to lie about a MAC address to such an assessor system. A lot of thinking would need to be done.
Of course, for some applications, you don’t have to worry about this, not if there is a closed community of machines, such as inside one organisation.
I started thinking about this because I was wondering what to do about cases of identifying where two IP addresses are the same machine, either because it has two NICs, or an IPv4 and an IPv6 address, or simply has multiple addresses for the same NIC, or it moves from one network to another and so on. For example there is no point in making two connections to one machine to send it the same load of stuff twice. If you could reach the same machine by two different methods then it might be that you could use that machine as a relay, if it is willing, to get to some destination, and get round some break in connectivity. But you would need to know that two IP addresses refer to the same box, so this box is worthwhile candidate for recruitment as such a relay.
Of course, routing systems already solve problems like this, and the space of ASs is a very sismioar kind of thing. Someone has already solved the business of dealing with AS assignments, so perhaps the answer lies there - the same thing but just far finer granularity, down to the individual machine.
Looking at it that way, a representation with n high bits of AS (I think it is 32 bits now) and m low bits of machine-number within that AS, maybe m=32 bits again would suffice. So that would make 32+32=64 bits in total, which would be very convenient. So the globally unique sparse two-part representation would be asn:in-as-num and these could also be packed into a an alternate form, a dense 64-bit index. This just increases by one with each allocation, so any time anyone registers a machine, regardless of which AS it belongs to, they get a number that just goes up by one, and is globally within a single space. That way there would be no need for even a hashed lookup, if you use the single integer index form of machine id, you could just have a directly indexed table of per-machine info records, so one global id gets you straight to a record containing stuff like associated domain names, associated IPs, all the mac addresses, ASN.
Perhaps more bits are needed for the asn 2-tuple as there are probably too many bits for the ASN and not enough for the per-AS-machine-id just in case one giant AS has more than four billion machines in it. If that can definitely be ruled out then that would be very good, otherwise it would be better to play safe and have say 64+64 bits. A different split, one other than 32:32 would have enough room, something like 24:40, but creates a whole world of trouble due to its incompatibility with the AS space which is already set at 32 bits now, I think.