The following account is less mathematical.
Cathy is a billionaire. Bob is a stock broker. Every Saturday, Cathy and Bob privately meet for the purpose of flipping a coin. The two sides of this coin are shown below:
The result of the coin toss is known only to Cathy and Bob. This secret will be used on the following Wednesday when Cathy will call Bob on the phone to give him an order - either to buy or to sell. Any person who may be listening to Cathy on the phone cannot know whether Cathy is saying the opposite of what she means or not. In the long run, each of the two possible outcomes occurs about half of the time; but the choice for each particular call is both uncertain and independent of any other Saturday coin toss. It is hoped that what the words mean to Bob in any given call cannot be known to anyone who does not know how the coin toss came out on the previous Saturday.
The process described in the previous paragraph is meant to make Cathy's Wednesday phone calls cryptic. Such a process is called encryption.
Encryption of communication does not work if the secrets required of the method are not kept. Disloyal employees, untrustworthy relatives, and defective information-flow procedures can defeat the best encryption. Some security workers are primarily concerned with this aspect of company security. It would be a mistake to think that this is the whole problem. Without perfect encryption, any leak can be blamed on hackers.
The coin toss produces one of only two outcomes. It doesn't matter whether the coin is marked with yes or no, or true or false, or 1 or 0, or A or B, or marked with two different colors. Cathy and Bob know that one side means "Say the opposite" and the other side means "Say what you mean."
This method can be disclosed to the public, including all competitors and phone-tapping investigators who might have an interest in knowing whether Cathy is buying or selling on Wednesdays. Knowing the method does not reveal what a particular message means unless one knows the single bit of information that determines the meaning of that phone message.
If Cathy and Bob wish to meet less often, they might meet every fifth Saturday to flip the coin five times. A written record of the coin tosses would then be held by each of them. This record would be their secret data for five weeks of Wednesday calls.
If each coin toss is sufficiently uncertain, and each coin toss is truly independent of the other coin tosses, then there will be no pattern in the succession of coin tosses regardless of the length of that succession. If this is the case, no statistical method can be applied to discern what outcome might come next. If there is some pattern to a long history of coin tosses, statistics can be applied to discern at least something about the history of Cathy's buy and sell orders. If the pattern is simple and clear, perhaps the entire history of buy and sell orders can be surmised.
Perfect security requires that each coin toss have no effect on any other. They must be completely independent. Also, each coin toss must have zero predictability. Actually, these two requirements are automatically satisfied if each coin toss has zero predictability. It could not have zero predictability if there were any dependence on other coin tosses.
A portable flash drive could contain bits (each either 1 or 0) to determine the meaning of Cathy's words in her Wednesday phone calls. This would require Bob to have an exact copy of this data in a drive of his own. It would also require that the two flash drives be loaded with data in total secrecy. The contents of the drives must be known only to Cathy and Bob. As with the coin tosses, each bit would need to be determined with such uncertainty that no pattern can exists in the data.
With such data being secretly shared between the sender and the receiver, email messages could be used instead of phone calls to make the Wednesday orders to either buy or sell. In such a case, each Wednesday email message from Cathy would contain either a single 1 or a single 0. The secretly shared data would be combined with the message bit according to the following truth table:
Secret Message Bit Data Bit Transmitted Bit 0 0 0 1 0 1 0 1 1 1 1 0
The truth table above shows that the message bit is not changed when the secret data bit is 0. It also shows that the message bit is always changed whenever the secret data bit is 1.
Hackers attempt to gain access to the transmitted bit. Those who succeed in this are confronted with either a 1 or a 0. There are two ways that a 1 could be transmitted. If the secret data bit associated with that transmitted bit is truly unpredictable, there is no way to decide whether the message bit was a 1 or a 0. Likewise, there are two ways that a 0 can be transmitted. If the secret data bit associated with that transmitted bit is truly unpredictable, there is no way to decide whether the message bit was a 1 or a 0.
Now that Cathy and Bob are using flash drives, they can share enormous amounts of data secretly. This could be used to encrypt much more than one bit of information per week.
Email encoding provides that each character of the text composed by the sender is represented by a string of eight bits. This is called a byte. Each letter such as a or punctuation such as , has a byte code eight bits long that is different from any other code. There is also a separate code for the space character and another for line breaks (paragraph endings). This encoding is not encryption. The representation of each character by eight binary bits is universal, open, and not hidden. There is a reason for it. The hardware of a computer is comprised of many parts that either are on or off at any given time. There is no part that is in one of 26 or 256 states at any given time. The computer must represent every meaning with a code that is a string of on's and off's. Each choice of on or off represents a bit, which we interpret as either a 1 or a 0. Your email messages are not encrypted unless you encrypt them.
During off-line encryption, a string of bits obtained from the shared secret data is combined with the bits of the byte codes of a text file. Each transmitted bit is the result of combining a message bit with a data bit in the way specified by the truth table shown above. This encrypted message is copied into an email program to be sent to the receiver. When decrypted by the receiver off line, it can be read as a text file containing any number of instructions, calculations, or questions.
At one time, this method was criticized for its inconvenience. Now with high-volume data storage, years worth of encryption data can be mailed or carried in one small package. There exist terabyte flash drives that can be used to store the secret data for a million messages, each containing a million bytes. Such small packages are not the slightest inconvenience for embassies, military organizations, and banks because they routinely employ point-to-point delivery by means of diplomatic pouches, curriers, and armored cars. It is so convenient that businesses and individuals only need to be assured that this method provides superior message security.
Thus far, I have avoided the terms random and randomness. The concept has been widely confused with complexity, chaos, and various imprecise formulations. Randomness has been called the absence of pattern. That, at least, tells us what it is not. You might have some difficulty writing down a collection of bits that has no pattern. Possibly the kind of source that comes closest to having no pattern that statisticians can discern is a geiger counter sending pulses to a computer and getting these pulses form the decay of a radioactive substance. This would be an example of a hardware random bit generator.
Hardware random bit generators are famously called true random bit generators or true random number generators. They are commonly distinguished from pseudo random bit generators that are computer programs intended to approximate randomness. (Pseudorandom is usually written as one word.) These programs are often called pseudorandom number generators.
Buying a true random number generator that actually does what it is advertised to do is not always easy. A good one will provide stronger encryption than is available by any other means. You will need to know how to ask for specifications. Here is an explanation of one way to evaluate the randomness of a file of data:
If every 1 in a file is immediately followed by a 0, and every 0 is immediately followed by a 1, there is an obvious pattern (1 0 1 0 1 0 . . . ). On the other hand, if in a long file, a 1 is followed just as often by a 1 as by a 0, and a 0 is followed just as often by a 0 as by a 1, we need to look further if a pattern is to be found. In such a case, a pattern (at a distance of a single bit place along the succession of bits) has not been found. If we find this to be true or false by using arithmetic, we call the calculation a correlation. If this correlation is equal to zero, the pattern we have tested for has not been found. Correlations greater than zero indicate some degree of pattern.
The arithmetic needs to be as simple as possible because it needs to be repeated for different distances along the string of bits and for different combinations of distances along that string. These add up to a great many calculations. The best way to do this is to change all of the zeros to minus ones so that pairs of each distance can be multiplied together to obtain either a 1 or zero. The sum of each such multiplication is a correlation. Here are two examples where N is the length of the string in bits:
The sum of (xi)(xi+1) as i goes from 1 to N-1. This correlates pairs at a distance of 1.
The sum of (xi)(xi+2) as i goes from 1 to N-2. This correlates pairs at a distance of 2.
To correlate pairs at every possible distance, take the sum of (xi)(xi+j) as i goes from 1 to N-j for every j from 1 to N-1. A separate correlation is calculated for each j. If each of these N-1 correlations are very near zero and N is large, we might be encouraged to believe that the string is random, but then all the triples and larger combinations of fixed distances must be addressed, such as the sum of (xi)(xi+3)(xi+25) as i goes from 1 to N-25.
For correlations such as these, where a degree of correlation is assigned to a string without relating it to another string, the term becomes autocorrelation. So what we have here is a separate autocorrelation being calculated on each of very many subsets of the file of bits. Only if every such calculation is very near zero do we say that the file is random.
The number of calculations is truly enormous. For this reason, there are many estimates of randomness available in the form of shorter but nonetheless imperfect formulations. It is often easier to trust a kind of hardware that you understand than to become a statistician.
Hacking steals secrets form the smallest of businesses and thereby reduces upward mobility. So many systems can be hacked because the strongest methods are not used. Encryption can be based on complexity or little-known math or shared secrets. The latter (the method of this article) is rarely used. There are political motives for this. Authorities fear terrorism. This needs to be addressed. I suggest that honest, law-abiding people who have a legitimate need for email security (such as technology entrepreneurs and many others) be allowed to invite vetting by the authorities and perhaps even registration to distinguish themselves from groups having nefarious motives.
Email does not have the privacy characteristic of postal mail. Postal mail does not proceed at the speed of business. A solution that protects the rights of individuals and groups needs to be found.