Future Beacon
Encryption for Privacy

by James Adrian


      Many individuals, businesses and other organizations have a legitimate and lawful need to keep their intellectual property and many other sorts of information confidential. Because email has become vital to our work and because email is not secure, encryption is often necessary.

      Apparently, crime-fighting organizations and officials often attempt to limit the use of encryption. By citing terrorism and other justifications, law makers in the United States have reinterpreted the Fourth Ammendment to the U. S. Constitution, and effectively modified it. The email messages sent and received by American citizens can now be copied and saved indefinitely in case they might help justify a criminal indictment someday in the future. Perhaps many people are afraid enough of terrorism to give up some aspects of their constitutional rights, but many are not.

      According to Cryptography and Liberty 1999, there are no laws restricting the private use of cryptography in Canada and there are no domestic use or import controls on cryptography within the United States; although, there are restrictions on export to certain countries.

      Restrictions on domestic use and export were not always as fair as they are today. In the case Bernstein v. Department of Justice in May of 1999, a federal appeals court affirmed the judgment of a lower court, holding that the Export Administration Regulations unconstitutionally limited the freedom to distribute encryption software. The court said the following:

      The government defendants appeal the grant of summary judgment to the plaintiff, Professor Daniel J. Bernstein ("Bernstein"), enjoining the enforcement of certain Export Administration Regulations ("EAR") that limit Bernstein's ability to distribute encryption software. We find that the EAR regulations operate as a prepublication licensing scheme that burdens scientific expression, vest boundless discretion in government officials, and lack adequate procedural safeguards. Consequently, we hold that the challenged regulations constitute a prior restraint on speech that offends the First Amendment. Although we employ a somewhat narrower rationale than did the district court, its judgment is accordingly affirmed.

      The lifting of unreasonable restrictions might not have taken place were it not for the work of many organizations advocating privacy. See this article.

      The opponents of encryption would seem to have taken another tact. I am continually amazed by the encryption propaganda and manipulation that has been introduced into our culture. In addition to the many government-created, government promoted encryption standards that can be decrypted by hackers, there is a discouraging narrative that has been repeated by several news sources, commentators, movies and sit-com episodes. The theory presented is that no cryptic message can truly facilitate privacy. They say "What one man can assemble, another can disassemble." This is provably, absolutely false. I will describe the unbreakable One-Time Pad and its proper use.

The One-Time Pad

      Information conveyed over the Internet is represented by numbers whose digits include only 0 and 1. Each digit is called a bit. An "E" is a string of 8 bits equal to 01000101. A memory cell containing 8 bits is called a byte. There is a unique byte code for each character that we use in our email messages.

      The purpose of encryption is to create as much uncertainty as possible as to which character was present in the message before it was encrypted. The one-time pad does this by adding (without carry) a separate and randomly chosen bit to each bit in the message. Like the regular operations of adding or subtracting, add without carry is an operation. Adding bits A and B, the following sums are created:

	A	B	Sum
	0	0	 0
	0	1	 1
	1	0	 1
	1	1	 0

      Notice that the sum of 1 and 1 in binary is 10, but the carry bit is not included in the sum by the add-without-carry operation. Only the least significant bit of the sum is used (in this case, 0).

      The message is called the plaintext. The encrypted message which is to be sent and received is called the ciphertext. The bits that are used to encrypt the message come from a store of random bits called the pad. The pad has a uniform distribution of 1's and 0's. Here is the way an encryption is usually written:

Plaintext 	 0 1 0 0 0 1 0 1  = E
Bits from Pad 	 0 0 1 0 1 1 1 1
Ciphertext 	 0 1 1 0 1 0 1 0

      In this case, the message is the capital letter , E. The random bits of the pad together with the add-without-carry operation produce an entirely arbitrary character code.

      The receiver of the message decrypts it by using the same string from the secret pad shared with the sender:

Ciphertext 	 0 1 1 0 1 0 1 0
Bits from Pad 	 0 0 1 0 1 1 1 1
Plaintext 	 0 1 0 0 0 1 0 1  = E

      Notice that the same numbers used to encrypt the message are used to decrypt the message. This is the characteristic shared by symmetric-key algorithms and symmetric cyphers. In the case of the one-time pad, a string form the pad is used for these purposes, but for other algorithms and in general, such a string is called the key.

      The add-without-carry operation is usually called the Exclusive-Or operation. The sum of A and B is 1 if ether A or B is 1 unless both are 1. If A and B are both 1, the sum is 0. (The 1's are excluded). The exclusive-or operation is notated as XOR. A XOR B = C (a ciphertext bit).

      Claude Shannon proved that the one-time pad is unbreakable provided that the key is truly random, it is shared only by the sender and receiver, and no string of numbers from the pad are used twice - thus the name, One-Time Pad.

      I recommend the book "Claude Elwood Shannon - Collected Papers" edited by N. J. A. Sloane and Aaron D. Wyner.

So Why Are We Not Secure In Our Papers?

      There are articles all over the Internet about the one-time pad. They quickly praise its security and then virtually all of them go on to bemoan the inconvenience of sharing a secret pad. Imagine the indignity of handing off a big box of duplicated data in a restaurant or on a quiet street. It better be a big box. I'm not in Casablanca very often. This reminds me of a famous quip by Yogi Berra:

Nobody goes there anymore. It's too crowded.

      What does the military or the State Department do? This is the age of the Internet and of DVD's, and an age in which 10% or more of the population is smart enough to invent things they have never seen or heard of. (That is not to say they are all trying hard and cooperating.) A large fraction of the population has accesss to a computer both in their home and at work. Pseudorandom numbers are getting less pseudo every day. UPS and armoured car services are very affordable. The same secret pseudorandom numbers can be generated in two places at once. This is not the 1600's.

      The bias against the use of the one-time pad has been effective in diminishing the number of customers looking for it. This is despite the fact that variations on the one-time pad that might arguably provide somewhat less than absolute security can be far more secure than the government-promoted encryption programs.

      Besides, shipping DVD's works. According to About.com the average size of an email message is 75 KB. That includes all of the unnecessary quoting of the entire thread, and it includes advertising and news letters sent to you, and large attached pictures and videos - everything. If you encrypted and sent that much ten time per day to your Philadelphia office, you would use up less than 274 megabytes per year on that secure connection. A single DVD is about 16 times that size. 20 DVD's might last you until you retire.

Is It Random?

      If every 1 in a file is immediately followed by a 0, and every 0 is immediately followed by a 1, there is an obvious pattern (1 0 1 0 1 0 . . . ). On the other hand, if in a long file, a 1 is followed just as often by a 1 as by a 0, and a 0 is followed just as often by a 0 as by a 1, we need to look further if a pattern is to be found. In such a case, a pattern (at a distance of a single bit place along the succession of bits) has not been found. If we find this to be true or false by using arithmetic, we call the calculation a correlation. If this correlation is equal to zero, the pattern we have tested for has not been found. Correlations greater than zero indicate some degree of pattern.

      The arithmetic needs to be as simple as possible because it needs to be repeated for different distances along the string of bits and for different combinations of distances along that string. These add up to a great many calculations. The best way to do this is to change all of the zeros to minus ones so that pairs of each distance can be multiplied together to obtain either a 1 or zero. The sum of each such multiplication is a correlation. Here are two examples where N is the length of the string in bits:

The sum of (xi)(xi+1) as i goes from 1 to N-1.       This correlates pairs at a distance of 1.

The sum of (xi)(xi+2) as i goes from 1 to N-2.       This correlates pairs at a distance of 2.

      To correlate pairs at every possible distance, take the sum of (xi)(xi+j) as i goes from 1 to N-j for every j from 1 to N-1. A separate correlation is calculated for each j. If each of these N-1 correlations are very near zero and N is large, we might be encouraged to believe that the string is random, but then all the triples and larger combinations of fixed distances must be addressed, such as the sum of (xi)(xi+3)(xi+25) as i goes from 1 to N-28.

      For correlations such as these, where a degree of correlation is assigned to a string without relating it to another string, the term becomes autocorrelation. So what we have here is a separate autocorrelation being calculated on each of very many subsets of the file of bits. Only if every such calculation is very near zero do we say that the file is random.

      The number of calculations is truly enormous. For this reason, there are many estimates of randomness available in the form of perhaps complex but nonetheless inadequate formulas. Trusting in those which do not examine all of the distance combinations is a big mistake.

Two-Time Pads Don't Work

      If all of your algorithm is published, and your algorithm uses part of a pad string twice to encrypt two pieces of plaintext, the hacker has an advantage. If a hacker knows that your correctly spelled and grammatically correct messages are, in some known instances, encrypted by the same string, the following will take place:
I will meet you at the usual place tomorrow at the usual time.

I cannot believe you were not there when I was!

      Each x and each y is an 8 bit string (the byte code). Both messages start with the capital letter, I, and a space following it. That's 16 bits in a row that are the same in the two plaintext messages. Where both the plaintext and the pad strings are the same, the ciphertext will be the same. The hacker is looking at the cypertext. Where whole bytes are the same, the same characters are being spelled out in both plaintext messages. Although this happens nowhere else in the two messages, there are plenty of places within each byte where the bits can be known to be either the same or different. This narrows the possibilities and gives the hacker leads for an investigation.

      Since I am on the side of privacy and not on the side of the hackers, I will say no more about code breaking. Just don't use the same pad string for two pieces of plaintext.


      Please feel free to write to me directly for more information or to make suggestions or comments. My email address is jim@futurebeacon.com. You can also go to my contact page to get my full contact information. Suggestions, questions, additional information and critiques are very welcome.