Research / Projects


Cryptology with small message spaces

Wayne Patterson Howard University

Cybersecurity Education and Research Center

Small Message Spaces

Consider the standard cryptographic environment with a message space M, a cipher text space C, and a family of transformations parameterized by the elements of a key space K. Consider further the extraordinary restriction that |M| is small, that is that there are a very limited number of messages to pass.

Although this is somewhat of a pathological example, it certainly has its analog both historically and in contemporary practice. For example, on a battlefield, in a critical situation, only a limited number of messages are required to be transmitted among the troops: “advance”, “retreat”, “flank left”, “flank right”, and so on. We will give another example later that is perhaps more common in contemporary society.

Current cryptosystems in widespread use are designed to be flexible, secure, and efficient with very large message spaces. For example, in the Advanced Encryption Standard (AES), the simplest configuration allows for messages drawn from a message space of cardinality 2128, which is approximately 3 x 1038. In the Rivest-Shamir-Adleman Public Key Cryptosystem (RSA), the designers’ recommendation is that the message space have cardinality approximately 10400.

In a cryptosystem where the number of possible messages is small --- in the previous example and in one to follow |M| < 100 --- either AES or RSA can be broken.

Consequently, if we were to use, for example, an RSA cryptosystem for this restricted case, with the small set of potential messages m1, m2, …, mk, then the cryptanalyst monitors and stores c1 = m1e (mod n), … and assuming he or she has a known plaintext, then the ci’s can be easily mapped back to the mi.

RSA for Small Message Spaces

There are several methods that will allow us to modify standard cryptosystems to enable them to be used in the situation with small message spaces. For example: Key generation: p, q are large primes as in the Rivest-Shamir-Adleman public key cryptosystem. They are kept secret. n = p q, and e, a number pseudo-randomly chosen that is relatively prime to f(n) are made public, as is a hash function H with the property that it is easy to generate integers k in the range [2,n-1] that will hash to any given value. A simple example of such a function could be the sum of the individual digits. The final element of the cryptosystem is also secret: it is d, the inverse of e mod (f(n)), that is e d = 1 (mod (f(n)).

Preprocessing: For each element  in the small message space, a number m’ in the range [2,n-1] is pseudo-randomly generated by Alice and sent to Bob. This initial phase is not public-key, as the correspondence between the element of the message space m and the corresponding m’ must be communicated to Bob using a common key.

Encryption: Alice chooses the element m to be communicated, and computes its hash value H(m); then she finds another m’ such that H(m) = H(m’), encrypts m’ (computes c’ = m’e(mod n) and sends to Bob.

Decryption: Bob receives c’, decrypts using his private key d, and obtains c’d = m’ (mod n). Then he computes H(m’), which is also H(m), and so he can compare H(m) to each of the hash values of the original messages he had received in the preprocessing phase to obtain the required message.

Cryptanalysis: If Trudy intercepts c’, she cannot compute the hash function H because the value of c’ yields no relationship to the value of m’ or of H(m’). Thus Trudy is unable to intercept the signal.

Although this is a simple enough modification to RSA, it could be extended to other common cryptosystems. In addition, it has been suggested  that either El Gamal approaches or probabilistic encryption could also be used instead of the hash function approach described above.


A Second Example

In June of 2010, I was invited to give a presentation at the Baseball Hall of Fame in Cooperstown, New York, for the 22nd Annual Symposium on Baseball and American Culture, on the subject of “The Cryptology of Baseball.” I had given numerous talks at the Hall of Fame in the past, on various aspects of baseball and how it impacts and is impacted by cultural factors in society. My research and writing on baseball is really only an avocation for me.

As a professor of computer science over the past thirty years, most of my scholarly work in the field has been with respect to crypto-related issues. So although I thought at first in speaking to an audience of persons who study baseball that I would have to address the issue in a very non-technical fashion.

One might question the value of encryption in baseball given the limited amount of information being communicated. However, a baseball game can be decided by one pitch, and consequently one pitch can determine the winner or loser of a series. It should perhaps be noted that the share for the World Series winning team in 2009 was $21,266,321.79, and for each player earning a full share, $365,052.73.

The use of cryptography and cryptanalysis in the baseball world is normally referred to as “signals” and “sign stealing”. Sending signals to batters and runners provides the greatest requirement for encryption, and as a consequence, the greatest opportunity for cryptanalysis. These are signals that must be conveyed both to a batter and any baserunner for the team at bat. The signals originate with the manager, and are relayed to the third-base coach (always the third-base coach and not the first-base coach, although there is no valid reason for this).

The batter and any baserunners receive their signals from the third-base coach, and so this is the transaction we have studied in the greatest detail.

The Third-Base Coach Cryptosystem

As indicated, the most critical set of signals with the greatest complexity and highest risk involved in the interception of the signals are those that are given by the third-base coach to the batter and any base runners.

In the third-base coach cryptosystem (henceforth, TBC), the messages that need to be conveyed to the batter and baserunner(s) largely define the following message space:

MTBC =  { Steal, Hold, StealOnOverthrow, Take, Bunt, Swing, HitAndRun, RunAndHit, SqueezeBunt, Null}

When the signal is to be given, the third-base coach selects a number of body movements, such as touching his belt (Belt), clapping his hands (Clap), touching the brim of his cap (Cap), and so on. The collection of potential body movements we will call BODY. An example might be:

BODY = {Belt, Clap, Cap, Leg, Nose, Ear, Shoulder, Thigh, Cheek, Wipe }

Prior to the game, a mapping from the MTBC to BODY is defined, say s: MTBC BODY. Touch the cap might mean bunt, for example, so than s(Bunt) = Cap.

However, the ciphertext consists of a sequence selected from the product set HS BODY, such as (1, Belt), (2, Leg), (3, Nose), (4, Ear), (5, Shoulder), (6, Clap). In order to decrypt the ciphertext, the batter and runner, knowing the hot sign, look for the body movement associated with the hot sign and perform the inverse map that originally associated the plaintext message to the body movement.

As an example, if x = 4 HS; and the ciphertext sequence is (1, Belt), (2, Leg), (3, Nose), (4, Ear), (5, Shoulder), (6, Clap) as indicated above, the signal is the preimage of the value Ear in MTBC. So if s-1(Ear) = Take, then the correct decryption by the batter and runner would be for the batter to take (that is, not swing) at the next pitch.

Thus in more standard crypto terminology, the “hot sign” and the mapping s define the key.

A full description of the TBC cryptosystem and its cryptanalysis will be found in the forthcoming paper “The Cryptology of Baseball”, by this author, in the journal Cryptologia, appearing in 2011.


Wayne Patterson

Howard University Cybersecurity Education and Research Center

December 31, 2010