Correction to 47 bit to 10 letter code
CASTalk.com Forum Index CASTalk.com
Discussion of DSP, FPGA, storage and embedded system.
 
 FAQFAQ   MemberlistMemberlist     RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 
 
Google
 
Web castalk.com
Correction to 47 bit to 10 letter code

 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture
Author Message
John Savard
Guest





Posted: Sun Mar 06, 2005 6:25 am    Post subject: Correction to 47 bit to 10 letter code Reply with quote

I have recieved a private communication in which I was informed of an
error in the code which represents 47 binary bits as 10 letters from
the English 26-letter alphabet.

I have made appropriate corrections to the code as depicted on the
page

http://home.ecn.ab.ca/~jsavard/crypto/mi060301.htm

to deal with this error.

While I was there, I also changed the code to make it more consistent
in the ordering of its subcodes.

And I added information on the page about IBM's successor to Chen-Ho
encoding, Densely Packed Decimal.

Not currently up on my site, but perhaps some may remember it from
2004, I had an elaborate description of a computer architecture there
- it used ten's complement packed decimal as an auxilliary data
format, and I had added a mode for use of Chen-Ho encoding. This
motivated me to see if Densely Packed Decimal could be modified to
work with ten's complement or nine's complement decimal numbers. It
can.

But my computer architecture, being primarily binary, needs to be able
to truncate numbers at other places than whole decimal digits (in the
compressed decimal mode, odd bits left over after the next lower
multiple of 10 allow the range of the number to be extended as much as
possible), and so I finally determined that despite this improvement
being possible, my architecture could still not effectively derive any
benefits from the use of this particular IBM innovation. Ah, well, at
least it saves having to license it.

John Savard
Back to top
David A. Scott
Guest





Posted: Sun Mar 06, 2005 6:43 pm    Post subject: Re: Correction to 47 bit to 10 letter code Reply with quote

jsavard@ecn.ab.ca (John Savard) wrote in
news:a5b2b542.0503051725.36accc26@posting.google.com:

Quote:
I have recieved a private communication in which I was informed of an
error in the code which represents 47 binary bits as 10 letters from
the English 26-letter alphabet.

I have made appropriate corrections to the code as depicted on the
page

http://home.ecn.ab.ca/~jsavard/crypto/mi060301.htm

to deal with this error.

While I was there, I also changed the code to make it more consistent
in the ordering of its subcodes.

And I added information on the page about IBM's successor to Chen-Ho
encoding, Densely Packed Decimal.

Not currently up on my site, but perhaps some may remember it from
2004, I had an elaborate description of a computer architecture there
- it used ten's complement packed decimal as an auxilliary data
format, and I had added a mode for use of Chen-Ho encoding. This
motivated me to see if Densely Packed Decimal could be modified to
work with ten's complement or nine's complement decimal numbers. It
can.

But my computer architecture, being primarily binary, needs to be able
to truncate numbers at other places than whole decimal digits (in the
compressed decimal mode, odd bits left over after the next lower
multiple of 10 allow the range of the number to be extended as much as
possible), and so I finally determined that despite this improvement
being possible, my architecture could still not effectively derive any
benefits from the use of this particular IBM innovation. Ah, well, at
least it saves having to license it.

John Savard


Actually one can easily go from any binary file to a file of 26 symbols
of ones choice or visa versa in a bijective manner using code from my site.
The code is already there for the huffman version I called it conditioned
huffman. Its been there for years. The condtion file would just need a
file that contained the 26 desired symbols. 6 symbols would be based
on 4 bits and 20 symbols would be based on 5 bits.

Know one may not like it in that a group of 10 letters would translate to
betweem 40 and 50 bits depending on letters. on the average a group of
10 letter would go to 47.6923076923076923076923076923077 bits

If one would mod the code to arb255.exe one could use arithmetic instead
of huffman it would also be bijective to any file and every 10 letters
would decode to 47.0043971814109216039681265425669 bits a slight
improvement over long files. In this case 10 letters would translate almost
entirely to 47 bit strings with the excess being 48 bit strings. If you
do the bijective transform correctly this would be 6 byte file. And going
the other way each 6 byte file would tranlate to a 10 character file with
a very small subset going to 11 character files all in a bijective way no
waste.




David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
Back to top
Terje Mathisen
Guest





Posted: Mon Mar 07, 2005 2:57 am    Post subject: Re: Correction to 47 bit to 10 letter code Reply with quote

David A. Scott wrote:
Quote:
jsavard@ecn.ab.ca (John Savard) wrote in
news:a5b2b542.0503051725.36accc26@posting.google.com:


I have recieved a private communication in which I was informed of an
error in the code which represents 47 binary bits as 10 letters from
the English 26-letter alphabet.

Actually one can easily go from any binary file to a file of 26 symbols
of ones choice or visa versa in a bijective manner using code from my site.
The code is already there for the huffman version I called it conditioned
huffman. Its been there for years. The condtion file would just need a
file that contained the 26 desired symbols. 6 symbols would be based
on 4 bits and 20 symbols would be based on 5 bits.

Know one may not like it in that a group of 10 letters would translate to
betweem 40 and 50 bits depending on letters. on the average a group of
10 letter would go to 47.6923076923076923076923076923077 bits

If one would mod the code to arb255.exe one could use arithmetic instead
of huffman it would also be bijective to any file and every 10 letters
would decode to 47.0043971814109216039681265425669 bits a slight
improvement over long files. In this case 10 letters would translate almost
entirely to 47 bit strings with the excess being 48 bit strings. If you
do the bijective transform correctly this would be 6 byte file. And going
the other way each 6 byte file would tranlate to a 10 character file with
a very small subset going to 11 character files all in a bijective way no
waste.

I'm sorry, but I really don't see the need for this kind of mapping!

What kind of storage/transfer medium is limited to the 26 english
uppercase letters only?

I have written several binary<->text codecs, mostly in the interest of
shipping binary data over email, but I have never seen any need to go
below MIME's Base64 encoding, and you can most often get at least 70+
symbols that are guaranteed to survive all sorts of gateways and
translations.

If the number of symbols isn't a power of two, then it mostly becomes a
choice between perfect efficency (which is quite easy to do by treating
the binary data as input to an arbitrary precision div/mod operation),
and something which is cheaper to code/decode in either space or time.

Personally I'm fond of an approach that uses 6.5 bits/char, encoding
13*8=104 bits of binary data with 16 chars. Since 2^6.5 is the same as
sqrt(2^13) = sqrt(8192) ~= 90.5, this is doable with 7-bit printable
ascii, without using any spaces

The order of bits when encoding is shuffled in such a way that the
decoder can blindly write a byte each time at least 8 bits have become
available, making the decoder _very_ small:

; CL is bits needed for extra byte, initially 8
; CH is extra bit buffer
top: ; Combine pairs of input chars
lodsb ; into 13 bit of data.
sub al,34
jb top ; Anything below 34 is filler
jz save_file ; 34 is EOF
dec ax ; 35 -> zero
mov ah,91
mul ah

xchg ax,dx ; Save first 6.5 bits
xor ax,ax
next:
lodsb
sub al,35
jb next ; Skip all below 35

add ax,dx ; Add together to 13 bits
stosb ; Save low 8
xor al,al
shr ax,cl ; Shift top 5 bits down
or ch,al ; and combine with buffer
sub cl,5 ; Have we got another full byte?
jg top ; No, so get another 13-bit block

mov [di],ch ; Save extra byte
inc di
mov ch,ah
inc_cl:
add cl,8 ; Need 8 more bits for next extra byte
jmp top
save_file:

Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Back to top
David A. Scott
Guest





Posted: Mon Mar 07, 2005 7:56 am    Post subject: Re: Correction to 47 bit to 10 letter code Reply with quote

Terje Mathisen <terje.mathisen@hda.hydro.com> wrote in
news:d0fuc7$mo0$1@osl016lin.hda.hydro.com:

Quote:

I'm sorry, but I really don't see the need for this kind of mapping!


Then don't use it.

Quote:
What kind of storage/transfer medium is limited to the 26 english
uppercase letters only?


I give up which one.

Quote:
I have written several binary<->text codecs, mostly in the interest of
shipping binary data over email, but I have never seen any need to go
below MIME's Base64 encoding, and you can most often get at least 70+
symbols that are guaranteed to survive all sorts of gateways and
translations.


26 is what the previous guy mentioned. However with the condition file
and huffman you could 2 to 256 symbols if your heart desired. You could
even use your 70+ symbols.




David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
Back to top
Terje Mathisen
Guest





Posted: Mon Mar 07, 2005 2:00 pm    Post subject: Re: Correction to 47 bit to 10 letter code Reply with quote

David A. Scott wrote:
Quote:
Terje Mathisen <terje.mathisen@hda.hydro.com> wrote in
I have written several binary<->text codecs, mostly in the interest of
shipping binary data over email, but I have never seen any need to go
below MIME's Base64 encoding, and you can most often get at least 70+
symbols that are guaranteed to survive all sorts of gateways and
translations.

26 is what the previous guy mentioned. However with the condition file
and huffman you could 2 to 256 symbols if your heart desired. You could
even use your 70+ symbols.

OK, sorry!

David, my post was directed to the OP, not you!

I agree that using huffman coding is an elegant way to do alphabet
conversions, I remember it as a 'light bulb moment' when I figured it
out after first reading about huffman coding. :-)

If we presuppose that the input stream is (pseudo-)random, most probably
because it has already been compressed, then all input symbols will be
equally probable, and a simple arbitrary precision arithmetic
coder/decoder will be optimal.

Terje
PS. I'm reading this in comp.arch, but I just noticed that it is also
cross-posted to sci.crypt, where my sample asm code would be less
interesting. Sorry about that!
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Back to top
John Savard
Guest





Posted: Tue Mar 08, 2005 7:57 am    Post subject: Re: Correction to 47 bit to 10 letter code Reply with quote

Terje Mathisen <terje.mathisen@hda.hydro.com> wrote in message news:<d0fuc7$mo0$1@osl016lin.hda.hydro.com>...

Quote:
I'm sorry, but I really don't see the need for this kind of mapping!

What kind of storage/transfer medium is limited to the 26 english
uppercase letters only?

Since it was only in a later post that you noticed it was crossposted
to sci.crypt, your query is understandable.

Actually, I only crossposted it to comp.arch because the revised page
mentioned the Densely Packed Decimal encoding, and I thought it might
be of some interest there as an understandable explanation of the
principles.

An efficient encoding of binary data to letters that adds very little
redundancy, but which can be carried out at high speed, allows a
message, first compressed using techniques like Huffman coding, and
then encrypted using modern binary block ciphers like DES (or later
ones, such as Rijndael, which became the new Advanced Encryption
Standard) to then, subsequently, be encrypted as well using classical
methods, for example, by rotor cipher machine simulators.

If the added redundancy is so slight that the two layers of encryption
cannot be analyzed separately, then, the fact that they work in two
very different number bases highly complicates analysis.

On my web page, I discuss ways of performing fractionation, taking
advantage of the fact that 26 * 26 = (25 * 27) + 1 on the one hand,
and on the other hand, 2^5 = (3*3*3)+5 and 2^7 = (5*5*5)+3, to convert
a binary message, half of whose 47-bit blocks are converted to
letters, into a mixture of symbols from a 3-alphabet and a 5-alphabet,
then using substitution tables, say with 81 or 25 elements, or other
ciphers, to further make analysis of the enciphering method highly
difficult.

Incidentally, elsewhere on my web page, I describe an efficient
encoding of binary bits into 26 letters that gives them different
frequencies - to maximize the speed at which binary data can be
transmitted by Morse Code. This would sort of qualify as having the
sort of usefulness of which you spoke, unlike this, which is useful
for another purpose.

John Savard
Back to top
Terje Mathisen
Guest





Posted: Tue Mar 08, 2005 1:57 pm    Post subject: Re: Correction to 47 bit to 10 letter code Reply with quote

John Savard wrote:
Quote:
Actually, I only crossposted it to comp.arch because the revised page
mentioned the Densely Packed Decimal encoding, and I thought it might
be of some interest there as an understandable explanation of the
principles.

I didn't study your explanation, but I assumed this was the same as, or
similar to, IBM's suggestion for decimal fp, which afaik stores three
decimal digits per 10 bits.
Quote:

An efficient encoding of binary data to letters that adds very little
redundancy, but which can be carried out at high speed, allows a
message, first compressed using techniques like Huffman coding, and
then encrypted using modern binary block ciphers like DES (or later
ones, such as Rijndael, which became the new Advanced Encryption
Standard) to then, subsequently, be encrypted as well using classical
methods, for example, by rotor cipher machine simulators.

If the added redundancy is so slight that the two layers of encryption
cannot be analyzed separately, then, the fact that they work in two
very different number bases highly complicates analysis.

OK. Even if I did some work on one of the also-rans in the AES process,
I really don't understand enough about cryptanalysis to evaluate that.
It sounds reasonable, but a _lot_ of crypto stuff is quite non-obvious,
at least to me. :-(

Quote:
Incidentally, elsewhere on my web page, I describe an efficient
encoding of binary bits into 26 letters that gives them different
frequencies - to maximize the speed at which binary data can be
transmitted by Morse Code. This would sort of qualify as having the
sort of usefulness of which you spoke, unlike this, which is useful
for another purpose.

I've been a ham since 1978 (la8nw), so I've looked into the opposite
problem: Efficient binary encoding of morse symbols.

Morse is of course one of the very first attempts to Huffman encode
english text, making common letters shorter than the others.

A reversible, minimally rendundant setup would seem to solve both
problems at the same time:

You want the transmission time to be minimized, not the number of morse
letters, which means that E and T will carry less information than a P
or C, right?

Terje
--
- <Terje.Mathisen@hda.hydro.com>
"almost all programming can be viewed as an exercise in caching"
Back to top
David A. Scott
Guest





Posted: Tue Mar 08, 2005 6:59 pm    Post subject: Re: Correction to 47 bit to 10 letter code Reply with quote

Terje Mathisen <terje.mathisen@hda.hydro.com> wrote in
news:d0jpdq$j0u$1@osl016lin.hda.hydro.com:

Quote:

I've been a ham since 1978 (la8nw), so I've looked into the opposite
problem: Efficient binary encoding of morse symbols.

Morse is of course one of the very first attempts to Huffman encode
english text, making common letters shorter than the others.

A reversible, minimally rendundant setup would seem to solve both
problems at the same time:

You want the transmission time to be minimized, not the number of morse
letters, which means that E and T will carry less information than a P
or C, right?

Terje


It seems most of us have been a ham of one kind or another, Some where I
wrote code that woudl be suitable for the kind of use mentioned. You could
compress a message to some binary string and then optimally send
the entire message as a binary string where you account for the length
of time of the two binary symbols. The messages though random would have
more dots than dashes. Though for a given transmission time it would
decompress to a message of the same length. I have given it much thought
years ago. However as a ham you are not allowed to encrypt and send over
the air it would piss off big brother. But I suspect when spies transmit
short message they would use similar methods. Check out

http://bijective.dogma.net/compres12.htm

I worte this for
David Cary and reference his site. It has interested him years ago
but have not heard from fin since.



David A. Scott
--
My Crypto code
http://bijective.dogma.net/crypto/scott19u.zip
http://www.jim.com/jamesd/Kong/scott19u.zip old version
My Compression code http://bijective.dogma.net/
**TO EMAIL ME drop the roman "five" **
Disclaimer:I am in no way responsible for any of the statements
made in the above text. For all I know I might be drugged.
As a famous person once said "any cryptograhic
system is only as strong as its weakest link"
Back to top
Grumble
Guest





Posted: Tue Mar 08, 2005 8:10 pm    Post subject: Re: Correction to 47 bit to 10 letter code Reply with quote

David A. Scott wrote:

Quote:
It seems most of us have been a ham of one kind or another.

I don't think I've ever been a ham of any sort :-)
Back to top
John Savard
Guest





Posted: Tue Mar 08, 2005 10:22 pm    Post subject: Re: Correction to 47 bit to 10 letter code Reply with quote

Terje Mathisen <terje.mathisen@hda.hydro.com> wrote in message news:<d0jpdq$j0u$1@osl016lin.hda.hydro.com>...

Quote:
I didn't study your explanation, but I assumed this was the same as, or
similar to, IBM's suggestion for decimal fp, which afaik stores three
decimal digits per 10 bits.

It was a reference to that suggestion. My representation is slightly
different, to make the explanation easier to follow, as I noted on the
page, so the IBM documents will still need to be referenced to get the
bits right for compatibility.

But, as I say, the point was that was the only reason I mentioned the
page on comp.arch as well.

Quote:
You want the transmission time to be minimized, not the number of morse
letters, which means that E and T will carry less information than a P
or C, right?

Yes in that case - the 47 bit to 10 letter code treats all letters as
equal, minimizing the number of letters, and serves a different
application.

The "Optimized Morse Armor" is described at:

http://home.ecn.ab.ca/~jsavard/crypto/mi060307.htm

and assumes standard timings for Morse symbols (dot = 1, dash = 3,
etc.) and a space between groups of five letters for the weightings
assigned to each letter.

For computer transmission, elsewhere in my site I note that one can
take information converted to letters and use base-78 armor to put 4
letters in 3 characters. Better than base-64, if not as good as
base-85. This was intended as a sketch of the elements of a PGP-like
program which used a classical layer of encryption following a binary
layer of encryption.

John Savard
Back to top
 
Post new topic   Reply to topic    CASTalk.com Forum Index -> Computer Architecture All times are GMT
Page 1 of 1

 
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum




VoIP Electronics Powered by phpBB