Home | Contact Us | FAQ | Search & Site Map | Link to Us
Sign In | Join | Other 45 Sites in Network
HomeAnnouncementsWhite Papers
Discussion GroupsFirst AidDatabasesJavaBeansGUIJava 3DVirtual MachineCORBASecurityToolsGeneral
Java DirectoryOpen Source ProjectsSample Book ChaptersUser GroupsWeb Resources
Related Topics
Databases.NETMore Topics ...

Java Forum / General / August 2007

Tip: Looking for answers? Try searching our database.

Explanation needed of binary operators

Thread view: 
NoNeYa - 26 Jul 2007 06:03 GMT
Howdy folks,
    I am trying to read an int from a file written in binary.  After
researching that archaic file structure, I have found that it is stored as
little endian/ least significant first.  I have seen code to read this as an
int in Java but don't understand what is happening.  I guess that I just
don't understand what is going on with the shifting of bits and "anding"
with other values.  Could someone please explain this in detail? Or lead me
to a good source for in-depth reading?  I don't just want code... I want to
understand.  If this seems trivial, please excuse me... I'm only a second
semester Computer Science major who likes to keep his brain busy during the
summer off and would like to make a new front-end for an old program that I
use at work.

Thanks!
Twisted - 26 Jul 2007 09:22 GMT
> Howdy folks,
>      I am trying to read an int from a file written in binary.  After
[quoted text clipped - 10 lines]
>
> Thanks!

Endianness is a tricky matter. A 16-bit int provides a simple example
because there's only two ways around it would normally go:

highbyte lowbyte

or

lowbyte highbyte

To reconstruct it you only need to read into byte variables named
"highByte" and "lowByte", the correct one into each, and then

short result = (((short)highByte)*256) | ((short)lowByte);

and Bob's your uncle. The high byte, times 256, logical-OR'd with the
low byte is correct. (Simply adding the low byte fails if the byte
type is signed, as I think it is in Java, though not in C with e.g.
"typedef unsigned char byte;" -- logical ORing it should work.
Likewise shifting left the high byte should work, but the sign bit of
the high byte is also the sign bit of the short so...)

With Java ints and C longs (32 bits) you've got 24 possible orderings
of the four bytes, because the first can be in any of four places, the
second in any of the remaining three, and the third in either of the
remaining two, before the fourth is forced into the only remaining
place -- 4*3*2 is 24. In practise, the orders you usually see are two
little-endian shorts or two big-endian shorts, with the high short
either first or last (so two independent endian choices and at most
four common byte-orderings).

Any order whatsoever can be dealt with by getting byte1, byte2, byte3,
and byte4 to refer to the LSB, next least significant, and so forth
reading them in whichever order they occur in the data stream (so you
might read byte3 first, depending on the byte order in the stream).
Then left shifting and oring:

int result = (((int)byte4)<<24) || (((int)byte3)<<16) ||
(((int)byte2)<<8) || ((int)byte1)

This should work as long as sign extension isn't used (I think that
requires <<< and is found in Java but not C or C++).

The basic explanation is that you have 32 bits in a line. The first
eight are the high byte, and the last eight are the low byte of the
int. (Java int here; C/C++ users must use long to ensure having 32
bits. Java long is always 64 bits, more than you need here.)

The high byte is cast to an int, which makes it an int with the eight
bits we're interested in the last eight. We need them in the first
eight, and the <<24 shifts them left 24, so the 7th bit (the 7th back
from the end, or first of the interesting eight) is shifted to become
bit 7+24=31, or the leftmost (as there are only 31 bits left of the
last, or zeroth, bit in an int). So the shift moves the eight
interesting bits into the top eight. The shifts on byte3 and byte2
make the bits in them move to the middle positions. The last one isn't
shifted and stays in the lowest position. So after the shifts but
before the logical-ors, we have changed say

file1:  ZWXY

into

byte1:  ...X (. = eight zero bits)
byte2:  ...Y
byte3:  ...Z
byte4:  ...W

(by reading byte3, byte4, byte1, and byte1 in that order)

into

temp1:  ...X
temp2:  ..Y.
temp3:  .Z..
temp4:  W...

and now the logical OR operations just combine them by copying the
nonzero bits into the result at the same place, so Z or . is Z, . or W
is W, etc. and we get:

result: WZYX

with the correct byte order unscrambled from the file's ZWXY order.
Mike Schilling - 18 Aug 2007 01:32 GMT
> With Java ints and C longs (32 bits) you've got 24 possible orderings
> of the four bytes, because the first can be in any of four places, the
[quoted text clipped - 4 lines]
> either first or last (so two independent endian choices and at most
> four common byte-orderings).

In fact, you're very unlikely to see anything other than strict
little-endian (LSB-B2-B3-MSB) or strict big-endian.(MSB-B3-B2-LSB)  The only
exception I've ever seen was the ordering used by the PDP-11
floating-point-processor [1], which was B3-MSB-LSB-B2.

1. Which could process both floats and 32-bit integers.
Ben Phillips - 18 Aug 2007 02:29 GMT
>>With Java ints and C longs (32 bits) you've got 24 possible orderings
>>of the four bytes, because the first can be in any of four places, the
[quoted text clipped - 9 lines]
> exception I've ever seen was the ordering used by the PDP-11
> floating-point-processor [1], which was B3-MSB-LSB-B2.

That's the third of the four Twisted mentioned, the other being
B2-LSB-MSB-B3.

I can't recall ever seeing a byte sex other than one of those three
myself, and only the two strict-endian ones seem to be used in any
modern PC or server hardware architectures.

OTOH I can recall a proliferation of very incompatible systems back in
the good old days -- 9- and 10-bit bytes, 7-bit bytes, even 6-bit bytes
and binary-coded decimal (yuck!!), and character orderings for the basic
A-Z stuff other than ASCII (EBCDIC, notably) or various bastardized
forms of almost-ASCII. (Pop quiz -- which popular system's pseudo-ASCII
had no {}, rearranged !@#$%^&*(), had an actual up-arrow symbol for ^,
had a £ symbol in the low 127, and had control characters that
represented colours? It actually let you type these in, mostly with
shift-number or other-modifier-key-number.)

These days we have it *easy*, with big-endian and little-endian and
cr/lf/crlf as the only two spots of low level data conversion
awkwardness. At least a byte is a byte is a byte is eight bits long and
character 65 (0x41; 081; 01000001) is always 'A'! :)

(Imagine trying to write, edit, or otherwise work with C source on a
system with no {} characters! It's probably no coincidence the systems
with {} missing were mainly programmed in assembly, or sometimes in
something icky like BASIC, and absolutely never in anything portable.)
Mike Schilling - 18 Aug 2007 02:55 GMT
> (Imagine trying to write, edit, or otherwise work with C source on a
> system with no {} characters! It's probably no coincidence the systems
> with {} missing were mainly programmed in assembly, or sometimes in
> something icky like BASIC, and absolutely never in anything portable.)

Many European keyboards lacked both square and curly brackets, leading to
the use of digraphs.  See
http://david.tribble.com/text/cdiffs.htm#C90-digraph.  And yes, that's
another horror we no longer contend with.
Real Gagnon - 18 Aug 2007 03:03 GMT
Ben Phillips <b.phillips@a5723mailhost.net> wrote in news:fa5i4e$mpk$1
@aioe.org:

> (Pop quiz -- which popular system's pseudo-ASCII
> had no {}, rearranged !@#$%^&*(), had an actual up-arrow symbol for ^,
> had a œ symbol in the low 127, and had control characters that
> represented colours? It actually let you type these in, mostly with
> shift-number or other-modifier-key-number.)

Looks like the Sinclair ZX Spectrum to me!

Bye!
Signature

Real Gagnon  from  Quebec, Canada
* Java, Javascript, VBScript and PowerBuilder code snippets
* http://www.rgagnon.com/howto.html
* http://www.rgagnon.com/bigindex.html

Arne Vajhøj - 18 Aug 2007 03:07 GMT
> These days we have it *easy*, with big-endian and little-endian and
> cr/lf/crlf as the only two spots of low level data conversion
> awkwardness. At least a byte is a byte is a byte is eight bits long and
> character 65 (0x41; 081; 01000001) is always 'A'! :)

In the blue world EBCDIC is still used.

Arne
Ben Phillips - 18 Aug 2007 13:00 GMT
>> These days we have it *easy*, with big-endian and little-endian and
>> cr/lf/crlf as the only two spots of low level data conversion
>> awkwardness. At least a byte is a byte is a byte is eight bits long
>> and character 65 (0x41; 081; 01000001) is always 'A'! :)
>
> In the blue world EBCDIC is still used.

Meanwhile, on Earth ...

:)
Martin Gregorie - 18 Aug 2007 16:11 GMT
>> These days we have it *easy*, with big-endian and little-endian and
>> cr/lf/crlf as the only two spots of low level data conversion
>> awkwardness. At least a byte is a byte is a byte is eight bits long
>> and character 65 (0x41; 081; 01000001) is always 'A'! :)
>
> In the blue world EBCDIC is still used.

which, on the AS/400 at least, lacked {} and used trigraphs instead.

Trivia: the reason the character ordering in EBCDIC is such a mess is
that the encodings are binary representations of the IBM 029 card
punch's hole patterns. That's why  you get the odd gaps between I and J
and between R and S. This was extended to allow for lower case. And no,
I have no idea why 0-9 are F0-F9 rather than 00-09.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

Roedy Green - 18 Aug 2007 23:25 GMT
On Sat, 18 Aug 2007 16:11:45 +0100, Martin Gregorie
<martin@see.sig.for.address> wrote, quoted or indirectly quoted
someone who said :

>I have no idea why 0-9 are F0-F9 rather than 00-09.

ASCII has the same strangeness.
0-9 are 30-39.

I suspect the reason was pedantry.  It would be even harder to get
students to understand the difference between the number 0 and the
character 0 if they had the same binary representation.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Patricia Shanahan - 18 Aug 2007 23:39 GMT
> On Sat, 18 Aug 2007 16:11:45 +0100, Martin Gregorie
> <martin@see.sig.for.address> wrote, quoted or indirectly quoted
[quoted text clipped - 8 lines]
> students to understand the difference between the number 0 and the
> character 0 if they had the same binary representation.

For ASCII, I think there was some deference to paper tape mechanics.
Treating no holes as NUL allows records to be separated by blocks of
unpunched tape. Treating all holes as DEL allows anything to be
overpunched into being a DEL.

Patricia
Martin Gregorie - 19 Aug 2007 17:31 GMT
>> On Sat, 18 Aug 2007 16:11:45 +0100, Martin Gregorie
>> <martin@see.sig.for.address> wrote, quoted or indirectly quoted
[quoted text clipped - 13 lines]
> unpunched tape. Treating all holes as DEL allows anything to be
> overpunched into being a DEL.

As a long lapsed user of the Flexowriter and the ASR-33 teletype, not to
mention the manual 8 hole paper tape punch this is exactly right.

FWIW my original wonderment at non use of 00-09 was because AFAICR
everybody and everything except IBM's EBCDIC sorts numerics before
alphabetics and, unless I've confused what little history I ever knew,
always has done it that way since Adam were a lad.

Given that EBCDIC puts capitals in zones 1-3 and lower case in zones
4-6, the only sensible place to put numerics is zone 0 because that
would preserve a natural sort order. I don't agree that this would cause
confusion over 00. A completely blank card column meant 'space' and a
zero was a single hole punched in the zero row.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

John W. Kennedy - 21 Aug 2007 03:00 GMT
> FWIW my original wonderment at non use of 00-09 was because AFAICR
> everybody and everything except IBM's EBCDIC sorts numerics before
> alphabetics and, unless I've confused what little history I ever knew,
> always has done it that way since Adam were a lad.

No, IBM equipment generally collated numerics after alphabetics long
before EBCDIC, even on machines, such as the 1401, where the binary
representation was the other way. EBCDIC was designed specifically so as
to continue this behavior.

There are many ways of collating. For one dramatic example, US telephone
directories traditionally collate space after alphanumerics, so that AAA
comes before AA.
Signature

John W. Kennedy
"Information is light. Information, in itself, about anything, is light."
  -- Tom Stoppard. "Night and Day"

Martin Gregorie - 21 Aug 2007 11:51 GMT
> No, IBM equipment generally collated numerics after alphabetics long
> before EBCDIC, even on machines, such as the 1401, where the binary
> representation was the other way. EBCDIC was designed specifically so as
> to continue this behavior.

Thanks for that correction. I came in via ICL kit in the late 60s, when
S/360 had largely replaced the 1400, so I never understood EBCDIC until
the ICL 2900 (which used EBCDIC) replaced the 1900 around 1980. ICL 1900
mainframes used the 6 bit ISO alternate character set. This sorted in
the order space, numeric, alphabetic.

> There are many ways of collating. For one dramatic example, US telephone
> directories traditionally collate space after alphanumerics, so that AAA
> comes before AA.

6 bit ISO, as the 1900 used it, had two shifts (IIRC you used the SI and
SO characters to switch between them), so a sort had to really jump
through hoops if you were using mixed case keys and mixed case lookups
were a real horror which, thankfully, I managed to avoid.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

Stefan Ram - 18 Aug 2007 23:41 GMT
>I suspect the reason was pedantry.  It would be even harder to
>get students to understand the difference between the number 0
>and the character 0 if they had the same binary representation.

 On teletypes, one could get certain effects with the bit
 patterns »NUL« (»0000000«) and »DEL« (»1111111«).

 DEL will punch all-holes, so it will erase any information.
 
 When the motor starts, the first characters sent might be
 lost, so sending some NULs at the start of a transmission will
 give the motor time to start.

 This dictated that the blocks containing those bit patterns
 had to be control blocks in X3.4-1963.

 I am working on a German language page about X3.4-1963:

http://www.purl.org/stefan_ram/pub/ascii_1963_de
Roedy Green - 19 Aug 2007 00:18 GMT
>  I am working on a German language page about X3.4-1963:

Then did ASCII come out.  I recall talking with Vern Detwiler (who
later founded MacDonald Detwiler) about what character set we should
use for the new IBM 7044. Back then each university devised it own
character set. I remember him talking about same new fangled 7-bit
code called ASCII. He was devising our 6-bit code to be as compatible
as possible with it.

Back then I was using 4 and 6 bit paper tape. Punch cards were mostly
1 or 2 holes of a possible 12.  Later I used TTYs. I forget how many
holes wide their tape was, though I certainly remember editing
programs with paper tape, where you would copy up the error, type the
correction, manually space over the error, and resume the copy at
perhaps the blinding speed of 15 cps.  It seems amazing what I was
able to accomplish with such primitive editing tools.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Martin Gregorie - 19 Aug 2007 17:38 GMT
> Back then I was using 4 and 6 bit paper tape. Punch cards were mostly
> 1 or 2 holes of a possible 12.  Later I used TTYs. I forget how many
[quoted text clipped - 3 lines]
> perhaps the blinding speed of 15 cps.  It seems amazing what I was
> able to accomplish with such primitive editing tools.

I always liked paper tape. It was less bulky than cards and you didn't
need to find a card sorter or spend hours rebuilding the deck if you
dropped it. Tangles? Just throw the tape out a top floor window or down
the stair well (remembering to keep a grip on one end) and rewind it.

The only advantages of cards were that they were great for shopping
lists and you could make a neat glider from two cards and a pencil.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

Mike Schilling - 19 Aug 2007 18:13 GMT
> I always liked paper tape. It was less bulky than cards and you didn't
> need to find a card sorter or spend hours rebuilding the deck if you
[quoted text clipped - 3 lines]
> The only advantages of cards were that they were great for shopping
> lists and you could make a neat glider from two cards and a pencil.

If you throw a card deck from a high window, it becomes nice (if oversized)
confetti.
Martin Gregorie - 19 Aug 2007 21:17 GMT
> If you throw a card deck from a high window, it becomes nice (if oversized)
> confetti.

Quite.

And the sorter was no use unless you put sequence numbers on the deck and
maintained it as well. That's why the original COBOL spec had a 6 digit
sequence number at the start of every line.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

Patricia Shanahan - 19 Aug 2007 18:19 GMT
>> Back then I was using 4 and 6 bit paper tape. Punch cards were mostly
>> 1 or 2 holes of a possible 12.  Later I used TTYs. I forget how many
[quoted text clipped - 11 lines]
> The only advantages of cards were that they were great for shopping
> lists and you could make a neat glider from two cards and a pencil.

There were some other advantages:

1. Content printing on each card. I could never, even when I was
handling paper tape a lot, read ASCII codes as fast as I could read
printed text.

2. Ease of changes in the middle of a file. The two procedures for tape
were the one Roedy described above, and physical cut-and-splice.
Splicing increased the risk of mechanical problems. Contrast that with
inserting and removing cards in the middle of a card deck.

Patricia
Martin Gregorie - 19 Aug 2007 22:00 GMT
>> The only advantages of cards were that they were great for shopping
>> lists and you could make a neat glider from two cards and a pencil.

I forgot a third: the chads made great, if itchy confetti.

And a fourth: card correction by pushing chad(s) into holes before
punching new ones with a 12 key hand punch. This only worked if, like
us, you used optical card readers that didn't flex the cards.

> There were some other advantages:
>
> 1. Content printing on each card. I could never, even when I was
> handling paper tape a lot, read ASCII codes as fast as I could read
> printed text.

I used to be able to read enough (newline, tab, space, numbers) to find
the right place on a tape.

As regards cards: our programmer's standby, the 12 key hand punch,
didn't print, so I learnt to read card codes at a good rate. Later we
were given printing hand punches but they were like a Dymo tape punch:
you had to dial the character and then hit to PUNCH bar to punch a
column. They were slow as hell: we hated them and used the old 12 key
punches by preference. I wish I'd had the sense to liberate one of the
12 key punches when they were phased out. They were marvelous Victorian
engineering: the best ones had cast iron bodies with a riveted-on brass
name plate saying "British Tabulating Machine Company". Their punches
never got blunt or jammed and they never wore out.

> 2. Ease of changes in the middle of a file. The two procedures for tape
> were the one Roedy described above, and physical cut-and-splice.
> Splicing increased the risk of mechanical problems.

Not if done right. I only used tape in anger at University to write
Algol 60 for an Elliott 503, the only machine I know that was faster at
floating point than integer arithmetic. Very appropriate seeing that it
was a scientific machine. But I digress....

We used to leave a foot or so of runout between procedure declarations
and in other suitable places, so we never had to copy & edit more than a
few feet of tape and splices never overlapped punched tape. IIRC we used
thin plastic heat-seal splicing tape. I don't remember having failed
splices or tape wrecks due to splices.

> Contrast that with
> inserting and removing cards in the middle of a card deck.

Actually, we only used a large program pack once and then slung them
because, even in 1968, we kept all program source on tape. Once a source
had been loaded we used small decks to edit the source on tape. The
programmer's overnight run started with a batch edit run that did
everybody's edits. This was followed by a batch compile. After that
individual test shots were run from the tape holding the compiled
programs. That was on an ICL 1900. By 1970 we'd moved our sources to
disk and the card decks had become individual edit/compile/test jobs for
George 1. A typical job pack would be no more than 50-100 cards. You
kept and reshuffled the commands, replacing the edits and test data as
needed.

I may be misremembering, but I have the impression that IBM mainframe
shops retained source as card decks a lot longer than we did. Certainly,
when I did a job in an IBM System/3 shop in NYC in 1976 all program
sources and, indeed, the master files as well were still on cards: those
nasty little 96 column jobbies.

Eee, lad. Tell that to the young people of today and they'll not believe
you.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

Roedy Green - 20 Aug 2007 11:50 GMT
On Sun, 19 Aug 2007 22:00:54 +0100, Martin Gregorie
<martin@see.sig.for.address> wrote, quoted or indirectly quoted
someone who said :

>I may be misremembering, but I have the impression that IBM mainframe
>shops retained source as card decks a lot longer than we did

Univac required mainframes to be sold with a card reader at least as
late as 1976. Card readers were perfected  shortly after they went
obsolete. Air fanned the cards and sucked the top card off the deck.
Early ones used a knife edge picker that shredded any card with an
tiny burr to the edge.  You had to keep reproducing entire  decks to
keep the edges clean.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 20 Aug 2007 14:37 GMT
>2. Ease of changes in the middle of a file. The two procedures for tape
>were the one Roedy described above, and physical cut-and-splice.
>Splicing increased the risk of mechanical problems. Contrast that with
>inserting and removing cards in the middle of a card deck.

The old mechanical equipment was much more impressive than today's
pizza boxes. An optical paper tape reader shot tape out so fast it
formed a 12 foot stream in the air.  A 300 LPM printer thundered with
the majesty of a Robocop. I was shocked, never having seen printing
faster than about 45 CPS before.  Unit record equipment made all
manner of whirring and kachunking noises that would shake the
building.  I remember writing a device drive for a Univac OCR device.
You had X milliseconds to decide what to do with the document after
you read it, which pocket to direct it to. It was a strange thing made
of rubber belts. On a 16K machine we did multithread lookahead i/o --
something modern Java programs still do NOT do.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Martin Gregorie - 20 Aug 2007 17:54 GMT
> The old mechanical equipment was much more impressive than today's
> pizza boxes. An optical paper tape reader shot tape out so fast it
> formed a 12 foot stream in the air.

Yep. ICL used Elliott 1200 cps paper tape readers, so it moved at 120
ins/sec. Big arcs of tape. The most impressive jam I ever saw was when
a bit of sticky tape got left on the end of a reel, which caught on the
drive roller. The reader pulled tape out of the bin at 120 ins.sec until
 the space between roller and its guard was jammed solid and the reader
stalled. Even the engineers were impressed - and took forever to clear
the reader.

> A 300 LPM printer thundered with
> the majesty of a Robocop. I was shocked, never having seen printing
> faster than about 45 CPS before.

We had a 1250 lpm drum printer. It was generally noisy but when you
printed a line of asterisks it made the most godawful KLANG as all 132
print hammers hit the drum simultaneously. It was a sufficiently fast
printer to need a power stacker which pulled in the paper to stack it:
the machine could page throw at about 3 feet a second and the stacker
had to keep up.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

Roedy Green - 21 Aug 2007 11:31 GMT
On Mon, 20 Aug 2007 17:54:00 +0100, Martin Gregorie
<martin@see.sig.for.address> wrote, quoted or indirectly quoted
someone who said :

>We had a 1250 lpm drum printer. It was generally noisy but when you
>printed a line of asterisks it made the most godawful KLANG as all 132
>print hammers hit the drum simultaneously. It was a sufficiently fast
>printer to need a power stacker which pulled in the paper to stack it:
>the machine could page throw at about 3 feet a second and the stacker
>had to keep up.

I presume you were an "operator" at some point in your career and had
a faulty mylar tape loop that controlled the vertical tab stops on the
printer, causing the paper to slew endlessly at full rate.  If it
happened when the covers were up you had an great arc in the air.  If
closed, it packed the printer cover tight as a mummy case. To stop it
you stomped your foot on the input paper box to break the paper.

Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Martin Gregorie - 22 Aug 2007 00:47 GMT
> I presume you were an "operator" at some point in your career and had
> a faulty mylar tape loop that controlled the vertical tab stops on the
> printer, causing the paper to slew endlessly at full rate.  If it
> happened when the covers were up you had an great arc in the air.  If
> closed, it packed the printer cover tight as a mummy case. To stop it
> you stomped your foot on the input paper box to break the paper.

We were a small service bureau with a 1903S to keep busy. Among the
systems staff we did everything - analyzed, designed, coded and, when
necessary, operated too. I was never good enough to know what George 3
wanted by listening to the control teletype, but I could tell "LP 3 FIX"
when I was lining up paper from requests to, e.g. load a magnetic tape.
I knew operators who could drive the system entirely off sound for an
hour or so when the teletype's print head failed.

I don't remember our fast printer ever turning into a paper fountain -
or the paper tape loop breaking, but we did tend to use tougher material
than plain paper tape for production loops. I seem to remember that the
1900 printer would only throw about 3 feet of paper (i.e. about two
pages) before timing out and stopping.  I know for sure that I never
broke the feed paper to stop the printer.

The 2900 printers were a nice improvement: they used a software
implementation of the paper loop and as well as telling the spooler what
sort of paper the job needed, you also told it what control loop to load.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

John W. Kennedy - 21 Aug 2007 02:50 GMT
> On Sat, 18 Aug 2007 16:11:45 +0100, Martin Gregorie
> <martin@see.sig.for.address> wrote, quoted or indirectly quoted
[quoted text clipped - 8 lines]
> students to understand the difference between the number 0 and the
> character 0 if they had the same binary representation.

ASCII was designed more for telegraphy and interchange media than for
internal use.

Signature

John W. Kennedy
"Information is light. Information, in itself, about anything, is light."
  -- Tom Stoppard. "Night and Day"

Mike Schilling - 21 Aug 2007 06:06 GMT
>> On Sat, 18 Aug 2007 16:11:45 +0100, Martin Gregorie
>> <martin@see.sig.for.address> wrote, quoted or indirectly quoted
[quoted text clipped - 11 lines]
> ASCII was designed more for telegraphy and interchange media than for
> internal use.

You mean it was invented for one purpose, pressed into use for another, and
is still being used for the one it's not well suited for, long after the one
it was designed for has more or less disappeared?  Geez, how often does that
happen? :-)
Lew - 21 Aug 2007 06:13 GMT
Roedy Green wrote:
>>> ASCII has the same strangeness.

John W. Kennedy wrote:
>> ASCII was designed more for telegraphy and interchange media than for
>> internal use.

> You mean it was invented for one purpose, pressed into use for another, and
> is still being used for the one it's not well suited for, long after the one
> it was designed for has more or less disappeared?  Geez, how often does that
> happen? :-)

Set to music, it's a vital tool for corporate advancement:

You gotta do some ASCII sing.

Signature

Lew

John W. Kennedy - 21 Aug 2007 02:39 GMT
> And no,
> I have no idea why 0-9 are F0-F9 rather than 00-09.

To match the existing collating sequences. (Many pre-360 machines
implemented, in their hardware, collating sequences that did not
correspond to the binary values of their character encodings; EBCDIC was
designed so that the 360 would not have that anomaly.)

Signature

John W. Kennedy
"Never try to take over the international economy based on a radical
feminist agenda if you're not sure your leader isn't a transvestite."
  -- David Misch:  "She-Spies", "While You Were Out"

Roedy Green - 21 Aug 2007 11:34 GMT
On Mon, 20 Aug 2007 21:39:59 -0400, "John W. Kennedy"
<jwkenne@attglobal.net> wrote, quoted or indirectly quoted someone who
said :

>To match the existing collating sequences. (Many pre-360 machines
>implemented, in their hardware, collating sequences that did not
>correspond to the binary values of their character encodings; EBCDIC was
>designed so that the 360 would not have that anomaly.)

I don't follow.  EBCDC '0' is not binary 0. Further , IIRC, the
letters A-Z and a-z are not contiguous blocks of binary assignments.
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Martin Gregorie - 22 Aug 2007 01:04 GMT
> On Mon, 20 Aug 2007 21:39:59 -0400, "John W. Kennedy"
> <jwkenne@attglobal.net> wrote, quoted or indirectly quoted someone who
[quoted text clipped - 7 lines]
> I don't follow.  EBCDC '0' is not binary 0. Further , IIRC, the
> letters A-Z and a-z are not contiguous blocks of binary assignments.

I think the approach is fairly clear: you adjust the binary code values
so that sorting on ascending code value gives you the collation sequence
 you want. In the case of EBCDIC that's pretty weird because the gaps
between I and J and between R and S are not empty: they contain a wild
assortment of punctuation and other symbols.

John says that the collation sequence predates EBCDIC. I'll go further
and guess that it predates computers as well. It was most likely defined
by IBM's original card sorters: businesses were running card-based
accounting systems in the '30s if not earlier using a room full of
sorters, collators and other electro-mechanical monsters.

FWIW the Manhattan Project calculations for the plutonium bomb design
were run using IBM card handling kit under the direction of Richard
Feynman. It was a faster replacement for the armies of girls with
hand-cranked Monroe calculators who had been doing the job. IIRC Feynman
thought up the idea of using punched cards.

Signature

martin@   | Martin Gregorie
gregorie. | Essex, UK
org       |

John W. Kennedy - 22 Aug 2007 22:03 GMT
>> On Mon, 20 Aug 2007 21:39:59 -0400, "John W. Kennedy"
>> <jwkenne@attglobal.net> wrote, quoted or indirectly quoted someone who
[quoted text clipped - 13 lines]
> between I and J and between R and S are not empty: they contain a wild
> assortment of punctuation and other symbols.

They do /now/, in EBCDIC-version-of-ISO-8859-1 and the like. But all
those spaces were empty in 1964. Apart from control characters and
lower-case letters, the original EBCDIC had only about 64 characters. So
the 64 characters of traditional BCD collated in EBCDIC more or less as
they always had, but with a straight binary compare, instead of special
collating-sequence hardware.

(That special hardware is why the basic-model 1401 Compare instruction
could only compare equal/not-equal; numerics could be high/low/equal
compared with a subtraction, but if you wanted to high/low/equal compare
alphameric data, you had to both buy a hardware add-on and accept
slighly reduced CPU performance.)

Mainframes are slowly moving away from EBCDIC, of course. The newest
System Z machines include full support of Unicode, including opcodes to
translate among UTF-8, UTF-16, and UTF-32.

> John says that the collation sequence predates EBCDIC. I'll go further
> and guess that it predates computers as well. It was most likely defined
> by IBM's original card sorters: businesses were running card-based
> accounting systems in the '30s if not earlier using a room full of
> sorters, collators and other electro-mechanical monsters.

Pretty much, yes.

Signature

John W. Kennedy
If Bill Gates believes in "intelligent design", why can't he apply it to
Windows?

Andreas Leitgeb - 26 Jul 2007 10:52 GMT
>      I am trying to read an int from a file written in binary.  After
> researching that archaic file structure, I have found that it is stored as
> little endian/ least significant first.

you've got two ways from here:
1.) read it in as an integer, and then do mask&shift-magic
on the integer to obtain an endian-swapped version of it.
2.) read four bytes separately, and compose them to an integer.

anyway, you need to be aware of how the separate bits
consitute the final result:
 in the stream you have  b1 b2 b3 b4    four bytes.
The integer value, you want, is:
 0x1*b1 + 0x100*b2 + 0x10000*b3 + 0x1000000*b4
by nature of little ends :-)

Multiplication by these constants is equivalent to
*left*-shifting by 0,8,16,24 bits respectively.
(division would be *right*-shifting)

If you read in the integer canonically from stream, you
actually get this number:
 0x1000000*b1 + 0x10000*b2 + 0x100*b3 + 0x1*b4
by nature of big ends.

So you'd have to do shifting, masking and finally adding
to re-arrange the bit-patterns of the integer.

Sometimes, shifting does the masking for you: if you
divide the whole number by 0x1000000 ( >>24 ), it's obvious,
that only b1 remains.
If you right-shift by 8 bits, then obviously
  0x10000*b1 + 0x100*b2 + 0x1*b3  remains, and after
masking with 0xff00, only 0x100*b2 remains, another one
of the parts which your desired result consists of.

That is: to extract the b1 from that wrong-endian int,
 you'll just to first *right*-shift it by 24 bits, the
 other bits vanishing themselves, so no masking necessary
 here.  For b2, you'd shift the original number only 8 bits
 to the *right*, (to change it's factor from 0x10000 to 0x100),
 and then mask it with 0xff00, (adapted to b2's bits' target
 position.
 For b3 you first mask (again the original value) with 0xff00
 and then *left*-shift 8 bits and for b4 you only need to
 *left*-shift the original number by 24 bits, like b1 no
 masking necessary.
All these separately shifted octets are then re-assembled,
either with or-operator "|", or (in this case equivalently) by adding.

I hope, it helped and wasn't itself more complicated than the
original problem ;-)
Lew - 26 Jul 2007 14:30 GMT
>>      I am trying to read an int from a file written in binary.  After
>> researching that archaic file structure, I have found that it is stored as
[quoted text clipped - 48 lines]
> I hope, it helped and wasn't itself more complicated than the
> original problem ;-)

You can also use a java.nio.IntBuffer, which "knows" about endianness through
its java.nio.ByteOrder.

Signature

Lew

John W. Kennedy - 02 Aug 2007 02:26 GMT
>>      I am trying to read an int from a file written in binary.  After
>> researching that archaic file structure, I have found that it is stored as
[quoted text clipped - 4 lines]
> on the integer to obtain an endian-swapped version of it.
> 2.) read four bytes separately, and compose them to an integer.

3.) in Java 1.5 and up, read it as an integer and then use
Short.reverseBytes(), Integer.reverseBytes(), or Long.reverseBytes(), as
appropriate.

Signature

John W. Kennedy
"The first effect of not believing in God is to believe in anything...."
  -- Emile Cammaerts, "The Laughing Prophet"

Andreas Leitgeb - 05 Aug 2007 00:24 GMT
> 3.) in Java 1.5 and up, read it as an integer and then use
> Short.reverseBytes(), Integer.reverseBytes(), or Long.reverseBytes(), as
> appropriate.

You're of course right, but I (perhaps mis-)understood the
original poster that he wanted to understand the details of
bit-shifting used for byte-reversing an int.
Roedy Green - 26 Jul 2007 14:21 GMT
> have seen code to read this as an
>int in Java but don't understand what is happening.  I guess that I just
>don't understand what is going on with the shifting of bits and "anding"
>with other values.
see http://mindprod.com/jgloss/endian.html
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Roedy Green - 26 Jul 2007 14:35 GMT
> shifting of bits and "anding"
for general background an bit fiddling see
http://mindprod.com/jgloss/binary.html
Signature

Roedy Green Canadian Mind Products
The Java Glossary
http://mindprod.com

Nigel Wade - 26 Jul 2007 14:48 GMT
> Howdy folks,
>      I am trying to read an int from a file written in binary.  After
> researching that archaic file structure, I have found that it is stored as
> little endian/ least significant first.  

An interesting viewpoint, binary data as an "archaic" file structure. I wonder
how you would store your data in a non-binary form. Don't forget that a "text"
file is simply binary interpreted in a very specific way, and one persons
(ASCII) "text" file may be another persons (EBCDIC) binary garbage.

> I have seen code to read this as an  
> int in Java but don't understand what is happening.  I guess that I just
[quoted text clipped - 7 lines]
>
> Thanks!

I will do my best to explain - without using any code.

Endian-ness is fun. It adds excitement and joy to the otherwise tedious task of
developing portable code to read arbitrary binary data formats. Java has made
the life of the data processor much less interesting by taking this task and
wrapping it up in the ByteBuffer class. However, for the purposes of learning
it is a good thing to understand what it going on behind the scenes.

There are [essentially] two types of endianess, big-endian and little-endian.
Big-endian hardware stores bytes in memory in their "natural" format, with the
"big" end on the "left" (lower memory address). Little-endian hardware was
designed to do the opposite, just to be awkward.

Lets assume we have 3 variables, containing a char, a 16bit int ("short") and a
32bit int ("long"). We'll assign the hex. values of 0x11, 0x1122 and 0x11223344
to these variables respectively. If these variables occupied consecutive memory
addresses (or were output to binary file in sequence) on big-endian hardware
the contents of memory would be 0x11, 0x11, 0x22, 0x11, 0x22, 0x33, 0x44. On
little-endian hardware the values would be 0x11, 0x22, 0x11, 0x44, 0x33, 0x22,
0x11. As you can see, little-endian hardware has reversed the bytes of each
value. (NOTE: If you write binary data from Java it is *always* output in
big-endian order).

If you write the data to a file and read it back on the same hardware using the
same variable types [and the same language] then there is no problem. The bytes
will be stored in the correct locations and the variables will have the correct
contents. The fun comes when you read the data as a byte array, or attempt to
read it on the other type of hardware or use a language which makes different
assumptions about the type of data.

To see how it all goes horribly wrong lets try to read the little-endian data
file (written by some language other than Java) into Java. Remember, the order
of the bytes in the little-endian binary file is 11221144332211. So we read the
first byte and treat it as a byte, and this is ok. Next we read the two bytes
0x22 and 0x11 and get the short integer 0x2211, not what we wanted at all. The
situation is the same for the "long" integer which will contain 0x44332211.
This is where byte shifting and masking becomes necessary (if you don't use
Java or don't use ByteBuffer in Java), the contents of the "short" and "long"
integers have to be reversed.

You can do this more easily by reading into a byte array and extracting the
correct bytes. For example, for the 4-byte "long" integer, reading the bytes
into a byte array you will get array[0]=0x44, array[1]=0x33, array[2]=0x22 and
array[3]=0x11. To construct the correct integer (0x11223344) you need to shift
array[3] left 24 places so it becomes 0x11000000, combine that with array[2]
left shifted 16 bits (0x220000) etc. How you write the code to do this is up to
you, I said I wouldn't use any code.

Signature

Nigel Wade, System Administrator, Space Plasma Physics Group,
           University of Leicester, Leicester, LE1 7RH, UK
E-mail :    nmw@ion.le.ac.uk
Phone :     +44 (0)116 2523548, Fax : +44 (0)116 2523555

NoNeYa - 28 Jul 2007 04:40 GMT
>> Howdy folks,
>>      I am trying to read an int from a file written in binary.  After
[quoted text clipped - 8 lines]
> file is simply binary interpreted in a very specific way, and one persons
> (ASCII) "text" file may be another persons (EBCDIC) binary garbage.

I think I may have worded that a little odd.  I meant to say "The *.DBF file
structure itself is archaic, but it does use binary storeage in it's
header".

>> I have seen code to read this as an
>> int in Java but don't understand what is happening.  I guess that I just
[quoted text clipped - 91 lines]
> up to
> you, I said I wouldn't use any code.
Mark Space - 30 Jul 2007 20:33 GMT
> I think I may have worded that a little odd.  I meant to say "The *.DBF file

So did any of these explanations help you out?
NoNeYa - 30 Jul 2007 21:37 GMT
>> I think I may have worded that a little odd.  I meant to say "The *.DBF
>> file
>
> So did any of these explanations help you out?

Some have clarified the situation "some-what".  I do realize that I am in
"way over my head" for my level of education in programming.  I am
continuing to learn more elsewhere and have asked one of my professors for
addition sources of reading.  I thank all that responded.  To directly
answer your question, all of the replies have educated me somewhat, but I am
still underwater looking up at the top.  I am using other sources to learn
more and have purchased another book to read.  I just wish I could find a
book that deals with reading binary and using binary operators, in a "baby
step" process with great explanation and code examples.  The problem isn't
resolved.... but I ain't givin' up yet!

Thanks.


Free Magazines

Get these publications absolutely FREE for up to 12 months. There are no hidden fees and no obligation. Simply choose a title, complete the application form and submit it. Read more ...

Oracle MagazineNetwork ComputingComputer WorldBio-IT WorldeWeekInformation WeekInfosecurity
 
Sign In
Join
My Latest Posts
My Monitored Threads
My Blog
My Photo Gallery
My Profile
My Homepage

Start New Thread
Enable EMail Alerts
Rate this Thread



©2008 Advenet LLC   Privacy Policy - Terms of Use
This website includes both content owned or controlled by Advenet as well as content owned or controlled by third parties.