68000 Assembler by Paul McKee User's Manual Table of Contents 1. Introduction ............................. 2 2. Source Code Format ....................... 3 2.1 Source Line Format....................... 3 2.1.1 Label Field............................ 3 2.1.2 Operation Field........................ 3 2.1.3 Operand Field.......................... 3 2.1.4 Comment Field.......................... 4 2.2 Symbols.................................. 4 2.3 Expressions.............................. 4 2.3.1 Operands in Expressions................ 4 2.3.1.1 Decimal Numbers...................... 4 2.3.1.2 Hexadecimal Numbers.................. 4 2.3.1.3 Binary Numbers....................... 5 2.3.1.4 Octal Numbers........................ 5 2.3.1.5 ASCII Constants...................... 5 2.3.2 Operators in Expressions............... 5 2.4 Addressing Mode Specifications........... 6 3. Assembly Details ......................... 7 3.1 Branch Instructions...................... 7 3.2 MOVEM Instruction........................ 7 3.3 Quick Instructions (MOVEQ, ADDQ, SUBQ)... 8 4. Assembler Directives ..................... 9 4.1 ORG - Set Origin......................... 9 4.2 Symbol Definition Directives............. 9 4.2.1 EQU - Equate Symbol.................... 9 4.2.2 SET - Set Symbol....................... 9 4.2.3 REG - Register List Symbol............. 10 4.3 Data Storage Directives.................. 10 4.3.1 DC - Define Constant................... 10 4.3.2 DCB - Define Constant Block............ 11 4.3.3 DS - Define Storage.................... 12 4.4 END - End of Source File................. 13 4.5 INCLUDE - directive...................... 13 5. Usage .................................... 14 5.1 Command Line............................. 14 5.2 Listing File Format...................... 14 5.3 Object Code File Format.................. 15 1 2 1. Introduction The program described here, 68000 Assembler, is a basic two- pass assembler for the 68000 and 68010 microprocessors. It supports the complete instruction set of both processors as well as a modest but capable set of assembler directives. The program produces formatted listing files as well as object code files in S-record format. The program was written in VAX-11 C by Paul McKee during the fall semester, 1986. The program should be portable (with some changes) to any C language implementation that supports 32-bit integers. 3 2. Source Code Format 2.1 Source Line Format The input to the assembler is a file containing instruc tions, assembler directives, and comments. Each line of the file may be up to 256 characters long. It is recommended, however, that the source lines be no longer that 80 characters, as this will guarantee that the lines of the listing file do not exceed 132 characters in length. The assembler treats uppercase and lowercase identically. Each line of the source code consists of the following fields: LABEL OPERATION OPERAND,OPERAND,... COMMENT For example, LOOP MOVE.L (A0)+,(A1)+ Sample source line The fields may be separated by any combination of spaces and tabs. Except for the comment field and quoted strings, there must be no spaces or tabs within a field. 2.1.1 Label Field Legal labels follow the rules for forming symbol names described in section 2.2. Labels may be distinguished in one of two ways: (1) They may begin in column 1, or (2) they may end in a colon, which does not become part of the label but simply serves to mark its end. A line may consist of a label alone. When a label is encountered in the source code, it is defined to have a value equal to the current location counter. This symbol may be used elsewhere is the program to refer to that location. 2.1.2 Operation Field The operation field specifies the instruction that is to be assembled or the assembler directive that is to be performed. A size code (.B, .W, .L, or .S) may be appended to the operation code if allowed, to specify Byte, Word, Long, or Short opera tions, respectively. The operation field must not begin in the column 1, because the operation would be confused with a label. 2.1.3 Operand Field The operand field may or may not be required, depending on the instruction or directive being used. If present, the field consists of one or more comma-separated items with no intervening spaces or tabs. (There may be spaces or tabs within an item, but only within quoted strings.) 4 2.1.4 Comment Field The comment field usually consists of everything on a source line after the operand field. No special character is needed to introduce the comment, and it may contain any characters desired. A comment may also be inserted in the source file in another way: An asterisk ("*") at the beginning of the line or after the label field will cause the rest of the line to be ignored, i.e., treated as a comment. 2.2 Symbols Symbols appear in the source code as labels, constants, and operands. The first character of a symbol must be either a letter (A-Z) or a period ("."). The remaining characters may be letters, dollar signs ("$"), periods ("."), or underscores("_"). A symbol may be of any length, but only the first 8 characters are significant. Remember that capitalization is ignored, so symbols which are capitalized differently are really the same. 2.3 Expressions An expression may be used in the source program anywhere a number is called for. An expression consists of one or more operands (numbers or symbols), combined with unary or binary operators. These components are described below. The value of the expression and intermediate values are always computed to 32 bits, with no account being made of any overflow that may occur. (Division by zero, however, will cause an error.) 2.3.1 Operands in Expressions An operand in an expression is either a symbol or one of the following sorts of constants. 2.3.1.1 Decimal Numbers A decimal number consists of a sequence of decimal digits (0-9) of any length. A warning will be generated if the value of the number cannot be represented in 32 bits. 2.3.1.2 Hexadecimal Numbers A hexadecimal number consists of a dollar sign ("$") fol lowed by a sequence of hexadecimal digits (0-9 and A-F) of any length. A warning will be generated if the value of the number cannot be represented in 32 bits. 5 2.3.1.3 Binary Numbers A binary number consists of a percent sign ("%") followed by a sequence of binary digits (0 and 1) of any length. A warning will be generated if the number consists of more that 32 digits. 2.3.1.4 Octal Numbers An octal number consists of a commercial at sign ("@") followed by a sequence of octal digits (0-7) of any length. A warning will be generated if the value of the number cannot be represented in 32 bits. 2.3.1.5 ASCII Constants An ASCII constant consists of one to four ASCII characters enclosed in single quote marks. If it is desired to put a single quote mark inside an ASCII constant, then two consecutive single quotes may be used to represent one such character. If the ASCII constant consists of one character, then it will be placed in the bottom byte of the 32 bit value; two characters will be placed in the bottom word, with the first character in the higher-order position. If four characters are used, then all four bytes will contain characters, with the first in the highest-order location. However, if three characters are used, then they will be placed in the three highest-order bytes of the 32-bit value, with 0 in the low byte (this is to accom modate the high-byte-first addressing used on the 68000). Note that ASCII constants in expressions are different from strings in DC directives, as the latter may be of any length. 2.3.2 Operators in Expressions The operators allowed in expressions are shown in the fol lowing table, in order of decreasing precedence. Within each group, the operators are evaluated in left-to-right order (except for group 2, which is evaluated right-to-left). Operators in Expressions 1. () Parenthesized subexpressions 2. - Unary minus (two's complement) ~ Bitwise not (one's complement) 3. << Shift left >> Shift right 4. & Bitwise and ! Bitwise or 5. * Multiplication / Integer division \ Modulus (x\y produces the remainder of x divided by y) 6. + Addition - Subtraction 6 2.4 Addressing Mode Specifications The 68000 and 68010 provide 14 general addressing modes. The formats used to specify these modes in assembly language programs are listed in the table below. The following symbols are used to describe the operand formats: Dn = Data Register An = Address Register (SP may used instead of A7) Xn = Data or Address register .s = Index register size code (either .W or .L, .W will be assumed if omitted) = Expression that evaluates to an 8-bit value (may be empty, in which case 0 will be used) = Expression that evaluates to a 16-bit value (may be empty, in which case 0 will be used) = Any expression PC = Program Counter Addressing Mode Specifications Mode Assembler Format --------------------------------------------- ---------------- Data Register Direct Dn Address Register Direct An Address Register Indirect (An) Address Register Indirect with Predecrement -(An) Address Register Indirect with Postincrement (An)+ Address Register Indirect with Displacement (An) Address Register Indirect with Index (An,Xn.s) Absolute Short or Long (chosen by assembler) Program Counter with Displacement (PC) Program Counter with Index (PC,Xn.s) Immediate # In addition to the general addressing modes, the following register names may be used as operands in certain instructions (e.g., MOVEC or EORI to CCR): SR = Status Register CCR = Condition Code Register USP = User Stack Pointer VBR = Vector Base Register (68010) SFC = Source Function Code Register (68010) DFC = Destination Function Code Register (68010) 7 3. Assembly Details 3.1 Branch Instructions The branch instructions (Bcc, BRA, and BSR) are unique in that they can take a ".S" size code. This suffix directs the assembler to assemble these as short branch instructions, i.e., one-word instructions with a range to -128 to +127 bytes. If the ".S" size code is used, and the destination is actually outside this range, then the assembler will print an error message. If the ".L" size code is used, the assembler will use a long branch, which is a two-word instruction with a range of -32768 to +32767 bytes. If neither size code is specified, then the assembler will use a short branch if possible (the branch destination must be known on the first pass to be within the short branch range); otherwise it will use long branch. 3.2 MOVEM Instruction The MOVEM instruction, which is used for saving and restor ing sets of registers, has one the following two forms: MOVEM , MOVEM , The register list may be an explicit register list of the form described in Section 4.2.3. On the other hand, if a particular set of registers is to be saved and restored repeatedly, the REG directive (Section 4.2.3) can be used to define a register list symbol that specifies the registers. For example, if the regis ter list symbol WORKSET is defined as follows: WORKSET REG A0-A4/D1/D2 then the following instructions will perform the same function: MOVEM.L WORKSET,-(SP) MOVEM.L A0-A4/D1/D2,-(SP) If a register list symbol is used, it must be defined before it appears in any MOVEM instructions. 8 3.3 Quick Instructions (MOVEQ, ADDQ, SUBQ) The MOVE, ADD, and SUB instructions have one-word "quick" variations which can be used certain addressing modes and operand values. The assembler will use these faster variations automat ically when possible, or they may be specified explicitly by writing the mnemonic as MOVEQ, ADDQ, or SUBQ. The MOVEQ instruction may be used for moving an immediate value in the range -128 to +127 into a data register. The assembler will assemble a MOVE.L #,Dn as a MOVEQ if the value is known on the first pass. The ADDQ (SUBQ) instruction adds (subtracts) an immediate value from 1 to 8 to (from) any alterable destination. The assembler will use the quick form if the value is known on the first pass to be in the range 1 to 8. 9 4. Assembler Directives 4.1 ORG - Set Origin The assembler maintains a 32-bit location counter, whose value is initially zero and which is incremented by some amount whenever an instruction is assembled or a data storage directive is carried out. The value of this location counter may be set with the ORG directive. This is typically done at the start of a program and at appropriate places within it. The format of the ORG directive is