summaryrefslogtreecommitdiffstats
path: root/segelf.txt
blob: 0050316dc681690656502b59472db8c32d2ba428 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
ABI for 16-bit real mode segmented code in ELF
----------------------------------------------

H. Peter Anvin
Version: 2019-01-10

16-bit segmented code in ELF is implemented with a combination of
three new relocations and a set of software conventions. This document
describes both.

The extensions are implemented in such a way that mixed-mode
programming is possible, as well, with the binary format explicitly
exposing segment-relative and absolute relocations.


Requirements
------------

16-bit code relies on a combination of segment types:

1. NEAR segments are addressed from a common segment base, and the
   segment registers are generally kept at a fixed value. All NEAR
   segments combined may not exceed 64K.

2. FAR segments are addressed from a segment base specific to that
   segment. Any one FAR segment may not exceed 64K.

3. HUGE segments are addressed from a segment base specific to each
   data item in the segment. HUGE segments have no size limit other
   than the global address space limit of 1088K-16 bytes.

4. A PUBLIC segment can be combined with other segments of the same
   name using the same segment base.

5. A PRIVATE segment has a separate segment base for each translation
   unit.

6. Multiple PUBLIC segments can be grouped together with a common
   segment base. This is mainly used for NEAR segments, in particular
   the standard _DATA, _BSS and _STACK segments (and, in the "tiny"
   memory model, the _TEXT segment) are usually combined in a group
   called "DGROUP".

Mixed-mode programming furthermore requires a way to reference any
data item by flat linear address.


New ELF relocations
-------------------

The following new relocations are added to the ELF i386 psABI:

R_386_SEG16	45	word16		A + (S >> 4)
R_386_SUB16	46	word16		A - S
R_386_SUB32	47	word32		A - S

In accordance with the ELF gABI specification, multiple relocations at
the same address are cumulative. This is essential for the SUB
relocations to work.

These are the only extensions to the ELF format proper.


Software conventions
--------------------

1. Sections
-----------

A PRIVATE or HUGE segment is represented by a section without any
special attributes. A PRIVATE or HUGE segment section must have an
alignment of 16 or higher.

NOTE: using PRIVATE segments means subsections cannot be used.

A PUBLIC segment is represented as a pair of sections:

	section!
	section$

"section!" will contain symbols but no data (see below). "section$"
carries the actual contents of the section. The "!" section must
have an alignment of 16 or higher, but the "$" sections MAY have any
alignment.

Segment groups are handled by bundling ! sections in the linker script.

These sections are named such that sorting the sections by name will
put all the ! sections immediately before all the $ sections for
the same segment.

When using subsection variants intended to be merged into the same
segment, e.g. for merged strings, the compiler/assembler needs to EITHER:

a. Combine all symbols into a single ! section, without a suffix.

	_DATA!
	_DATA$
	_DATA$.strings

b. Add any suffix *after* the ! symbol.

	_DATA!
	_DATA$
	_DATA!.strings
	_DATA$.strings

In the interest of robustness compilers/assemblers should emit !
sections before the first associated $ section, preferably immediately
before.


2. Symbols
----------

Symbols contain, as is normal in ELF, linear addresses, including the
value of the segment base. Thus, a symbol located at 0x1234:0x5678
will have a value of (0x1234 << 4) + 0x5678 = 0x179b8. This also means
that flat 32-bit code can make direct use of this symbol in normal
fashion.

Each symbol is matched with an auxiliary symbol containing the
preferred segment base as a linear address. The name of the auxiliary
symbol associated with the symbol "foo" is "foo!". Accordingly, for
the example above, with foo at 0x1234:0x5678, we would have:

	foo   = 0x179b8
	foo!  = 0x12340

For a PRIVATE segment, these auxiliary symbols are simply placed at
the beginning of the section by the compiler/assembler.

For a PUBLIC segment, they are placed in the ! section corresponding
to the segment (however, the primary symbol is placed in the $
section.)

For a HUGE segment, the compiler/assembler should generate the !
symbols so that:

	symbol! = symbol & ~0xf

Undefined (external) references to these auxiliary symbols should be
marked WEAK. If the auxiliary symbol would contain the absolute value
0, it does not need to be emitted. This, again, simplifies mixed-mode
programming.



3. Use of relocations
---------------------

To access a symbol by its preferred segment base:

	mov ax,SEG symbol
	mov es,ax
	mov ax,[es:symbol]

	SEG symbol generates:

	R_386_SEG16	symbol!

	[symbol] generates:

	R_386_16	symbol
	R_386_SUB16	symbol!


To access a symbol relative to a different segment base:

	mov ax,[symbol wrt DGROUP]

	R_386_16	symbol
	R_386_SUB16	section DGROUP! + 0

To access a symbol relative to the segment base of a different symbol:

	mov ax,[symbol wrt seg othersymbol]

	R_386_16	symbol
	R_386_SUB16	othersymbol!

To access the absolute linear address of a symbol:

	mov eax,symbol wrt 0

	R_386_32	symbol


To access the address of a symbol versus a fixed segment base:

	mov ax,[video_rows wrt 40h]

	R_386_16	video_rows-0x400



4. Sample linker script
-----------------------

This linker script is applicable to the conventional DOS memory models
except the tiny model.

SECTIONS
{
	. = 0;

	far_TEXT : {
		*(SORT_BY_NAME(SORT_BY_ALIGNMENT(?*_TEXT*)))
	}
	far_DATA : {
		*(SORT_BY_NAME(SORT_BY_ALIGNMENT(?*_DATA*)))
	}

	_TEXT ALIGN(16) : {
		*(_START*!* _TEXT*!*)
		*(SORT_NONE(_START*))
		*(SORT_BY_ALIGNMENT(_TEXT*))
	}

	DGROUP (NOLOAD) ALIGN(16) : {
	       *(DGROUP*!* _DATA*!* _BSS*!* _STACK*!*)
	       PROVIDE(___bss_start! = .);
	       PROVIDE(___bss_end! = .);
	       PROVIDE(___stack_base! = .);
	       PROVIDE(___stack_top!  = .);
	}

	_DATA : {
		*(SORT_BY_ALIGNMENT(_DATA*))
	}

	PROVIDE(___filesize = .);

	_BSS : {
		PROVIDE(___bss_start = .);
		*(SORT_BY_ALIGNMENT(_BSS*) (COMMON))
		PROVIDE(___bss_end = .);
	}

	. = ALIGN(16);
	/* Default near stack/heap segment size, can be overridden */
	PROVIDE(___stack_size = 65536 + ADDR(DGROUP) - .);
	_STACK (NOLOAD) : {
		PROVIDE(___stack_base = .);
		. = . + ___stack_size;
		PROVIDE(___stack_top = .);
	}

	far_BSS ALIGN(16) : {
		PROVIDE(___farbss_start = .);
		*(SORT_BY_NAME(SORT_BY_ALIGNMENT(?*_BSS*)))
		. = ALIGN(16);
		PROVIDE(___farbss_end = .);
	}

	PROVIDE(___end = .);
}
ENTRY(__start)


This linker script is applicable to the tiny DOS memory model.

SECTIONS
{
	. = 0;

	DGROUP (NOLOAD) : {
	       *(*!*)
	       PROVIDE(___bss_start! = .);
	       PROVIDE(___bss_end! = .);
	       PROVIDE(___end! = .);
	}

	. = 0x100;

	_TEXT : {
		*(SORT_NONE(_START*))
		*(SORT_BY_ALIGNMENT(_TEXT*))
	}
	_DATA : {
		*(SORT_BY_ALIGNMENT(_DATA*))
	}

	PROVIDE(___filesize = .);

	_BSS : {
		PROVIDE(___bss_start = .);
		*(SORT_BY_ALIGNMENT(_BSS*))
		*(COMMON)
		PROVIDE(___bss_end = .);
	}

	PROVIDE(___end = .);
}
ENTRY(__start)