目录

简易ELF文件解析器

目标

解析ELF格式的文件并且打印出section和segments的对应关系

背景

ELF 是Executable and Linking Format的缩写,即可执行和可链接的格式,是Unix/Linux系统ABI (Application Binary Interface)规范的一部分。
Unix/Linux下的可执行二进制文件、目标代码文件、共享库文件和core dump文件都属于ELF文件。

https://cdn.nlark.com/yuque/0/2022/png/368236/1663507368645-ac066d92-c419-44cd-b5d9-9010e83969f8.png#clientId=uaa546326-2dde-4&crop=0&crop=0&crop=1&crop=1&errorMessage=unknown%20error&from=paste&id=uaa8a3108&margin=%5Bobject%20Object%5D&originHeight=324&originWidth=516&originalType=url&ratio=1&rotation=0&showTitle=true&status=error&style=none&taskId=u65d78883-7f0b-4869-a6a1-d099b5acb14&title=ELF%E6%96%87%E4%BB%B6%E7%9A%84%E5%A4%A7%E8%87%B4%E5%B8%83%E5%B1%80
ELF文件的大致布局

ELFHeader

ELF header的定义可以在 /usr/include/elf.h 中找到,有两种不同的结构体:

  • Elf32_Ehdr是32位 ELF header的结构体
  • Elf64_Ehdr是64位ELF header的结构体。

这两者通过ident中的class标志位来决定
https://cdn.nlark.com/yuque/0/2022/png/368236/1663507489592-49703d8f-d94b-4e20-ab3f-8d144dec9374.png#clientId=uaa546326-2dde-4&crop=0&crop=0&crop=1&crop=1&errorMessage=unknown%20error&from=paste&id=u81172a12&margin=%5Bobject%20Object%5D&originHeight=328&originWidth=646&originalType=url&ratio=1&rotation=0&showTitle=false&status=error&style=none&taskId=u5d5fd72b-ca9d-49ea-8d24-6908085e036&title=

heade结构如下:
https://cdn.nlark.com/yuque/0/2022/png/368236/1663507524737-4a52cf67-d605-4dfe-916b-3ae72995f815.png#clientId=uaa546326-2dde-4&crop=0&crop=0&crop=1&crop=1&errorMessage=unknown%20error&from=paste&id=u073e11d9&margin=%5Bobject%20Object%5D&originHeight=542&originWidth=365&originalType=url&ratio=1&rotation=0&showTitle=false&status=error&style=none&taskId=u5855a3db-8f55-4f77-88a4-24d8c13239e&title=

Section

每个section都有一个section header描述它,但是一个section header可能在文件中没有对应的section,因为有的section是不占用文件空间的。每个section在文件中是连续的字节序列。section之间不会有重叠。
一个目标文件中可能有未覆盖到的空间,比如各种header和section都没有覆盖到。这部分字节的内容是未指定的,也是没有意义的。
结构体长这样:
https://cdn.nlark.com/yuque/0/2022/png/368236/1663507632984-96e968b1-5a2a-41e5-ac5d-5f663e9fbb5d.png#clientId=uaa546326-2dde-4&crop=0&crop=0&crop=1&crop=1&errorMessage=unknown%20error&from=paste&id=u8e8825b5&margin=%5Bobject%20Object%5D&originHeight=480&originWidth=434&originalType=url&ratio=1&rotation=0&showTitle=false&status=error&style=none&taskId=ub13f8c5a-ec51-41fc-9361-070ad78d86e&title=

programheader

和section差不多,通过elf.h中的定义我们可以找到结构体然后根据offset和长度进行读取

思路

其实实现思路很简单,根据结构体的大小进行字节的读取,然后转成常见的类型进行输出即可

实验

测试环境

Host:Windows10-WSL2
Distro:PengWin(Debian 11)
python解释器:3.8.2
pip包:不需要安装额外的第三方库

代码

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
# -*- coding: UTF-8 -*-

import binascii
from distutils.command.build import build
from io import TextIOWrapper
from dataclasses import dataclass
import sys
from typing import List

# mapping definition start
"""
这里的定义来自/usr/include/elf.h
"""
SH_TYPE_MAP_LIST = {
    0: 'SHT_NULL',
    1: 'SHT_PROGBITS',
    2: 'SHT_SYMTAB',
    3: 'SHT_STRTAB',
    4: 'SHT_RELA',
    5: 'SHT_HASH',
    6: 'SHT_DYNAMIC',
    7: 'SHT_NOTE',
    8: 'SHT_NOBITS',
    9: 'SHT_REL',
    10: 'SHT_SHLIB',
    11: 'SHT_DYNSYM',
    14: 'SHT_INIT_ARRAY',
    15: 'SHT_FINI_ARRAY',
    16: 'SHT_PREINIT_ARRAY',
    17: 'SHT_GROUP',
    18: 'SHT_SYMTAB_SHNDX',
    19: 'SHT_NUM',
    0x60000000: 'SHT_LOOS',
    0x6ffffff5: 'SHT_GNU_ATTRIBUTES',
    0x6ffffff6: 'SHT_GNU_HASH',
    0x6ffffff7: 'SHT_GNU_LIBLIST',
    0x6ffffff8: 'SHT_CHECKSUM',
    0x6ffffffa: 'SHT_SUNW_move',
    0x6ffffffb: 'SHT_SUNW_COMDAT',
    0x6ffffffc: 'SHT_SUNW_syminfo',
    0x6ffffffd: 'SHT_GNU_verdef',
    0x6ffffffe: 'SHT_GNU_verneed',
    0x6fffffff: 'SHT_GNU_versym',
    0x70000000: 'SHT_LOPROC',
    0x7fffffff: 'SHT_HIPROC',
    0x80000000: 'SHT_LOUSER',
    0x8fffffff: 'SHT_HIUSER'
}

PT_TYPE_MAP_LIST = {
    0: 'NULL',
    1: 'LOAD',
    2: 'DYNAMIC',
    3: 'INTERP',
    4: 'NOTE',
    5: 'SHLIB',
    6: 'PHDR',
    7: 'TLS',
    0x70000000: 'LOPROC',
    0x70000001: 'HIPROC',
    0x6474E551: 'GNU_STACK',
    0x6474E552: 'GNU_RELRO',
}
# mapping definition end


@dataclass
class ELFIdentification():
    file_identification: str
    ei_class: int
    ei_data: int
    ei_version: int
    ei_osabi: int
    ei_abiversion: int
    ei_pad: str
    ei_nident: int

    def __str__(self) -> str:
        builder = "\n"
        for k, v in self.__dict__.items():
            builder += f"\t{k}: {v}\n"
        return builder


@dataclass
class ELFSectionHeader():
    sh_name: int
    sh_type: int
    sh_flags: int
    sh_addr: int
    sh_offset: int
    sh_size: int
    sh_link: int
    sh_info: int
    sh_addralign: int
    sh_entsize: int
    section_name: str


@dataclass
class ELFProgramHeader():
    p_type: int
    p_offset: int
    p_vaddr: int
    p_paddr: int
    p_filesz: int
    p_memsz: int
    p_flags: int
    p_align: int


class ELFHeader():
    """
    解析ELF header
    """

    def __init__(self, fp: TextIOWrapper) -> None:
        fp.seek(0, 0)
        file_identification = fp.read(4)
        ei_class = int.from_bytes(fp.read(1), 'little')
        ei_data = int.from_bytes(fp.read(1), 'little')
        ei_version = int.from_bytes(fp.read(1), 'little')
        ei_osabi = int.from_bytes(fp.read(1), 'little')
        ei_abiversion = int.from_bytes(fp.read(1), 'little')
        ei_pad = binascii.b2a_hex(fp.read(6))
        ei_nident = int.from_bytes(fp.read(1), 'little')
        self.elf_ident = ELFIdentification(file_identification, ei_class,
                                           ei_data, ei_version,
                                           ei_osabi, ei_abiversion,
                                           ei_pad.decode('ascii'), ei_nident)
        # 64位和32位部分取值不一样长
        if self.is_64bit():
            self.e_type = int.from_bytes(fp.read(2), 'little')
            self.e_machine = int.from_bytes(fp.read(2), 'little')
            self.e_version = int.from_bytes(fp.read(4), 'little')
            self.e_entry = int.from_bytes(fp.read(8), 'little')
            self.e_phoff = int.from_bytes(fp.read(8), 'little')
            self.e_shoff = int.from_bytes(fp.read(8), 'little')
            self.e_flags = int.from_bytes(fp.read(4), 'little')
            self.e_ehsize = int.from_bytes(fp.read(2), 'little')
            self.e_phentsize = int.from_bytes(fp.read(2), 'little')
            self.e_phnum = int.from_bytes(fp.read(2), 'little')
            self.e_shentsize = int.from_bytes(fp.read(2), 'little')
            self.e_shnum = int.from_bytes(fp.read(2), 'little')
            self.e_shstrndx = int.from_bytes(fp.read(2), 'little')
        elif self.is_32bit():
            self.e_type = int.from_bytes(fp.read(2), 'little')
            self.e_machine = int.from_bytes(fp.read(2), 'little')
            self.e_version = int.from_bytes(fp.read(4), 'little')
            self.e_entry = int.from_bytes(fp.read(4), 'little')
            self.e_phoff = int.from_bytes(fp.read(4), 'little')
            self.e_shoff = int.from_bytes(fp.read(4), 'little')
            self.e_flags = int.from_bytes(fp.read(4), 'little')
            self.e_ehsize = int.from_bytes(fp.read(2), 'little')
            self.e_phentsize = int.from_bytes(fp.read(2), 'little')
            self.e_phnum = int.from_bytes(fp.read(2), 'little')
            self.e_shentsize = int.from_bytes(fp.read(2), 'little')
            self.e_shnum = int.from_bytes(fp.read(2), 'little')
            self.e_shstrndx = int.from_bytes(fp.read(2), 'little')

    def is_32bit(self) -> bool:
        return self.elf_ident.ei_class == 1

    def is_64bit(self) -> bool:
        return self.elf_ident.ei_class == 2

    def __str__(self):
        builder = "\nHeader Info:\n"
        for k, v in self.__dict__.items():
            builder += f"{k}: {v}\n"
        return builder


class ELFSectionHeaderTable():
    """
    解析header table
    """

    def __init__(self, fp: TextIOWrapper, header: ELFHeader) -> None:
        self.section: List[ELFSectionHeader] = []
        for i in range(header.e_shnum):
            self.section.append(
                ELFSectionHeaderTable._parse_section_header(
                    fp, header.e_shoff + i * header.e_shentsize,
                    header.is_64bit()))
        if header.e_shnum == 0:
            return
        
        fp.seek(self.section[header.e_shstrndx].sh_offset)
        size = self.section[header.e_shstrndx].sh_size
        self.sectionNameTable = fp.read(size)

        for i in range(header.e_shnum):
            idx = self.section[i].sh_name
            name = []
            while self.sectionNameTable[idx] != 0:
                name.append(chr(self.sectionNameTable[idx]))
                idx += 1
            self.section[i].section_name = "".join(name)

    def _getSectionName(self, elf32_Shdr: ELFSectionHeader) -> str:
        idx = self.sectionNameTable.find(0, elf32_Shdr.sh_name)
        return self.sectionNameTable[elf32_Shdr.sh_name:idx].decode()

    def __str__(self) -> str:
        builder = "\nSection Header Table:\n  #      %-32s%-16s%-16s%-16s%-8s%-8s%-8s%-8s%-8s%-8s \n" % (
            'Name', 'Type', 'Addr', 'Offset', 'Size', 'ES', 'Flg', 'Lk', 'Inf',
            'Al')
        for index, elf32_Shdr in enumerate(self.section):
            if elf32_Shdr.sh_type in SH_TYPE_MAP_LIST:
                builder += '  [%4d] %-32s%-16s%-16s%-16s%-8s%-8d%-8d%-8d%-8d%-8d\n' % \
                (index,
                 self._getSectionName(elf32_Shdr),
                 SH_TYPE_MAP_LIST[elf32_Shdr.sh_type].strip(),
                 hex(elf32_Shdr.sh_addr),
                 hex(elf32_Shdr.sh_offset),
                 hex(elf32_Shdr.sh_size),
                 elf32_Shdr.sh_entsize,
                 elf32_Shdr.sh_flags,
                 elf32_Shdr.sh_link,
                 elf32_Shdr.sh_info,
                 elf32_Shdr.sh_addralign,
                 )
            else:
                raise ValueError(f"Unknown type {elf32_Shdr.sh_type}")
        return builder

    @staticmethod
    def _parse_section_header(fp: TextIOWrapper, offset: int,
                              is_64: bool) -> ELFSectionHeader:
        fp.seek(offset, 0)
        if is_64:
            sh_name = int.from_bytes(fp.read(4), 'little')
            sh_type = int.from_bytes(fp.read(4), 'little')
            sh_flags = int.from_bytes(fp.read(8), 'little')
            sh_addr = int.from_bytes(fp.read(8), 'little')
            sh_offset = int.from_bytes(fp.read(8), 'little')
            sh_size = int.from_bytes(fp.read(8), 'little')
            sh_link = int.from_bytes(fp.read(4), 'little')
            sh_info = int.from_bytes(fp.read(4), 'little')
            sh_addralign = int.from_bytes(fp.read(8), 'little')
            sh_entsize = int.from_bytes(fp.read(8), 'little')
        else:
            sh_name = int.from_bytes(fp.read(4), 'little')
            sh_type = int.from_bytes(fp.read(4), 'little')
            sh_flags = int.from_bytes(fp.read(4), 'little')
            sh_addr = int.from_bytes(fp.read(4), 'little')
            sh_offset = int.from_bytes(fp.read(4), 'little')
            sh_size = int.from_bytes(fp.read(4), 'little')
            sh_link = int.from_bytes(fp.read(4), 'little')
            sh_info = int.from_bytes(fp.read(4), 'little')
            sh_addralign = int.from_bytes(fp.read(4), 'little')
            sh_entsize = int.from_bytes(fp.read(8), 'little')
        return ELFSectionHeader(sh_name, sh_type, sh_flags, sh_addr, sh_offset,
                                sh_size, sh_link, sh_info, sh_addralign,
                                sh_entsize, "")


class ELFProgramHeaderTable():
    """
    解析program heder
    """
    def __init__(self, fp: TextIOWrapper, header: ELFHeader) -> None:
        self.programHeaderTable: List[ELFProgramHeader] = []
        for i in range(header.e_phnum):
            self.programHeaderTable.append(
                self._parseProgramHeader(
                    fp, header.e_phoff + i * header.e_phentsize,
                    header.is_64bit()))

    @staticmethod
    def _parseProgramHeader(fp: TextIOWrapper, offset: int, is_64: bool):
        fp.seek(offset, 0)
        if is_64:
            p_type = int.from_bytes(fp.read(4), 'little')
            p_offset = int.from_bytes(fp.read(4), 'little')
            p_vaddr = int.from_bytes(fp.read(8), 'little')
            p_paddr = int.from_bytes(fp.read(8), 'little')
            p_filesz = int.from_bytes(fp.read(8), 'little')
            p_memsz = int.from_bytes(fp.read(8), 'little')
            p_flags = int.from_bytes(fp.read(8), 'little')
            p_align = int.from_bytes(fp.read(8), 'little')
        else:
            p_type = int.from_bytes(fp.read(4), 'little')
            p_offset = int.from_bytes(fp.read(4), 'little')
            p_vaddr = int.from_bytes(fp.read(4), 'little')
            p_paddr = int.from_bytes(fp.read(4), 'little')
            p_filesz = int.from_bytes(fp.read(4), 'little')
            p_memsz = int.from_bytes(fp.read(4), 'little')
            p_flags = int.from_bytes(fp.read(4), 'little')
            p_align = int.from_bytes(fp.read(4), 'little')
        return ELFProgramHeader(p_type, p_offset, p_vaddr, p_paddr, p_filesz,
                                p_memsz, p_flags, p_align)

    def __str__(self) -> str:
        builder = "\nProgram Header Table:\n  #      %-16s%-16s%-16s%-16s%-10s%-10s%-9s%-8s \n" % (
            'Type', 'offset', 'VirtAddr', 'PhysAddr', 'FileSiz', 'MemSiz',
            'Flg', 'Align')
        for index, elf32_Phdr in enumerate(self.programHeaderTable):
            if elf32_Phdr.p_type in PT_TYPE_MAP_LIST:
                builder += '  [%4d] %-16s%-16s%-16s%-16s%-8s%-8s%-8d%-8s\n' % (
                    index,
                    PT_TYPE_MAP_LIST[elf32_Phdr.p_type],
                    hex(elf32_Phdr.p_offset),
                    hex(elf32_Phdr.p_vaddr),
                    hex(elf32_Phdr.p_paddr),
                    hex(elf32_Phdr.p_filesz),
                    hex(elf32_Phdr.p_memsz),
                    elf32_Phdr.p_flags,
                    hex(elf32_Phdr.p_align),
                )
        return builder


class ELF():
    """
    解析ELF文件的一部分
    """

    def __init__(self, path) -> None:
        with open(path, 'rb') as fp:
            self.header = ELFHeader(fp)
            self.section = ELFSectionHeaderTable(fp, self.header)
            self.program = ELFProgramHeaderTable(fp, self.header)

    def _getSegmentSections(self, elf32_Phdr: ELFProgramHeader):
        start = elf32_Phdr.p_offset
        end = elf32_Phdr.p_offset + elf32_Phdr.p_filesz

        sections = []
        for elf32_Shdr in self.section.section:
            section_start = elf32_Shdr.sh_offset
            section_end = elf32_Shdr.sh_offset + elf32_Shdr.sh_size
            if section_start >= start and section_end <= end:
                sections.append(elf32_Shdr)

        return sections

    def __str__(self) -> str:
        builder = f"{self.header}\n{self.section}\n{self.program}\nSection to segment mapping:"
        for index, elf32_Phdr in enumerate(self.program.programHeaderTable):
            sections: List[ELFSectionHeader] = self._getSegmentSections(
                elf32_Phdr)

            sections_str = set()
            for elf32_Shdr in sections:
                idx = self.section.sectionNameTable.index(
                    0, elf32_Shdr.sh_name)
                name: bytes = self.section.sectionNameTable[elf32_Shdr.
                                                            sh_name:idx]
                sections_str.add(name.decode())
            builder += ' SEG[%d] %s\n' % (index, " ".join(sections_str))
        return builder


if __name__ == '__main__':
    print(ELF(sys.argv[1]))

运行

1
python ./new.py /bin/bash

结果

https://cdn.nlark.com/yuque/0/2022/png/368236/1663507842498-fbe2ef8c-2743-4e61-9935-88b4d73b16b5.png#clientId=uaa546326-2dde-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=474&id=ud6cb4a60&margin=%5Bobject%20Object%5D&name=image.png&originHeight=474&originWidth=799&originalType=binary&ratio=1&rotation=0&showTitle=false&size=30721&status=done&style=none&taskId=u9b3edd04-ca05-47db-a688-5dd398125ac&title=&width=799
https://cdn.nlark.com/yuque/0/2022/png/368236/1663507850841-fac5ca65-0a1f-4788-b1ce-830350d41cbc.png#clientId=uaa546326-2dde-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=602&id=u6b533c82&margin=%5Bobject%20Object%5D&name=image.png&originHeight=602&originWidth=1208&originalType=binary&ratio=1&rotation=0&showTitle=false&size=98635&status=done&style=none&taskId=uae106160-59a9-4395-9cca-e9fbf275417&title=&width=1208
https://cdn.nlark.com/yuque/0/2022/png/368236/1663507862125-cb51a431-ddfa-49c9-81d4-501bb5fd1407.png#clientId=uaa546326-2dde-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=530&id=u6e91e2ed&margin=%5Bobject%20Object%5D&name=image.png&originHeight=530&originWidth=1904&originalType=binary&ratio=1&rotation=0&showTitle=false&size=86573&status=done&style=none&taskId=u2794a5d2-2e4e-4762-bf26-10614d0dcd2&title=&width=1904

对比readelf

https://cdn.nlark.com/yuque/0/2022/png/368236/1663507891876-db6cdc78-9060-41ad-807b-d235474684fb.png#clientId=uaa546326-2dde-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=894&id=u44798f72&margin=%5Bobject%20Object%5D&name=image.png&originHeight=894&originWidth=1141&originalType=binary&ratio=1&rotation=0&showTitle=false&size=705477&status=done&style=none&taskId=uc04a79c3-51ed-4db6-adac-4f833545db1&title=&width=1141

思考

可以观察到readelf的mapping输出比我的少一些,可能是做了一些过滤

参考文献:

ELF文件解析(一):Segment和Section - JollyWing - 博客园