天天看點

Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料

文章目錄

  • 前言
  • 一、實體記憶體映射
    • 1.1 x86_64虛拟位址空間簡介
    • 1.2 kernel text mapping
    • 1.3 direct mapping of all phys memory
  • 二、__pa(x)函數和__va(x)函數
    • 2.1 direct mapping
    • 2.2 kernel text mapping
  • 三、API示範
  • 參考資料

前言

實驗平台:

intel x86_64

centos 7:3.10.0

一、實體記憶體映射

1.1 x86_64虛拟位址空間簡介

// linux-3.10.1/Documentation/x86/x86_64/mm.txt

<previous description obsolete, deleted>

Virtual memory map with 4 level page tables:

0000000000000000 - 00007fffffffffff (=47 bits) user space, different per mm
hole caused by [48:63] sign extension
ffff800000000000 - ffff80ffffffffff (=40 bits) guard hole
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
ffffc80000000000 - ffffc8ffffffffff (=40 bits) hole
ffffc90000000000 - ffffe8ffffffffff (=45 bits) vmalloc/ioremap space
ffffe90000000000 - ffffe9ffffffffff (=40 bits) hole
ffffea0000000000 - ffffeaffffffffff (=40 bits) virtual memory map (1TB)
... unused hole ...
ffffffff80000000 - ffffffffa0000000 (=512 MB)  kernel text mapping, from phys 0
ffffffffa0000000 - ffffffffff5fffff (=1525 MB) module mapping space
ffffffffff600000 - ffffffffffdfffff (=8 MB) vsyscalls
ffffffffffe00000 - ffffffffffffffff (=2 MB) unused hole

The direct mapping covers all memory in the system up to the highest
memory address (this means in some cases it can also include PCI memory
holes).

vmalloc space is lazily synchronized into the different PML4 pages of
the processes using the page fault handler, with init_level4_pgt as
reference.

Current X86-64 implementations only support 40 bits of address space,
but we support up to 46 bits. This expands into MBZ space in the page tables.
           

x86_64虛拟位址空間布局如下(x86_64實體記憶體空間zone區域劃分沒有高端記憶體區域):

Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料

主要是分析其中的:

這兩塊核心虛拟位址空間都是直接映射區,位址連續,和實體位址空間是簡單的線性映射關系。雖然這兩塊核心虛拟位址空間是直接映射區,但還是會建立頁表。

......
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
......
ffffffff80000000 - ffffffffa0000000 (=512 MB)  kernel text mapping, from phys 0
......
           

(32位系統下這兩個區域是合并在一起的位址空間連續的直接映射區)

1.2 kernel text mapping

Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料

從 __START_KERNEL_map(0xffffffff80000000)開始的 512M 用于存放核心代碼段、全局變量、BSS 等。這裡對應到實體記憶體開始的位置,減去 __START_KERNEL_map 就能得到實體記憶體的位址。

對于目前我的centos7.6 ,3.10.0,核心通常安裝在實體位址0x1000000處,從第16MB開始,是以核心代碼段的起始位址是:0xffffffff81000000。

因為核心代碼段_text的起始虛拟位址:0xffffffff81000000

Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料
// linux-3.10.1/arch/x86/include/asm/page_64_types.h

#define __PHYSICAL_START	((CONFIG_PHYSICAL_START +	 	\
				  (CONFIG_PHYSICAL_ALIGN - 1)) &	\
				 ~(CONFIG_PHYSICAL_ALIGN - 1))

#define __START_KERNEL		(__START_KERNEL_map + __PHYSICAL_START)

#define __START_KERNEL_map	_AC(0xffffffff80000000, UL)
           
// linux-3.10.1/Documentation/x86/x86_64/mm.txt

ffffffff80000000 - ffffffffa0000000 (=512 MB)  kernel text mapping, from phys 0
           

__START_KERNEL_map是核心映射的起始虛拟位址,從實體位址phys_base映射到虛拟位址__START_KERNEL_map。

實體位址phys_base:64bit下為了支援KASLR(kernel address space layout ramdomization)核心映像在實體記憶體中是一個随機位址phys_base,centos7.6 預設沒有開啟KASLR,實體位址phys_base預設是0 。

32位系統下也都是預設從實體位址0開始建立映射關系。

(3.10.0沒有加入KASLR,請參考:https://www.phoronix.com/news/KASLR-Default-Linux-4.12)

// linux-3.10.1/arch/x86/kernel/head_64.S

	.text
	__HEAD
	.code64
	.globl startup_64
startup_64:
	......
	/*
	 * Compute the delta between the address I am compiled to run at and the
	 * address I am actually running at.
	 */
	leaq	_text(%rip), %rbp
	subq	$_text - __START_KERNEL_map, %rbp
	......
		/* Fixup phys_base */
	addq	%rbp, phys_base(%rip)

ENTRY(phys_base)
	/* This must match the first entry in level2_kernel_pgt */
	.quad   0x0000000000000000
           

KASLR:将kernel随機的加載到不同的實體位址運作,核心在自引導及decompressed後,會通過判斷kaslr指令行參數是否enable來确定是否對加載核心的實體位址和核心運作的虛拟位址進行随機化操作。

__PHYSICAL_START宏是核心代碼段在實體記憶體中的起始位址,即:0x1000000。

__START_KERNEL宏是是核心代碼段映射的起始虛拟位址,即:_text核心虛拟位址0xffffffff81000000。

_text 的實體位址(__PHYSICAL_START宏) = 0xffffffff81000000 - 0xffffffff80000000 = 0x1000000

核心代碼段的起始實體位址_text就是 0x1000000。

也可以通過看核心的配置的檔案檢視核心代碼段的起始實體位址_text:

vim /boot/config-3.10.0-693.el7.x86_64
           
Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料

通過 /proc/iomem 檢視系統的實體位址空間,可以看到核心代碼段的起始實體位址是0x1000000,與上述相等。

Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料
// linux-3.10.1/arch/x86/kernel/vmlinux.lds.S

#define LOAD_OFFSET __START_KERNEL_map

OUTPUT_ARCH(i386:x86-64)
ENTRY(phys_startup_64)
jiffies_64 = jiffies;

SECTIONS
{
	......
    . = __START_KERNEL;
    phys_startup_64 = startup_64 - LOAD_OFFSET;
    ......
}
           

phys_startup_64 是核心代碼段的實體起始位址, startup_64是核心代碼段的虛拟起始位址(和_text是同一個位址,核心代碼段的起始函數就是startup_64)。

// linux-3.10.1/arch/x86/kernel/head_64.S

	.text
	__HEAD
	.code64
	.globl startup_64
startup_64:
	/*
	 * At this point the CPU runs in 64bit mode CS.L = 1 CS.D = 0,
	 * and someone has loaded an identity mapped page table
	 * for us.  These identity mapped page tables map all of the
	 * kernel pages and possibly all of memory.
	 *
	 * %rsi holds a physical pointer to real_mode_data.
	 *
	 * We come here either directly from a 64bit bootloader, or from
	 * arch/x86_64/boot/compressed/head.S.
	 *
	 * We only come here initially at boot nothing else comes here.
	 *
	 * Since we may be loaded at an address different from what we were
	 * compiled to run at we first fixup the physical addresses in our page
	 * tables and then reload them.
	 */

	/*
	 * Compute the delta between the address I am compiled to run at and the
	 * address I am actually running at.
	 */
	leaq	_text(%rip), %rbp
	subq	$_text - __START_KERNEL_map, %rbp
	......
           

是以對于核心代碼段:核心代碼段實體位址 + __START_KERNEL_map = 核心代碼段虛拟位址。

1.3 direct mapping of all phys memory

Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料

從 0xffff800000000000 開始就是核心的部分,開始有 8T 的空檔區域。從 __PAGE_OFFSET_BASE(0xffff880000000000) 開始的 64T 的虛拟位址空間是直接映射區域,也就是減去 PAGE_OFFSET 就是實體位址。

這塊區域把所有實體記憶體線性映射到PAGE_OFFSET虛拟位址。PAGE_OFFSET的值可能是固定的0xffff880000000000,或者KASLR使能後的随機位址page_offset_base。

// linux-3.10.1/arch/x86/include/asm/page_64_types.h

#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)
           

比如:

Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料

packet套接字的 struct sock 結構體的核心虛拟位址是 0xffff88025ff24800。

// linux-3.10.1/Documentation/x86/x86_64/mm.txt

ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
           

packet套接字的 struct sock的實體位址 = 0xffff88025ff24800 - 0xffff880000000000 = 0x25ff24800

packet套接字的 struct sock的實體位址就在System RAM:100000000-26dffffff 範圍中。

Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料

小結:/dev/mem是實體位址空間,而作業系統操作的任何記憶體都基于虛拟位址。

(1)x86_64可以直接映射64T的實體記憶體(direct mapping of all phys. memory),足以一一映射目前常見的任意實體記憶體。

(2)Linux核心對所有實體記憶體建立一一映射(kernel text mapping)。實體位址和虛拟位址之間固定偏移。

二、__pa(x)函數和__va(x)函數

對于上述的兩塊記憶體映射區:

......
ffff880000000000 - ffffc7ffffffffff (=64 TB) direct mapping of all phys. memory
......
ffffffff80000000 - ffffffffa0000000 (=512 MB)  kernel text mapping, from phys 0
......
           

這兩塊映射區域之間的核心虛拟位址轉換為實體位址可以直接借助于__pa(x)函數,無需通過頁表轉換獲得。

direct mapping之間的實體位址轉換為核心虛拟位址可以直接借助于__va(x)函數。

// linux-3.10.1/arch/x86/include/asm/page.h
#define __pa(x)		__phys_addr((unsigned long)(x))

// linux-3.10.1/arch/x86/include/asm/page_64.h
#define __phys_addr(x)		__phys_addr_nodebug(x)

extern unsigned long phys_base;

static inline unsigned long __phys_addr_nodebug(unsigned long x)
{
	unsigned long y = x - __START_KERNEL_map;

	/* use the carry flag to determine if x was < __START_KERNEL_map */
	x = y + ((x > y) ? phys_base : (__START_KERNEL_map - PAGE_OFFSET));

	return x;
}
           

2.1 direct mapping

// linux-3.10.1/arch/x86/include/asm/page_types.h
#define PAGE_OFFSET		((unsigned long)__PAGE_OFFSET)

// linux-3.10.1/arch/x86/include/asm/page_64_types.h
#define __PAGE_OFFSET           _AC(0xffff880000000000, UL)

__pa(x) = x -  PAGE_OFFSET
           

相對應有個__va(x)函數:

2.2 kernel text mapping

// linux-3.10.1/arch/x86/include/asm/page_64_types.h
#define __START_KERNEL_map	_AC(0xffffffff80000000, UL)

__pa(x) = x - __START_KERNEL_map + phys_base
           

如果沒有開啟KASLR

__pa(x) = x - __START_KERNEL_map
           

上述兩個函數隻适用direct mapping和kernel text mapping。其他區域的核心虛拟位址空間不能适用這兩個函數進行轉換。

三、API示範

#include <linux/module.h>
#include <linux/kernel.h>
#include <asm/page.h>
#include <asm/pgtable_types.h>


static int __init pa_va_init(void)
{
    unsigned long kernel_phys_address;
    unsigned long direct_phys_address;
    
    kernel_phys_address = __pa(0xffffffff81000000);
    printk(" kernel_text_start_phys_address = 0x%lx\n", kernel_phys_address);

    direct_phys_address = __pa(0xffff88006cadc000 );
    printk(" direct_phys_address = 0x%lx\n", direct_phys_address);

	return -1;
}

static void __exit pa_va_exit(void)
{
}

module_init(pa_va_init);
module_exit(pa_va_exit);
MODULE_LICENSE("GPL");

           
Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料
Linux 實體記憶體映射前言一、實體記憶體映射二、__pa(x)函數和__va(x)函數三、API示範參考資料

參考資料

Linux 3.10.1

極客時間:趣談Linux作業系統

https://blog.csdn.net/pwl999/article/details/112055498

https://zhuanlan.zhihu.com/p/99557658

https://fanlv.wiki/2021/07/25/linux-mem/

https://blog.csdn.net/richardysteven/article/details/52629731

https://blog.csdn.net/dog250/article/details/102745181

https://mp.weixin.qq.com/s/TJ8ttDAZfZeUK-fSfRsJ8g

繼續閱讀