-
Notifications
You must be signed in to change notification settings - Fork 5
Description
I followed https://github.com/DC-DeepComputing/Framework/blob/main/FML13V03/DC-ROMA%20RISC-V%20AI%20PC%2C%20RISC-V%20Mainboard%20II%20NPU%20Memory%20Adjustment%20Instructions.md for the 32G RAM version (which I have) to expand the reserved memory. Here the dmesg output:
[ 0.000000] Linux version 6.6.92-eic7x-2025.07 (root@b67a2314ece6) (riscv64-unknown-linux-gnu-gcc () 13.2.0, GNU ld (GNU Binutils) 2.42) #2025.09.26.03.45+ SMP Fri Sep 26 03:53:01 UTC 2025
[ 0.000000] Machine model: DeepComputing FML13V03
[ 0.000000] SBI specification v1.0 detected
[ 0.000000] SBI implementation ID=0x1 Version=0x10003
[ 0.000000] SBI TIME extension detected
[ 0.000000] SBI IPI extension detected
[ 0.000000] SBI RFENCE extension detected
[ 0.000000] SBI SRST extension detected
[ 0.000000] earlycon: sbi0 at I/O port 0x0 (options '')
[ 0.000000] printk: bootconsole [sbi0] enabled
[ 0.000000] efi: EFI v2.10 by Das U-Boot
[ 0.000000] efi: RTPROP=0xe8cc8040 SMBIOS=0xe8cf5000 INITRD=0xe33bc040 MEMRESERVE=0xe33bb040
[ 0.000000] OF: reserved mem: OVERLAP DETECTED!
mmz_nid_0_part_0@1,c0000000 (0x00000001c0000000--0x0000000480000000) overlaps with g2d_8GB_boundary_reserved_4k (0x00000001fe000000--0x0000000200000000)
[ 0.000000] OF: reserved mem: OVERLAP DETECTED!
mmz_nid_1_part_0@21,40000000 (0x0000002140000000--0x0000002400000000) overlaps with d1_g2d_8GB_boundary_reserved_4k (0x00000021fe000000--0x0000002200000000)
[ 0.000000] Reserved memory: created CMA memory pool at 0x0000002120000000, size 512 MiB
[ 0.000000] OF: reserved mem: initialized node linux,cma, compatible id shared-dma-pool
[ 0.000000] OF: reserved mem: 0x0000002120000000..0x000000213fffffff (524288 KiB) map reusable linux,cma
[ 0.000000] OF: reserved mem: 0x0000000059000000..0x00000000593fffff (4096 KiB) nomap non-reusable sprammemory@59000000
[ 0.000000] OF: reserved mem: 0x0000000079000000..0x00000000793fffff (4096 KiB) nomap non-reusable sprammemory@79000000
[ 0.000000] OF: reserved mem: 0x0000000080000000..0x000000008007ffff (512 KiB) nomap non-reusable mmode_resv0@80000000
[ 0.000000] OF: reserved mem: 0x00000000dffe0000..0x00000000dfffffff (128 KiB) nomap non-reusable lpcpures@dffe0000
[ 0.000000] OF: reserved mem: 0x00000000e0000000..0x00000000e1ffffff (32768 KiB) nomap non-reusable region@e0000000
[ 0.000000] OF: reserved mem: 0x00000000fff00000..0x00000000ffffffff (1024 KiB) nomap non-reusable ramoops@fff00000
[ 0.000000] Reserved memory: created mmz_nid_0_part_0@1,c0000000 eswin reserve memory at 0x00000001c0000000, size 11264 MiB
[ 0.000000] OF: reserved mem: initialized node mmz_nid_0_part_0@1,c0000000, compatible id eswin-reserve-memory
[ 0.000000] OF: reserved mem: 0x00000001c0000000..0x000000047fffffff (11534336 KiB) nomap non-reusable mmz_nid_0_part_0@1,c0000000
[ 0.000000] OF: reserved mem: 0x00000001fe000000..0x00000001ffffffff (32768 KiB) nomap non-reusable g2d_8GB_boundary_reserved_4k
[ 0.000000] OF: reserved mem: 0x00000002fffff000..0x00000002ffffffff (4 KiB) nomap non-reusable g2d_12GB_boundary_reserved_4k
[ 0.000000] OF: reserved mem: 0x0000002040000000..0x00000020403fffff (4096 KiB) nomap non-reusable nid_1_zero_device_simu@2040000000
[ 0.000000] OF: reserved mem: 0x00000020e0000000..0x00000020e1ffffff (32768 KiB) nomap non-reusable region@20,e0000000
[ 0.000000] OF: reserved mem: 0x00000020fffff000..0x00000020ffffffff (4 KiB) nomap non-reusable d1_g2d_4GB_boundary_reserved_4k
[ 0.000000] Reserved memory: created mmz_nid_1_part_0@21,40000000 eswin reserve memory at 0x0000002140000000, size 11264 MiB
[ 0.000000] OF: reserved mem: initialized node mmz_nid_1_part_0@21,40000000, compatible id eswin-reserve-memory
[ 0.000000] OF: reserved mem: 0x0000002140000000..0x00000023ffffffff (11534336 KiB) nomap non-reusable mmz_nid_1_part_0@21,40000000
[ 0.000000] OF: reserved mem: 0x00000021fe000000..0x00000021ffffffff (32768 KiB) nomap non-reusable d1_g2d_8GB_boundary_reserved_4k
[ 0.000000] OF: NUMA: parsing numa-distance-map-v1
[ 0.000000] NUMA: NODE_DATA [mem 0x1bfffe1c0-0x1bfffffff]
[ 0.000000] NUMA: NODE_DATA [mem 0x211fcb71c0-0x211fcb8fff]
then I followed https://github.com/DC-DeepComputing/Framework/blob/main/FML13V03/DC-ROMA%20RISC-V%20AI%20PC%20Install%20AI%20Models(Deepseek-7B)%20Guide.md
The single-die model works. Unfortunately, the 2 die does NOT. It just hangs after I enter the question:
root@roma:/home/roma# /opt/eswin/sample-code/npu_sample/qwen_sample/bin/es_qwen2 /opt/eswin/sample-code/npu_sample/qwen_sample/src/deepseek_7b_1k_int8_peer/config.json
Loading models: [==================================================] 100.00% ( 70.834682 seconds )
----------------------------------------------------------------------------------
0: Role setting: 你是一个智能助理.
----------------------------------------------------------------------------------
1: 介绍一下大语言模型
2: The quantum computers
3: Humans and robots coexist
4: Customized prompts
----------------------------------------------------------------------------------
[YOU]: 4
[YOU]: Who are you?
On the serial console I get messages that point toward a bug in some driver blocking the system:
[ 1344.354910] rcu: INFO: rcu_sched detected stalls on CPUs/tasks:
[ 1344.360906] rcu: 0-...0: (5 ticks this GP) idle=f28c/1/0x4000000000000002 softirq=26168/26170 fqs=7031
[ 1344.370318] rcu: hardirqs softirqs csw/system
[ 1344.375895] rcu: number: 45651993 0 0
[ 1344.381473] rcu: cputime: 0 0 0 ==> 30028(ms)
[ 1344.388441] rcu: (detected by 6, t=15010 jiffies, g=39517, q=632 ncpus=8)
[ 1451.314391] INFO: task kworker/u21:2:332 blocked for more than 120 seconds.
[ 1451.321379] Not tainted 6.6.92-eic7x-2025.07 #2025.09.26.03.45+
[ 1451.327832] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[ 1451.335841] INFO: task login:834 blocked for more than 120 seconds.
[ 1451.342117] Not tainted 6.6.92-eic7x-2025.07 #2025.09.26.03.45+
[ 1451.348571] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
If I press C-c I get:
[YOU]: Who are you?
^C^Cterminate called after throwing an instance of 'std::system_error'
what(): Resource deadlock avoided
^C^C^C^C
and actually the system is deadlocked for good. I am just reading the documentation linked from another issue so basically I don't know what I am doing, but I would like to see the demo working before trying anything more complex.
Thanks!