Quantcast
Channel: Raspberry Pi Forums
Viewing all articles
Browse latest Browse all 4182

Graphics programming • Freeze/memory corruption after 35-40 minutes of using camera+isp+encoder

$
0
0
I have a very experimental C project that turns Raspberry Pi into a USB webcam. It is very bespoke, i.e. no libcamera, no pre-made uvc-gadget. Generally it works rather well, but has a very serious problem: it can't run longer than about 40 minutes.

Roughly about 36 minutes the board can randomly completely freeze to the point that power needs to be cut to reset it.

Most of the time it just freezes. Cursor stops blinking on HDMI console output (but there's still output), no UART messages, no kernel panics, no network connectivity, nothing.

But roughly in one in 3-5 cases there's some other reaction, for example:
  • One-two line kernel panic excerpt on UART, e.g.

    Code:

    [ 2576.525916] 8<--- cut here ---[ 2576.529057] Unable to handle kernel paging request at virtual address 9ee34000 when write
  • Semi-freezed state: HDMI console has clearly garbage column of chars, but cursor is blinking, UART echoes chars back until <enter> is pressed. Board can be pinged, but no ports seem to be open. No panic indication anywhere.
  • Upon freeze HDMI output slowly (within 5-10 seconds) filled the screen with green-ish patterned lines, line by line from top to bottom. After a few minutes screen quickly got filled by blue pattern.
So it suggests that there is an ongoing memory corruption, and that while pipeline is active it slowly progresses from somewhere, until 36-ish minutes later it starts to overwrite sensitive kernel memory, upon which the board just crashes in peculiar ways.
Screen getting filled with weird colors after freeze suggests that this not an ARM issue, but something in VC/ISP/encoder/...

The setup:
  • The board is Pi Zero 2 W. There are two different physical boards, they behave the same.
  • The camera board is HQ camera, 1332x990 (cropped to x976) @ 120fps (gets around 75 really). Two different physical cameras, same results.
  • Three different SD cards with different OS installs. All behave the same.
  • Tried latest bookworm 6.6.20 kernel, and previous 6.1 bullseye. It's roughly the same: 6.1 tends to freeze around 36:32+/-2sec, 6.6 -- around 42:00+/-30sec. But there are only a handful datapoints so far.
  • Both 32 bit and 64 bit versions behave the same.
  • The pipeline is as follows: it opens /dev/video0 sensor, feeds its frames into /dev/video13+14 ISP to debayer, then feeds yuv frames into /dev/video31, and then feeds that into a uvc-gadget v4l2 device. It's rather straightforward, it doesn't do anything clever, just pumping DMABUF v4l2 buffers from one place to another.
  • The process runs unprivileged as a regular user.
Observations so far:
  • CPU temperature remains reasonable, about 50-52C, doesn't grow.
  • CPU usage is below 25% of a single core: 3% itself, 4x5% kernel uvcgadget threads.
  • Memory usage doesn't grow.
  • Various vcgencmd metrics also remain stable, although I don't really understand most of them.
  • Disabling uvc-gadget part and just discarding frames doesn't seem to improve situation, it still freezes.
  • 32-bit 6.6.20 bookworm kernel with Global Shutter camera but without uvc-gadget could survive for almost 4 hours once.
  • Custom-built 6.6.25 kernel with HQ camera and UVC gadget could survive for almost 60 minutes twice.
What I haven't tried yet:
  • Restarting camera or various components and checking whether any of it resets the "timer".
  • Isolating V4L2 devices, trying to pinpoint which one, or which combination is the trigger.
  • Running KASAN kernel.
  • Using libcamera -- unfortunately, on Pi Zero 2 W libcamera will peg all 4 cores to 100%, resulting in a very corrupted UVC gadget experience that I can't trust.
Has anyone else seen anything like this?

Are there any other GPU/VC metrics/logs I can monitor/collect while running doing test runs to get an idea of what's going on?

How do I proceed from here? Is there anything else I could do to investigate?

Statistics: Posted by provod — Sun Apr 14, 2024 4:41 am



Viewing all articles
Browse latest Browse all 4182

Trending Articles