Investigating a Firefox crash

Recently, I noticed that Firefox crashes when I visit specific websites. I found that visiting homedepot.com, and clicking on a link would reliably reproduced the crash. It would occasionally, and seemingly randomly happen on other websites as well. Having looked at the Firefox code before, I decided to figure out what was happening.

The first thing I did was get the Firefox source and build it, to make sure this bug hadn't been resolved already. In spite of Firefox being such a complex piece of software, building it on Arch Linux is pretty straightforward. You download a Python script which will setup and download the mozilla-unified Mercurial repository. Mozilla has a tool called Mach, which is used as an interface for building the project. One ./mach build call, and about an hour later, I had a copy of Firefox built from source. The crash was indeed still present, so it was time to investigate.

I decided to attach to the process with gdb to see what was going on. The crash seems to be in the CanvasRenderer:

Thread 51 "CanvasRenderer" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x70bb218006c0 (LWP 9495)]
0x000070bb1f9aa33d in ?? () from /usr/lib/libnvidia-eglcore.so.555.58.02

I got the following corresponding backtrace:

Backtrace
#0  0x000077667a1aa33d in ?? () from /usr/lib/libnvidia-eglcore.so.555.58.02
#1  0x000077667a12eb94 in ?? () from /usr/lib/libnvidia-eglcore.so.555.58.02
#2  0x000077668837ea56 in mozilla::gl::GLContext::InitImpl (this=this@entry=0x77667ef8b400) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContext.cpp:908
#3  0x000077668838a3d7 in mozilla::gl::GLContext::Init (this=0x77667ef8b400) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContext.cpp:333
#4  mozilla::gl::GLContextEGL::Init (this=this@entry=0x77667ef8b400) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContextProviderEGL.cpp:409
#5  0x000077668838909a in mozilla::gl::GLContextEGL::CreateGLContext (egl=std::shared_ptr (empty) = {...}, desc=..., surfaceConfig=surfaceConfig@entry=0x0, surface=surface@entry=0x0, useGles=, contextConfig=0x0, out_failureId=0x77667b9ff358) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContextProviderEGL.cpp:765
#6  0x000077668838b8e1 in mozilla::gl::GLContextEGL::CreateWithoutSurface(std::shared_ptr, mozilla::gl::GLContextCreateDesc const&, nsTSubstring*)::$_0::operator()(bool) const (this=this@entry=0x77667b9ff2d0, useGles=false) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContextProviderEGL.cpp:1148
#7  0x000077668838c1c6 in mozilla::gl::GLContextEGL::CreateWithoutSurface (egl=std::shared_ptr (use count 10, weak count 2) = {...}, desc=..., out_failureId=0x77667b9ff358) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContextProviderEGL.cpp:1216
#8  mozilla::gl::GLContextProviderEGL::CreateHeadless (desc=..., out_failureId=0x77667b9ff358) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContextProviderEGL.cpp:1249
#9  0x0000776689953d9d in mozilla::WebGLContext::CreateAndInitGL(bool, std::vector >*)::$_1::operator()(already_AddRefed (*)(mozilla::gl::GLContextCreateDesc const&, nsTSubstring*), char const*) const (this=, pfnCreate=, info=) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLContext.cpp:389
#10 mozilla::WebGLContext::CreateAndInitGL(bool, std::vector >*)::$_0::operator()() const (this=) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLContext.cpp:402
#11 mozilla::WebGLContext::CreateAndInitGL (this=this@entry=0x776668f1e800, forceEnabled=, out_failReasons=out_failReasons@entry=0x77667b9ff420) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLContext.cpp:396
#12 0x0000776689954592 in mozilla::WebGLContext::Create(mozilla::HostWebGLContext*, mozilla::webgl::InitContextDesc const&, mozilla::webgl::InitContextResult*)::$_0::operator()[abi:cxx11]() const (this=) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLContext.cpp:554
#13 mozilla::WebGLContext::Create (host=0x77665dfb4000, desc=..., out=0x77667b9ff5a0) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLContext.cpp:528
#14 0x0000776689909e34 in mozilla::HostWebGLContext::Create (ownerData=..., desc=..., out=0x77667b9ff5a0) at /home/nihal/mozilla/mozilla-unified/dom/canvas/HostWebGLContext.cpp:58
#15 0x000077668998828a in mozilla::dom::WebGLParent::RecvInitialize (this=0x776671f2dac0, desc=..., out=0x0) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLParent.cpp:18
#16 0x00007766899f9ae4 in mozilla::dom::PWebGLParent::OnMessageReceived (this=0x776671f2dac0, msg__=..., reply__=...) at /home/nihal/mozilla/mozilla-unified/obj-x86_64-pc-linux-gnu/ipc/ipdl/PWebGLParent.cpp:503
#17 0x000077668877b267 in mozilla::gfx::PCanvasManagerParent::OnMessageReceived (this=, msg__=..., reply__=...) at /home/nihal/mozilla/mozilla-unified/obj-x86_64-pc-linux-gnu/ipc/ipdl/PCanvasManagerParent.cpp:470
#18 0x0000776687eae95a in mozilla::ipc::MessageChannel::DispatchSyncMessage (this=this@entry=0x776663341858, aProxy=, aProxy@entry=0x7766652f64c0, aMsg=..., aReply=...) at /home/nihal/mozilla/mozilla-unified/ipc/glue/MessageChannel.cpp:1787
#19 0x0000776687eadc09 in mozilla::ipc::MessageChannel::DispatchMessage (this=this@entry=0x776663341858, aProxy=aProxy@entry=0x7766652f64c0, aMsg=...) at /home/nihal/mozilla/mozilla-unified/ipc/glue/MessageChannel.cpp:1737
#20 0x0000776687eadfa8 in mozilla::ipc::MessageChannel::RunMessage (this=0x776663341858, aProxy=0x7766652f64c0, aTask=...) at /home/nihal/mozilla/mozilla-unified/ipc/glue/MessageChannel.cpp:1530
#21 0x0000776687eae604 in mozilla::ipc::MessageChannel::MessageTask::Run (this=0x776653a3ee00) at /home/nihal/mozilla/mozilla-unified/ipc/glue/MessageChannel.cpp:1630
#22 0x0000776687683534 in nsThread::ProcessNextEvent (this=0x776696651980, aMayWait=, aResult=0x77667b9ffabf) at /home/nihal/mozilla/mozilla-unified/xpcom/threads/nsThread.cpp:1198
#23 0x0000776687686d1c in NS_ProcessNextEvent (aThread=0x776653b19000, aThread@entry=0x776696651980, aMayWait=false) at /home/nihal/mozilla/mozilla-unified/xpcom/threads/nsThreadUtils.cpp:480
#24 0x0000776687eb19f7 in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0x776680d9d100, aDelegate=0x77667b9ffb70) at /home/nihal/mozilla/mozilla-unified/ipc/glue/MessagePump.cpp:300
#25 0x0000776687e25205 in MessageLoop::RunInternal (this=0x0) at /home/nihal/mozilla/mozilla-unified/ipc/chromium/src/base/message_loop.cc:370
#26 MessageLoop::RunHandler (this=0x0) at /home/nihal/mozilla/mozilla-unified/ipc/chromium/src/base/message_loop.cc:363
#27 MessageLoop::Run (this=0x0) at /home/nihal/mozilla/mozilla-unified/ipc/chromium/src/base/message_loop.cc:345
#28 0x0000776687680914 in nsThread::ThreadFunc (aArg=0x776680d9ab60) at /home/nihal/mozilla/mozilla-unified/xpcom/threads/nsThread.cpp:370
#29 0x00007766965f8874 in _pt_root (arg=arg@entry=0x776680d883a0) at /home/nihal/mozilla/mozilla-unified/nsprpub/pr/src/pthreads/ptthread.c:201
#30 0x000059ecb413d824 in set_alt_signal_stack_and_start (params=) at /home/nihal/mozilla/mozilla-unified/mozglue/interposers/pthread_create_interposer.cpp:81
#31 0x00007766968a6ded in ?? () from /usr/lib/libc.so.6
#32 0x000077669692a0dc in ?? () from /usr/lib/libc.so.6

Using the gdb command

x/i 0x000070bb1f9aa33d

we can see the instruction that it crashed on:

mov %ecx,0xc(%rsi)

Checking the value of %rsi using

i r rsi

we get the following result:

rsi 0x0 0

we see that the value is 0, meaning that the instruction is attempting to write to 0xc, which is not in the address space.

One frame up, it appears that this null pointer comes from thread local memory.

   0x70bb1f92eb54:	data16 lea 0x129f174(%rip),%rdi        # 0x70bb20bcdcd0
   0x70bb1f92eb5c:	data16 data16 rex.W call 0x70bb1f029af0 <__tls_get_addr@plt>
   0x70bb1f92eb64:	mov    (%rax),%rbx ; %rax points to thread local storage
   0x70bb1f92eb67:	mov    0x4ab70(%rbx),%rdi ; %rdi is a value in thread local storage
   0x70bb1f92eb6e:	cmp    $0xf,%ebp
   0x70bb1f92eb71:	ja     0x70bb1f066dd8
   0x70bb1f92eb77:	sub    $0x8,%rsp
   0x70bb1f92eb7b:	mov    0x80(%rdi),%rsi ; where %rsi gets assigned to 0
   0x70bb1f92eb82:	mov    %r15d,%ecx
   0x70bb1f92eb85:	mov    %ebp,%edx
   0x70bb1f92eb87:	push   %r12
   0x70bb1f92eb89:	mov    %r13d,%r9d
   0x70bb1f92eb8c:	mov    %r14d,%r8d
   0x70bb1f92eb8f:	call   0x70bb1f9aa330 ; the call to the function where the segmentation fault happens
=> 0x70bb1f92eb94:	mov    0x4ab70(%rbx),%rdx

In frame 2, we can see that the call that leads to the crash is an OpenGL function: mSymbols.fVertexAttrib4f(i, 0, 0, 0, 1);

which is apparently called in that spot to compensate for a bug in the NVIDIA OpenGL driver.

If we remove this call, or just go to about:config and set gfx.work-around-driver-bugs to false (which disables this call), the crash goes away.

So it appears a different bug in the NVIDIA drivers caused this work-around for a preexisting bug in the NVIDIA drivers to crash the browser. This seems like quite a complicated issue to actually get fixed, so I just decided to disable the workarounds and called it a day.