Recently, I noticed that Firefox crashes when I visit specific websites.
I found that visiting homedepot.com, and clicking on a link would reliably
reproduced the crash. It would occasionally, and seemingly randomly happen
on other websites as well. Having looked at the Firefox code before, I
decided to figure out what was happening.
The first thing I did was get the Firefox source and build it, to make
sure this bug hadn't been resolved already. In spite of Firefox being
such a complex piece of software, building it on Arch Linux is pretty
straightforward. You download a Python script which will setup and
download the mozilla-unified Mercurial repository. Mozilla has a tool
called Mach, which is used as an interface for building the project.
One ./mach build call, and about an hour later, I had a copy of
Firefox built from source. The crash was indeed still present, so it
was time to investigate.
I decided to attach to the process with gdb to see what was going
on. The crash seems to be in the CanvasRenderer:
Thread 51 "CanvasRenderer" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x70bb218006c0 (LWP 9495)]
0x000070bb1f9aa33d in ?? () from /usr/lib/libnvidia-eglcore.so.555.58.02
I got the following corresponding backtrace:
Backtrace
#0 0x000077667a1aa33d in ?? () from /usr/lib/libnvidia-eglcore.so.555.58.02
#1 0x000077667a12eb94 in ?? () from /usr/lib/libnvidia-eglcore.so.555.58.02
#2 0x000077668837ea56 in mozilla::gl::GLContext::InitImpl (this=this@entry=0x77667ef8b400) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContext.cpp:908
#3 0x000077668838a3d7 in mozilla::gl::GLContext::Init (this=0x77667ef8b400) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContext.cpp:333
#4 mozilla::gl::GLContextEGL::Init (this=this@entry=0x77667ef8b400) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContextProviderEGL.cpp:409
#5 0x000077668838909a in mozilla::gl::GLContextEGL::CreateGLContext (egl=std::shared_ptr (empty) = {...}, desc=..., surfaceConfig=surfaceConfig@entry=0x0, surface=surface@entry=0x0, useGles=, contextConfig=0x0, out_failureId=0x77667b9ff358) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContextProviderEGL.cpp:765
#6 0x000077668838b8e1 in mozilla::gl::GLContextEGL::CreateWithoutSurface(std::shared_ptr, mozilla::gl::GLContextCreateDesc const&, nsTSubstring*)::$_0::operator()(bool) const (this=this@entry=0x77667b9ff2d0, useGles=false) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContextProviderEGL.cpp:1148
#7 0x000077668838c1c6 in mozilla::gl::GLContextEGL::CreateWithoutSurface (egl=std::shared_ptr (use count 10, weak count 2) = {...}, desc=..., out_failureId=0x77667b9ff358) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContextProviderEGL.cpp:1216
#8 mozilla::gl::GLContextProviderEGL::CreateHeadless (desc=..., out_failureId=0x77667b9ff358) at /home/nihal/mozilla/mozilla-unified/gfx/gl/GLContextProviderEGL.cpp:1249
#9 0x0000776689953d9d in mozilla::WebGLContext::CreateAndInitGL(bool, std::vector >*)::$_1::operator()(already_AddRefed (*)(mozilla::gl::GLContextCreateDesc const&, nsTSubstring*), char const*) const (this=, pfnCreate=, info=) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLContext.cpp:389
#10 mozilla::WebGLContext::CreateAndInitGL(bool, std::vector >*)::$_0::operator()() const (this=) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLContext.cpp:402
#11 mozilla::WebGLContext::CreateAndInitGL (this=this@entry=0x776668f1e800, forceEnabled=, out_failReasons=out_failReasons@entry=0x77667b9ff420) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLContext.cpp:396
#12 0x0000776689954592 in mozilla::WebGLContext::Create(mozilla::HostWebGLContext*, mozilla::webgl::InitContextDesc const&, mozilla::webgl::InitContextResult*)::$_0::operator()[abi:cxx11]() const (this=) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLContext.cpp:554
#13 mozilla::WebGLContext::Create (host=0x77665dfb4000, desc=..., out=0x77667b9ff5a0) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLContext.cpp:528
#14 0x0000776689909e34 in mozilla::HostWebGLContext::Create (ownerData=..., desc=..., out=0x77667b9ff5a0) at /home/nihal/mozilla/mozilla-unified/dom/canvas/HostWebGLContext.cpp:58
#15 0x000077668998828a in mozilla::dom::WebGLParent::RecvInitialize (this=0x776671f2dac0, desc=..., out=0x0) at /home/nihal/mozilla/mozilla-unified/dom/canvas/WebGLParent.cpp:18
#16 0x00007766899f9ae4 in mozilla::dom::PWebGLParent::OnMessageReceived (this=0x776671f2dac0, msg__=..., reply__=...) at /home/nihal/mozilla/mozilla-unified/obj-x86_64-pc-linux-gnu/ipc/ipdl/PWebGLParent.cpp:503
#17 0x000077668877b267 in mozilla::gfx::PCanvasManagerParent::OnMessageReceived (this=, msg__=..., reply__=...) at /home/nihal/mozilla/mozilla-unified/obj-x86_64-pc-linux-gnu/ipc/ipdl/PCanvasManagerParent.cpp:470
#18 0x0000776687eae95a in mozilla::ipc::MessageChannel::DispatchSyncMessage (this=this@entry=0x776663341858, aProxy=, aProxy@entry=0x7766652f64c0, aMsg=..., aReply=...) at /home/nihal/mozilla/mozilla-unified/ipc/glue/MessageChannel.cpp:1787
#19 0x0000776687eadc09 in mozilla::ipc::MessageChannel::DispatchMessage (this=this@entry=0x776663341858, aProxy=aProxy@entry=0x7766652f64c0, aMsg=...) at /home/nihal/mozilla/mozilla-unified/ipc/glue/MessageChannel.cpp:1737
#20 0x0000776687eadfa8 in mozilla::ipc::MessageChannel::RunMessage (this=0x776663341858, aProxy=0x7766652f64c0, aTask=...) at /home/nihal/mozilla/mozilla-unified/ipc/glue/MessageChannel.cpp:1530
#21 0x0000776687eae604 in mozilla::ipc::MessageChannel::MessageTask::Run (this=0x776653a3ee00) at /home/nihal/mozilla/mozilla-unified/ipc/glue/MessageChannel.cpp:1630
#22 0x0000776687683534 in nsThread::ProcessNextEvent (this=0x776696651980, aMayWait=, aResult=0x77667b9ffabf) at /home/nihal/mozilla/mozilla-unified/xpcom/threads/nsThread.cpp:1198
#23 0x0000776687686d1c in NS_ProcessNextEvent (aThread=0x776653b19000, aThread@entry=0x776696651980, aMayWait=false) at /home/nihal/mozilla/mozilla-unified/xpcom/threads/nsThreadUtils.cpp:480
#24 0x0000776687eb19f7 in mozilla::ipc::MessagePumpForNonMainThreads::Run (this=0x776680d9d100, aDelegate=0x77667b9ffb70) at /home/nihal/mozilla/mozilla-unified/ipc/glue/MessagePump.cpp:300
#25 0x0000776687e25205 in MessageLoop::RunInternal (this=0x0) at /home/nihal/mozilla/mozilla-unified/ipc/chromium/src/base/message_loop.cc:370
#26 MessageLoop::RunHandler (this=0x0) at /home/nihal/mozilla/mozilla-unified/ipc/chromium/src/base/message_loop.cc:363
#27 MessageLoop::Run (this=0x0) at /home/nihal/mozilla/mozilla-unified/ipc/chromium/src/base/message_loop.cc:345
#28 0x0000776687680914 in nsThread::ThreadFunc (aArg=0x776680d9ab60) at /home/nihal/mozilla/mozilla-unified/xpcom/threads/nsThread.cpp:370
#29 0x00007766965f8874 in _pt_root (arg=arg@entry=0x776680d883a0) at /home/nihal/mozilla/mozilla-unified/nsprpub/pr/src/pthreads/ptthread.c:201
#30 0x000059ecb413d824 in set_alt_signal_stack_and_start (params=) at /home/nihal/mozilla/mozilla-unified/mozglue/interposers/pthread_create_interposer.cpp:81
#31 0x00007766968a6ded in ?? () from /usr/lib/libc.so.6
#32 0x000077669692a0dc in ?? () from /usr/lib/libc.so.6
Using the gdb command
x/i 0x000070bb1f9aa33d
we can see the instruction that it crashed on:
mov %ecx,0xc(%rsi)
Checking the value of %rsi using
i r rsi
we get the following result:
rsi 0x0 0
we see that the value is 0, meaning that the instruction is attempting to write to 0xc, which is not in the address space.
One frame up, it appears that this null pointer comes from thread local memory.
0x70bb1f92eb54: data16 lea 0x129f174(%rip),%rdi # 0x70bb20bcdcd0
0x70bb1f92eb5c: data16 data16 rex.W call 0x70bb1f029af0 <__tls_get_addr@plt>
0x70bb1f92eb64: mov (%rax),%rbx ; %rax points to thread local storage
0x70bb1f92eb67: mov 0x4ab70(%rbx),%rdi ; %rdi is a value in thread local storage
0x70bb1f92eb6e: cmp $0xf,%ebp
0x70bb1f92eb71: ja 0x70bb1f066dd8
0x70bb1f92eb77: sub $0x8,%rsp
0x70bb1f92eb7b: mov 0x80(%rdi),%rsi ; where %rsi gets assigned to 0
0x70bb1f92eb82: mov %r15d,%ecx
0x70bb1f92eb85: mov %ebp,%edx
0x70bb1f92eb87: push %r12
0x70bb1f92eb89: mov %r13d,%r9d
0x70bb1f92eb8c: mov %r14d,%r8d
0x70bb1f92eb8f: call 0x70bb1f9aa330 ; the call to the function where the segmentation fault happens
=> 0x70bb1f92eb94: mov 0x4ab70(%rbx),%rdx
In frame 2, we can see that the call that leads to the crash is an OpenGL function:
mSymbols.fVertexAttrib4f(i, 0, 0, 0, 1);
which is apparently called in that spot to compensate for a bug in the NVIDIA OpenGL driver.
If we remove this call, or just go to about:config and set gfx.work-around-driver-bugs to false (which disables this call), the crash goes away.
So it appears a different bug in the NVIDIA drivers caused this work-around for a preexisting bug in the NVIDIA drivers to crash the browser. This seems like quite a complicated issue to actually get fixed, so I just decided to disable the workarounds and called it a day.