Closed Bug 918234 Opened 11 years ago Closed 11 years ago

XPCOM objects created/destroyed from static ctor/dtor: caused by AsyncLatencyLogger static nsRefPtr

Categories

(Core :: Audio/Video, defect)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla27

People

(Reporter: ishikawa, Assigned: ishikawa)

References

Details

Attachments

(3 files)

With recent source files (fetched today), 
runnig |make mozmill| test suite of thunderbird,
I notice the following errors (stacktrace is
converted into symbolic values)

WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/xpcom/base/nsTraceRefcntImpl.cpp, line 141

Program ../../.././mozilla/dist/bin/thunderbird (pid = 20788) received signal 11.
Stack:
AsmJSFaultHandler(int, siginfo_t*, void*) (AsmJSSignalHandlers.cpp:0)
Warning: File "linux-gate.so.1" does not exist.
__kernel_rt_sigreturn+0x00000000 [linux-gate.so.1 +0x0000040C]
mozilla::OffTheBooksMutex::Lock() (/new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/xpcom/glue/BlockingResourceBase.cpp:225)
AsyncLatencyLogger::~AsyncLatencyLogger() (/new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/content/media/Latency.cpp:96)
nsRefPtr<AsyncLatencyLogger>::~nsRefPtr() (/new-hd1/extra/ishikawa/TB-3HG/objdir-tb3/mozilla/content/media/../../dist/include/mozilla/mozalloc.h:225)
Sleeping for 300 seconds.
Type 'gdb ../../.././mozilla/dist/bin/thunderbird 20788' to attach your debugger to this thread.


Since this crashes the program albeit it is final shutdown phase, it is annoying and should be fixed.
[I wonder if this is related to some past crashes of TB at shutdown time.]

I have not seen this before (or very very infrequently.)
I forgot to add that this is with DEBUG BUILD of thunderbird.

Implicated function in the stacktrace:

Latency.cpp:96
AsyncLatencyLogger::~AsyncLatencyLogger()
{
  MutexAutoLock lock(mMutex);
  if (mThread) {                 <----- ??? -----
    mThread->Shutdown();
  }

It looks to me that this is some kind of thread-race issues that manifest as a strange problem (although
I will defer the judgement to people in the know.)


cf. A comment in mozilla/xpcom/base/nsTraceRefcntImpl.cpp (at line 920+)

namespace mozilla {
void
LogTerm()
{
  NS_ASSERTION(gInitCount > 0,
               "NS_LogTerm without matching NS_LogInit");

  if (--gInitCount == 0) {
#ifdef DEBUG
    /* FIXME bug 491977: This is only going to operate on the
     * BlockingResourceBase which is compiled into
     * libxul/libxpcom_core.so. Anyone using external linkage will
     * have their own copy of BlockingResourceBase statics which will
     * not be freed by this method.
     *
     * It sounds like what we really want is to be able to register a
     * callback function to call at XPCOM shutdown.  Note that with
     * this solution, however, we need to guarantee that
     * BlockingResourceBase::Shutdown() runs after all other shutdown
     * functions.
     */
    BlockingResourceBase::Shutdown();
#endif
    
  
  mStart = TimeStamp();
}

Inquiring mind wants to know which task/thread is executing this function and
will that be the same as MmThread? (If it is the same, we will have a problem, of course, if Shutdown() does  what I think it does.)


BlockingResourceBase.cpp:225

// Debug implementation of (OffTheBooks)Mutex
void
OffTheBooksMutex::Lock()
{
    CallStack callContext = CallStack();

    CheckAcquire(callContext);
    PR_Lock(mLock);          <-----------------------------------
    Acquire(callContext);       // protected by mLock
}

I notice that there are are few entries related to this error message:
Bug 583056  XPCOM-enabled child processes spew "WARNING: XPCOM objects created/destroyed from static ctor/dtor" on shutdown 
Bug 583059 - Make it easier to debug static-XPCOM-ctor/dtor warnings 

Bug 583056 seems very much related to this.

TIA
See Also: → 583056, 583059
You need to break at the point of the error and list the backtrace for all threads. The mini-backtrace you've provided is not enough information. Normally this means that somebody is leaking an object or not cleaning it up properly at shutdown, but we really can't know without a better stack.

This is not an XPCOM bug, nor is it related to the two other bugs you mention, except that they added the asserts for static dtor checking. General for now, although this is probably an error in the AsyncLatencyLogger/media code.
Component: XPCOM → General
See Also: 583056, 583059
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #2)
> You need to break at the point of the error and list the backtrace for all
> threads. The mini-backtrace you've provided is not enough information.
> Normally this means that somebody is leaking an object or not cleaning it up
> properly at shutdown, but we really can't know without a better stack.
> 

I will try to see what I can do: Since this is part of the automated |make mozmill| test suite,
it may be a little difficult/awkward to start gdb and place proper gdb breakpoint.
(Well maybe if a particular test file reproduces the error rather repeatedly, then
I can probably hookup a wrapper that I created to run TB under valgrind and tell valgrind to halt
immediately before proceeding to run the program (i.e., TB) before I can interact its vgdb feature.)

Thank you for the suggestion. 

> This is not an XPCOM bug, nor is it related to the two other bugs you
> mention, except that they added the asserts for static dtor checking.
> General for now, although this is probably an error in the
> AsyncLatencyLogger/media code.

Please wait for my success to capture the stacks for threads.

TIA
Here is the gdb session when |make mozmill| offered me to
attach to the crashed process.

Note that I use -gsplit-dwarf option to GCC 4.8 and specify -Wl,--gdb-index 
to GNU ld.gold linker to speed up the linking process.
Linking process runs fast now, but it seems that gdb can not grok the
produced binary very well :-(

Please let me know where people would like to focus on which thread, etc.
But I may have to switch back to slooow ordinary linker and re-compile everything without -gsplit-dwarf. Ugh.
Is thread 0 the thread that's causing the warning/assert? If so can you just provide the complete backtrace of thread 0?
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #5)
> Is thread 0 the thread that's causing the warning/assert? If so can you just
> provide the complete backtrace of thread 0?

You mean thread 1 (not zero).
I am afraid that gdb failed to read the symbol table of libxul.so compiled with "-gsplit-dwarf" and linked with --gdbindex by ld.gold and could not produce a meaningful stack.
(It is possible that TB is almost exited and thus there is not much on the stack except for XRE_main() entry. The error is printed really at the end of TB execution.
The following is how the error is printed in the real session: I am excerpting the lines preceding it.

OK, it is lengthy, so I am attaching it as an upload.

Anyway, I will recompile the whole thing without using -gsplit-dwarf and see what will happen.

TIA
Here is the |make mozmill| session and the crash report that is printed.
The gdb session that follows shows a rather shallow stack for thread 1.
It may be the real case, or the failure to read symbol properly for libxul.so
that is linked by -gsplit-dwarf may cut the backtrace short.

TIA
That stack is an indication of a bug in this code:

http://hg.mozilla.org/mozilla-central/annotate/59beb1868522/content/media/Latency.cpp#l27

* Static nsRefPtr is forbidden in general because of static ctors: it should be using StaticRefPtr. One feature of StaticRefPtr is that it doesn't have a destructor either, so it will not cause releases after shutdown would make them unsafe
* For some reason we are shutting down without having called AsyncLatencyLogger::Shutdown. This is supposed to be called from nsLayoutStatics::Shutdown, so I suppose that something in the tests you're running is leaking documents/windows and preventing clean shutdown of the layout module (the layout module intentionally does not shut down if somebody leaks documents/windows).
Component: General → WebRTC: Audio/Video
Summary: WARNING: XPCOM objects created/destroyed from static ctor/dtor: file .../mozilla/xpcom/base/nsTraceRefcntImpl.cpp, line 141 → XPCOM objects created/destroyed from static ctor/dtor: caused by AsyncLatencyLogger static nsRefPtr
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #8)
> That stack is an indication of a bug in this code:
> 
> http://hg.mozilla.org/mozilla-central/annotate/59beb1868522/content/media/
> Latency.cpp#l27

Thank you for your analysis and explanation.

> 
> * Static nsRefPtr is forbidden in general because of static ctors: it should
> be using StaticRefPtr. One feature of StaticRefPtr is that it doesn't have a
> destructor either, so it will not cause releases after shutdown would make
> them unsafe


I am uploading a change to Latency.cpp which seems to cure the crash issue for now.
(I can continue testing thunderbird |make mozmill|.)


> * For some reason we are shutting down without having called
> AsyncLatencyLogger::Shutdown. This is supposed to be called from
> nsLayoutStatics::Shutdown, so I suppose that something in the tests you're
> running is leaking documents/windows and preventing clean shutdown of the
> layout module (the layout module intentionally does not shut down if
> somebody leaks documents/windows).

This is a tough nut to crack.
Since  I have a log that caused the gdb crashes (about a dozen during |make mozmill| run.
Maybe someone can focus on why the Shutdown() routine was not called? (And specifically, why
documents/windows were leaked? )
To me why part is obvious.
The log has lines like the following preceding the error.
So I do suspect that proper ordering of shutdown was not performed and some services were
already dead and thus failure to release resources properly : my pure conjecture, but
|make mozmill| has many such lines indicating that the shutdown code is a mess.
(I have seen other messages also. Maybe only DEBUG BUILD shows these
smoking gun warnings/errors?)

cf. remote != ullptr is seen often when the disorderly shutdown is seen.
> WARNING: not an nsIRDFRemoteDataSource: 'remote != nullptr', file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/rdf/datasource/src/nsLocalStore.cpp, line 279



--DOMWINDOW == 12 (0xb1d2918) [serial = 55] [outer = (nil)] [url = mailbox:///new-hd1/extra/ishikawa/TB-3HG/objdir-tb3/mozilla/_tests/mozmill/mozmillprofile/Mail/Local%20Folders/Inbox.sbd/SmartFoldersA?number=0]
--DOMWINDOW == 11 (0xb858d18) [serial = 56] [outer = (nil)] [url = about:blank]
--DOMWINDOW == 10 (0xb8fcfb0) [serial = 57] [outer = (nil)] [url = mailbox:///new-hd1/extra/ishikawa/TB-3HG/objdir-tb3/mozilla/_tests/mozmill/mozmillprofile/Mail/tinderbox/Inbox?number=0]
--DOMWINDOW == 9 (0xb3732d8) [serial = 44] [outer = (nil)] [url = about:blank]
--DOMWINDOW == 8 (0xb2f81f0) [serial = 46] [outer = (nil)] [url = about:blank]
--DOMWINDOW == 7 (0xb051b18) [serial = 31] [outer = (nil)] [url = about:blank]
--DOMWINDOW == 6 (0xa2c8960) [serial = 18] [outer = (nil)] [url = about:blank]
--DOMWINDOW == 5 (0xa465998) [serial = 20] [outer = (nil)] [url = about:blank]
--DOMWINDOW == 4 (0xb09f5f8) [serial = 34] [outer = (nil)] [url = about:blank]
--DOMWINDOW == 3 (0xb0257f8) [serial = 30] [outer = (nil)] [url = about:blank]
--DOMWINDOW == 2 (0x93b0da0) [serial = 8] [outer = (nil)] [url = about:blank]
WARNING: not an nsIRDFRemoteDataSource: 'remote != nullptr', file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/rdf/datasource/src/nsLocalStore.cpp, line 279
WARNING: not an nsIRDFRemoteDataSource: 'remote != nullptr', file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/rdf/datasource/src/nsLocalStore.cpp, line 279
WARNING: not an nsIRDFRemoteDataSource: 'remote != nullptr', file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/rdf/datasource/src/nsLocalStore.cpp, line 279
WARNING: not an nsIRDFRemoteDataSource: 'remote != nullptr', file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/rdf/datasource/src/nsLocalStore.cpp, line 279
WARNING: NS_ENSURE_TRUE(ioService) failed: file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mailnews/base/util/nsMsgMailNewsUrl.cpp, line 537
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8000FFFF: file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mailnews/local/src/nsMailboxUrl.cpp, line 137
WARNING: NS_ENSURE_TRUE(ioService) failed: file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mailnews/base/util/nsMsgMailNewsUrl.cpp, line 537
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x8000FFFF: file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mailnews/local/src/nsMailboxUrl.cpp, line 137
WARNING: Leaking the RDF Service.: file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/rdf/build/nsRDFModule.cpp, line 165
WARNING: some msg dbs left open: '!m_dbCache->Length()', file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mailnews/db/msgdb/src/nsMsgDatabase.cpp, line 916
db left open /new-hd1/extra/ishikawa/TB-3HG/objdir-tb3/mozilla/_tests/mozmill/mozmillprofile/Mail/tinderbox/Inbox.msf
WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
###!!! ASSERTION: Component Manager being held past XPCOM shutdown.: 'cnt == 0', file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/xpcom/build/nsXPComInit.cpp, line 788
UNKNOWN [/new-hd1/extra/ishikawa/TB-3HG/objdir-tb3/mozilla/dist/bin/libxul.so +0x02EF76CF]
NS_ShutdownXPCOM+0x00000011 [/new-hd1/extra/ishikawa/TB-3HG/objdir-tb3/mozilla/dist/bin/libxul.so +0x02EF7C46]
UNKNOWN [/new-hd1/extra/ishikawa/TB-3HG/objdir-tb3/mozilla/dist/bin/libxul.so +0x00351CA2]
UNKNOWN [/new-hd1/extra/ishikawa/TB-3HG/objdir-tb3/mozilla/dist/bin/libxul.so +0x0035EB2F]
XRE_main+0x0000011A [/new-hd1/extra/ishikawa/TB-3HG/objdir-tb3/mozilla/dist/bin/libxul.so +0x0035EEA7]
UNKNOWN [../../.././mozilla/dist/bin/thunderbird +0x00002F07]
WARNING: NS_ENSURE_TRUE(compMgr) failed: file /new-hd1/extra/ishikawa/TB-3HG/NEW-COMMSRC/mozilla/xpcom/glue/nsComponentManagerUtils.cpp, line 58
Leaked URLs:
  chrome://gloda/content/thunderbirdOverlay.xul
  chrome://messenger-smime/content/msgHdrViewSMIMEOverlay.xul
   [... omissin ...]

TIA
This is the patch to use proper data type.

Regarding the logger function to be invoked in the first place,
I am a little confused.

I have disabled webtrc since this is a local DEBUG BUILD of thunderbird.
I don't need webrtc features for a mailer during testing.

Is it possible that somehow the code to invoke logging function (and the function body itself) is left in even if I disable webrtc feature, and
the calling of shutdown is somehow commented out by #if/#endif processing?

TIA
Attachment #807624 - Flags: review?(benjamin)
(In reply to ISHIKAWA, Chiaki from comment #11)
> Created attachment 807624 [details] [diff] [review]
> Using proper StaticRefPtr type.
> 
> This is the patch to use proper data type.
> 
> Regarding the logger function to be invoked in the first place,
> I am a little confused.
> 
> I have disabled webtrc since this is a local DEBUG BUILD of thunderbird.
> I don't need webrtc features for a mailer during testing.
> 
> Is it possible that somehow the code to invoke logging function (and the
> function body itself) is left in even if I disable webrtc feature, and
> the calling of shutdown is somehow commented out by #if/#endif processing?
> 
> TIA

The answer to the last question I myself posed is "Unlikely." after reading
https://bugzilla.mozilla.org/show_bug.cgi?id=918621#c1

Chirs Jones seems to have experienced the same issue with FF (presumably webrtc is compiled in.)
So the issue of failing to call shutdown is due to the leaks of documents and windows somewhere and it does not seem to be confined to TB alone. FF also has it.
Comment on attachment 807624 [details] [diff] [review]
Using proper StaticRefPtr type.

This needs review from a webrtc peer.
Attachment #807624 - Flags: review?(paul)
Attachment #807624 - Flags: review?(benjamin)
Attachment #807624 - Flags: review+
Attachment #807624 - Flags: review?(paul) → review+
Comment on attachment 807624 [details] [diff] [review]
Using proper StaticRefPtr type.

Review of attachment 807624 [details] [diff] [review]:
-----------------------------------------------------------------

I'll note this is only a partial fix for the general problem; it turns out several paths for exiting avoid the ::Shutdown() and so will leak the thread (etc) even with this fix.  The one I found was bug 917491.  But we need this small change anyways.  (I knew this, but it slipped my mind.)

Also, this is Media-related (log audio stream and output latency), not webrtc-specific (though it's used with WebRTC).  So it shouldn't be #ifdef MOZ_WEBRTC
(In reply to Randell Jesup [:jesup] from comment #14)
> Comment on attachment 807624 [details] [diff] [review]
> Using proper StaticRefPtr type.
> 
> Review of attachment 807624 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I'll note this is only a partial fix for the general problem; it turns out
> several paths for exiting avoid the ::Shutdown() and so will leak the thread
> (etc) even with this fix.  The one I found was bug 917491.  But we need this
> small change anyways.  (I knew this, but it slipped my mind.)
> 

Thank you for your quick review. Also it is a good thing to know that someone is aware of the
problematic paths where ::Shutdown is not called.

This patch is sorely needed. Without this, THUNDERBIRD (yes NOT firefox) bombs out
occasionally and makes systematic testing, running its |make mozmill| test suite,
very difficult (!).

> Also, this is Media-related (log audio stream and output latency), not
> webrtc-specific (though it's used with WebRTC).  So it shouldn't be #ifdef
> MOZ_WEBRTC

I hope whoever is knowledgeable enough to fix the large issue take not of this.

TIA
Assignee: nobody → ishikawa
Keywords: checkin-needed
Chiaki: you may need to find out why nsLayoutStatics::Shutdown isn't being called; if it isn't then a ton of stuff is leaking.  See bug 917491
Blocks: 917491
Component: WebRTC: Audio/Video → Video/Audio
OS: Linux → All
Hardware: x86 → All
Target Milestone: --- → mozilla27
(In reply to Randell Jesup [:jesup] from comment #17)
> Chiaki: you may need to find out why nsLayoutStatics::Shutdown isn't being
> called; if it isn't then a ton of stuff is leaking.  See bug 917491

OK. It was suggested that either a window or document is leaking in which case
nsLayoutStatics::Shutdown is not called. Anyway, since I know which tests crashed due to the improper pointer type and I may be able to try testing such targets to see why. It may take time, though.
So please don't hold your breath.

TIA
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: