Program occasionally crashes on startup due to GNUstepDefaults.lck error

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Program occasionally crashes on startup due to GNUstepDefaults.lck error

Lobron, David
Hello GNUstep people,

I'm debugging an occasional crash that we've observed on machines that run 20 instances of a GNUstep-based program.  The crash does not occur often, and I can't reproduce it on demand, but when it does occur, we see a message like this:

mapmaker_7.exe: Uncaught exception NSGenericException, reason: Unable to get attributes of lock file we made at /root/GNUstep/Defaults/.lck/.GNUstepDefaults.lck

Since the lock file path is absolute, I suspect the problem here is contention among the various processes for the same file if two or more of them are starting up at exactly the same time.  However, I would expect a lock file to be impervious to multiple readers.  Maybe there is a non-atomic operation happening here.  I searched the code for "GNUstepDefaults.lck", but I did find it there.  The /root/GNUstep/Defaults/.lck is normally empty, so it seems like this is a transient file.

Does anyone know if there's a recommended way to handle this?  Is it a known issue?  I checked the bug list and mailing lists, but did not see anything recent.

Thanks,

David
_______________________________________________
Discuss-gnustep mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/discuss-gnustep
Reply | Threaded
Open this post in threaded view
|

Re: Program occasionally crashes on startup due to GNUstepDefaults.lck error

Wolfgang Lux

> Am 14.06.2017 um 22:29 schrieb Lobron, David <[hidden email]>:
>
> Hello GNUstep people,
>
> I'm debugging an occasional crash that we've observed on machines that run 20 instances of a GNUstep-based program.  The crash does not occur often, and I can't reproduce it on demand, but when it does occur, we see a message like this:
>
> mapmaker_7.exe: Uncaught exception NSGenericException, reason: Unable to get attributes of lock file we made at /root/GNUstep/Defaults/.lck/.GNUstepDefaults.lck
>
> Since the lock file path is absolute, I suspect the problem here is contention among the various processes for the same file if two or more of them are starting up at exactly the same time.  However, I would expect a lock file to be impervious to multiple readers.  Maybe there is a non-atomic operation happening here.  I searched the code for "GNUstepDefaults.lck", but I did find it there.  The /root/GNUstep/Defaults/.lck is normally empty, so it seems like this is a transient file.
>
> Does anyone know if there's a recommended way to handle this?  Is it a known issue?

Sort of. I've seen many issues revolving around GNUstepDefaults.lck in the past here.
However, looking at the code I'm afraid the reason is simply that the NSDistributedLock implementation is broken (and apparently is so for quite some time). :-(
The problem is that the code of NSDistributedLock was changed to use -[NSFileManager createDirectoryAtPath:withIntermediateDirectories:attributes:error:] with the second argument set to YES so that it would create intermediate directories. However, that means that this method returns YES regardless of whether the new directory already exists or not. I think changing the parameter to NO should fix your (and everyone else's) issue.

Wolfgang


_______________________________________________
Discuss-gnustep mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/discuss-gnustep
Reply | Threaded
Open this post in threaded view
|

Re: Program occasionally crashes on startup due to GNUstepDefaults.lck error

Richard Frith-Macdonald-9

> On 16 Jun 2017, at 10:48, Wolfgang Lux <[hidden email]> wrote:
>
>
>> Am 14.06.2017 um 22:29 schrieb Lobron, David <[hidden email]>:
>>
>> Hello GNUstep people,
>>
>> I'm debugging an occasional crash that we've observed on machines that run 20 instances of a GNUstep-based program.  The crash does not occur often, and I can't reproduce it on demand, but when it does occur, we see a message like this:
>>
>> mapmaker_7.exe: Uncaught exception NSGenericException, reason: Unable to get attributes of lock file we made at /root/GNUstep/Defaults/.lck/.GNUstepDefaults.lck
>>
>> Since the lock file path is absolute, I suspect the problem here is contention among the various processes for the same file if two or more of them are starting up at exactly the same time.  However, I would expect a lock file to be impervious to multiple readers.  Maybe there is a non-atomic operation happening here.  I searched the code for "GNUstepDefaults.lck", but I did find it there.  The /root/GNUstep/Defaults/.lck is normally empty, so it seems like this is a transient file.
>>
>> Does anyone know if there's a recommended way to handle this?  Is it a known issue?
>
> Sort of. I've seen many issues revolving around GNUstepDefaults.lck in the past here.
> However, looking at the code I'm afraid the reason is simply that the NSDistributedLock implementation is broken (and apparently is so for quite some time). :-(
> The problem is that the code of NSDistributedLock was changed to use -[NSFileManager createDirectoryAtPath:withIntermediateDirectories:attributes:error:] with the second argument set to YES so that it would create intermediate directories. However, that means that this method returns YES regardless of whether the new directory already exists or not. I think changing the parameter to NO should fix your (and everyone else's) issue.

Thanks a lot for that Wolfgang, I've spent days trying to figure out why the user defaults locks were occasionally failing, looking in completely the wrong area (threading mostly)' and never considering it might be something as basic as that.  It's great that you spotted the cause of the problem.
I have changed the NSDistributedLock code (github) to use O/S level functions to create the lock directory, to be sure that if a directory already exists we see the creation attempt as failing rather than succeeding.
_______________________________________________
Discuss-gnustep mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/discuss-gnustep
Reply | Threaded
Open this post in threaded view
|

Re: Program occasionally crashes on startup due to GNUstepDefaults.lck error

Wolfgang Lux

> Am 17.06.2017 um 08:04 schrieb Richard Frith-Macdonald <[hidden email]>:
>
>
>> On 16 Jun 2017, at 10:48, Wolfgang Lux <[hidden email]> wrote:
>>
>>
>>> Am 14.06.2017 um 22:29 schrieb Lobron, David <[hidden email]>:
>>>
>>> Hello GNUstep people,
>>>
>>> I'm debugging an occasional crash that we've observed on machines that run 20 instances of a GNUstep-based program.  The crash does not occur often, and I can't reproduce it on demand, but when it does occur, we see a message like this:
>>>
>>> mapmaker_7.exe: Uncaught exception NSGenericException, reason: Unable to get attributes of lock file we made at /root/GNUstep/Defaults/.lck/.GNUstepDefaults.lck
>>>
>>> Since the lock file path is absolute, I suspect the problem here is contention among the various processes for the same file if two or more of them are starting up at exactly the same time.  However, I would expect a lock file to be impervious to multiple readers.  Maybe there is a non-atomic operation happening here.  I searched the code for "GNUstepDefaults.lck", but I did find it there.  The /root/GNUstep/Defaults/.lck is normally empty, so it seems like this is a transient file.
>>>
>>> Does anyone know if there's a recommended way to handle this?  Is it a known issue?
>>
>> Sort of. I've seen many issues revolving around GNUstepDefaults.lck in the past here.
>> However, looking at the code I'm afraid the reason is simply that the NSDistributedLock implementation is broken (and apparently is so for quite some time). :-(
>> The problem is that the code of NSDistributedLock was changed to use -[NSFileManager createDirectoryAtPath:withIntermediateDirectories:attributes:error:] with the second argument set to YES so that it would create intermediate directories. However, that means that this method returns YES regardless of whether the new directory already exists or not. I think changing the parameter to NO should fix your (and everyone else's) issue.
>
> Thanks a lot for that Wolfgang, I've spent days trying to figure out why the user defaults locks were occasionally failing, looking in completely the wrong area (threading mostly)' and never considering it might be something as basic as that.  It's great that you spotted the cause of the problem.
> I have changed the NSDistributedLock code (github) to use O/S level functions to create the lock directory, to be sure that if a directory already exists we see the creation attempt as failing rather than succeeding.

Thanks for committing the change for me, I was still in the process of switching my local checkout to git.
However, it looks like you got my comment slightly wrong. I didn't mean to say that using the createDirectoryAtPath:withIntermediateDirectories:attributes:error: method is wrong or that that method is broken in any way. It's just that calling it with the withIntermediateDirectories: argument set to YES is wrong because in that case the method return YES even if the directory already exists. Calling the method with NO for the second parameter would have been perfectly okay (unless the create method would indeed contain a bug).

Wolfgang



_______________________________________________
Discuss-gnustep mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/discuss-gnustep
Reply | Threaded
Open this post in threaded view
|

Re: Program occasionally crashes on startup due to GNUstepDefaults.lck error

Richard Frith-Macdonald-9

> On 17 Jun 2017, at 15:40, Wolfgang Lux <[hidden email]> wrote:
>
>
>> Am 17.06.2017 um 08:04 schrieb Richard Frith-Macdonald <[hidden email]>:
>>
>>
>>> On 16 Jun 2017, at 10:48, Wolfgang Lux <[hidden email]> wrote:
>>>
>>>
>>>> Am 14.06.2017 um 22:29 schrieb Lobron, David <[hidden email]>:
>>>>
>>>> Hello GNUstep people,
>>>>
>>>> I'm debugging an occasional crash that we've observed on machines that run 20 instances of a GNUstep-based program.  The crash does not occur often, and I can't reproduce it on demand, but when it does occur, we see a message like this:
>>>>
>>>> mapmaker_7.exe: Uncaught exception NSGenericException, reason: Unable to get attributes of lock file we made at /root/GNUstep/Defaults/.lck/.GNUstepDefaults.lck
>>>>
>>>> Since the lock file path is absolute, I suspect the problem here is contention among the various processes for the same file if two or more of them are starting up at exactly the same time.  However, I would expect a lock file to be impervious to multiple readers.  Maybe there is a non-atomic operation happening here.  I searched the code for "GNUstepDefaults.lck", but I did find it there.  The /root/GNUstep/Defaults/.lck is normally empty, so it seems like this is a transient file.
>>>>
>>>> Does anyone know if there's a recommended way to handle this?  Is it a known issue?
>>>
>>> Sort of. I've seen many issues revolving around GNUstepDefaults.lck in the past here.
>>> However, looking at the code I'm afraid the reason is simply that the NSDistributedLock implementation is broken (and apparently is so for quite some time). :-(
>>> The problem is that the code of NSDistributedLock was changed to use -[NSFileManager createDirectoryAtPath:withIntermediateDirectories:attributes:error:] with the second argument set to YES so that it would create intermediate directories. However, that means that this method returns YES regardless of whether the new directory already exists or not. I think changing the parameter to NO should fix your (and everyone else's) issue.
>>
>> Thanks a lot for that Wolfgang, I've spent days trying to figure out why the user defaults locks were occasionally failing, looking in completely the wrong area (threading mostly)' and never considering it might be something as basic as that.  It's great that you spotted the cause of the problem.
>> I have changed the NSDistributedLock code (github) to use O/S level functions to create the lock directory, to be sure that if a directory already exists we see the creation attempt as failing rather than succeeding.
>
> Thanks for committing the change for me, I was still in the process of switching my local checkout to git.
> However, it looks like you got my comment slightly wrong. I didn't mean to say that using the createDirectoryAtPath:withIntermediateDirectories:attributes:error: method is wrong or that that method is broken in any way. It's just that calling it with the withIntermediateDirectories: argument set to YES is wrong because in that case the method return YES even if the directory already exists. Calling the method with NO for the second parameter would have been perfectly okay (unless the create method would indeed contain a bug).

I had looked at the code and seen that it counted creation as having succeeded irrespective of the flag (which is why I opted to use the O/S function directly).
The documented behavior of the methods was not 100% clear on the point, so I wrote a couple of testcases to check the actual behavior on OSX, and confirm it is as you had thought.
I've now altered the GNUstep behavior to match the OSX behavior and altered the documentation to explicitly state what happens if the directory already exists.
_______________________________________________
Discuss-gnustep mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/discuss-gnustep
Reply | Threaded
Open this post in threaded view
|

Re: Program occasionally crashes on startup due to GNUstepDefaults.lck error

Lobron, David
In reply to this post by Richard Frith-Macdonald-9
Many thanks for this fix, Wolfgang and Richard!

--David

> On Jun 17, 2017, at 2:04 AM, Richard Frith-Macdonald <[hidden email]> wrote:
>
>>
>> On 16 Jun 2017, at 10:48, Wolfgang Lux <[hidden email]> wrote:
>>
>>
>>> Am 14.06.2017 um 22:29 schrieb Lobron, David <[hidden email]>:
>>>
>>> Hello GNUstep people,
>>>
>>> I'm debugging an occasional crash that we've observed on machines that run 20 instances of a GNUstep-based program.  The crash does not occur often, and I can't reproduce it on demand, but when it does occur, we see a message like this:
>>>
>>> mapmaker_7.exe: Uncaught exception NSGenericException, reason: Unable to get attributes of lock file we made at /root/GNUstep/Defaults/.lck/.GNUstepDefaults.lck
>>>
>>> Since the lock file path is absolute, I suspect the problem here is contention among the various processes for the same file if two or more of them are starting up at exactly the same time.  However, I would expect a lock file to be impervious to multiple readers.  Maybe there is a non-atomic operation happening here.  I searched the code for "GNUstepDefaults.lck", but I did find it there.  The /root/GNUstep/Defaults/.lck is normally empty, so it seems like this is a transient file.
>>>
>>> Does anyone know if there's a recommended way to handle this?  Is it a known issue?
>>
>> Sort of. I've seen many issues revolving around GNUstepDefaults.lck in the past here.
>> However, looking at the code I'm afraid the reason is simply that the NSDistributedLock implementation is broken (and apparently is so for quite some time). :-(
>> The problem is that the code of NSDistributedLock was changed to use -[NSFileManager createDirectoryAtPath:withIntermediateDirectories:attributes:error:] with the second argument set to YES so that it would create intermediate directories. However, that means that this method returns YES regardless of whether the new directory already exists or not. I think changing the parameter to NO should fix your (and everyone else's) issue.
>
> Thanks a lot for that Wolfgang, I've spent days trying to figure out why the user defaults locks were occasionally failing, looking in completely the wrong area (threading mostly)' and never considering it might be something as basic as that.  It's great that you spotted the cause of the problem.
> I have changed the NSDistributedLock code (github) to use O/S level functions to create the lock directory, to be sure that if a directory already exists we see the creation attempt as failing rather than succeeding.


_______________________________________________
Discuss-gnustep mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/discuss-gnustep
Reply | Threaded
Open this post in threaded view
|

Re: Program occasionally crashes on startup due to GNUstepDefaults.lck error

Fred Kiefer
In reply to this post by Richard Frith-Macdonald-9

> Am 18.06.2017 um 10:20 schrieb Richard Frith-Macdonald <[hidden email]>:
>
>>
>> On 17 Jun 2017, at 15:40, Wolfgang Lux <[hidden email]> wrote:
>> Thanks for committing the change for me, I was still in the process of switching my local checkout to git.
>> However, it looks like you got my comment slightly wrong. I didn't mean to say that using the createDirectoryAtPath:withIntermediateDirectories:attributes:error: method is wrong or that that method is broken in any way. It's just that calling it with the withIntermediateDirectories: argument set to YES is wrong because in that case the method return YES even if the directory already exists. Calling the method with NO for the second parameter would have been perfectly okay (unless the create method would indeed contain a bug).
>
> I had looked at the code and seen that it counted creation as having succeeded irrespective of the flag (which is why I opted to use the O/S function directly).
> The documented behavior of the methods was not 100% clear on the point, so I wrote a couple of testcases to check the actual behavior on OSX, and confirm it is as you had thought.
> I've now altered the GNUstep behavior to match the OSX behavior and altered the documentation to explicitly state what happens if the directory already exists.

With that change in place, couldn’t we switch back to the old implementation for NSDistributedLock? I hate to see OS specific code scattered around in different places :-)


_______________________________________________
Discuss-gnustep mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/discuss-gnustep
Reply | Threaded
Open this post in threaded view
|

Re: Program occasionally crashes on startup due to GNUstepDefaults.lck error

Richard Frith-Macdonald-9

> On 19 Jun 2017, at 17:56, Fred Kiefer <[hidden email]> wrote:
>
>
>> Am 18.06.2017 um 10:20 schrieb Richard Frith-Macdonald <[hidden email]>:
>>
>>>
>>> On 17 Jun 2017, at 15:40, Wolfgang Lux <[hidden email]> wrote:
>>> Thanks for committing the change for me, I was still in the process of switching my local checkout to git.
>>> However, it looks like you got my comment slightly wrong. I didn't mean to say that using the createDirectoryAtPath:withIntermediateDirectories:attributes:error: method is wrong or that that method is broken in any way. It's just that calling it with the withIntermediateDirectories: argument set to YES is wrong because in that case the method return YES even if the directory already exists. Calling the method with NO for the second parameter would have been perfectly okay (unless the create method would indeed contain a bug).
>>
>> I had looked at the code and seen that it counted creation as having succeeded irrespective of the flag (which is why I opted to use the O/S function directly).
>> The documented behavior of the methods was not 100% clear on the point, so I wrote a couple of testcases to check the actual behavior on OSX, and confirm it is as you had thought.
>> I've now altered the GNUstep behavior to match the OSX behavior and altered the documentation to explicitly state what happens if the directory already exists.
>
> With that change in place, couldn’t we switch back to the old implementation for NSDistributedLock? I hate to see OS specific code scattered around in different places :-)

Good point.  I just comitted a change to do that.
_______________________________________________
Discuss-gnustep mailing list
[hidden email]
https://lists.gnu.org/mailman/listinfo/discuss-gnustep