CMS_UAF_USERS Archives

June 2016, Week 1

CMS_UAF_USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Lisa Giacchetti <[log in to unmask]>
Reply To:
Lisa Giacchetti <[log in to unmask]>
Date:
Tue, 7 Jun 2016 09:52:42 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (83 lines)
Hi all,

  FYI -  we have several users reporting this type of error on the crab 
submit.
  Stefano has been helping us try to figure out what the problem is and 
how to fix it.

  Mauro can you send some information on both types of failures? IE ones 
where the checkwrite fails and others that fail.

Lisa



On 6/7/16 9:43 AM, Mauro Verzetti wrote:
> I’ve got a mix of both.
>
> Few tasks won’t even start due to "Error message: Operation not allowed. Can't submit task because write check at destination site fails”, some of them will fail at transfer time.
>
> Thanks
>
> Mauro
>
>> On 07 Jun 2016, at 16:32, Stefano Belforte <[log in to unmask]> wrote:
>>
>> The fact that crab checkwrite may require --checksum=no has nothing to do
>> with jobs failing. Task submission and job stage out do not request the checksum.
>> If you have failing jobs, the failure reason needs to be investigated before
>> you resubmit.
>> Stefano
>>
>> On 06/07/2016 04:14 PM, Mauro Verzetti wrote:
>>> Hi,
>>>
>>> What’s the status on the issue? If it is solved what is the best action to take?
>>> 1) resubmit failed jobs
>>> 2) Kill the remnants of old jobs and start a new production from scratch.
>>>
>>> Thank you!
>>>
>>> Mauro
>>>
>>>> On 07 Jun 2016, at 00:32, Lisa Giacchetti <[log in to unmask]> wrote:
>>>>
>>>> Yet another update. This time better news!
>>>>
>>>> We received more information from the eos developer and i believe i have fixed xrdcp access to the /store/user/<cernusername>. Jesus has tested xrdcp access and it works.
>>>>
>>>> I have to work on seeing if the physics group areas need to have this fix tomorrow. If you have input for me on that ( ie its still broken or not broken) please let us know.
>>>>
>>>> I had someone test access via crab and they are still experiencing the problem that requires the addition of the --checkwrite=no onto the crab_checkwrite call. I am not sure if crab automatically runs the checkwrite or if there is a way to disable that or add in the extra option. I am trying to gather more information on this now.
>>>>
>>>> If you try the xrdcp access and it does NOT work please let us know. it would be preferable if you opened a snow ticket so we can keep track.
>>>>
>>>> again thank you all for your patience!
>>>>
>>>> lisa
>>>>
>>>>
>>>> On 6/6/16 4:33 PM, Lisa Giacchetti wrote:
>>>>> Hello everyone,
>>>>>
>>>>> we have discovered there is a bug in eos connected to the original link function that was there. The developers are working on a permanent fix and have given me a work around that I am trying to test (so far it has not been successful).
>>>>> Users who are affected by this are those whose cern user name is different from their fermi one and they need to be using via crab or xrdcp to access files (as that users the path with your cern username).
>>>>>
>>>>> I do NOT expect this to be fixed tonight as i need more input from the CERN EOS developer.
>>>>>
>>>>> Also intermittently through the day i was breaking the eosls functionality try to fix this other issue. I have restored things to a state that will allow eosls to work.
>>>>>
>>>>> We apologize for any inconvenience this is causing you and will update these lists with more information as we have it.
>>>>>
>>>>> lisa
>>>>>
>>>>>
>>>>> On 6/6/16 12:19 PM, David A Mason wrote:
>>>>>> Greetings,
>>>>>>
>>>>>> Several users have observed problems with CRAB3 failing to be able to stage back to EOS with the new version.  We think this is related to a change in the behavior of links in the new EOS version and are now working with the CERN developers on this. In the meantime we are also currently working a shorter term fix to get the stageouts working again for everyone.  Let us know if you are seeing these problems (or also useful to know if you have been unaffected by this and staging data back successfully).
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> —Dave

ATOM RSS1 RSS2