SCIENTIFIC-LINUX-USERS Archives

December 2007

SCIENTIFIC-LINUX-USERS@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Bly, MJ (Martin)" <[log in to unmask]>
Reply To:
Bly, MJ (Martin)
Date:
Fri, 7 Dec 2007 22:10:51 +0000
Content-Type:
text/plain
Parts/Attachments:
text/plain (277 lines)
Hello Stephan, Troy,

I've instrumented our mirror stuff so that I can tell when each
individual mirror is started - should help pin down a time window when
we hit problems.

Apropos mirror volumes, we have 13 SL sets:

3.0.3(32bit), 3.0.5(32bit), 3.0.8(32/64bit), 3.0.9(32/64), 
4.2(32), 4.4(32/64),  45(32/64)
5.0(32/64)

In addition we have some of the slc stuff from CERN and a number of
glite, dcache and lcg software repositories as well as a couple of
Centos and Fedora sets.  The mirror run takes the repositories in
alphabetic order of definition file names and takes around 2 hours for a
lightly changed payload - mostly scan time for the source. 

Maybe we have been lucky in not hitting the problem window.  The files
missing on Sunday would have been in 44/i386/SL/RPMS - the base
repository.  Now we are looking for it, we may see a more regular
pattern here too.

Cheers,
	Martin.
-- 
   -----------------------------------
      Martin Bly  +44|0 1235 446981
      RAL Tier1 Fabric Team Manager
   ----------------------------------- 

> -----Original Message-----
> From: [log in to unmask] [mailto:[log in to unmask]] 
> Sent: 07 December 2007 17:58
> To: Bly, MJ (Martin)
> Cc: Troy Dawson; [log in to unmask]
> Subject: RE: Problem with Mirroring
> 
> Hi Martin,
> 
> On Thu, 6 Dec 2007, Bly, MJ (Martin) wrote:
> 
> > Chaps,
> >
> > I've only bounced against this the one time:  we do twice 
> daily mirror
> 
> our observations could still be compatible: I figure we are 
> mirroring more of scientificlinux.org than you are. Our 
> current list is: 308, 309, 41, 44, 45, 50. With the exception 
> of 41, we mirror both i386 and x86_64. And whenever something 
> is going on there, we mirror the rolling directories as well. 
> Lately, something was going on in 30rolling, 40rolling and 5rolling...
> 
> Since missing files are always (at least I can't remember a different
> pattern) from the same subdirectory (either 45/i386/SL/RPMS, 
> or 50/x86_64/SL, or...), we simply may detect the problem 
> more reliably than you. NB most of the time, the problem 
> would have gone unnoticed here if we didn't check what rsync 
> is actually doing every single time, unless it happened to 
> affect our current mainstream release.
> 
> Cheers,
>  	Stephan
> 
> 
> > runs starting 2am and noon (UK time, currently GMT, thus 
> 8pm and 6am 
> > FNAL time) and this was the only time in almost daily use 
> of our own 
> > mirrors to install/update systems.
> >
> > If you think it might help diagnose the problem, I can try and 
> > instrument our mirror runs to pin the times of each mirror.
> >
> > 	Martin.
> > --
> >   -----------------------------------
> >      Martin Bly  +44|0 1235 446981
> >      RAL Tier1 Fabric Team Manager
> >   -----------------------------------
> >
> >> -----Original Message-----
> >> From: [log in to unmask]
> >> [mailto:[log in to unmask]] On 
> Behalf Of 
> >> Troy Dawson
> >> Sent: 06 December 2007 20:18
> >> To: [log in to unmask]
> >> Cc: Bly, MJ (Martin); [log in to unmask]
> >> Subject: Re: Problem with Mirroring
> >>
> >> Hi Stephan,
> >> Thank you for the explanation, especially about how 
> frequent it is.  
> >> To be honest, your reports were the only one's we'd gotten 
> about this 
> >> problem, so we weren't aware of how bad it is.  I'm also glad that 
> >> Martin spoke up, because if all the reports come from the 
> same place 
> >> it sort of seems like it's a possible communication between places 
> >> problem.
> >>
> >> And if he isn't reading this, please send my apologies to Kai.  I 
> >> guess I did snap at him, and I'm sorry.
> >> Here at Fermilab I've had to explain to people at least 6 times in 
> >> the past week that we haven't yet released SLF 5.1 and explain to 
> >> them what that means.
> >>   So I guess my temper has gotten a bit thin on that subject.
> >>
> >> Troy
> >>
> >> [log in to unmask] wrote:
> >>> Hi Troy,
> >>>
> >>> this problem has become so frequent that I stopped
> >> reporting it, since
> >>> I was sure your storage experts are chasing it.
> >>>
> >>> I usually run a dryrun rsync whenever I expect some change
> >> (it's not a
> >>> cron job). Lately, at least one out of four times the
> >> dryrun told me
> >>> it would delete some files it clearly shouldn't, 
> typically with the 
> >>> first character of the filename from some range in the
> >> alphabet, as in
> >>> the reports your received.
> >>>
> >>> Every time I checked (maybe the first five times), using an
> >> ftp client
> >>> confirmed what rsync said: the server didn't know about the files.
> >>>
> >>> Given that an rsync dryrun takes about a minute, the
> >> problem must be
> >>> present a pretty significant fraction of the day. I 
> usually have to 
> >>> wait for O(30m) until the files "are back". Now we
> >> typically run this
> >>> in the morning. I've oserved the around noon as well (all times
> >>> GMT+1). Hence it may be due to some maintenace task you or
> >> your storage guys run at night.
> >>>
> >>> It should be fairly simple to gain more statistics: just 
> running an 
> >>> "ls <release>/SL/RPMS/" (or /SL/ for SL5) every 15 minutes on the 
> >>> server, for all stable releases, an comparing with the
> >> first result,
> >>> should provide you with evidence that the problem is real
> >> and an idea
> >>> of how serious it is, without any risk of false positives or 
> >>> significant load on the server.
> >>>
> >>> And please cut Kai a little slack. He's in charge of
> >> mirroring the SL
> >>> tree here (not trivial lately due to the problem this is
> >> all about),
> >>> and deciding which updates make it onto the systems and 
> which don't 
> >>> yet, while I'm on leave. For example, right now he has to
> >> put the xen
> >>> updates for 5.0 on hold. As he said, he's new to the SL
> >> business, but
> >>> he's not a fool. 5.1 beta is integrated into our installation and 
> >>> configuration management under the 5.1 label, which is
> >> probably why he
> >>> confused 5rolling for 5.1.
> >>>
> >>> Cheers,
> >>>         Stephan
> >>>
> >>> On Thu, 6 Dec 2007, Troy Dawson wrote:
> >>>
> >>>> Hi Martin,
> >>>> Could you let me know what time this happened, and what 
> machine is 
> >>>> doing the transfer.
> >>>> It's actually much easier for us to trace ftp transactions than 
> >>>> rsync, so maybe this will give us a glimpse into what is 
> happening 
> >>>> better than tracking down DESY's rsync.
> >>>>
> >>>> Troy
> >>>>
> >>>> Bly, MJ (Martin) wrote:
> >>>>> I am seeing this too, on a long-standing mirror using 
> lftp direct 
> >>>>> from ftp.scientificlinux.org.  Clobbered some 
> installations I was 
> >>>>> doing of our copy on Sunday - we lost some of the SL4.4 
> i386 base 
> >>>>> stuff, mostly files starting 'o'  (openssl, openoffice,
> >> openssh, ... list on request).
> >>>>> A re-sync of the mirror brought the files back.
> >>>>>
> >>>>>         Martin.
> >>>>> --
> >>>>> Martin Bly
> >>>>> RAL Tier1 Fabric Team
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: [log in to unmask]
> >>>>>> [mailto:[log in to unmask]]
> >> On Behalf
> >>>>>> Of Kai Leffhalm
> >>>>>> Sent: 06 December 2007 11:38
> >>>>>> To: [log in to unmask]
> >>>>>> Subject: Problem with Mirroring
> >>>>>>
> >>>>>> Hi,
> >>>>>> I am new here and have one problem when mirroring the
> >> repositories:
> >>>>>>   sometimes the packages disappear and reappear after
> >> some time. I
> >>>>>> was told that this is a problem with the fileserver. Now
> >> there are
> >>>>>> some old packages, which seems to be deleted on 
> purpose and some 
> >>>>>> seems to be reorganized.
> >>>>>>
> >>>>>> Is there any possibility to see, which files would be 
> deleted on 
> >>>>>> purpose and which are just temporarily deleted?
> >>>>>>
> >>>>>> Cheers
> >>>>>> Kai Leffhalm
> >>>>>>
> >>>>>> Examples:
> >>>>>> SL5.1: (Files are moved to different directory) 
> >>>>>> x86_64/SL/firefox-1.5.0.12-7.el5.i386.rpm
> >>>>>> x86_64/SL/firefox-1.5.0.12-7.el5.x86_64.rpm
> >>>>>> x86_64/SL/firefox-devel-1.5.0.12-7.el5.i386.rpm
> >>>>>> x86_64/SL/firefox-devel-1.5.0.12-7.el5.x86_64.rpm
> >>>>>> deleting
> >> x86_64/updates/security/firefox-1.5.0.12-7.el5.x86_64.rpm
> >>>>>> deleting 
> x86_64/updates/security/firefox-1.5.0.12-7.el5.i386.rpm
> >>>>>> deleting
> >>>>>> x86_64/updates/security/firefox-devel-1.5.0.12-7.el5.x86_64.rpm
> >>>>>> deleting
> >>>>>> x86_64/updates/security/firefox-devel-1.5.0.12-7.el5.i386.rpm
> >>>>>>
> >>>>>>
> >>>>>> SL4.4: (Old and current packages are deleted) deleting 
> >>>>>> i386/errata/SL/RPMS/postgresql-libs-7.4.17-1.RHEL4.1.i386.rpm
> >>>>>> deleting
> >>>>>> i386/errata/SL/RPMS/postgresql-jdbc-7.4.17-1.RHEL4.1.i386.rpm
> >>>>>> deleting
> >>>>>> i386/errata/SL/RPMS/openssh-server-3.9p1-8.RHEL4.20.i386.rpm
> >>>>>> deleting
> >>>>>> i386/errata/SL/RPMS/openssh-clients-3.9p1-8.RHEL4.20.i386.rpm
> >>>>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Stephan Wiesand
> >>>    DESY - DV -
> >>>    Platanenallee 6
> >>>    15738 Zeuthen, Germany
> >>
> >>
> >> --
> >> __________________________________________________
> >> Troy Dawson  [log in to unmask]  (630)840-6468 Fermilab 
> >> ComputingDivision/LCSI/CSI DSS Group 
> >> __________________________________________________
> >>
> >
> 
> --
> Stephan Wiesand
>    DESY - DV -                     Phone:   +49 33762 7 7370
>    Platanenallee 6                 Fax:     +49 33762 7 7216
>    15738 Zeuthen, Germany          e-mail:  [log in to unmask]
> 

ATOM RSS1 RSS2