Hello Stephan, Troy,
I've instrumented our mirror stuff so that I can tell when each
individual mirror is started - should help pin down a time window when
we hit problems.
Apropos mirror volumes, we have 13 SL sets:
3.0.3(32bit), 3.0.5(32bit), 3.0.8(32/64bit), 3.0.9(32/64),
4.2(32), 4.4(32/64), 45(32/64)
5.0(32/64)
In addition we have some of the slc stuff from CERN and a number of
glite, dcache and lcg software repositories as well as a couple of
Centos and Fedora sets. The mirror run takes the repositories in
alphabetic order of definition file names and takes around 2 hours for a
lightly changed payload - mostly scan time for the source.
Maybe we have been lucky in not hitting the problem window. The files
missing on Sunday would have been in 44/i386/SL/RPMS - the base
repository. Now we are looking for it, we may see a more regular
pattern here too.
Cheers,
Martin.
--
-----------------------------------
Martin Bly +44|0 1235 446981
RAL Tier1 Fabric Team Manager
-----------------------------------
> -----Original Message-----
> From: [log in to unmask] [mailto:[log in to unmask]]
> Sent: 07 December 2007 17:58
> To: Bly, MJ (Martin)
> Cc: Troy Dawson; [log in to unmask]
> Subject: RE: Problem with Mirroring
>
> Hi Martin,
>
> On Thu, 6 Dec 2007, Bly, MJ (Martin) wrote:
>
> > Chaps,
> >
> > I've only bounced against this the one time: we do twice
> daily mirror
>
> our observations could still be compatible: I figure we are
> mirroring more of scientificlinux.org than you are. Our
> current list is: 308, 309, 41, 44, 45, 50. With the exception
> of 41, we mirror both i386 and x86_64. And whenever something
> is going on there, we mirror the rolling directories as well.
> Lately, something was going on in 30rolling, 40rolling and 5rolling...
>
> Since missing files are always (at least I can't remember a different
> pattern) from the same subdirectory (either 45/i386/SL/RPMS,
> or 50/x86_64/SL, or...), we simply may detect the problem
> more reliably than you. NB most of the time, the problem
> would have gone unnoticed here if we didn't check what rsync
> is actually doing every single time, unless it happened to
> affect our current mainstream release.
>
> Cheers,
> Stephan
>
>
> > runs starting 2am and noon (UK time, currently GMT, thus
> 8pm and 6am
> > FNAL time) and this was the only time in almost daily use
> of our own
> > mirrors to install/update systems.
> >
> > If you think it might help diagnose the problem, I can try and
> > instrument our mirror runs to pin the times of each mirror.
> >
> > Martin.
> > --
> > -----------------------------------
> > Martin Bly +44|0 1235 446981
> > RAL Tier1 Fabric Team Manager
> > -----------------------------------
> >
> >> -----Original Message-----
> >> From: [log in to unmask]
> >> [mailto:[log in to unmask]] On
> Behalf Of
> >> Troy Dawson
> >> Sent: 06 December 2007 20:18
> >> To: [log in to unmask]
> >> Cc: Bly, MJ (Martin); [log in to unmask]
> >> Subject: Re: Problem with Mirroring
> >>
> >> Hi Stephan,
> >> Thank you for the explanation, especially about how
> frequent it is.
> >> To be honest, your reports were the only one's we'd gotten
> about this
> >> problem, so we weren't aware of how bad it is. I'm also glad that
> >> Martin spoke up, because if all the reports come from the
> same place
> >> it sort of seems like it's a possible communication between places
> >> problem.
> >>
> >> And if he isn't reading this, please send my apologies to Kai. I
> >> guess I did snap at him, and I'm sorry.
> >> Here at Fermilab I've had to explain to people at least 6 times in
> >> the past week that we haven't yet released SLF 5.1 and explain to
> >> them what that means.
> >> So I guess my temper has gotten a bit thin on that subject.
> >>
> >> Troy
> >>
> >> [log in to unmask] wrote:
> >>> Hi Troy,
> >>>
> >>> this problem has become so frequent that I stopped
> >> reporting it, since
> >>> I was sure your storage experts are chasing it.
> >>>
> >>> I usually run a dryrun rsync whenever I expect some change
> >> (it's not a
> >>> cron job). Lately, at least one out of four times the
> >> dryrun told me
> >>> it would delete some files it clearly shouldn't,
> typically with the
> >>> first character of the filename from some range in the
> >> alphabet, as in
> >>> the reports your received.
> >>>
> >>> Every time I checked (maybe the first five times), using an
> >> ftp client
> >>> confirmed what rsync said: the server didn't know about the files.
> >>>
> >>> Given that an rsync dryrun takes about a minute, the
> >> problem must be
> >>> present a pretty significant fraction of the day. I
> usually have to
> >>> wait for O(30m) until the files "are back". Now we
> >> typically run this
> >>> in the morning. I've oserved the around noon as well (all times
> >>> GMT+1). Hence it may be due to some maintenace task you or
> >> your storage guys run at night.
> >>>
> >>> It should be fairly simple to gain more statistics: just
> running an
> >>> "ls <release>/SL/RPMS/" (or /SL/ for SL5) every 15 minutes on the
> >>> server, for all stable releases, an comparing with the
> >> first result,
> >>> should provide you with evidence that the problem is real
> >> and an idea
> >>> of how serious it is, without any risk of false positives or
> >>> significant load on the server.
> >>>
> >>> And please cut Kai a little slack. He's in charge of
> >> mirroring the SL
> >>> tree here (not trivial lately due to the problem this is
> >> all about),
> >>> and deciding which updates make it onto the systems and
> which don't
> >>> yet, while I'm on leave. For example, right now he has to
> >> put the xen
> >>> updates for 5.0 on hold. As he said, he's new to the SL
> >> business, but
> >>> he's not a fool. 5.1 beta is integrated into our installation and
> >>> configuration management under the 5.1 label, which is
> >> probably why he
> >>> confused 5rolling for 5.1.
> >>>
> >>> Cheers,
> >>> Stephan
> >>>
> >>> On Thu, 6 Dec 2007, Troy Dawson wrote:
> >>>
> >>>> Hi Martin,
> >>>> Could you let me know what time this happened, and what
> machine is
> >>>> doing the transfer.
> >>>> It's actually much easier for us to trace ftp transactions than
> >>>> rsync, so maybe this will give us a glimpse into what is
> happening
> >>>> better than tracking down DESY's rsync.
> >>>>
> >>>> Troy
> >>>>
> >>>> Bly, MJ (Martin) wrote:
> >>>>> I am seeing this too, on a long-standing mirror using
> lftp direct
> >>>>> from ftp.scientificlinux.org. Clobbered some
> installations I was
> >>>>> doing of our copy on Sunday - we lost some of the SL4.4
> i386 base
> >>>>> stuff, mostly files starting 'o' (openssl, openoffice,
> >> openssh, ... list on request).
> >>>>> A re-sync of the mirror brought the files back.
> >>>>>
> >>>>> Martin.
> >>>>> --
> >>>>> Martin Bly
> >>>>> RAL Tier1 Fabric Team
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: [log in to unmask]
> >>>>>> [mailto:[log in to unmask]]
> >> On Behalf
> >>>>>> Of Kai Leffhalm
> >>>>>> Sent: 06 December 2007 11:38
> >>>>>> To: [log in to unmask]
> >>>>>> Subject: Problem with Mirroring
> >>>>>>
> >>>>>> Hi,
> >>>>>> I am new here and have one problem when mirroring the
> >> repositories:
> >>>>>> sometimes the packages disappear and reappear after
> >> some time. I
> >>>>>> was told that this is a problem with the fileserver. Now
> >> there are
> >>>>>> some old packages, which seems to be deleted on
> purpose and some
> >>>>>> seems to be reorganized.
> >>>>>>
> >>>>>> Is there any possibility to see, which files would be
> deleted on
> >>>>>> purpose and which are just temporarily deleted?
> >>>>>>
> >>>>>> Cheers
> >>>>>> Kai Leffhalm
> >>>>>>
> >>>>>> Examples:
> >>>>>> SL5.1: (Files are moved to different directory)
> >>>>>> x86_64/SL/firefox-1.5.0.12-7.el5.i386.rpm
> >>>>>> x86_64/SL/firefox-1.5.0.12-7.el5.x86_64.rpm
> >>>>>> x86_64/SL/firefox-devel-1.5.0.12-7.el5.i386.rpm
> >>>>>> x86_64/SL/firefox-devel-1.5.0.12-7.el5.x86_64.rpm
> >>>>>> deleting
> >> x86_64/updates/security/firefox-1.5.0.12-7.el5.x86_64.rpm
> >>>>>> deleting
> x86_64/updates/security/firefox-1.5.0.12-7.el5.i386.rpm
> >>>>>> deleting
> >>>>>> x86_64/updates/security/firefox-devel-1.5.0.12-7.el5.x86_64.rpm
> >>>>>> deleting
> >>>>>> x86_64/updates/security/firefox-devel-1.5.0.12-7.el5.i386.rpm
> >>>>>>
> >>>>>>
> >>>>>> SL4.4: (Old and current packages are deleted) deleting
> >>>>>> i386/errata/SL/RPMS/postgresql-libs-7.4.17-1.RHEL4.1.i386.rpm
> >>>>>> deleting
> >>>>>> i386/errata/SL/RPMS/postgresql-jdbc-7.4.17-1.RHEL4.1.i386.rpm
> >>>>>> deleting
> >>>>>> i386/errata/SL/RPMS/openssh-server-3.9p1-8.RHEL4.20.i386.rpm
> >>>>>> deleting
> >>>>>> i386/errata/SL/RPMS/openssh-clients-3.9p1-8.RHEL4.20.i386.rpm
> >>>>>>
> >>>>
> >>>>
> >>>
> >>> --
> >>> Stephan Wiesand
> >>> DESY - DV -
> >>> Platanenallee 6
> >>> 15738 Zeuthen, Germany
> >>
> >>
> >> --
> >> __________________________________________________
> >> Troy Dawson [log in to unmask] (630)840-6468 Fermilab
> >> ComputingDivision/LCSI/CSI DSS Group
> >> __________________________________________________
> >>
> >
>
> --
> Stephan Wiesand
> DESY - DV - Phone: +49 33762 7 7370
> Platanenallee 6 Fax: +49 33762 7 7216
> 15738 Zeuthen, Germany e-mail: [log in to unmask]
>
|