SCIENTIFIC-LINUX-DEVEL Archives

August 2012

SCIENTIFIC-LINUX-DEVEL@LISTSERV.FNAL.GOV

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Stephan Wiesand <[log in to unmask]>
Reply To:
Stephan Wiesand <[log in to unmask]>
Date:
Thu, 30 Aug 2012 18:23:22 +0200
Content-Type:
text/plain
Parts/Attachments:
text/plain (85 lines)
On Aug 30, 2012, at 13:13 , Stephan Wiesand <[log in to unmask]> wrote:

> Hi All,
> 
> still trying to find out what's going on. Pat told me he's unable to reproduce the problem. There are no further reports. But it clearly was happening at TU Chemnitz and on my test VM.
> 
> To make things even weirder: I can't reproduce it any more either. After reformatting the cache as ext4, all combinations of modules and kernels (old/new, new/old) work for me. Thus it seems the history of the cache filesystem is important. And I'm beginning to wonder whether it's simply a bug in   earlier ext4, e2fsprogs, openafs, whatever.
> 
> My test VM was last installed on February 16th, with the then brand new SL6 and the openafs-1.6.0 coming with it, and the cache fs was created by the installer. OpenAFS was updated to 1.6.1 on April 1st. I'll now try to recreate this history as accurately as possible.

I reinstalled 6.2, and the problem immediately struck. It affects openafs-1.6.0 as well as 1.6.1.

Then I installed 6.3 - no problem.

So, something is special about ext4 filesystems created by the 6.2 installer (and possibly earlier ones), causing problems with the 6.3 kernels. A module built against a 6.3 kernel can deal with it, but fails horribly when inserted into a 6.2 kernel - at least on such a filesystem.

Next tests:
- use new module with old kernel on a filesystem where old/new works
- repeat all tests with ext3

It may just be some dirty secret about the history of the ext4 implementation on 32-bit EL6. In that case, it could be sane to continue using kmods for openafs. But I'm not convinced yet.

NB I started a discussion on openafs-devel: https://lists.openafs.org/pipermail/openafs-devel/2012-August/018936.html

Best regards
	Stephan

> Thomas & Thomas, you reported that some of your systems didn't fail either (those with few RPMs installed, as opposed to those with a richer installation). Is it possible that those have a different history as well? Cache fs created manually instead of with anaconda? Installed with openafs-1.6.1 in the first place, not updated from 1.6.0? Anything else you can think of?
> 
> Thanks a lot for any input.
> 
> Best regards,
> 	Stephan
> 
> On Aug 16, 2012, at 17:31 , Thomas Müller <[log in to unmask]> wrote:
> 
>> Hi Stephan,
>> 
>> yes, we have the cache on ext4 - so i guess you are right.
>> 
>> regards,
>> Thomas.
>> 
>> Am 16.08.2012 17:20, schrieb Stephan Wiesand:
>>> Hi Thomas,
>>> 
>>> On Aug 15, 2012, at 13:51 , Thomas L. Koppe<[log in to unmask]>  wrote:
>>> 
>>>> in the mean while we builded the kmod-openafs rpm against the kernel 2.6.32-279.2.1.el6 and now we've no problems anymore. We're testing it now for while.
>>> 
>>> is your cache on ext4? The old module seems to work for me when I use ext3 for the cache.
>>> 
>>> I wonder whether the problem is related to this: http://joejulian.name/blog/glusterfs-bit-by-ext4-structure-change/
>>> 
>>> All the same, it still means that those handy kmods cannot be used for packaging the openafs client. I guess it's back to kernel-module-openafs-`uname -r` .
>>> 
>>> Best regards,
>>> 	Stephan
>>> 
>>>> Am 15.08.2012 11:38, schrieb Stephan Wiesand:
>>>>> On Aug 15, 2012, at 09:41 , "Thomas L. Koppe"<[log in to unmask]>   wrote:
>>>>> 
>>>>>> we have some problems with the new kernels (2.6.32-279, 2.6.32-279.1.1, 2.6.32-279.2.1) and openafs-1.6.1-112.sl6 on 32bit systems. It's the same problem on SL_6.2_X86 and SL6.3_X86. On servers with just a few installed RPMs everything works fine. On systems with a full paket installation we can't read any files in afs, which are bigger than some kbytes. On x86_64 systems we have no problems.
>>>>> 
>>>>> 
>>>>> ouch. I can reproduce this on my test VM. Reading processes just hang in D+ state, but are interruptible, right?
>>>>> 
>>>>> I rebuilt the module against 2.6.32-279.1.1, and that build works for me. Could you please try
>>>>> 
>>>>> http://www-zeuthen.desy.de/~wiesand/SL6/i686/kmod-openafs-1.6.1-112.sl6.279.1.1.i686.rpm ?
>>>>> 
>>>>> Installing this will probably cause the same problem with older kernels you still have installed. If you want to keep both functional, rm /lib/modules/.../weak-updates/openafs/openafs.ko , extract the new module with rpm2cpio|cpio -iv, copy it to an appropritae place (say /lib/modules/.../kernel/fs), and run depmod -a 2.6.32-279.1.1.el6 . Then cross fingers and reboot.
> 
> -- 
> Stephan Wiesand
> DESY - DV -
> Platanenallee 6
> 15732 Zeuthen, Germany

-- 
Stephan Wiesand
DESY - DV -
Platanenallee 6
15732 Zeuthen, Germany

ATOM RSS1 RSS2