Please find attached updated version of the patch and test program,
which makes clock_gettime(CLOCK_MONOTONIC_RAW,...) calls have about
the same latency (or slightly less) than that of
clock_gettime(CLOCK_MONOTONIC,...) calls ;

Before, MONOTONIC_RAW was implemented with a syscall,
resulting in latencies of 200-1000ns on a 3.4ghz Haswell ;
Now, clock_gettime(CLOCK_MONOTONIC_RAW,...) calls on the
same machine have a latency of @ 20-40ns , about the
same as clock_gettime(CLOCK_MONOTONIC,...) calls.

The updated patch is more similar to the latest
upstream linux-4.16-rc6 patch than the previous
patch, but otherwise should behave the same.

NOTE: see kernel bug:
   https://bugzilla.kernel.org/show_bug.cgi?id=199129
H.J. Liu, the developer of the GCC & Binutils
RETPOLINE support, believes that the Linux kernel vDSO
SHOULD NOT be compiled with -DRETPOLINE /
( -mindirect-branch=thunk-extern -mindirect-branch-register )
, and has submitted a patch to disable it in the upstream kernel.

The changes to earlier versions of the patch are to enable
compilation with these flags, which is now the default in
kernel 4.16 and now evidently for the RHEL-7.x 3.10 kernels
also.

I have disabled VDSO -DRETPOLINE compilation with the patch
in kernel bug #199129 in kernels I build and would advise
users to do likewise; but the kernel developers do not as
yet share this belief, so the default is left intact, to
compile the vDSO with -DRETPOLINE in this patch.
The patch should build & work identically with / without -DRETPOLINE
compilation, but using -DRETPOLINE for the vDSO will slightly slow
down all system calls and calls of vDSO functions.

If anyone has any issues with this patch, please let me know.
I'm just sharing it because I had to port the patch to SL / RHEL-7
and so I might as well share it - testing + feedback would also
be useful.

Thanks & Best Regards,
Jason