Commits · 88826276dcaf4cef9cc7c2695ff15c6d20d4a74d · nexedi / linux

28 Jul, 2010 40 commits

fanotify: infrastructure to add an remove marks on vfsmounts · 88826276

Eric Paris authored Dec 17, 2009

infrastructure work to add and remove marks on vfsmounts.  This should get
every set up except wiring the functions to the syscalls.
Signed-off-by: Eric Paris <eparis@redhat.com>

88826276

fanotify: should_send_event needs to handle vfsmounts · 1c529063

Eric Paris authored Dec 17, 2009

currently should_send_event in fanotify only cares about marks on inodes.
This patch extends that interface to indicate that it cares about events
that happened on vfsmounts.
Signed-off-by: Eric Paris <eparis@redhat.com>

1c529063

fsnotify: Infrastructure for per-mount watches · ca9c726e

Andreas Gruenbacher authored Dec 17, 2009

Per-mount watches allow groups to listen to fsnotify events on an entire
mount.  This patch simply adds and initializes the fields needed in the
vfsmount struct to make this happen.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>

ca9c726e

fsnotify: vfsmount marks generic functions · 0d48b7f0

Eric Paris authored Dec 17, 2009

Much like inode-mark.c has all of the code dealing with marks on inodes
this patch adds a vfsmount-mark.c which has similar code but is intended
for marks on vfsmounts.
Signed-off-by: Eric Paris <eparis@redhat.com>

0d48b7f0

fsnotify/vfsmount: add fsnotify fields to struct vfsmount · 2504c5d6

Andreas Gruenbacher authored Dec 17, 2009

This patch adds the list and mask fields needed to support vfsmount marks.
These are the same fields fsnotify needs on an inode.  They are not used,
just declared and we note where the cleanup hook should be (the function is
not yet defined)
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>

2504c5d6

fsnotify: clear marks to 0 in fsnotify_init_mark · ba643f04

Eric Paris authored Dec 17, 2009

Currently fsnotify_init_mark sets some fields to 0/NULL. Some users
already used some sorts of zalloc, some didn't. This patch uses memset to
explicitly zero everything in the fsnotify_mark when it is initialized so we
don't have to be careful if fields are later added to marks.
Signed-off-by: Eric Paris <eparis@redhat.com>

ba643f04

fsnotify: split generic and inode specific mark code · 5444e298

Eric Paris authored Dec 17, 2009

currently all marking is done by functions in inode-mark.c. Some of this
is pretty generic and should be instead done in a generic function and we
should only put the inode specific code in inode-mark.c
Signed-off-by: Eric Paris <eparis@redhat.com>

5444e298

fanotify: Add pids to events · 32c32632

Andreas Gruenbacher authored Dec 17, 2009

Pass the process identifiers of the triggering processes to fanotify
listeners: this information is useful for event filtering and logging.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>

32c32632

fanotify: create_fd cleanup · 22aa425d

Andreas Gruenbacher authored Dec 17, 2009

Code cleanup which does the fd creation work seperately from the userspace
metadata creation.  It fits better with the other code.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>

22aa425d

fanotify: CONFIG_HAVE_SYSCALL_WRAPPERS for sys_fanotify_mark · 9bbfc964

Heiko Carstens authored Dec 17, 2009

Please note that you need the patch below in addition, otherwise the
syscall wrapper stuff won't work on those 32 bit architectures which enable
the wrappers.

When enabled the syscall wrapper defines always take long parameters and then
cast them to whatever is needed. This approach doesn't work for the 32 bit
case where the original syscall takes a long long parameter, since we would
lose the upper 32 bits.
So syscalls with 64 bit arguments are special cases wrt to syscall wrappers
and enp up in the ugliness below (see also sys_fallocate). In addition these
special cased syscall wrappers have the drawback that ftrace syscall tracing
doesn't work on them, since they don't get defined by using the usual macros.
Signed-off-by: Eric Paris <eparis@redhat.com>

9bbfc964

fanotify: select ANON_INODES. · ef601a9c

Paul Mundt authored Dec 17, 2009

fanotify references anon_inode_getfd(), which is only available with
ANON_INODES enabled. Presently this bails out with the following:

  LD      vmlinux
fs/built-in.o: In function `sys_fanotify_init':
(.text+0x26d1c): undefined reference to `anon_inode_getfd'
make: *** [vmlinux] Error 1

which is trivially corrected by adding an ANON_INODES select.
Signed-off-by: Paul Mundt <lethal@linux-sh.org>
Signed-off-by: Eric Paris <eparis@redhat.com>

ef601a9c

fanotify: send events using read · a1014f10

Eric Paris authored Dec 17, 2009

Send events to userspace by reading the file descriptor from fanotify_init().
One will get blocks of data which look like:

struct fanotify_event_metadata {
	__u32 event_len;
	__u32 vers;
	__s32 fd;
	__u64 mask;
	__s64 pid;
	__u64 cookie;
} __attribute__ ((packed));

Simple code to retrieve and deal with events is below

	while ((len = read(fan_fd, buf, sizeof(buf))) > 0) {
		struct fanotify_event_metadata *metadata;

		metadata = (void *)buf;
		while(FAN_EVENT_OK(metadata, len)) {
			[PROCESS HERE!!]
			if (metadata->fd >= 0 && close(metadata->fd) != 0)
				goto fail;
			metadata = FAN_EVENT_NEXT(metadata, len);
		}
	}
Signed-off-by: Eric Paris <eparis@redhat.com>

a1014f10

fanotify: fanotify_mark syscall implementation · 2a3edf86

Eric Paris authored Dec 17, 2009

NAME
	fanotify_mark - add, remove, or modify an fanotify mark on a
filesystem object

SYNOPSIS
	int fanotify_mark(int fanotify_fd, unsigned int flags, u64 mask,
			  int dfd, const char *pathname)

DESCRIPTION
	fanotify_mark() is used to add remove or modify a mark on a filesystem
	object.  Marks are used to indicate that the fanotify group is
	interested in events which occur on that object.  At this point in
	time marks may only be added to files and directories.

	fanotify_fd must be a file descriptor returned by fanotify_init()

	The flags field must contain exactly one of the following:

	FAN_MARK_ADD - or the bits in mask and ignored mask into the mark
	FAN_MARK_REMOVE - bitwise remove the bits in mask and ignored mark
		from the mark

	The following values can be OR'd into the flags field:

	FAN_MARK_DONT_FOLLOW - same meaning as O_NOFOLLOW as described in open(2)
	FAN_MARK_ONLYDIR - same meaning as O_DIRECTORY as described in open(2)

	dfd may be any of the following:
	AT_FDCWD: the object will be lookup up based on pathname similar
		to open(2)

	file descriptor of a directory: if pathname is not NULL the
		object to modify will be lookup up similar to openat(2)

	file descriptor of the final object: if pathname is NULL the
		object to modify will be the object referenced by dfd

	The mask is the bitwise OR of the set of events of interest such as:
	FAN_ACCESS		- object was accessed (read)
	FAN_MODIFY		- object was modified (write)
	FAN_CLOSE_WRITE		- object was writable and was closed
	FAN_CLOSE_NOWRITE	- object was read only and was closed
	FAN_OPEN		- object was opened
	FAN_EVENT_ON_CHILD	- interested in objected that happen to
				  children.  Only relavent when the object
				  is a directory
	FAN_Q_OVERFLOW		- event queue overflowed (not implemented)

RETURN VALUE
	On success, this system call returns 0. On error, -1 is
	returned, and errno is set to indicate the error.

ERRORS
	EINVAL An invalid value was specified in flags.

	EINVAL An invalid value was specified in mask.

	EINVAL An invalid value was specified in ignored_mask.

	EINVAL fanotify_fd is not a file descriptor as returned by
	fanotify_init()

	EBADF fanotify_fd is not a valid file descriptor

	EBADF dfd is not a valid file descriptor and path is NULL.

	ENOTDIR dfd is not a directory and path is not NULL

	EACCESS no search permissions on some part of the path

	ENENT file not found

	ENOMEM Insufficient kernel memory is available.

CONFORMING TO
	These system calls are Linux-specific.
Signed-off-by: Eric Paris <eparis@redhat.com>

2a3edf86

fanotify: sys_fanotify_mark declartion · bbaa4168

Eric Paris authored Dec 17, 2009

This patch simply declares the new sys_fanotify_mark syscall

int fanotify_mark(int fanotify_fd, unsigned int flags, u64_mask,
		  int dfd const char *pathname)
Signed-off-by: Eric Paris <eparis@redhat.com>

bbaa4168

fanotify: fanotify_init syscall implementation · 52c923dd

Eric Paris authored Dec 17, 2009

NAME
	fanotify_init - initialize an fanotify group

SYNOPSIS
	int fanotify_init(unsigned int flags, unsigned int event_f_flags, int priority);

DESCRIPTION
	fanotify_init() initializes a new fanotify instance and returns a file
	descriptor associated with the new fanotify event queue.

	The following values can be OR'd into the flags field:

	FAN_NONBLOCK Set the O_NONBLOCK file status flag on the new open file description.
		Using this flag saves extra calls to fcntl(2) to achieve the same
		result.

	FAN_CLOEXEC Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.
		See the description of the O_CLOEXEC flag in open(2) for reasons why
		this may be useful.

	The event_f_flags argument is unused and must be set to 0

	The priority argument is unused and must be set to 0

RETURN VALUE
	On success, this system call return a new file descriptor. On error, -1 is
	returned, and errno is set to indicate the error.

ERRORS
	EINVAL An invalid value was specified in flags.

	EINVAL A non-zero valid was passed in event_f_flags or in priority

	ENFILE The system limit on the total number of file descriptors has been reached.

	ENOMEM Insufficient kernel memory is available.

CONFORMING TO
	These system calls are Linux-specific.
Signed-off-by: Eric Paris <eparis@redhat.com>

52c923dd

fanotify: fanotify_init syscall declaration · 11637e4b

Eric Paris authored Dec 17, 2009

This patch defines a new syscall fanotify_init() of the form:

int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags,
		      unsigned int priority)

This syscall is used to create and fanotify group.  This is very similar to
the inotify_init() syscall.
Signed-off-by: Eric Paris <eparis@redhat.com>

11637e4b

fanotify: do not clone on merge unless needed · 9dced01a

Eric Paris authored Dec 17, 2009

Currently if 2 events are going to be merged on the notication queue with
different masks the second event will be cloned and will replace the first
event. However if this notification queue is the only place referencing
the event in question there is no reason not to just update the event in
place. We can tell this if the event->refcnt == 1. Since we hold a
reference for each queue this event is on we know that when refcnt == 1
this is the only queue. The other concern is that it might be about to be
added to a new queue, but this can't be the case since fsnotify holds a
reference on the event until it is finished adding it to queues.
Signed-off-by: Eric Paris <eparis@redhat.com>

9dced01a

fanotify: merge notification events with different masks · a12a7dd3

Eric Paris authored Dec 17, 2009

Instead of just merging fanotify events if they are exactly the same, merge
notification events with different masks. To do this we have to clone the
old event, update the mask in the new event with the new merged mask, and
put the new event in place of the old event.
Signed-off-by: Eric Paris <eparis@redhat.com>

a12a7dd3

fanotify:drop notification if they exist in the outgoing queue · 767cd46c

Eric Paris authored Dec 17, 2009

fanotify listeners get an open file descriptor to the object in question so
the ordering of operations is not as important as in other notification
systems. inotify will drop events if the last event in the event FIFO is
the same as the current event. This patch will drop fanotify events if
they are the same as another event anywhere in the event FIFO.
Signed-off-by: Eric Paris <eparis@redhat.com>

767cd46c

fanotify: fscking all notification system · ff0b16a9

Eric Paris authored Dec 17, 2009

fanotify is a novel file notification system which bases notification on
giving userspace both an event type (open, close, read, write) and an open
file descriptor to the object in question. This should address a number of
races and problems with other notification systems like inotify and dnotify
and should allow the future implementation of blocking or access controlled
notification. These are useful for on access scanners or hierachical storage
management schemes.

This patch just implements the basics of the fsnotify functions.
Signed-off-by: Eric Paris <eparis@redhat.com>

ff0b16a9

fanotify: FMODE_NONOTIFY and __O_SYNC in sparc conflict · 12ed2e36

Signed-off-by: Wu Fengguang authored Feb 08, 2010

sparc used the same value as FMODE_NONOTIFY so change FMODE_NONOTIFY to be
something unique.
Signed-off-by: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: Eric Paris <eparis@redhat.com>

12ed2e36

vfs: introduce FMODE_NONOTIFY · ecf081d1

Eric Paris authored Dec 17, 2009

This is a new f_mode which can only be set by the kernel.  It indicates
that the fd was opened by fanotify and should not cause future fanotify
events.  This is needed to prevent fanotify livelock.  An example of
obvious livelock is from fanotify close events.

Process A closes file1
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
This creates a close event for file1.
fanotify opens file1 for Listener X
Listener X deals with the event and closes its fd for file1.
notice a pattern?

The fix is to add the FMODE_NONOTIFY bit to the open filp done by the kernel
for fanotify.  Thus when that file is used it will not generate future
events.

This patch simply defines the bit.
Signed-off-by: Eric Paris <eparis@redhat.com>

ecf081d1

fsnotify: take inode->i_lock inside fsnotify_find_mark_entry() · 35566087

Andreas Gruenbacher authored Dec 17, 2009

All callers to fsnotify_find_mark_entry() except one take and
release inode->i_lock around the call.  Take the lock inside
fsnotify_find_mark_entry() instead.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>

35566087

dnotify: rename mark_entry to mark · ef5e2b78

Eric Paris authored Dec 17, 2009

nomenclature change.  Used to call things 'entries' but now we just call
them 'marks.'  Do those changes for dnotify.
Signed-off-by: Eric Paris <eparis@redhat.com>

ef5e2b78

inotify: rename mark_entry to just mark · 000285de

Eric Paris authored Dec 17, 2009

rename anything in inotify that deals with mark_entry to just be mark.  It
makes a lot more sense.
Signed-off-by: Eric Paris <eparis@redhat.com>

000285de

fsnotify: rename mark_entry to just mark · 841bdc10

Eric Paris authored Dec 17, 2009

previously I used mark_entry when talking about marks on inodes.  The
_entry is pretty useless.  Just use "mark" instead.
Signed-off-by: Eric Paris <eparis@redhat.com>

841bdc10

fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_mark · d0775441
Eric Paris authored Dec 17, 2009
```
the _entry portion of fsnotify functions is useless.  Drop it.
Signed-off-by: Eric Paris <eparis@redhat.com>
```
d0775441

fsnotify: rename fsnotify_mark_entry to just fsnotify_mark · e61ce867

Eric Paris authored Dec 17, 2009

The name is long and it serves no real purpose.  So rename
fsnotify_mark_entry to just fsnotify_mark.
Signed-off-by: Eric Paris <eparis@redhat.com>

e61ce867

fsnotify: kill FSNOTIFY_EVENT_FILE · 72acc854

Andreas Gruenbacher authored Dec 17, 2009

Some fsnotify operations send a struct file.  This is more information than
we technically need.  We instead send a struct path in all cases instead of
sometimes a path and sometimes a file.
Signed-off-by: Andreas Gruenbacher <agruen@suse.de>
Signed-off-by: Eric Paris <eparis@redhat.com>

72acc854

fsnotify: add flags to fsnotify_mark_entries · 098cf2fc

Eric Paris authored Dec 17, 2009

To differentiate between inode and vfsmount (or other future) types of
marks we add a flags field and set the inode bit on inode marks (the only
currently supported type of mark)
Signed-off-by: Eric Paris <eparis@redhat.com>

098cf2fc

fsnotify: add vfsmount specific fields to the fsnotify_mark_entry union · 4136510d

Eric Paris authored Dec 17, 2009

vfsmount marks need mostly the same data as inode specific fields, but for
consistency and understandability we put that data in a vfsmount specific
struct inside a union with inode specific data.
Signed-off-by: Eric Paris <eparis@redhat.com>

4136510d

fsnotify: put inode specific fields in an fsnotify_mark in a union · 2823e04d

Eric Paris authored Dec 17, 2009

The addition of marks on vfs mounts will be simplified if the inode
specific parts of a mark and the vfsmnt specific parts of a mark are
actually in a union so naming can be easy.  This patch just implements the
inode struct and the union.
Signed-off-by: Eric Paris <eparis@redhat.com>

2823e04d

fsnotify: include vfsmount in should_send_event when appropriate · 3a9fb89f

Eric Paris authored Dec 17, 2009

To ensure that a group will not duplicate events when it receives it based
on the vfsmount and the inode should_send_event test we should distinguish
those two cases.  We pass a vfsmount to this function so groups can make
their own determinations.
Signed-off-by: Eric Paris <eparis@redhat.com>

3a9fb89f

fsnotify: mount point listeners list and global mask · 7131485a

Eric Paris authored Dec 17, 2009

currently all of the notification systems implemented select which inodes
they care about and receive messages only about those inodes (or the
children of those inodes.) This patch begins to flesh out fsnotify support
for the concept of listeners that want to hear notification for an inode
accessed below a given monut point. This patch implements a second list
of fsnotify groups to hold these types of groups and a second global mask
to hold the events of interest for this type of group.

The reason we want a second group list and mask is because the inode based
notification should_send_event support which makes each group look for a mark
on the given inode. With one nfsmount listener that means that every group would
have to take the inode->i_lock, look for their mark, not find one, and return
for every operation. By seperating vfsmount from inode listeners only when
there is a inode listener will the inode groups have to look for their
mark and take the inode lock. vfsmount listeners will have to grab the lock and
look for a mark but there should be fewer of them, and one vfsmount listener
won't cause the i_lock to be grabbed and released for every fsnotify group
on every io operation.
Signed-off-by: Eric Paris <eparis@redhat.com>

7131485a

fsnotify: add groups to fsnotify_inode_groups when registering inode watch · 4ca76352

Eric Paris authored Dec 17, 2009

Currently all fsnotify groups are added immediately to the
fsnotify_inode_groups list upon creation. This means, even groups with no
watches (common for audit) will be on the global tracking list and will
get checked for every event. This patch adds groups to the global list on
when the first inode mark is added to the group.
Signed-of-by: Eric Paris <eparis@redhat.com>

4ca76352

fsnotify: initialize the group->num_marks in a better place · 36fddeba

Eric Paris authored Dec 17, 2009

Currently the comments say that group->num_marks is held because the group
is on the fsnotify_group list.  This isn't strictly the case, we really
just hold the num_marks for the life of the group (any time group->refcnt
is != 0)  This patch moves the initialization stuff and makes it clear when
it is really being held.
Signed-off-by: Eric Paris <eparis@redhat.com>

36fddeba

fsnotify: rename fsnotify_groups to fsnotify_inode_groups · 19c2a0e1

Eric Paris authored Dec 17, 2009

Simple renaming patch. fsnotify is about to support mount point listeners
so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists
used only for groups which have watches on inodes.
Signed-off-by: Eric Paris <eparis@redhat.com>

19c2a0e1

fsnotify: drop mask argument from fsnotify_alloc_group · 0d2e2a1d

Eric Paris authored Dec 17, 2009

Nothing uses the mask argument to fsnotify_alloc_group.  This patch drops
that argument.
Signed-off-by: Eric Paris <eparis@redhat.com>

0d2e2a1d

Audit: only set group mask when something is being watched · 220d14df

Eric Paris authored Dec 17, 2009

Currently the audit watch group always sets a mask equal to all events it
might care about.  We instead should only set the group mask if we are
actually watching inodes.  This should be a perf win when audit watches are
compiled in.
Signed-off-by: Eric Paris <eparis@redhat.com>

220d14df

fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group · ffab8340

Eric Paris authored Dec 17, 2009

fsnotify_obtain_group was intended to be able to find an already existing
group.  Nothing uses that functionality.  This just renames it to
fsnotify_alloc_group so it is clear what it is doing.
Signed-off-by: Eric Paris <eparis@redhat.com>

ffab8340