1. 28 Jul, 2010 40 commits
    • Andreas Gruenbacher's avatar
      fanotify: create_fd cleanup · 22aa425d
      Andreas Gruenbacher authored
      Code cleanup which does the fd creation work seperately from the userspace
      metadata creation.  It fits better with the other code.
      Signed-off-by: default avatarAndreas Gruenbacher <agruen@suse.de>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      22aa425d
    • Heiko Carstens's avatar
      fanotify: CONFIG_HAVE_SYSCALL_WRAPPERS for sys_fanotify_mark · 9bbfc964
      Heiko Carstens authored
      Please note that you need the patch below in addition, otherwise the
      syscall wrapper stuff won't work on those 32 bit architectures which enable
      the wrappers.
      
      When enabled the syscall wrapper defines always take long parameters and then
      cast them to whatever is needed. This approach doesn't work for the 32 bit
      case where the original syscall takes a long long parameter, since we would
      lose the upper 32 bits.
      So syscalls with 64 bit arguments are special cases wrt to syscall wrappers
      and enp up in the ugliness below (see also sys_fallocate). In addition these
      special cased syscall wrappers have the drawback that ftrace syscall tracing
      doesn't work on them, since they don't get defined by using the usual macros.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      9bbfc964
    • Paul Mundt's avatar
      fanotify: select ANON_INODES. · ef601a9c
      Paul Mundt authored
      fanotify references anon_inode_getfd(), which is only available with
      ANON_INODES enabled. Presently this bails out with the following:
      
        LD      vmlinux
      fs/built-in.o: In function `sys_fanotify_init':
      (.text+0x26d1c): undefined reference to `anon_inode_getfd'
      make: *** [vmlinux] Error 1
      
      which is trivially corrected by adding an ANON_INODES select.
      Signed-off-by: default avatarPaul Mundt <lethal@linux-sh.org>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      ef601a9c
    • Eric Paris's avatar
      fanotify: send events using read · a1014f10
      Eric Paris authored
      Send events to userspace by reading the file descriptor from fanotify_init().
      One will get blocks of data which look like:
      
      struct fanotify_event_metadata {
      	__u32 event_len;
      	__u32 vers;
      	__s32 fd;
      	__u64 mask;
      	__s64 pid;
      	__u64 cookie;
      } __attribute__ ((packed));
      
      Simple code to retrieve and deal with events is below
      
      	while ((len = read(fan_fd, buf, sizeof(buf))) > 0) {
      		struct fanotify_event_metadata *metadata;
      
      		metadata = (void *)buf;
      		while(FAN_EVENT_OK(metadata, len)) {
      			[PROCESS HERE!!]
      			if (metadata->fd >= 0 && close(metadata->fd) != 0)
      				goto fail;
      			metadata = FAN_EVENT_NEXT(metadata, len);
      		}
      	}
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      a1014f10
    • Eric Paris's avatar
      fanotify: fanotify_mark syscall implementation · 2a3edf86
      Eric Paris authored
      NAME
      	fanotify_mark - add, remove, or modify an fanotify mark on a
      filesystem object
      
      SYNOPSIS
      	int fanotify_mark(int fanotify_fd, unsigned int flags, u64 mask,
      			  int dfd, const char *pathname)
      
      DESCRIPTION
      	fanotify_mark() is used to add remove or modify a mark on a filesystem
      	object.  Marks are used to indicate that the fanotify group is
      	interested in events which occur on that object.  At this point in
      	time marks may only be added to files and directories.
      
      	fanotify_fd must be a file descriptor returned by fanotify_init()
      
      	The flags field must contain exactly one of the following:
      
      	FAN_MARK_ADD - or the bits in mask and ignored mask into the mark
      	FAN_MARK_REMOVE - bitwise remove the bits in mask and ignored mark
      		from the mark
      
      	The following values can be OR'd into the flags field:
      
      	FAN_MARK_DONT_FOLLOW - same meaning as O_NOFOLLOW as described in open(2)
      	FAN_MARK_ONLYDIR - same meaning as O_DIRECTORY as described in open(2)
      
      	dfd may be any of the following:
      	AT_FDCWD: the object will be lookup up based on pathname similar
      		to open(2)
      
      	file descriptor of a directory: if pathname is not NULL the
      		object to modify will be lookup up similar to openat(2)
      
      	file descriptor of the final object: if pathname is NULL the
      		object to modify will be the object referenced by dfd
      
      	The mask is the bitwise OR of the set of events of interest such as:
      	FAN_ACCESS		- object was accessed (read)
      	FAN_MODIFY		- object was modified (write)
      	FAN_CLOSE_WRITE		- object was writable and was closed
      	FAN_CLOSE_NOWRITE	- object was read only and was closed
      	FAN_OPEN		- object was opened
      	FAN_EVENT_ON_CHILD	- interested in objected that happen to
      				  children.  Only relavent when the object
      				  is a directory
      	FAN_Q_OVERFLOW		- event queue overflowed (not implemented)
      
      RETURN VALUE
      	On success, this system call returns 0. On error, -1 is
      	returned, and errno is set to indicate the error.
      
      ERRORS
      	EINVAL An invalid value was specified in flags.
      
      	EINVAL An invalid value was specified in mask.
      
      	EINVAL An invalid value was specified in ignored_mask.
      
      	EINVAL fanotify_fd is not a file descriptor as returned by
      	fanotify_init()
      
      	EBADF fanotify_fd is not a valid file descriptor
      
      	EBADF dfd is not a valid file descriptor and path is NULL.
      
      	ENOTDIR dfd is not a directory and path is not NULL
      
      	EACCESS no search permissions on some part of the path
      
      	ENENT file not found
      
      	ENOMEM Insufficient kernel memory is available.
      
      CONFORMING TO
      	These system calls are Linux-specific.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      2a3edf86
    • Eric Paris's avatar
      fanotify: sys_fanotify_mark declartion · bbaa4168
      Eric Paris authored
      This patch simply declares the new sys_fanotify_mark syscall
      
      int fanotify_mark(int fanotify_fd, unsigned int flags, u64_mask,
      		  int dfd const char *pathname)
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      bbaa4168
    • Eric Paris's avatar
      fanotify: fanotify_init syscall implementation · 52c923dd
      Eric Paris authored
      NAME
      	fanotify_init - initialize an fanotify group
      
      SYNOPSIS
      	int fanotify_init(unsigned int flags, unsigned int event_f_flags, int priority);
      
      DESCRIPTION
      	fanotify_init() initializes a new fanotify instance and returns a file
      	descriptor associated with the new fanotify event queue.
      
      	The following values can be OR'd into the flags field:
      
      	FAN_NONBLOCK Set the O_NONBLOCK file status flag on the new open file description.
      		Using this flag saves extra calls to fcntl(2) to achieve the same
      		result.
      
      	FAN_CLOEXEC Set the close-on-exec (FD_CLOEXEC) flag on the new file descriptor.
      		See the description of the O_CLOEXEC flag in open(2) for reasons why
      		this may be useful.
      
      	The event_f_flags argument is unused and must be set to 0
      
      	The priority argument is unused and must be set to 0
      
      RETURN VALUE
      	On success, this system call return a new file descriptor. On error, -1 is
      	returned, and errno is set to indicate the error.
      
      ERRORS
      	EINVAL An invalid value was specified in flags.
      
      	EINVAL A non-zero valid was passed in event_f_flags or in priority
      
      	ENFILE The system limit on the total number of file descriptors has been reached.
      
      	ENOMEM Insufficient kernel memory is available.
      
      CONFORMING TO
      	These system calls are Linux-specific.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      52c923dd
    • Eric Paris's avatar
      fanotify: fanotify_init syscall declaration · 11637e4b
      Eric Paris authored
      This patch defines a new syscall fanotify_init() of the form:
      
      int sys_fanotify_init(unsigned int flags, unsigned int event_f_flags,
      		      unsigned int priority)
      
      This syscall is used to create and fanotify group.  This is very similar to
      the inotify_init() syscall.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      11637e4b
    • Eric Paris's avatar
      fanotify: do not clone on merge unless needed · 9dced01a
      Eric Paris authored
      Currently if 2 events are going to be merged on the notication queue with
      different masks the second event will be cloned and will replace the first
      event.  However if this notification queue is the only place referencing
      the event in question there is no reason not to just update the event in
      place.  We can tell this if the event->refcnt == 1.  Since we hold a
      reference for each queue this event is on we know that when refcnt == 1
      this is the only queue.  The other concern is that it might be about to be
      added to a new queue, but this can't be the case since fsnotify holds a
      reference on the event until it is finished adding it to queues.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      9dced01a
    • Eric Paris's avatar
      fanotify: merge notification events with different masks · a12a7dd3
      Eric Paris authored
      Instead of just merging fanotify events if they are exactly the same, merge
      notification events with different masks.  To do this we have to clone the
      old event, update the mask in the new event with the new merged mask, and
      put the new event in place of the old event.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      a12a7dd3
    • Eric Paris's avatar
      fanotify:drop notification if they exist in the outgoing queue · 767cd46c
      Eric Paris authored
      fanotify listeners get an open file descriptor to the object in question so
      the ordering of operations is not as important as in other notification
      systems.  inotify will drop events if the last event in the event FIFO is
      the same as the current event.  This patch will drop fanotify events if
      they are the same as another event anywhere in the event FIFO.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      767cd46c
    • Eric Paris's avatar
      fanotify: fscking all notification system · ff0b16a9
      Eric Paris authored
      fanotify is a novel file notification system which bases notification on
      giving userspace both an event type (open, close, read, write) and an open
      file descriptor to the object in question.  This should address a number of
      races and problems with other notification systems like inotify and dnotify
      and should allow the future implementation of blocking or access controlled
      notification.  These are useful for on access scanners or hierachical storage
      management schemes.
      
      This patch just implements the basics of the fsnotify functions.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      ff0b16a9
    • Signed-off-by: Wu Fengguang's avatar
      fanotify: FMODE_NONOTIFY and __O_SYNC in sparc conflict · 12ed2e36
      Signed-off-by: Wu Fengguang authored
      sparc used the same value as FMODE_NONOTIFY so change FMODE_NONOTIFY to be
      something unique.
      Signed-off-by: default avatarWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      12ed2e36
    • Eric Paris's avatar
      vfs: introduce FMODE_NONOTIFY · ecf081d1
      Eric Paris authored
      This is a new f_mode which can only be set by the kernel.  It indicates
      that the fd was opened by fanotify and should not cause future fanotify
      events.  This is needed to prevent fanotify livelock.  An example of
      obvious livelock is from fanotify close events.
      
      Process A closes file1
      This creates a close event for file1.
      fanotify opens file1 for Listener X
      Listener X deals with the event and closes its fd for file1.
      This creates a close event for file1.
      fanotify opens file1 for Listener X
      Listener X deals with the event and closes its fd for file1.
      This creates a close event for file1.
      fanotify opens file1 for Listener X
      Listener X deals with the event and closes its fd for file1.
      notice a pattern?
      
      The fix is to add the FMODE_NONOTIFY bit to the open filp done by the kernel
      for fanotify.  Thus when that file is used it will not generate future
      events.
      
      This patch simply defines the bit.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      ecf081d1
    • Andreas Gruenbacher's avatar
      fsnotify: take inode->i_lock inside fsnotify_find_mark_entry() · 35566087
      Andreas Gruenbacher authored
      All callers to fsnotify_find_mark_entry() except one take and
      release inode->i_lock around the call.  Take the lock inside
      fsnotify_find_mark_entry() instead.
      Signed-off-by: default avatarAndreas Gruenbacher <agruen@suse.de>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      35566087
    • Eric Paris's avatar
      dnotify: rename mark_entry to mark · ef5e2b78
      Eric Paris authored
      nomenclature change.  Used to call things 'entries' but now we just call
      them 'marks.'  Do those changes for dnotify.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      ef5e2b78
    • Eric Paris's avatar
      inotify: rename mark_entry to just mark · 000285de
      Eric Paris authored
      rename anything in inotify that deals with mark_entry to just be mark.  It
      makes a lot more sense.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      000285de
    • Eric Paris's avatar
      fsnotify: rename mark_entry to just mark · 841bdc10
      Eric Paris authored
      previously I used mark_entry when talking about marks on inodes.  The
      _entry is pretty useless.  Just use "mark" instead.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      841bdc10
    • Eric Paris's avatar
      fsnotify: rename fsnotify_find_mark_entry to fsnotify_find_mark · d0775441
      Eric Paris authored
      the _entry portion of fsnotify functions is useless.  Drop it.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      d0775441
    • Eric Paris's avatar
      fsnotify: rename fsnotify_mark_entry to just fsnotify_mark · e61ce867
      Eric Paris authored
      The name is long and it serves no real purpose.  So rename
      fsnotify_mark_entry to just fsnotify_mark.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      e61ce867
    • Andreas Gruenbacher's avatar
      fsnotify: kill FSNOTIFY_EVENT_FILE · 72acc854
      Andreas Gruenbacher authored
      Some fsnotify operations send a struct file.  This is more information than
      we technically need.  We instead send a struct path in all cases instead of
      sometimes a path and sometimes a file.
      Signed-off-by: default avatarAndreas Gruenbacher <agruen@suse.de>
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      72acc854
    • Eric Paris's avatar
      fsnotify: add flags to fsnotify_mark_entries · 098cf2fc
      Eric Paris authored
      To differentiate between inode and vfsmount (or other future) types of
      marks we add a flags field and set the inode bit on inode marks (the only
      currently supported type of mark)
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      098cf2fc
    • Eric Paris's avatar
      fsnotify: add vfsmount specific fields to the fsnotify_mark_entry union · 4136510d
      Eric Paris authored
      vfsmount marks need mostly the same data as inode specific fields, but for
      consistency and understandability we put that data in a vfsmount specific
      struct inside a union with inode specific data.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      4136510d
    • Eric Paris's avatar
      fsnotify: put inode specific fields in an fsnotify_mark in a union · 2823e04d
      Eric Paris authored
      The addition of marks on vfs mounts will be simplified if the inode
      specific parts of a mark and the vfsmnt specific parts of a mark are
      actually in a union so naming can be easy.  This patch just implements the
      inode struct and the union.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      2823e04d
    • Eric Paris's avatar
      fsnotify: include vfsmount in should_send_event when appropriate · 3a9fb89f
      Eric Paris authored
      To ensure that a group will not duplicate events when it receives it based
      on the vfsmount and the inode should_send_event test we should distinguish
      those two cases.  We pass a vfsmount to this function so groups can make
      their own determinations.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      3a9fb89f
    • Eric Paris's avatar
      fsnotify: mount point listeners list and global mask · 7131485a
      Eric Paris authored
      currently all of the notification systems implemented select which inodes
      they care about and receive messages only about those inodes (or the
      children of those inodes.)  This patch begins to flesh out fsnotify support
      for the concept of listeners that want to hear notification for an inode
      accessed below a given monut point.  This patch implements a second list
      of fsnotify groups to hold these types of groups and a second global mask
      to hold the events of interest for this type of group.
      
      The reason we want a second group list and mask is because the inode based
      notification should_send_event support which makes each group look for a mark
      on the given inode.  With one nfsmount listener that means that every group would
      have to take the inode->i_lock, look for their mark, not find one, and return
      for every operation.   By seperating vfsmount from inode listeners only when
      there is a inode listener will the inode groups have to look for their
      mark and take the inode lock.  vfsmount listeners will have to grab the lock and
      look for a mark but there should be fewer of them, and one vfsmount listener
      won't cause the i_lock to be grabbed and released for every fsnotify group
      on every io operation.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      7131485a
    • Eric Paris's avatar
      fsnotify: add groups to fsnotify_inode_groups when registering inode watch · 4ca76352
      Eric Paris authored
      Currently all fsnotify groups are added immediately to the
      fsnotify_inode_groups list upon creation.  This means, even groups with no
      watches (common for audit) will be on the global tracking list and will
      get checked for every event.  This patch adds groups to the global list on
      when the first inode mark is added to the group.
      Signed-of-by: default avatarEric Paris <eparis@redhat.com>
      4ca76352
    • Eric Paris's avatar
      fsnotify: initialize the group->num_marks in a better place · 36fddeba
      Eric Paris authored
      Currently the comments say that group->num_marks is held because the group
      is on the fsnotify_group list.  This isn't strictly the case, we really
      just hold the num_marks for the life of the group (any time group->refcnt
      is != 0)  This patch moves the initialization stuff and makes it clear when
      it is really being held.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      36fddeba
    • Eric Paris's avatar
      fsnotify: rename fsnotify_groups to fsnotify_inode_groups · 19c2a0e1
      Eric Paris authored
      Simple renaming patch.  fsnotify is about to support mount point listeners
      so I am renaming fsnotify_groups and fsnotify_mask to indicate these are lists
      used only for groups which have watches on inodes.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      19c2a0e1
    • Eric Paris's avatar
      fsnotify: drop mask argument from fsnotify_alloc_group · 0d2e2a1d
      Eric Paris authored
      Nothing uses the mask argument to fsnotify_alloc_group.  This patch drops
      that argument.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      0d2e2a1d
    • Eric Paris's avatar
      Audit: only set group mask when something is being watched · 220d14df
      Eric Paris authored
      Currently the audit watch group always sets a mask equal to all events it
      might care about.  We instead should only set the group mask if we are
      actually watching inodes.  This should be a perf win when audit watches are
      compiled in.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      220d14df
    • Eric Paris's avatar
      fsnotify: fsnotify_obtain_group should be fsnotify_alloc_group · ffab8340
      Eric Paris authored
      fsnotify_obtain_group was intended to be able to find an already existing
      group.  Nothing uses that functionality.  This just renames it to
      fsnotify_alloc_group so it is clear what it is doing.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      ffab8340
    • Eric Paris's avatar
      fsnotify: fsnotify_obtain_group kzalloc cleanup · cd7752ce
      Eric Paris authored
      fsnotify_obtain_group uses kzalloc but then proceedes to set things to 0.
      This patch just deletes those useless lines.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      cd7752ce
    • Eric Paris's avatar
      fsnotify: remove group_num altogether · 74be0cc8
      Eric Paris authored
      The original fsnotify interface has a group-num which was intended to be
      able to find a group after it was added.  I no longer think this is a
      necessary thing to do and so we remove the group_num.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      74be0cc8
    • Eric Paris's avatar
      fsnotify: lock annotation for event replacement · cac69dad
      Eric Paris authored
      fsnotify_replace_event need to lock both the old and the new event.  This
      causes lockdep to get all pissed off since it dosn't know this is safe.
      It's safe in this case since the new event is impossible to be reached from
      other places in the kernel.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      cac69dad
    • Eric Paris's avatar
      fsnotify: replace an event on a list · 1201a536
      Eric Paris authored
      fanotify would like to clone events already on its notification list, make
      changes to the new event, and then replace the old event on the list with
      the new event.  This patch implements the replace functionality of that
      process.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      1201a536
    • Eric Paris's avatar
      fsnotify: clone existing events · b4e4e140
      Eric Paris authored
      fsnotify_clone_event will take an event, clone it, and return the cloned
      event to the caller.  Since events may be in use by multiple fsnotify
      groups simultaneously certain event entries (such as the mask) cannot be
      changed after the event was created.  Since fanotify would like to merge
      events happening on the same file it needs a new clean event to work with
      so it can change any fields it wishes.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      b4e4e140
    • Eric Paris's avatar
      fsnotify: per group notification queue merge types · 74766bbf
      Eric Paris authored
      inotify only wishes to merge a new event with the last event on the
      notification fifo.  fanotify is willing to merge any events including by
      means of bitwise OR masks of multiple events together.  This patch moves
      the inotify event merging logic out of the generic fsnotify notification.c
      and into the inotify code.  This allows each use of fsnotify to provide
      their own merge functionality.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      74766bbf
    • Eric Paris's avatar
      fsnotify: send struct file when sending events to parents when possible · 28c60e37
      Eric Paris authored
      fanotify needs a path in order to open an fd to the object which changed.
      Currently notifications to inode's parents are done using only the inode.
      For some parental notification we have the entire file, send that so
      fanotify can use it.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      28c60e37
    • Eric Paris's avatar
      fsnotify: pass a file instead of an inode to open, read, and write · 2a12a9d7
      Eric Paris authored
      fanotify, the upcoming notification system actually needs a struct path so it can
      do opens in the context of listeners, and it needs a file so it can get f_flags
      from the original process.  Close was the only operation that already was passing
      a struct file to the notification hook.  This patch passes a file for access,
      modify, and open as well as they are easily available to these hooks.
      Signed-off-by: default avatarEric Paris <eparis@redhat.com>
      2a12a9d7