Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
N
neoppod
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
1
Issues
1
List
Boards
Labels
Milestones
Merge Requests
2
Merge Requests
2
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
nexedi
neoppod
Commits
0814c1e1
Commit
0814c1e1
authored
Jan 15, 2018
by
Kirill Smelkov
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
go/zodb/fs1: My notes on I/O
parent
d232237e
Changes
1
Hide whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
135 additions
and
0 deletions
+135
-0
go/zodb/storage/fs1/notes_io.txt
go/zodb/storage/fs1/notes_io.txt
+135
-0
No files found.
go/zodb/storage/fs1/notes_io.txt
0 → 100644
View file @
0814c1e1
Notes on Input/Output
---------------------
Several options available here:
pread
~~~~~
The kernel handles both disk I/O and caching (in pagecache).
For hot cache case:
Cost = C(pread(n)) = α + β⋅n
α - syscall cost
β - cost to copy 1 byte (both src and dst is in cache)
α is quite big ≈ (200 - 300) ns
α/β ≈ 2-3.5 · 10^4
see details here: https://github.com/golang/go/issues/19563
thus:
the cost to pread 1 page (n ≈ 4·10^3) is ~ (1.1 - 1.2) · α
the cost to copy 1 page is ~ (0.1 - 0.2) · α
if there are many small reads and for each read syscall is made it works slow
becaus α is big.
pread + user-buffer
~~~~~~~~~~~~~~~~~~~
It is possible to mitigate high α and buffer data from bigger reads in
user-space and for smaller client reads copy data from that buffer.
math to get optimal parameters:
( note here S is what α is above - syscall time, and C is what β is above - 1
byte copy time )
- we are reading N bytes sequentially
- consider 2 variants:
a) 1 syscall + 1 big copy + use the copy for smaller reads
cost: S + C⋅N + C⋅N
b) direct acces in x-byte chunks
N
cost: S⋅─ + C⋅N
x
Q: when direct access is chaper?
N⋅S
-> x ≥ ─────── , or
C⋅N + S
α⋅N S
x ≥ ───── , α = ─
α + N C
Q: when reading direct in x-byte chunks: what is N so direct becomes cheaper?
α⋅x
-> N ≥ ─────
α - x
----
Performance depends on buffer hit/miss ratio and will be evaluated for simple
1-page buffer.
mmap
~~~~
The kernel handles both disk I/O and caching (in pagecache).
XXX the cost of minor pagefault is ~ 5.5·α http://marc.info/?l=linux-kernel&m=149002565506733&w=2
Cost ~ α (FIXME see ^^^) is spent on first-time access.
Future accesses to page, given it is still in page-cache, does not incur α cost.
However I/O errors are reported as SIGBUS on memory access. Thus if for read
requst pointer to mmaped-memory is returned, clients could get I/O errors as
exceptions potentially everywhere.
To get & check I/O errors on actual read request the read service will thus
need to access and copy data from mmapped-memory to other buffer incurring β⋅n
cost in hot-cache case.
Not doing the copy can lead to situation where data was first read/checked by
read service ok, then evicted from page-cache by kernel, then accessed by
client which cause real disk I/O, and if this I/O fails -> client get SIGBUS.
Another potential disadvantage: if memory access causes disk I/O whole thread
is blocked, not only goroutine which issued the access.
Note: madvice should be used to guide kernel cache read-ahead/backwards or
where we are planning to access data next. madvice is syscall so this can add α
back.
Link on the subject - how to copy/catch SIGBUS & do not block calling thread:
https://groups.google.com/d/msg/golang-nuts/11rdExWP6ac/226CPanVBAAJ
...
Direct I/O
~~~~~~~~~~
Kernel handles disk I/O directly to user-space memory.
The kernel does not handle caching.
Cache must be implemented in user-space.
pros:
- kernel is accessed only when there is real need for disk IO.
- memory can be managed completely by "us" in userspace.
- what to cache and preload can be more integrated with client workload.
- various copy discipline for reads are possible,
including providing pointer to in-cache data to clients (though this
requires implementing ref-count and such)
cons:
- harder to implement
- Linus dislikes Direct I/O very much
- probably more kernel bugs as this is kind of more exotic area
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment