block: Adding ROW scheduling algorithm

This patch adds the implementation of a new scheduling algorithm - ROW. The policy of this algorithm is to prioritize READ requests over WRITE as much as possible without starving the WRITE requests. Change-Id: I4ed52ea21d43b0e7c0769b2599779a3d3869c519 Signed-off-by: Tatyana Brokhman <[email protected]>
gtel3g · Nov 13, 2018 · 541f280 · 541f280
1 parent 0b284ee
commit 541f280
Show file tree

Hide file tree

Showing 4 changed files with 823 additions and 0 deletions.
diff --git a/Documentation/block/row-iosched.txt b/Documentation/block/row-iosched.txt
@@ -0,0 +1,117 @@
+Introduction
+============
+
+The ROW scheduling algorithm will be used in mobile devices as default
+block layer IO scheduling algorithm. ROW stands for "READ Over WRITE"
+which is the main requests dispatch policy of this algorithm.
+
+The ROW IO scheduler was developed with the mobile devices needs in
+mind. In mobile devices we favor user experience upon everything else,
+thus we want to give READ IO requests as much priority as possible.
+The main idea of the ROW scheduling policy is:
+If there are READ requests in pipe - dispatch them but don't starve
+the WRITE requests too much.
+
+Software description
+====================
+The requests are kept in queues according to their priority. The
+dispatching of requests is done in a Round Robin manner with a
+different slice for each queue. The dispatch quantum for a specific
+queue is defined according to the queues priority. READ queues are
+given bigger dispatch quantum than the WRITE queues, within a dispatch
+cycle.
+
+At the moment there are 6 types of queues the requests are
+distributed to:
+-	High priority READ queue
+-	High priority Synchronous WRITE queue
+-	Regular priority READ queue
+-	Regular priority Synchronous WRITE queue
+-	Regular priority WRITE queue
+-	Low priority READ queue
+
+If in a certain dispatch cycle one of the queues was empty and didn't
+use its quantum that queue will be marked as "un-served". If we're in a
+middle of a dispatch cycle dispatching from queue Y and a request
+arrives for queue X that was un-served in the previous cycle, if X's
+priority is higher than Y's, queue X will be preempted in the favor of
+queue Y. This won't mean that cycle is restarted. The "dispatched"
+counter of queue X will remain unchanged. Once queue Y uses up it's quantum
+(or there will be no more requests left on it) we'll switch back to queue X
+and allow it to finish it's quantum.
+
+For READ requests queues we allow idling in within a dispatch quantum in
+order to give the application a chance to insert more requests. Idling
+means adding some extra time for serving a certain queue even if the
+queue is empty. The idling is enabled if we identify the application is
+inserting requests in a high frequency.
+
+For idling on READ queues we use timer mechanism. When the timer expires,
+if there are requests in the scheduler we will signal the underlying driver
+(for example the MMC driver) to fetch another request for dispatch.
+
+The ROW algorithm takes the scheduling policy one step further, making
+it a bit more "user-needs oriented", by allowing the application to
+hint on the urgency of its requests. For example: even among the READ
+requests several requests may be more urgent for completion then others.
+The former will go to the High priority READ queue, that is given the
+bigger dispatch quantum than any other queue.
+
+ROW scheduler will support special services for block devices that
+supports High Priority Requests. That is, the scheduler may inform the
+device upon urgent requests using new callback make_urgent_request.
+In addition it will support rescheduling of requests that were
+interrupted. For example, if the device issues a long write request and
+a sudden high priority read interrupt pops in, the scheduler will
+inform the device about the urgent request, so the device can stop the
+current write request and serve the high priority read request. In such
+a case the device may also send back to the scheduler the reminder of
+the interrupted write request, such that the scheduler may continue
+sending high priority requests without the need to interrupt the
+ongoing write again and again. The write remainder will be sent later on
+according to the scheduler policy.
+
+Design
+======
+Existing algorithms (cfq, deadline) sort the io requests according LBA.
+When deciding on the next request to dispatch they choose the closest
+request to the current disk head position (from handling last
+dispatched request). This is done in order to reduce the disk head
+movement to a minimum.
+We feel that this functionality isn't really needed in mobile devices.
+Usually applications that write/read large chunks of data insert the
+requests in already sorted LBA order. Thus dealing with sort trees adds
+unnecessary complexity.
+
+We're planing to try this enhancement in the future to check if the
+performance is influenced by it.
+
+SMP/multi-core
+==============
+At the moment the code is acceded from 2 contexts:
+- Application context (from block/elevator layer): adding the requests.
+- Underlying driver context (for example the mmc driver thread): dispatching
+  the requests and notifying on completion.
+
+One lock is used to synchronize between the two. This lock is provided
+by the underlying driver along with the dispatch queue.
+
+Config options
+==============
+1. hp_read_quantum: dispatch quantum for the high priority READ queue
+2. rp_read_quantum: dispatch quantum for the regular priority READ queue
+3. hp_swrite_quantum: dispatch quantum for the high priority Synchronous
+   WRITE queue
+4. rp_swrite_quantum: dispatch quantum for the regular priority
+   Synchronous WRITE queue
+5. rp_write_quantum: dispatch quantum for the regular priority WRITE
+   queue
+6. lp_read_quantum: dispatch quantum for the low priority READ queue
+7. lp_swrite_quantum: dispatch quantum for the low priority Synchronous
+   WRITE queue
+8. read_idle: how long to idle on read queue in Msec (in case idling
+   is enabled on that queue).
+9. read_idle_freq: frequency of inserting READ requests that will
+   trigger idling. This is the time in Msec between inserting two READ
+   requests
+
diff --git a/block/Kconfig.iosched b/block/Kconfig.iosched
@@ -21,6 +21,17 @@ config IOSCHED_DEADLINE
 	  a new point in the service tree and doing a batch of IO from there
 	  in case of expiry.
 
+config IOSCHED_ROW
+	tristate "ROW I/O scheduler"
+	default y
+	---help---
+	  The ROW I/O scheduler gives priority to READ requests over the
+	  WRITE requests when dispatching, without starving WRITE requests.
+	  Requests are kept in priority queues. Dispatching is done in a RR
+	  manner when the dispatch quantum for each queue is calculated
+	  according to queue priority.
+	  Most suitable for mobile devices.
+
 config IOSCHED_CFQ
 	tristate "CFQ I/O scheduler"
 	default y
@@ -70,6 +81,16 @@ choice
 	config DEFAULT_DEADLINE
 		bool "Deadline" if IOSCHED_DEADLINE=y
 
+	config DEFAULT_ROW
+		bool "ROW" if IOSCHED_ROW=y
+		help
+		  The ROW I/O scheduler gives priority to READ requests
+		  over the WRITE requests when dispatching, without starving
+		  WRITE requests. Requests are kept in priority queues.
+		  Dispatching is done in a RR manner when the dispatch quantum
+		  for each queue is defined according to queue priority.
+		  Most suitable for mobile devices.
+
 	config DEFAULT_CFQ
 		bool "CFQ" if IOSCHED_CFQ=y
 
@@ -91,6 +112,7 @@ endchoice
 config DEFAULT_IOSCHED
 	string
 	default "deadline" if DEFAULT_DEADLINE
+	default "row" if DEFAULT_ROW
 	default "cfq" if DEFAULT_CFQ
 	default "noop" if DEFAULT_NOOP
 	default "bfq" if DEFAULT_BFQ

diff --git a/block/Makefile b/block/Makefile
@@ -14,6 +14,7 @@ obj-$(CONFIG_BLK_CGROUP)	+= blk-cgroup.o
 obj-$(CONFIG_BLK_DEV_THROTTLING)	+= blk-throttle.o
 obj-$(CONFIG_IOSCHED_NOOP)	+= noop-iosched.o
 obj-$(CONFIG_IOSCHED_DEADLINE)	+= deadline-iosched.o
+obj-$(CONFIG_IOSCHED_ROW)	+= row-iosched.o
 obj-$(CONFIG_IOSCHED_CFQ)	+= cfq-iosched.o
 obj-$(CONFIG_IOSCHED_BFQ)	+= bfq-iosched.o