Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consistent Riak (riak_kv_ensemble_backend) fails to start if AAE is disabled #959

Closed
sargun opened this issue May 28, 2014 · 7 comments
Closed
Assignees
Labels
Milestone

Comments

@sargun
Copy link

sargun commented May 28, 2014

So, I noticed that if I don't have anti-entropy on, and I enable strongly consistent Riak, it doesn't work. Specifically, what happens is that riak_kv_ensembles sets up the ensembles, but the riak_ensemble_peer never gets past to all_sync state. It appears that this is because the riak_kv_ensemble_backend relies on anti-entropy to perform an exchange before it comes up. See here:


sync(Replies, State=#state{ensemble=_Ensemble, id=Id}) ->
    Peers0 = [{Idx, PeerId} || {PeerId={{kv,_PL,_N,Idx},_Node},_Reply}
<- Replies],
    Peers = orddict:from_list(Peers0),
    {{kv, PL, N, Idx}, _} = Id,
    IndexN = {PL,N},
    %% Sort to remove duplicates when changing ownership / forwarded response
    Siblings0 = lists:usort([I || {{{kv,_PL,_N,I},_Node},_Reply} <- Replies]),
    %% Just in case, remove self from list
    Siblings = Siblings0 -- [Idx],

    case local_partition(Idx) of
        true ->
            T0 = erlang:now(),
            Pid = self(),
            spawn_link(fun() ->
                               wait_for_sync(Idx, IndexN, Pid, T0,
Siblings, Peers)
                       end),
            {async, State};
        false ->
            {ok, State}
    end.

wait_for_sync(Idx, IndexN, Pid, T0, Siblings, Peers) ->
    Exchanges = riak_kv_entropy_info:exchanges(Idx, IndexN),
    Recent = [OtherIdx || {OtherIdx, T1, _} <- Exchanges,
                          T1 > T0],
    lager:info("~p/~p: Exchanges: ~p~nT0: ~p~nRecent: ~p~nSibs: ~p",
                [Idx, IndexN, Exchanges, T0, Recent, Siblings]),
    Need = length(Siblings),
    Finished = length(Recent),
    Local = local_partition(Idx),
    Complete = ((Siblings -- Recent) =:= []),
    if not Local ->
            lager:info("Partition ownership changed. No need to sync."),
            riak_ensemble_backend:sync_complete(Pid, []);
       Complete ->
            lager:info("Complete ~b/~b :: ~p -> ~p~n", [Finished,
Need, Idx, Pid]),
            SyncPeers = [orddict:fetch(PeerIdx, Peers) || PeerIdx <- Siblings],
            riak_ensemble_backend:sync_complete(Pid, SyncPeers);
       true ->
            lager:info("Not yet ~b/~b :: ~p", [Finished, Need, Idx]),
            timer:sleep(10000),
            wait_for_sync(Idx, IndexN, Pid, T0, Siblings, Peers)
    end.

(I uncommented the debugging). If riak_kv_entropy_manager is not enabled, then riak_kv_entropy_info:exchanges will always be empty. Can we either (1) manually trigger AAE exchange upon noticing that strong consistency is enabled (I imagine you can do this by setting the mode to manual, and then queueing up the AAE jobs), (2) throw a warning to the user saying that they should enable AAE.

@kellymclaughlin kellymclaughlin added this to the 2.0-RC milestone May 30, 2014
@jburwell jburwell self-assigned this Jun 2, 2014
@jburwell
Copy link

jburwell commented Jun 2, 2014

Working up a test case to reproduce.

@jburwell jburwell added the Bug label Jun 2, 2014
@sargun
Copy link
Author

sargun commented Jun 2, 2014

Turn off aae, and try to start riak_kv, with strong consistency on. Then look at riak ensembles, and they will never reach quorum.

@jburwell
Copy link

jburwell commented Jun 4, 2014

@sargun you are correct -- this behavior is as designed. When AAE is disabled and strong consistency is enabled, strongly consistent operations will not work because ensembles will never gain quorum. However, non-consistent operations will continue to work. We plan to add a log, as well as, riak admin warning regarding the inconsistent configuration.

cc @jtuple

@sargun
Copy link
Author

sargun commented Jun 4, 2014

Why not do the aae sync the first time when the riak_ensemble starts up? Riak_kv_entropy_manager supports being executed manually. You don't need continual synchronization, just during the ensemble member start-up.

jburwell pushed a commit to basho/riaknostic that referenced this issue Jun 10, 2014
  strong consistency is enabled and AAE is disabled
  (defect basho/riak_kv#959)
jburwell pushed a commit to basho/riak_test that referenced this issue Jun 10, 2014
  AAE is disabled.  (defect basho/riak_kv#959)
- Adds additional console output to reset-current-env to explain
  configuration and steps being executed
- Adds the -n option to the reset-current-env script to specify the
  number of nodes to build.  By default, 5 will be created.
@jburwell
Copy link

@sargun AAE tree construction is an long running operation (often measured in hours to days depending on dataset size). Thus, Riak doesn't build an AAE tree on-demand for a given exchange. Riak builds AAE trees once a week, and keeps them up-to-date in realtime as K/V operations come in. If AAE is disabled, these trees aren't being maintained in realtime and thus all necessary trees would need to be built from scratch. Since that could take days, and therefore leave ensembles unavailable for days, this is operationally unfeasible.

/cc @jtuple

@jburwell
Copy link

Open the following PRs to verify proper Riak startup under the conditions described, log a warning on startup, and check the strong consistency/AAE configuration using Riaknostic:

/cc @kellymclaughlin @jtuple @bsparrow435

@jburwell
Copy link

All PRs have been approved and merged.

jaydoane pushed a commit to apache/couchdb that referenced this issue Dec 25, 2020
  strong consistency is enabled and AAE is disabled
  (defect basho/riak_kv#959)
jaydoane pushed a commit to apache/couchdb that referenced this issue Apr 20, 2021
I think I'm done with shell scripts for now.

Port to Perl.

Some cleaning up of the Perl code.

Add check for ring_creation_size != num_partitions.

Check if the Riak node is running.

Prefer riak over riaksearch when looking for commands.

Make it a proper hash.

Fetch number of nodes in the ring.

Look for crash dumps and emfile errors.

Add checks for number of partitions vs. nodes and for not being part of the ring.

Align status output for easier parsing.

Check number of connected nodes vs. ring members.

Use a better ratio for partitions vs. nodes.

Add some checks if the node is running to prevent errors in checks.

Start porting Riaknostic to Erlang.

Remove io:format that's broken anyway.

Remove shell script version.

Fetch and start printing data from the Riak instance.

Add Riak installation detection.

Find Riak logs.

Less code is good. Just tail-recurse it.

Change run code to use a config dict.

Add module to test for ring membership.

Add more check modules.

Add module to check for connected nodes.

Fixed ping riak to work more consistantly.

General code cleanup.

Moved log directories into the riaknostic.app.

Removed unneccassary filter code.

Updated rebar to the current version

Scriptized. Name and cookie can be passed in as parameters.

Added a README

Added type specs.

Added more output to nodes connected.

Added node to Config dict.

Added initial version of disk check.

Added noatime check for all mounted disks.

Removed flag to specify vm name because it's not needed.

Exrcised riaknostic node from connected node list.

Removed perl script.

Added dizzy's bitcask large value check.

Better output from nodes connected

Got rid of unnecessary sup.

More readable output.

Fixed incorrect application start callback return.

Memory use stats

Improved organization
Added util library

Added ability to output warnings and errors from riaknostic modules.

Added a gen_server for logging.

Improved logging with more logical strucutre

Using list:keyfind for OTP release per Sean's comment

Added conversion from binary to float
Fixed issue with higher memory usage check

vm.args can now be parsed in
A bit of cleanup

Integrated lager

Improved code organization
Moved from dicts to basic prop lists
Added all ebin directories in riak lib to path
Riaknostics are discovered via their run/1 methods

Added lots of command line configuration

Added sibling and vclock options to large value check

Added a guard to bitcask_threshold_check function

Key vals are binary_to_termed before printing.

Inserted tabs in readme - usage

Fixed broken lager:warning call.

Add license headers to all source files. Closes #3.

Upgrade rebar.

Add a Makefile, copied from lager.

Add check-module behavior according to plan.

Starting refactor of check modules.

Refactor memory use, add TODOs.

Refactor ring membership.

Added noatime check.

Getting the DataDir from riaknostic_config won't work, but if DataDir is set correctly, it is a valid noatime check.

Refactor nodes connected and fix some compilation bugs.

Refactor ring size check.

Fix typo/syntax error.

Rename disk check module.

Add ability to identify modules that are checks.

Update TODOs.

Refactor Joe's disk check module.

Add a little documentation to the private functions.

Add getopt, cleanup unused or antiquated modules.

Switching to use a global notion of the config, probably app env but TBD.

Do a little line-wrapping.

WIP riaknostic_config accessors.

Add top-level script with getopt and check descriptions.

Remove high-impact bitcask check.

Implement a huge swath of the runner, disk check works!

application:get_env/2 returns {ok, Value}.

Absolutize data directories.

Expose base_dir/0 and etc_dir/0.

Adjust crash dump detector to use base_dir().

Recognize -sname switch and distinguish between short and long names.

Added riaknostic_node module for interacting with the local/cluster nodes.

Fix a few bugs and enhance debugging information.

* Messages are properly sorted now.
* Match output of ps command properly.
* Add debug logging of node-connection logic.
* Improve detection of node connectivity.

Fix docs target, ignore generated docs.

Add edoc overview, initial stylesheet.

Finish up some styles and documentation, more detail on behaviour needed.

Add a more verbose description of the behaviour.

Make clear that this stylesheet is for edoc.

Initial version of the landing page.

Don't need to link to edoc, reflow some of those paragraphs.

Ignore parts of the gh-pages branch.

WIP make pages.

Add forkme ribbon.

Make sure to ignore root PNG files and add the new image to the pages.

Remove useless memsup info.

Added Dr. Basho. Thanks @jgnewman! Closes #13.

Build package tarball. Closes #12.

Update the README.

Minor wording correction, add missing docs to riaknostic_node.

Add a word of caution.

Solaris ps doesn't understand -o command and we don't use it anyway.

Forgot to stage this line.

Use -nocookie to prevent usage of the .erlang.cookie file. Closes #16.

Fedora installs Riak libraries to /usr/lib64. Closes #18

Setup dialyzer.

Fix dialyzer warnings.

Check for:
* Ring sizes not a multiple of 2
* Deployments where vnodes/node < 3% of ring size
* Deployments where vnodes/node > 70% of ring size

Change ring size inappropriate check from multiple of 2 to power of 2.

Leave out the ring size/vnode messages until we have a better
understanding of the relationship and can give better advice.

Fix a few mistakes.

Fix cluster_command

  1 Added check for ring preflists satisfying n_val

Check whether search is enabled on all nodes

v1.0.1

Update lager dependency

Changed regex split to string tokens. Fixes failure in Ubuntu

Fix xargs argument for Linux

eaccess -> eacces to catch the error correctly

Add can_connect_all to check if all nodes are available.

The reason for this, is that for search we're checking if search is enabled
or disabled on all nodes. If a node is down, this is not a valid test, and
errors out otherwise.

Check if connected first before running all connected

Travis CI config

Ignore .eunit folder

Add meck as a dependency

Initial eunit test for riaknostic_check_ring using meck

v1.0.2

Add Travis CI Build Status to README.md

some work on the docs re: 26 & 29

add lines for the autosaves of the one true editor

Added some reassuring output.

Just a few lines so that the runner of the command knows that riaknostic
is running and that it exited without error.

Update README.md

SmartOS has different paths from standard Solaris, using pkg_add package with current version 1.2

add basic sysctl checking

Removed freeBSD stuff

end of the day temp commit, code still kind of broken

move stuff to zip rather than os:cmd
added multiple platform support.
a couple of bugs/features:
 - we also need to be able to just grab
   a copy of a file
 - we need a list of tests for each platform
 - need cases for sunos and freebsd
 - fold in regular diagnostic messages (once I
   land the fix for #14).
 - there is a bug in shelling out, only some of the
   output is actually recorded.

another broken checkin, so I can work on something else

added the ability to copy out named files
changed the where the files were stored before
cleanup to CWD.

update to flesh out the export command a bit more.
still needs much testing, especially on smartos

clean up, fix some bugs, add directory-grabbing

added a (bad) first pass at machine-readable output

midstream checking to get back to work on export

added a (bad) first pass at machine-readable output

midstream checking to get back to work on export

Changed getopt version, one other fix

Fix default output

Added some comments and TODO's

Fix lager dependency version now that lager was updated

Fixate lager dependency on 1.2.1

Change dep on lager to 1.2.2 to match the rest of riak

Roll version riaknostic 1.1.0

Clarify that Riak 1.3 already has Riaknostic installed

removed misplaced parathesis

Add OpenBSD bits

Update lager dep to 2.0.0rc2

Lager to 2.0.0 final

Un-escriptize riaknostic and modify for lager 2.0 compatability

Add an extra log line for clarity when running non-existent checks

newline fix

Restore riaknostic output to console

When riaknostic became part of Riak instead of a separate app, its
output (through lager) ended up in the node's console.log instead of
being output by 'riak-admin diag'. Among other things, this broke the
riaknostic_rt riak test. This adds a layer on top of lager, so messages
can be directed to the console again, simply by using io:format. This
way, messages are sent to the group_leader instead of the user process,
which is what the lager backend does. When riaknostic is invoked through
RPC by riak-admin, the caller becomes the group leader and picks up
those messages. I wish there was a cleaner way to do this leveraging
something in lager, but I couldn't find any.

Roll riaknostic version 1.2.0

Pin meck dependency to a specific tag

Remove sysctl checks

Sysctl checks are now handled by the riak_kv_env module.

Standardize meck dep

Standardize on a rebar.config dep format to reduce conflicts

pull app.config and vm.args from init:get_arguments

added extra -vm_args to CONFIG_ARGS for easy access by erlang vm

Roll riaknostic 1.2.1 to pull in lager 2.0.1

Fix rebar.config url to stay consistent

Bump lager dep to 2.0.2

Bump lager dep to 2.0.3

- Adds a check for strong consistency configuration -- warning when
  strong consistency is enabled and AAE is disabled
  (defect basho/riak_kv#959)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants