Skip to content

Latest commit

 

History

History
194 lines (133 loc) · 7.79 KB

README.md

File metadata and controls

194 lines (133 loc) · 7.79 KB

Securing Hive with Ranger

GOAL - Demonstrate some cornerstone capabilities of configuring Hive access via Ranger

PREREQUISITE - Risk Analysis with Pig

SEE ALSO - This demo is based on these two publicly-available Hortonworks tutorials:

RECORDED DEMO

Streaming into HDFS

PRIOR DEMO CLEANUP - Cleanup

Use Cases

The Sandbox is provisioned with a user group called Marketing which is made up of the following individual user accounts; mktg1, mktg2 and mktg3. The following "prevent Marketing users from accessing ..." restrictive use cases will be explored in this demo.

Type Specifics
Table geolocation_stage and trucks_stage
Row Where geolocation.event is not equal to normal
Column events and totmiles from risk_factor

Environment Preparation

Create Hive View

As we will see later in this demo, a Hive View will be needed to fully implement the row-level security requirement. Logged into Ambari as maria_dev, run the following DDL.

CREATE VIEW geo_normal_event AS
    SELECT * FROM geolocation
     WHERE event = 'normal';

Ensure results are returned SELECT * FROM geo_normal_event LIMIT 100; is executed.

Create HDFS Home Directory

Unfortunately, the HDFS home directory for the Marketing users are not created. Using a simple user provisioning process, run the following steps to create a home directory for mktg (we will not be using the other accounts in this demo).

HW13005:~ lmartin$ ssh [email protected] -p 2222
[email protected]'s password: 
Last login: Fri Sep 22 14:59:22 2017 from 10.0.2.2
[root@sandbox ~]# su - hdfs
[hdfs@sandbox ~]$ hdfs dfs -mkdir /user/mktg1
[hdfs@sandbox ~]$ hdfs dfs -chown mktg1 /user/mktg1
[hdfs@sandbox ~]$ exit
logout
[root@sandbox ~]# 

Allow Marketing to use Ambari

The Marketing users are already created on the underlying Linux system, but they need to be allowed to log into Ambari so they can use the Hive View. Log into Ambari as admin and go to the admin pulldown in upper-right corner, select Manage Ambari > Users > Create Local User and then create accounts for mktg1, mktg2 and mktg3 (set password to password for all three) which should look like the following.

Create User

After creating the third user, click on the Views link in the left-side Views UI widget to see the list of views. Then toggle the Hive view and click on Hive View as highlighted below.

Hive View

Scoll down to the Permissions widget and add the three mktgN users as seen in the next screenshot.

Set Users

Disable Hive Global Access

As identified in the Sandbox splash page, open a browser on the Ranger UI at http://127.0.0.1:6080 and login with raj_ops / raj_ops credentials. Select the Sandbox_hive link and then click on Policy ID link associated with the "Hive Global Tables Allow" policy from the list.

Move the Policy Name's enabled slider selection to disabled and click on the Save button at the bottom.

alt text

Verify Preparation

At this point, the Marketing users should not be able see any tables. Log into Ambari as mktg1 and within the Hive View validate no tables are listed under the default Databases table list as well as show tables; returns an empty list. Additionally, a security error should be display when attempting to run a query.

alt text

Table-Level Restriction

GOAL: Prevent Marketing users from accessing geolocation_stage and trucks_stage tables

Since we are starting with zero access to Marketing, we will create a single security policy to allow the team to view all tables except the staging ones. Log into Ranger as admin and navigate to Sandbox_hive and then click Add New Policy to create a policy named "Default Mktg Access to Trucking" that is configured to allow access to all tables in the default database except for the "staging" ones.

alt text

NOTE: The configurable value for the policy refresh is 30 seconds, so keep trying until new rules are accessible via Hive.

Now, back in Ambari's Hive View, the mktg1 user can retrieve results for show tables; as well as see the list of tables in the Databases UI widget. More importantly, this user can now run queries on the tables shown in the list (ex: SELECT * FROM driver_mileage LIMIT 100;).

To validate the table-level restrictions are in place, attempt to run a query on one of the staging tables to verify a security error is presented.

alt text

Row-Level Restriction

GOAL: Prevent Marketing users from accessing rows from geolocation table when the event column has a value other than normal

Row-level security is not an intrinsic feature of Hive yet, but Ranger does offer row-level filtering. As this HCC discussion indicates, the common approach to this need is to create a Hive View that selects only the data that you want to grant/restrict access to and then secure the View, and the underlying table, accordingly.

At the beginning of the demo we created a geo_normal_event view that the Marketing team can access. Obviously, we did this ahead of time, but in a model where the requirement comes along after the structures are in place, one simply needs to create the view and then lock down access to the backing table.

To satisfy this requirement with our current system, simply edit the "Default Mktg Access to Trucking" policy in Ranger to include geolocation along with the staging tables being restricted.

alt text

To verify, ensure mktg1 can select from geo_normal_event, but not from geolocation.

alt text

Column-Level Restriction

GOAL: Prevent Marketing users from accessing the events and totmiles columns of the risk_factor table

To implement this requirement we can simply modify the same Ranger policy to exclude access to these columns (with this configuration, these column names would be restricted from any table).

alt text

To verify, log into Ambari as mktg1 and verify the following message surfaces when attempting to execute SELECT * FROM risk_factor LIMIT 100;.

alt text

Verify all works correctly when selecting only allowed columns.

SELECT driverid, riskfactor FROM risk_factor LIMIT 100;

Environment Reset

Return to the Hive policy list in Ranger and perform the following actions.

  • Disable the "Default Mktg Access to Trucking" policy
  • Enable the "Hive Global Tables Allow" policy