Skip to content

Latest commit

 

History

History
288 lines (228 loc) · 12.6 KB

1.security.md

File metadata and controls

288 lines (228 loc) · 12.6 KB

  • Perimeter
    • Strong authentication
    • Network isolation, edge nodes
    • Firewalls, iptables
  • Access
    • Authorization controls
    • Granular access to HDFS files, Hive/Impala objects
  • Data
    • Encryption-at-rest
    • Encryption-in-transit
      • Transport Layer Security (TLS)
  • Visibility
    • Auditing data practices without exposing content
    • Separation of concerns: storage management vs. data stewardship

  • "Hadoop in Secure Mode" lists four areas of authentication concern. All of them depend on Kerberos, directly or indirectly

    • Users
    • Hadoop services
    • Web consoles
    • Data confidentiality
  • Linux supports MIT Kerberos

  • "Hadoop in Secure Mode" relies on Kerberos

    • Data encryption services available out of the box
      • RPC (SASL QOP "quality-of-protection")
    • Browser authentication supported by HTTP SPNEGO
  • LDAP/Active Directory integration

    • Applying existing user databases to Hadoop cluster is a common ask
  • ELI5: Kerberos: Great introduction / refresher to Kerberos concepts.


Active Directory Integration

  • Cloudera recommends Direct-to-AD integration as preferred practice.
  • The alternative is a one-way cross-realm trust to AD
    • Requires MIT Kerberos realm in Hadoop cluster
    • Avoids adding service principals to AD
  • Common sticking points
    • Admin reluctance(!)
    • Version / feature incompatibility
    • Misremembered details
    • Other settings that "shouldn't be a problem"

Common Direct-to-AD Issues

  • /etc/krb5.conf doesn't authenticate to KDC

    • Test with kinit AD_user
  • Required encryption type isn't allowed by JDK

    • Install Unlimited Policy files
  • Supported encryption types are disjoint

    • Check AD "functional level"
  • To trace Kerberos & Hadoop

    • export KRB5_TRACE=/dev/stderr
    • Include -Dsun.security.krb5.debug=true in HADOOP_OPTS (& export it )
    • export HADOOP_ROOT_LOGGER="DEBUG,console"

  • HDFS permissions & ACLs
    • Need principal definitions beyond user-group-world
    • Relief from edge cases and implications of hierarchical data
    • Can provide permissions for a restricted list of users and groups
  • Apache Sentry (incubating)
    • Database servers need files for storage, managed by admins
    • Authorizations needed for database objects may be disjoint

  • Plain HDFS permissions are largely POSIX-ish
    • Execution bit doesn't work except as a sticky bit
    • Applied to simple or Kerberos credentials
      • The NameNode process owner is the HDFS superuser
  • POSIX-style ACLs also supported
    • Disabled by default (dfs.namenode.acls.enabled)
    • Additional permissions for named users, groups, other, and the mask
      • chmod operates on mask filter -> effective permissions
    • Best used to refine, not replace, file permissions
      • Some overhead to store/process them

  • Originally a Cloudera project, now Apache incubating
    • Some useful docs are not yet migrated to ASF
  • Supports authorization for database objects
    • Objects: server, database, table, view, URI
    • Authorizations: SELECT, INSERT, ALL
  • A Sentry policy is defined by mapping a role to a privilege
    • A group (LDAP or Linux) is then assigned to an Sentry role
    • Users can be added or removed from the group as necessary
  • Supports Hive (through HiveServer2), Impala and Search (Solr) out of the box
  • Sentry policy is defined by mappings
    • Local/LDAP groups -> Sentry roles
    • Sentry roles -> database object, privileges

Sentry Design

Graphic overview


Sentry Design Notes

  • Each service has to bind to a policy engine
    • Currently impalad and HiveServer2 have hooks
    • Cloudera Search integration is a workaround
  • Service Provider interfaces for persisting policies to a store
    • Support for file storage to HDFS or local filesystem
  • The policy engine grants/revokes access
    • Rules applied to user, the objects requested and the necessary permission
  • Sentry / HDFS Synchronization
    • Automatically adds ACLs to match permission grants in Sentry
  • A fully-formed config example is here
  • You can watch a short video overview here

Sentry and HiveServer2


  • Relational model and storage
  • Introduced in C5.1
  • Uses a database to store policies
  • CDH supports migrating file-based authorization
    • sentry --command config-tool --policyIni policy_file --import
  • Impala & Hive must use the same provider (db or file)
  • Cloudera Search can only use the file provider

Network ("in-flight") encryption

  • For communication between web services (HTTPS)
    • Digital certificates, private key stores
  • HDFS Block data transfer
    • dfs.encrypt.data.transfer (very slow - not recommended for now)
  • RPC support already in place
  • Support includes MR shuffling, Web UI, HDFS data and fsimage transfers

At-rest encryption

  • Encryption/decryption that is transparent to Hadoop applications
  • Need: Key-based protection
  • Need: Minimal performance cost
    • AES-NI on recent Intel CPUs.
  • Navigator Encrypt
    • Block device encryption at OS level
  • HDFS Transparent Data Encryption
    • Encryption Zones
    • Key Management Server (KMS)
  • Key Trustee
    • Cloudera's enterprise-grade keystore

Other requirements

  • Tokenization
  • Data masking
    • Leverage partners for this (Protegrity, Dataguise etc)

  • Provided by Cloudera Navigator
  • See who has accessed resources (filesystem, databases, log of queries run)
  • Custom reports
    • e.g. show all failed access attempts
  • Redaction of sensitive information
    • Separation of duties


Security Lab

Integrating Kerberos with Cloudera Manager

  • Plan one: follow the documentation here
  • Plan two: Launch the Kerberos wizard and complete the checklist.
  • Set up an MIT KDC
  • Create a Linux account with your GitHub name
  • Once your integration succeeds, add these files to your security/ folder:
    • /etc/krb5.conf
    • /var/kerberos/krb5kdc/kdc.conf
    • /var/kerberos/krb5kdc/kadm5.acl
  • Create a file kinit.md that includes:
    • The kinit command you use to authenticate your user
    • The output from klist showing your credentials
  • Create a file cm_creds.png that shows the principals CM generated

Optional challenge - Test-driven setup

  • There's a lot work in this lab. If you choose to do it, be sure to:
  • Ignore the steps to set up CDH 5 (already done)
  • Test client connectivity with JDBC
  • Set up and integrate an Active Directory instance
  • Test with a secured client connection
  • Enable Kerberos
  • Add a Sentry configuration to the mix
  • Test client connection again

If you're comfortable with AD, this may take an hour. If not, maybe 2-3 hours. Let your instructors know if you want to attempt this lab.


Security Lab (Choose A or B)

Complete one of the following labs: