Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consolidated commit of column selective changes #25

Open
wants to merge 3 commits into
base: addComments
Choose a base branch
from

Conversation

svemuri
Copy link

@svemuri svemuri commented Sep 26, 2012

This is a consolidated commit of column selective deserialization changes.
In case you have looked at the code changes before, here is a brief description of the differences from that version

  1.   A unit test which exercises the optimization (TestColSelectiveSerde.java)
    
  2.   A Hive level functional test for query variants including JOINS (colselectivetest.q). I have tested it out on Hive8.
    
  3.  Created a.gitignore file to let git ignore directories like target. dist as candidates for files to be checked in.
    
    1.  In terms of the actual code changes,  a table property (haivvreo.colselective) can be used to turn off the optimization by setting it to FALSE. it defaults to TRUE. This property can be configured using the TBLPROPERTIES feature of Hive and I have tested it out as well.  
      
    2. An existing property "hive.io.file.readcolumn.ids" is used to drive the optimization i.e identify the set of columns requested.
    3. Since the code which walks the columns in AvroDeserializer requires that the columns are sorted, and do not contain duplicates, generateColArray method in AvroSerDe.java ensures that this condition is satisfied even if the source property supplied by Hive does not confirm to this condition.
    4. There are timers which track deserialization time and number of records optimized which serve to verify if it took the optimized code path. These messages are printed using LOG.info(). I have kept them because I have found them to be useful during testing or when running performance experiments.

Srinivas Vemuri added 3 commits September 27, 2012 19:44
consolidated commit of column selective deserialization changes

incorporate comments

incorporate comments

remove separate test file for column selective optimization
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant