hadoop - Hive(Bigdata)- difference between bucketing and indexing -


what main difference between bucketing , indexing of table in hive?

the main difference goal:

  • indexing

the goal of hive indexing improve speed of query lookup on columns of table. without index, queries predicates 'where tab1.col1 = 10' load entire table or partition , process rows. if index exists col1, portion of file needs loaded , processed.

indexes become more essential when tables grow extremely large, , undoubtedly know, hive thrives on large tables.

  • bucketing

it used join operations, because can optimize joins bucketing records specific 'key' or 'id'. in way, when want join operation, records same 'key' in same bucket , join operation faster. can see technique decomposing data sets more manageable parts. link gives 5 tips efficient hive queries , 1 of them bucketing.


Comments

Popular posts from this blog

facebook - android ACTION_SEND to share with specific application only -

python - Creating a new virtualenv gives a permissions error -

javascript - cocos2d-js draw circle not instantly -