hadoop - Hive(Bigdata)- difference between bucketing and indexing -
what main difference between bucketing , indexing of table in hive?
the main difference goal:
- indexing
the goal of hive indexing improve speed of query lookup on columns of table. without index, queries predicates 'where tab1.col1 = 10' load entire table or partition , process rows. if index exists col1, portion of file needs loaded , processed.
indexes become more essential when tables grow extremely large, , undoubtedly know, hive thrives on large tables.
- bucketing
it used join operations, because can optimize joins bucketing records specific 'key' or 'id'. in way, when want join operation, records same 'key' in same bucket , join operation faster. can see technique decomposing data sets more manageable parts. link gives 5 tips efficient hive queries , 1 of them bucketing.
Comments
Post a Comment