google app engine - Designing an API on top of BigQuery -
i have appengine app tracks user various sorts of impression data across several websites. we're gathering 40 million records month , main bigquery table closing in on 15gb in size after 6 weeks of gathering data , our estimates show within 6 more weeks, gathering on 100 million records month. relatively small dataset in terms of bigdata, potential grow quite bit quite fast.
now faced successful trial need work on api sits on top of bigquery allows analyze data , deliver results dashboard provided us.
my concern here of data being analyzed customer spans few days @ (per request) , since bigquery queries in fact full table scans, api may in time become slower respond table grows in size , bq needs process more data in order return results.
my question therefore this. should shard bigquery log tables, instance month or week, in order reduce data needs processing, or "wiser" pre-process data , store results in ndb datastore? result in blazingly fast api, requires pre-process everything, things customers may never need.
or perhaps optimizing prematurely?
based on experience analyzing performance of similar projects in bigquery. if concerned performance only, don't have change anything. bigquery's optimizer can figure out many things, , if query uses against few days - performance good. billing point of view, paying more , more data grows, in order save money - wise shard data month or week. table_range still able query data if need it, don't lose functionality.
Comments
Post a Comment