Home Ask Login Register

Developers Planet

Your answer is one click away!

Rams February 2016

MySql - Index optimization

We are having a Analytics product. For each of our customer we give one JavaScript code, they put that in their web sites. If a user visit our customer site the java script code hit our server so that we store this page visit on behalf of our customer. Each of our customer contains unique domain name that means customer determined by domain nam

Database server : MySql 5.6 Table rows : 400 million

Following is our table schema.

+---------------+------------------+------+-----+---------+----------------+
| Field         | Type             | Null | Key | Default | Extra          |
+---------------+------------------+------+-----+---------+----------------+
| id            | int(10) unsigned | NO   | PRI | NULL    | auto_increment |
| domain        | varchar(50)      | NO   | MUL | NULL    |                |
| guid          | binary(16)       | YES  |     | NULL    |                |
| sid           | binary(16)       | YES  |     | NULL    |                |
| url           | varchar(2500)    | YES  |     | NULL    |                |
| ip            | varbinary(16)    | YES  |     | NULL    |                |
| is_new        | tinyint(1)       | YES  |     | NULL    |                |
| ref           | varchar(2500)    | YES  |     | NULL    |                |
| user_agent    | varchar(255)     | YES  |     | NULL    |                |
| stats_time    | datetime         | YES  |     | NULL    |                |
| country       | char(2)          | YES  |     | NULL    |                |
| region        | char(3)          | YES  |     | NULL    |                |
| city          | varchar(80)      | YES  |     | NULL    |                |
| city_lat_long | varchar(50)      | YES  |     | NULL    |                |
| email         | varchar(100)     | YES  |     | NULL    |                |
+---------------+------------------+------+-----+---------+----------------+

In above table guid represents visitor of our custome

Answers


DRapp February 2016

Just for grins, run this to see what type of spread you have...

select 
      country, region, city, 
      DATE_FORMAT(colName, '%Y-%m-%d') DATEONLY, count(*)
   from 
      yourTable 
   group by
      country, region, city, 
      DATE_FORMAT(colName, '%Y-%m-%d')
   order by
      count(*) desc

and then see how many rows it returns. Also, what sort of range does the COUNT column generate. Instead of just an index, does it make sense to create a separate aggregation table on the key elements you are trying to provide with data mining.

If so, I would recommend looking at a similar post also on the stack here. This shows a SAMPLE on how, but I would first look at the counts before suggesting further. But if you have it broken down on a daily basis, what MIGHT this be reduced to.

Additionally, you might want to create pre-aggregate tables ONCE to get started, then have a nightly procedure that builds any new records based on a day just completed. This way it is never running through all 400M records.

If your pre-aggregate tables store based on just the date (y,m,d only), your queries rolled-up per day would shorten querying requirements. The COUNT(*) is just an example basis, but your could add count( distinct whateverColumn ) as needed. Then, you could query the SUM( aggregateColumn ) based on domain, date range, etc. If your 400M records gets reduced down to 7M records, I would also have a minimum index on the (domain, dateOnlyField, and maybe country) to optimize your domain, date-range queries. Once you get something narrowed down at whatever level make sense, you could always drill into the raw data for the granular level.

Post Status

Asked in February 2016
Viewed 3,633 times
Voted 11
Answered 1 times

Search




Leave an answer


Quote of the day: live life