Overview
- create JSON formatted log from Rails
- send log using fluent-agent-lite
- receive log using Fluentd
- save on local file & MongoDB
- map & reduce on MongoDB
@さくらのVPS 2G with Debian GNU/Linux testing (wheezy)
install Ruby & Rails
rbenv使って入れてみる
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
$ git clone http://gitweb.hidepiy.com/git/dotfiles.git $ cd ~ $ /bin/bash ~/dotfiles/init.sh $ vim hogehoge :BundleInstall しばし待つ。。。 $ zsh $ sudo aptitude install curl g++ make $ sudo aptitude install zlib1g-dev libssl-dev libreadline-dev libyaml-dev libxml2-dev libxslt-dev libsqlite3-dev $ sudo aptitude install rbenv $ echo 'eval "$(rbenv init -)"' >> ~/.zshrc $ git clone git://github.com/sstephenson/ruby-build.git $ cd ruby-build $ sudo ./install.sh ここでターミナルに入りなおす $ rbenv install 1.9.3-p194 $ rbenv rehash $ rbenv global 1.9.3-p194 $ rbenv version 1.9.3-p194 (set by /home/hidepiy/.rbenv/version) $ ruby -v ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-linux] $ gem install rails -v 3.2.6 $ rbenv rehash $ rails -v Rails 3.2.6 |
RailsにJSON形式のログ吐かせる
ちょこっとアプリ
1 2 3 |
$ rails new LOG_APP --skip-active-record --skip-test-unit $ cd LOG_APP $ rails generate controller Log lgr |
1 2 3 4 5 6 |
require 'pv_logger' class LogController < ApplicationController def lgr PV_LOGGER.write_pv_log(rand(1000),rand(100)) end end |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
class PvLogger < Logger def write_pv_log(coordinate_id, shop_user_id) pvLog = {'coordinateId' => coordinate_id, 'shopUserId' => shop_user_id, 'timeStamp' => Time.now.to_s(:db)} PV_LOGGER.info(pvLog.to_json) end def format_message(severity, timestamp, progname, msg) "#{msg}\n" end end logfile = File.open(File.join(Rails.root, "log", "pv.log"), "a") logfile.sync = true PV_LOGGER = PvLogger.new(logfile, 'daily') |
Unicorn(と何か)を入れる
1 2 3 4 5 6 |
$ cat << GEM >> Gemfile $ gem 'execjs' $ gem 'therubyracer' $ gem 'unicorn' $ GEM $ bundle install |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 |
worker_processes 2 working_directory "/home/hidepiy/workfield/LOG_APP/" listen '/tmp/LOG_APP_unicorn.sock' pid '/tmp/LOG_APP_unicorn.pid' log = '/var/log/rails/LOG_APP_unicorn.log' stdout_path /home/hidepiy/workfield/LOG_APP/log/unicorn.stdout.log stderr_path /home/hidepiy/workfield/LOG_APP/log/unicorn.stderr.log preload_app true before_fork do |server, worker| defined?(ActiveRecord::Base) and ActiveRecord::Base.connection.disconnect! old_pid = "#{ server.config[:pid] }.oldbin" unless old_pid == server.pid begin Process.kill :QUIT, File.read(old_pid).to_i rescue Errno::ENOENT, Errno::ESRCH end end end after_fork do |server, worker| defined?(ActiveRecord::Base) and ActiveRecord::Base.establish_connection end |
起動してたたく host:5000/log/lgr
1 |
$ unicorn_rails -c config/unicorn.rb -p 5000 |
ログが吐かれてる
1 2 3 4 5 |
{"coordinateId":347,"shopUserId":18,"timeStamp":"2012-10-26 21:59:13"} {"coordinateId":882,"shopUserId":96,"timeStamp":"2012-10-26 21:59:14"} {"coordinateId":412,"shopUserId":20,"timeStamp":"2012-10-26 21:59:14"} {"coordinateId":185,"shopUserId":9,"timeStamp":"2012-10-26 21:59:14"} {"coordinateId":549,"shopUserId":93,"timeStamp":"2012-10-26 21:59:14"} |
Fluent達を勃てる
まずは受け取る側(Fluentd)勃てる
1 2 3 4 5 6 |
$ cd $ gem install fluentd $ rbenv rehash $ mkdir -vp ~/fluent $ fluentd --setup ~/fluent Installed /home/hidepiy/fluent/fluent.conf. |
てきとに設定
1 2 3 4 5 6 7 8 9 10 11 12 |
<source> type forward port 24224 </source> ... <match pv> type file path /home/hidepiy/fluent/log/pv_log time_slice_format %Y%m%d time_slice_wait 5M time_format %Y%m%d </match> |
起動
1 |
$ fluentd -d ~/fluent/fluentd.pid -l ~/fluent/log/fluentd.log -c ~/fluent/fluent.conf |
続いて送る側(fluent-agent-lite)勃てる
Railsが吐いてるのがJSONなんで、おとなしく送る側もFluentdでいい気もするけど、
とりあえず使ってみようということで
1 2 3 4 |
$ cd $ git clone http://github.com/tagomoris/fluent-agent-lite $ sudo ./fluent-agent-lite/bin/install.sh $ sudo cp -vip /etc/fluent-agent-lite.conf /usr/local/fluent-agent-lite/fluent-agent-lite.conf.ori |
1 2 3 4 5 6 7 8 9 10 |
TAG_PREFIX="" LOGS=$(cat <<"EOF" pv /home/hidepiy/workfield/LOG_APP/log/pv.log EOF ) PRIMARY_SERVER="localhost:24224" LOG_PATH=/var/log/fluent-agent/fluent-agent.log |
おりゃっ
1 |
$ sudo /etc/init.d/fluent-agent-lite start |
で、送信先のログ(Fluentdが吐くヤツ)を確認
1 2 3 4 5 |
20121026>-pv>-{"message":"{\"coordinateId\":347,\"shopUserId\":18,\"timeStamp\":\"2012-10-26 21:59:13\"}"} 20121026>-pv>-{"message":"{\"coordinateId\":882,\"shopUserId\":96,\"timeStamp\":\"2012-10-26 21:59:14\"}"} 20121026>-pv>-{"message":"{\"coordinateId\":412,\"shopUserId\":20,\"timeStamp\":\"2012-10-26 21:59:14\"}"} 20121026>-pv>-{"message":"{\"coordinateId\":185,\"shopUserId\":9,\"timeStamp\":\"2012-10-26 21:59:14\"}"} 20121026>-pv>-{"message":"{\"coordinateId\":549,\"shopUserId\":93,\"timeStamp\":\"2012-10-26 21:59:14\"}"} |
う~む、いちおう来てるけど、なんか惜しいゲロ
ので、ググッてでてきたplugin挿れる
fluent-plugin-file-alternative : https://github.com/tagomoris/fluent-plugin-file-alternative
スコ
1 2 |
$ fluent-gem install fluent-plugin-file-alternative $ fluent-gem list |
設定しなおし
1 2 3 4 5 6 7 8 9 10 11 12 |
... <match pv> type file_alternative path /var/log/fluent/pv_log time_slice_format %Y%m%d time_slice_wait 5M time_format %Y%m%d output_include_time false output_include_tag false output_data_type attr:message add_newline true </match> |
イイ感じ
1 2 3 4 5 |
{"coordinateId":347,"shopUserId":18,"timeStamp":"2012-10-26 21:59:13"} {"coordinateId":882,"shopUserId":96,"timeStamp":"2012-10-26 21:59:14"} {"coordinateId":412,"shopUserId":20,"timeStamp":"2012-10-26 21:59:14"} {"coordinateId":185,"shopUserId":9,"timeStamp":"2012-10-26 21:59:14"} {"coordinateId":549,"shopUserId":93,"timeStamp":"2012-10-26 21:59:14"} |
MongoDBに挿れる
Mongo用意
1 2 3 4 |
$ cd /usr/local/src $ sudo wget http://fastdl.mongodb.org/linux/mongodb-linux-x86_64-2.2.0.tgz $ sudo tar xzvf mongodb-linux-x86_64-2.2.0.tgz $ sudo ln -s mongodb-linux-x86_64-2.2.0 mongodb |
1 2 |
logpath=/usr/local/src/mongodb/log/mongod.log dbpath=/usr/local/src/mongodb/data/db |
起動
1 2 |
$ sudo mkdir -vp mongodb/{data/db,log} $ sudo /usr/local/src/mongodb/bin/mongod --config /usr/local/src/mongodb/mongo.conf |
Fluentdにpluginいれる
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
こいつで探して。。。 $ fluent-gem search -rd fluent-plugin スコスコスコ $ fluent-gem install fluent-plugin-mongo $ fluent-gem install bson_ext $ fluent-gem install fluent-plugin-parser $ fluent-gem list *** LOCAL GEMS *** bson (1.7.0, 1.6.4) bson_ext (1.7.0) fluent-mixin-plaintextformatter (0.1.0) fluent-plugin-file-alternative (0.1.3) fluent-plugin-mongo (0.6.10) fluent-plugin-parser (0.1.2) fluentd (0.10.27) mongo (1.6.4) msgpack (0.4.7) unicorn (4.4.0) ... |
設定しなおし
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 |
... <match pv> type copy <store> type file_alternative path /home/hidepiy/fluent/log/pv_log time_slice_format %Y%m%d time_slice_wait 5M time_format %Y%m%d output_include_time false output_include_tag false output_data_type attr:message add_newline true </store> <store> type parser add_prefix parsed format json key_name message </store> </match> <match parsed.pv> type mongo host localhost database fluent collection pv </match> |
Fluentd再起動後Mongo確認
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
$ /usr/local/src/mongodb/bin/mongo MongoDB shell version: 2.2.0 connecting to: test > show dbs admin (empty) fluent 0.203125GB local (empty) > use fluent switched to db fluent > show collections pv system.indexes > db.pv.find() { "_id" : ObjectId("5093c3d9e4712732d3000479"), "coordinateId" : 347, "shopUserId" : 18, "timeStamp" : "2012-11-02 21:59:13", "time" : ISODate("2012-11-02T12:59:13Z") } { "_id" : ObjectId("5093c3d9e4712732d300047a"), "coordinateId" : 882, "shopUserId" : 96, "timeStamp" : "2012-11-02 21:59:14", "time" : ISODate("2012-11-02T12:59:14Z") } { "_id" : ObjectId("5093c3d9e4712732d300047b"), "coordinateId" : 412, "shopUserId" : 20, "timeStamp" : "2012-11-02 21:59:14", "time" : ISODate("2012-11-02T12:59:14Z") } { "_id" : ObjectId("5093c3d9e4712732d300047c"), "coordinateId" : 185, "shopUserId" : 9, "timeStamp" : "2012-11-02 21:59:14", "time" : ISODate("2012-11-02T12:59:14Z") } { "_id" : ObjectId("5093c3d9e4712732d300047d"), "coordinateId" : 549, "shopUserId" : 93, "timeStamp" : "2012-11-02 21:59:14", "time" : ISODate("2012-11-02T12:59:14Z") } |
Mongoでmap&reduceして集計
ログたまるの待ってらんないので、mongoimportで水増しする
mongoimport : http://docs.mongodb.org/manual/reference/mongoimport/
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
ログファイル作って $ wc -l /home/hidepiy/workfield/pv_log.20121030.huge 4385869 /home/hidepiy/workfield/pv_log.20121030.huge 起動中のMongoDBにimport $ sudo /usr/local/src/mongodb/bin/mongoimport --db pvimport --collection huge --file /home/hidepiy/workfield/pv_log.20121030.huge connected to: 127.0.0.1 Thu Nov 1 23:16:42 4092727/310556318 1% Thu Nov 1 23:16:42 57800 19266/second ... Thu Nov 1 23:17:06 32727643/310556318 10% Thu Nov 1 23:17:06 462200 17118/second ... Thu Nov 1 23:17:30 63260219/310556318 20% Thu Nov 1 23:17:30 893400 17517/second ... Thu Nov 1 23:17:57 96249843/310556318 30% Thu Nov 1 23:17:57 1359300 17426/second ... Thu Nov 1 23:19:39 220412351/310556318 70% Thu Nov 1 23:19:39 3112800 17293/second ... Thu Nov 1 23:20:03 251348540/310556318 80% Thu Nov 1 23:20:03 3549700 17400/second ... Thu Nov 1 23:20:27 282348446/310556318 90% Thu Nov 1 23:20:27 3987500 17489/second ... Thu Nov 1 23:20:48 309213155/310556318 99% Thu Nov 1 23:20:48 4366900 17537/second Thu Nov 1 23:20:49 imported 4385869 objects |
けっこかかるな。。。(1M records/min)
なんとなくMySQLのloaddataのほうが早そう
さくらのVPS 2Gでload aveが2、CPU50%ぐらい
mongoimportのドキュメント見ると、
-> “…the –dbpath option enables mongoimport to attach directly to local data files and insert the data without the mongod”
ってことなんで試してみる
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 |
さっきの消す(上書きされるから不要ってウワサもあるけど。。。) > db.huge.count() 4385869 > db.huge.drop() true > db.huge.count() 0 > exit bye えいやっ $ sudo /usr/local/src/mongodb/bin/mongoimport --dbpath /usr/local/src/mongodb/data/db/ --db pvimport --collection huge --file ~/workfield/pv_log.20121030.huge If you are running a mongod on the same path you should connect to that instead of direct data file access Thu Nov 1 23:01:23 dbexit: Thu Nov 1 23:01:23 [tools] shutdown: going to close listening sockets... Thu Nov 1 23:01:23 [tools] shutdown: going to flush diaglog... Thu Nov 1 23:01:23 [tools] shutdown: going to close sockets... Thu Nov 1 23:01:23 [tools] shutdown: waiting for fs preallocator... Thu Nov 1 23:01:23 [tools] shutdown: closing all files... Thu Nov 1 23:01:23 [tools] closeAllFiles() finished Thu Nov 1 23:01:23 dbexit: really exiting now |
おぉ、ちょっぱや~っと思ったけど、失敗しただけか
指定したdbpathが、既に起動中のMongoDBと一緒だかららしぃので、いったんMongoDB落としてから再実行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 |
$ sudo /usr/local/src/mongodb/bin/mongoimport --dbpath /usr/local/src/mongodb/data/db/ --db pvimport --collection huge --file ~/workfield/pv_log.20121030.huge Thu Nov 1 23:29:06 [tools] build index pvimport.huge { _id: 1 } Thu Nov 1 23:29:07 [tools] build index done. scanned 0 total records. 0.024 secs Thu Nov 1 23:29:09 [tools] 2676553/310556318 0% Thu Nov 1 23:29:09 [tools] 37800 12600/second ... Thu Nov 1 23:32:33 [tools] 307216353/310556318 98% Thu Nov 1 23:32:33 [tools] 4338700 20959/second Thu Nov 1 23:32:35 [tools] imported 4385869 objects Thu Nov 1 23:32:35 dbexit: Thu Nov 1 23:32:35 [tools] shutdown: going to close listening sockets... Thu Nov 1 23:32:35 [tools] shutdown: going to flush diaglog... Thu Nov 1 23:32:35 [tools] shutdown: going to close sockets... Thu Nov 1 23:32:35 [tools] shutdown: waiting for fs preallocator... Thu Nov 1 23:32:35 [tools] shutdown: closing all files... Thu Nov 1 23:32:35 [tools] closeAllFiles() finished Thu Nov 1 23:32:35 [tools] shutdown: removing fs lock... Thu Nov 1 23:32:35 dbexit: really exiting now |
ちょっぴり早くなったのかも
load aveとCPUはさっきの半分ぐらい
いよいよ集計
↓を参考にMapとReduce定義する
MapReduce : http://www.mongodb.org/display/DOCS/MapReduce
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
var map = function() { emit(this.coordinateId + ":" + this.shopUserId, 1) } var reduce = function(key, values) { var count = 0; values.forEach(function(value) { count += value; }); return count; } var outputCollection = 'pvSummary' var mongo = new Mongo('localhost'); var db = mongo.getDB('pvimport'); var res = db.huge.mapReduce(map, reduce, {out: outputCollection}); shellPrint(res); |
実行
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
$ sudo /usr/local/src/mongodb/bin/mongo /usr/local/src/mongodb/tool/pv.mr.js MongoDB shell version: 2.2.0 connecting to: test { "result" : "pvSummary", "timeMillis" : 128223, "counts" : { "input" : 4385869, "emit" : 4385869, "reduce" : 448920, "output" : 1044 }, "ok" : 1, } |
これまたけっこかかるな。。。
まぁなんも設定せずただ生で挿入しただけだしな
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
Fri Nov 2 00:10:35 [conn9] CMD: drop pvimport.tmp.mr.huge_2_inc Fri Nov 2 00:10:35 [conn9] build index pvimport.tmp.mr.huge_2_inc { 0: 1 } Fri Nov 2 00:10:35 [conn9] build index done. scanned 0 total records. 0 secs Fri Nov 2 00:10:35 [conn9] CMD: drop pvimport.tmp.mr.huge_2 Fri Nov 2 00:10:35 [conn9] build index pvimport.tmp.mr.huge_2 { _id: 1 } Fri Nov 2 00:10:35 [conn9] build index done. scanned 0 total records. 0 secs Fri Nov 2 00:10:38 [conn9] >->-65200/4385869>1% Fri Nov 2 00:10:41 [conn9] >->-154000/4385869>-3% Fri Nov 2 00:12:41 [conn9] 4275700/4385869 97% Fri Nov 2 00:12:43 [conn9] CMD: drop pvimport.pvSummary Fri Nov 2 00:12:43 [conn9] CMD: drop pvimport.tmp.mr.huge_2 Fri Nov 2 00:12:43 [conn9] CMD: drop pvimport.tmp.mr.huge_2 Fri Nov 2 00:12:43 [conn9] CMD: drop pvimport.tmp.mr.huge_2_inc Fri Nov 2 00:12:43 [conn9] command pvimport.$cmd command: { mapreduce: "huge", map: function () { emit(this.coordinateId + ":" + this.shopUserId, 1); }, reduce: function (key, values) { var count = 0; values.forEach(functio..., out: "pvSummary" } ntoreturn:1 keyUpdates:0 numYields: 44903 locks(micros) W:2925 r:225397660 w:48631 reslen:133 128225ms |
確認
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
/usr/local/src/mongodb/bin/mongo MongoDB shell version: 2.2.0 connecting to: test > use pvimport switched to db pvimport > db.pvSummary.count() 1044 > db.pvSummary.find() { "_id" : "101:11", "value" : 4181 } { "_id" : "101:63", "value" : 4383 } { "_id" : "103:81", "value" : 3986 } { "_id" : "104:53", "value" : 4254 } { "_id" : "105:67", "value" : 4327 } { "_id" : "105:82", "value" : 3425 } { "_id" : "107:97", "value" : 2527 } ... |
ふぅ