I have a spark-streaming service, where I am processing and detecting anomalies on the basis of some offline generated model. I feed data into this service from a log file, which is streamed using the following command
tail -f <logfile>| nc -lk 9999
Here the spark streaming service is taking data from port 9999. However, I observe that the last few lines are being dropped, i.e. spark streaming does not receive those log lines or they are not processed.
However, I also observed that if I simply take the logfile as standard input instead of tailing it, no lines are dropped:
nc -q 10 -lk 9999 < logfile
Can anyone explain why this behavior is happening? And what could be a better resolution to the problem of streaming log data to spark streaming instance?