(The question is stale, but the issue is not)
Personally, I think your intuition makes some sense. That is to say, if you don't need the mathematical tidiness of conjugacy, then whatever distribution you would use for a location parameter, you should use the same one for the log of a scale parameter. So, what you're saying is: use the equivalent of a normal prior.
Would you actually use a normal prior for a location parameter? Most people would say that, unless you make the variance huge, that's probably a bit "too dogmatic", for reasons explained in the other answers here (unbounded influence). An exception would be if you're doing empirical bayes; that is, using your data to estimate the parameters of your prior.
If you want to be "weakly informative", you'd probably choose a distribution with fatter tails; the obvious candidates are t distributions. Gelman's latest advice seems to be to use a t with df of 3-7. (Note that the link also supports my suggestion that you want to do the same thing for log of scale that you would do for location) So instead of a lognormal, you could use a log-student-t. To accomplish this in stan, you might do something like:
real log_sigma_y; //declare at the top of your model block
//...some more code for your model
log_sigma_y <- log(sigma_y); increment_log_prob(-log_sigma_y);
log_sigma_y ~ student_t(3,1,3); //This is a 'weakly informative prior'.
However, I think that if the code above is too complex for you, you could probably get away with a lognormal prior, with two caveats. First, make the variance of that prior a few times wider than your rough guess of how "unsure you are"; you want a weakly informative prior, not a strongly informative one. And second, once you fit your model, check the posterior median of the parameter, and make sure the log of it is not too far from the center of the lognormal. "Not too far" probably means: less than two standard deviations, and preferably not much more than one SD.