minw,b∑i=1nlog(1+exp(−yifw,b(xi)))+λ∥w∥2
where the
xi and
yi are the feature vector and target vector for example
i from your training set. This function originates from the joint likelihood over all training examples, which explains its probabalistic nature even though we use it for classification. In the equation
w is your weight vector and
b your bias. I trust that you know what
fw,b(xi) is. The last term in the minimization problem is the regularization term, which, among other things, controls the generalization of the model.
Assuming all your x are normalized, for example by deviding by the magnitude of x, it is quite easy to see which variables are more important: those wich are larger c.f. the others or (on the negative side) smaller c.f. the others. They influence the loss the most.
If you are keen on finding the variables which really are important and in the process don't mind kicking a few out, you can ℓ1 regularize your loss function:
minw,b∑i=1nlog(1+exp(−yifw,b(xi)))+λ|w|
The derivatives or the regularizer are quite straightforward, so I will not mention them here. Using this form of regularization and an appropriate λ will enforce the less important elements in w to become zero and the others not.
I hope this helps. Ask if you have any further questions.