To be a physicist, you need a great deal of mathematical intuition. It's more useful to be able to intuit whether something is true (and the conditions under which it holds) based on some rough sketch of a method or some toy model than know the formal proof, which often solves the problem in a direction that isn't very helpful in making constructive use of a theorem. The reason being the primary skill of a scientist is to know what hunches are worth turning into research projects quickly; if you need a theorem later the proof can be found in a textbook, journal, or farmed out to the math department. To obtain this intuition it's sometimes useful to put some time into learning the mathematics, since if you don't know anything you obviously will have very bad educated guesses, but knowing the proofs is in no way essential. I'd shy away from heavy mathematical treatments in favor of model-based treatments, like those found in mathematical methods texts for the physical sciences. To be clear though, this is my own opinion, there are lots of physicists who think knowing the proofs are really important to them, though I'd posit that when they're using them they're really being mathematicians as well as physicists.
An example of what I mean here I always give is the idea of completeness of a basis in spectral theory. Almost all analytical physics rests on Fourier analysis--any function can be broken into a sum of sines and cosines of different wavelengths of varying amplitudes (fourier coefficients). The proof that you only need sines and cosines (and a rigorous accounting of the conditions) requires at least half a year of graduate-level analysis, a re-definition of the integral, introduction of abstract topological spaces, Sturm-Louisville theory, and Green's functions, but all a physicist need know is this:
We know it works for many functions, since we've tried it many times and get plots that look exactly the same. Let's thing about the general case. Large wavelengths capture large-scale fluctuations. Small wavelengths capture small-scale fluctuations. If we first fit the general shape of a function using large wavelengths, we should be able to correct for local inconguencies by using small wavelengths, regardless of the shape of the function (we know from example it works for functions that don't look even close to sine waves, so there's no reason to suspect this wouldn't be true). There might be an issue if the function changes its value too quickly--that is, has a large slope. Large slopes mean a large derivative, so we probably need a condition on the derivative. Since we can go as small as we want on the wavelengths, that condition is probably only that the derivative doesn't become infinite--in other words, that the derivative exists at every point. If the derivative doesn't exist somewhere, we're probably ok unless we get very close, since everywhere else around that point the function behaves totally fine. Since again we can go as small as we want on wavelengths, it seems reasonable to suppose that we can trust the fourier transform except exactly at the point where the derivative becomes singular--that is, if we cut off our function before the singular derivative, we should be totally fine.
That argument is piss-poor, and any mathematician who reads it will be outraged (rightfully, if I were calling it a proof), but it gives motivation to, and intuition for, the methods of Fourier analysis and the main conditions upon the fourier transform existing, which is all you really need to use it.