Long story short for part 1 of your queries:
http://web.mit.edu/6.02/www/s2012/handouts/14.pdf
See section 14.1.1
To directly answer your question about "voices directly through the antenna" the answer is no. Though no expert on information related to section 2 of your query, basically the sequence of events is related to receiving the signal via the antenna, processing the received signal via demodulation, converted into digital format that your computer can process, and then converts that digital information into a format that humans can decipher via various forms of software interfaces (one of which is the software interface that will display text format to a hearing impaired user or a video that utilizes sub-title format to display necessary language translation). This very basic attempt of mine to explain the key concepts involved in the processes you are enquiring about barely scratches the surface (to put it mildly) of the information needed to understand the whole process. :-)