We had recently started getting lots of error messages similar to the following on our TFS Build Servers:
Exception Message: The build failed because the build server that hosts build agent TFS-BuildController001 - Agent4 lost communication with Team Foundation Server. (type FaultException`1)
This error message would appear randomly; some builds would pass, others would fail, and when they did fail with this error message it was often at different parts in the build process.
After a bit of digging I found this post and this one, which discussed different error messages around their build process failing with some sort of error around the build controller losing connection to the TFS server. They talked about different fixes relating to DNS issues and load balancing, so we had our network team update our DNS records and flush the cache, but were still getting the same errors.
We have several build controllers, and I noticed that the problem was only happening on two of the three, so our network team updated the hosts file on the two with the problem to match the entries in the one that was working fine, and boom, everything started working properly again 🙂
So the problem was that the hosts file on those two build controller machines somehow got changed.
The hosts file can typically be found at "C:\Windows\System32\Drivers\etc\hosts", and here is an example of what we now have in our hosts file for entries (just the two entries):
12.345.67.89 TFS-Server.OurDomain.local 12.345.67.89 TFS-Server
If you too are running into this TFS Build Server error I hope this helps.