Skip Navigation


Corrupt downloads and bad streaming affects East Lothian
Monday 14 September 2015 17:45:42 by Andrew Ferguson

We are aware of at least three providers affected by problems in the East Lothian area, specifically the Tranent, Prestonpans and Musselburgh exchange areas.

Customers of BT, TalkTalk and Sky all with the fibre based services at the provider are reporting corrupt downloads of large files and problems with streaming video. The corrupt downloads problem is potentially serious as if you are updating the firmware of a device that does not perform checksums on any downloaded files you might end up with a dead device.

We have tracked down what looks to be a regular speed test user on the Tranent exchange and the speed test results are showing what would normally be classed as a congestion problem, but congestion should not normally exhibit as corrupt data packets. A couple of weeks ago you can see that the same user had a never nicely behaved connection with a stable 63 Mbps download speed (the user has many more speed tests but we have featured the two that appear to demonstrate what is happening for users).

We have asked for more information on what is the actual problem, but given three threads that are already running BT, Sky and TalkTalk and service status checkers are just referring to problems in the area we are not hopeful on getting a short concise answer.

If you are not sure if you are affected, our download test files and their MD5SUMS are available for those wanting to verify large file downloads. Also in addition to our usual speed test options, you can run our speed test over SSL though for those on connections over 400-500 Mbps we have found that browser and PC performance can be a limiting factor when handling SSL traffic.

Update 10am Tuesday Zen Internet status page is monitoring the work that is still on-going to fix the problem, several things appear to have been tried but the fault is persisting.

Comments

Posted by TheEulerID about 1 year ago
Corrupt downloads are no doubt a pain in the neck, but surely software and firmware have checksums and no updates are applies unless these match.

In any event, it's a bit difficult to see how a download can be corrupted "in-flight" without being detected as there are several layers of checksum. These checks are end-to-end and it ought not be possible (at least with TCP) for corruption to go undetected.

The nearest I can imagine is files being corrupted in a cache (but, again I'd expect checksums to detect problems).
Posted by TheEulerID about 1 year ago
Reading the posts, then it doesn't look like corrupted data is arriving undetected. It's just the downloads are failing due to corruptions. That is clearly no right, but it's not a threat. It's clearly a common fault.
Downloading corrupt files which went undetected would be a whole different world of pain. I've only looked at a dozen pages or so, but it reads like either congestion or a bit of common kit mangling packets.
Posted by TheEulerID about 1 year ago
Ok - reading even further down the list it looks like some consoles may have been bricked which can surely only happen through a design fault in the console's firmware update routines. It imples it's not CRC checked before there's an install attempt. If so, that's sloppy beyond belief in this day and age. I suspect what happened was a download failed part through and the firmware update routines ran anyway.
Posted by I75Inside about 1 year ago
TheEulerID. The issue is that the checksum mechanism is actually fine and full downloads arrive corrupted.

A device in the network is corrupting the data as it passes through between checksums. Every so often two bytes 16 bytes apart have bit 2 corrupted. Whatever is corrupting the byte stream is doing so before the checksum is recalculated.

So you can only tell it is broken if you do a md5 or other check on the whole file once it has downloaded. Most downloaded files like zip files do this for you. Higher level download tools detect issues and they are running slow or failing.
Posted by TheEulerID about 1 year ago
@I75Inside

You are talking about the transport layer checksums. I was talking about the checksums that are in the files. There is no way that a network transport mechanism can possibly be aware of whatever checksum mechanism is used in an arbitrary file.
Posted by TheEulerID about 1 year ago
@I75Inside

I should also have said that for a network device to mangle data and for it to be undetected at the transport layer it would mean that the TCP checksum would have to be recalculated and reset by the network layer device. Network layer devices should ideally not be manipulating transport layer checksums at all, but it is at least possible as its standardised. For a network layer device to recalculate an application layer CRC (as I'd expect to see in a firmware update file) it would have to be "application aware".
Posted by Nightglow about 1 year ago
Zen has a detail fault report here:

https://status.zen.co.uk/broadband/fault-outage-details.aspx?reference=43147
Posted by Nightglow about 1 year ago
Too early for me, correct link.

https://status.zen.co.uk/broadband/
Posted by I75Inside about 1 year ago
@TheEulerID Yes I was referring to the transport layer as I read your first post as saying download corruption was being detected, which is not the case at the download stage itself for TCP at least. Only the tools which are doing the downloads are noticing if they contain checks or are using layers on top of TCP like SSL.

How the applications deal with downloads is, as you say, in the realm of the application. Not all downloaded content has such checks though. e.g I bricked an iPhone last week when I flashed it from a corrupted iOS image using iTunes.
Posted by TheEulerID about 1 year ago
@I75Inside

It's still unusual; TCP checksums are e2e. I'm trying to think why a network device should regenerate a CRC in a TCP header. I recall there are odd issues with VPNs going via routers with IP aliasing, but that's it.

I'd be inclined to think that the devices are updating firmware with truncated files because of partial downloads. I've certainly seen truncated downloads in the past. A good reason to use a ZIP file even if the data is incompressible.

TCP CRCs aren't totally infallible, but I wouldn't expect a whole stream of them to go undetected.
Posted by I75Inside about 1 year ago
@TheEulerID Well seems fixed and has been tracked down to a faulty layer 2 switch card in Tranent which has been replaced. All now seems to be working and no corruption.
Posted by I75Inside about 1 year ago
I suspect the corruption was such it was not picked up by the TCP CRC. The two corrupted bytes were always +4 and then -4 relative to what they should be. These probably cancelled each other out in the CRC which does a loop first of the bytes and sums the values. The result for both good and bad payload would be the same.
Posted by TheEulerID about 1 year ago
@I75Inside

I'm glad to hear it's fixed. It may be that this cancelling effect would explain it (it implies a weakness in the algorithm - possibly it is compromised for speed over security). In any event, a bit of a freak, but it can never be ruled out.

In any event, it doesn't excuse the lack of checksums in firmware/software update routines. That's a fundamental design shortcoming.
Posted by TheEulerID about 1 year ago
Reading around this the TCP checksum system is very weak and can't be relied upon for data integrity. For that you need an application layer system using something much more robust algorithm.

http://ask.metafilter.com/12120/Why-do-you-get-corrupt-FTP-transfers-if-its-over-TCPIP
Posted by c_j_ about 1 year ago
"the CRC which does a loop first of the bytes and sums the values"

Summing the values is a checksum, it's not a useful CRC, and in sufficiently large quantities of data, doing it only once is not even much of a checksum.

A slightly better checksum for non-trivial quantities of data splits the data into rows and columns, with a checksum for each column.

But for serious quantities of data you'll want an application-level integrity check of some kind, frequently a CRC, in case there's an undetected problem at TCP level, as there was here.
You must be logged in to post comments. Click here to login.