SoftwareSerial Buffer problem when not power of 2

I am building a GPS/SD card logger which all works fine :slight_smile:
The problem is I want to log position fixes at 10hz which ups the data rate, so I have the soft serial at 38400, the next thing is I obviously get overflow, especially when i'm writing to the SD card so I change the buffer from 64 to 256 and all works fine no loss etc.
Just selecting the NMEA data I want by commanding the GPS module.

The big thing I cant understand is if I set the buffer size to anything other than a power of 2, I get corruption.
Initial thought was something with the mod buffersize not working correctly, but the code looks fine, I then thought about alignment, but that all looks ok.

Before I start posting code is there something obvious going wrong.

.. Simon

What version of the IDE are you using?

What value, specifically, causes corruption? 15?

I have spent all evening on this, with the following findings and following ideas.
With slower baud rates (4800) it works fine, when its increased to 19.2 then the problems start.
I notice that the fist 2 bytes are ok then it goes out of sync then back in sync and out again.
So im fairly sure its a timing issue. looking at the SoftSerial code there are calculations using % (mod)
of the buffer size while the head chases the tail etc in the circular buffer, my guess is that when its 512
the calculation is either optimised or just executes faster than when the buffer size is 511.
I will investigate further.

oh yeh and I just checked this is in the ISR

This is with IDE 1.0.1 16Mhz 328 nano

So what size are you trying?

  static volatile uint8_t _receive_buffer_tail;
  static volatile uint8_t _receive_buffer_head;

They will hold 255 max.

Hi Nick thanks for the comment I hadn't noticed the 8 next to the int and assumed 16bit :sleeping: still not quite use to working back in the micro world.

Any how the problem still persists if I set _SS_MAX_RX_BUFF to 63, 127 or anything other than 2^n i still get timing issues.
Im currently using 127 for testing, but will revamp the code to use 16 bit in the end as I may need more than 256.

With 127 and original code I get the following corruption at 19.2k
$G¨Ô? ??ÅÉ`37,???KLK&%?©Å
$G¨Ô? ??ÅÉd36,???KLK&%?©Å

As a quick test I modified the code as follows to see if its the root of the problem
The original code does the mod twice which is a waste of cycles and we all know ISRs should be as fast as poss.

//    if ((_receive_buffer_tail + 1) % _SS_MAX_RX_BUFF != _receive_buffer_head) 
    uint8_t calc_once = (_receive_buffer_tail + 1) % _SS_MAX_RX_BUFF;
    if (calc_once != _receive_buffer_head)
    {
      // save new data in buffer: tail points to where byte goes
      _receive_buffer[_receive_buffer_tail] = d; // save new byte
//      _receive_buffer_tail = (_receive_buffer_tail + 1) % _SS_MAX_RX_BUFF;
      _receive_buffer_tail = calc_once;
    }
    else
    .....

Now the corruption is quite a bit less
$GPGGA,000825.8³6,,,,,0,0,,¬­??¦,
$GPGGA,000826.0³7,,,,,0,0,,¬­??¦,

So I am on the right track.
I will remove the mod and add some IF-BUT code to do the buffer loop which may fix the problem.
As a side note I will do some testing to see how much faster the mod is when its power 2, it could be optimised by the compiler to be a simple logical AND in that specific case which would be very fast.

I will post anything interesting.

Any modulo operation with a power of two can be optimized by the compiler to a simple AND operation which is much faster than a division (the Atmel 8bit instruction set doesn't have a division operator, so a modulo is a sequence of subtractions) and in this case probably even more important: it's not timing constant (it doesn't need the same time to execute for different input values).

Here is some simple test code that tends to show that %128 executes much faster than %127
Its a bit messy and long winded as I hadnt read that the res of the micro() is only 4
so it got chopped up and duplicated and moved around quit a bit till it produced sensible results.
swap and move the mod values around to see the results change.

  unsigned long t11;
  unsigned long t21;
  unsigned long t12;
  unsigned long t22;
  uint8_t val0 = 0xA5;
  uint8_t res0 = 0;
  uint8_t val1 = 0xA5;
  uint8_t res1 = 0;
  t11 = micros();
  for (int i = 0; i < 500; i++)
  {
  res0 = (val0+i) % 127;
  }
  t21 = micros();
  t12 = micros();
  for (int i = 0; i < 500; i++)
  {
  res1 = (val1+i) % 128;
  }
  t22 = micros();
  Serial.print("Res 0 "); Serial.println(res0);
  Serial.print("Dur 0 "); Serial.println(t21 - t11);
  Serial.print("Res 1 "); Serial.println(res1);
  Serial.print("Dur 1 "); Serial.println(t22 - t12);

result :-

Res 0 29
Dur 0 7648
Res 1 24
Dur 1 440

... Simon

simoncastle:
Any how the problem still persists if I set _SS_MAX_RX_BUFF to 63, 127 or anything other than 2^n i still get timing issues.
Im currently using 127 for testing, but will revamp the code to use 16 bit in the end as I may need more than 256.

I agree with the others, that at a high baud rate (when cycles are scarce) doing a lengthy division, twice, isn't going to help.

I further optimised the code as follows which allows the system to RX error free at 19.2 but fails badly at the next step up the baud rate to 38.4 dont know about the ones in between.
I can't see how this code is too much slower than the assumed logical AND version of the % but it's enough to make it fail so it just shows you how critical this section is during the inter byte gap.

Im wondering if something can be done around the eating up of the stop bit to buy some extra time.

... Simon

//    if ((_receive_buffer_tail + 1) % _SS_MAX_RX_BUFF != _receive_buffer_head) 
    uint8_t calc_once = _receive_buffer_tail + 1;
    if (calc_once >=_SS_MAX_RX_BUFF) calc_once = 0;
    if (calc_once != _receive_buffer_head)
    {
      // save new data in buffer: tail points to where byte goes
      _receive_buffer[_receive_buffer_tail] = d; // save new byte
//      _receive_buffer_tail = (_receive_buffer_tail + 1) % _SS_MAX_RX_BUFF;
      _receive_buffer_tail = calc_once;
    }
    else ...

I think the issue is that % returns different values for negative inputs to an and? Certainly at a quick glance the code in SoftwareSerial::available () worries me - intermediate results are not cast to unsigned so will potentially be negative?

Anyway there's no need for % at all, each use can be replaced by a conditional subtraction of _SS_MAX_RX_BUFF from the variable that should be wrapping-round.

MarkT:
Anyway there's no need for % at all, each use can be replaced by a conditional subtraction of _SS_MAX_RX_BUFF from the variable that should be wrapping-round.

That's what I did but it runs slower than the original code with a buffer size of 2^n
The original would work at 38.4k with no problem but then you need a buffer size of 128 etc.
My original problem was that the thing fell over with a buffer size of 100, which started the investigation and head scratching.

... Simon

Just for interest I decided to look at the assembler for the speed test above, you can clearly see it performs an AND IMMEDIATE with 127 to do the mod 128.

The strange thing is that it looks like its promoted the uint8_t to be 16 bit, possibly because of the xA5 assignment
Ah yeh its adding int i that causes the promotion to 16 bit.

 1be:   0e 94 8f 01     call    0x31e   ; 0x31e <micros>
 1c2:   1b 01           movw    r2, r22
 1c4:   2c 01           movw    r4, r24
 1c6:   85 ea           ldi     r24, 0xA5       ; 165
 1c8:   90 e0           ldi     r25, 0x00       ; 0
 1ca:   fc 01           movw    r30, r24
 1cc:   ef 77           andi    r30, 0x7F       ; 127
 1ce:   f0 70           andi    r31, 0x00       ; 0
 1d0:   fb 83           std     Y+3, r31        ; 0x03
 1d2:   ea 83           std     Y+2, r30        ; 0x02
 1d4:   01 96           adiw    r24, 0x01       ; 1
 1d6:   f2 e0           ldi     r31, 0x02       ; 2
 1d8:   89 39           cpi     r24, 0x99       ; 153
 1da:   9f 07           cpc     r25, r31
 1dc:   b1 f7           brne    .-20            ; 0x1ca <setup+0x7e>
 1de:   0e 94 8f 01     call    0x31e   ; 0x31e <micros>

OK its working.
The fix on post #8 is working fine at 57.6k with any buffer size i.e. 160
It appeared not to work the other night when I upped the rate as the GPS cmd to change baud was wrong. :blush:
160 ish is a good buffer size as it lets you have 2 lines of NMEA data in hand while you talk to an SD card or do other things.
128 looks a bit small, 256 may be too big for some.
256 does actually work as the 8bit math it naturally mod 256 if you know what I mean.

Another possible optimisation I observed is that the delay for eating up the stop bit could be just over half a bit as we are already looking at the so called middle, so it should only take another half a bit or so before we get back to the MARK state.
I guess we only need to miss the edge of the STOP if it happens to be changing from the state of the final bit.

Any one know how to get this forwarded for consideration in the core ?
There should be a least a big WARNING or explanation that the current code is highly optimised for buffers of 64, 128, 256, and it certainly doesn't work with other values at 19.2k or more

... Simon

I just want to point out that even a small buffer (eg. 32 bytes) should be plenty if you handle the rest in your code. Examples here:

With a suitable state machine you can throw away bytes as they arrive. The only real point of the serial buffer is to allow a margin for error if you can't empty it as fast as the data arrives.

Hi Nick
Thanks

With a suitable state machine you can throw away bytes as they arrive. The only real point of the serial buffer is to allow a margin for error if you can't empty it as fast as the data arrives.

Exactly
As I said I'm receiving updates at 10Hz so need 38.4k if I want to capture all the data, I'm not doing any waiting or delays etc but when I write to the SD card it sometimes takes quite a time before it returns, possibly while its updating the FAT or something, because its periodic possibly after a 512 byte block or something. There is no silly reopening and closing of files etc. Its during this time the buffer overruns and I DO NOT want to throw away data. I did upgrade to a class 4 card that alleviated the problem a bit, but don't have anything faster at the moment to test with.

... Simon

I'm not doing any waiting or delays etc but when I write to the SD card it sometimes takes quite a time before it returns

Writing to the SD card IS a delay. Just not in the usual sense of "do nothing else until" or "do something else until". Writing to the SD card takes time.

but don't have anything faster at the moment to test with.

fat16lib has posted some faster libraries, for the Mega particularly, where double buffering is possible, making writes a lot faster. I'd check out some of his work.