Why is there a delay at the end of void Loop?

The goto statement has been part of C from the beginning, added to satisfy those people that barely got beyond BASIC programming, in my opinion. In 25 years of C/C++ coding, I've used goto exactly once, and that could have been avoided if I'd been thinking.

I see there is a performance hit for using functions though, them not being compiled inline and all that.. :~
I wonder if goto can give a speed advantage there?

them not being compiled inline and all that..

Have you examined the output from the compiler?
What evidence do you have to support your assertion?

I have had some interesting results posted on the POV math thread but there may have been some other factors involved. I'll quickly rewrite the code for both scenarios and see what the scope says ..

Ah! Time to defend the lowly goto...

I was recently working on EtherCard::packetLoop, see tcpip.cpp, line 516. Note the vast number of return statements. Now I needed to add something in just before the method returned. Doh! So there are a few options at this point:

  1. Put the code in before every return.
  2. Refactor the whole thing to be a giant tangled mess of if/elses, even more than it already is.
  3. Refactor even further to separate out into smaller methods.
  4. Use a 'goto' in place of the returns, jumping to the new code just before the single return.

Generally, a goto is a good solution when there are a lot of error cases that need to halt further execution and you don't have exceptions available.

The results are in! :slight_smile:

With goto statement 68KHz:

Using a function 63KHz:

Code snippet for goto:

void loop() {
  
  while (1)
  {
   
   goto ShiftNow;
   starthere:
   datab = 4;
  } 
  
ShiftNow:  
   LATCH_OFF();
  SPI.transfer (scopedata);
  LATCH_ON();

  LATCH_OFF();
  SPI.transfer (dataa);
  LATCH_ON();
  
  LATCH_OFF();
  SPI.transfer (datab);
  LATCH_ON();
  
  LATCH_OFF();
  SPI.transfer (datac);
  LATCH_ON();
    
  LATCH_OFF();
  SPI.transfer (datad);
  LATCH_ON();
    
  LATCH_OFF();
  SPI.transfer (datae);
  LATCH_ON();
    
  LATCH_OFF();
  SPI.transfer (dataf);
  LATCH_ON();
    
  LATCH_OFF();
  SPI.transfer (datag);
  LATCH_ON();
  
  goto starthere;
}

Code snippet for function:

void ShiftNow()
{
    LATCH_OFF();
  SPI.transfer (scopedata);
  LATCH_ON();

  LATCH_OFF();
  SPI.transfer (dataa);
  LATCH_ON();
  
  LATCH_OFF();
  SPI.transfer (datab);
  LATCH_ON();
  
  LATCH_OFF();
  SPI.transfer (datac);
  LATCH_ON();
    
  LATCH_OFF();
  SPI.transfer (datad);
  LATCH_ON();
    
  LATCH_OFF();
  SPI.transfer (datae);
  LATCH_ON();
    
  LATCH_OFF();
  SPI.transfer (dataf);
  LATCH_ON();
    
  LATCH_OFF();
  SPI.transfer (datag);
  LATCH_ON();
}

void loop() {
  
  while (1)
  {
   ShiftNow();
  } 

}

For more info on the testing parameters see POV math - #6 by system - Project Guidance - Arduino Forum

PS: I personally wouldn't use goto unless I really needed to ..

EDIT: spelling

All of the typical infinite loop constructs ("while (1)", "for (;;)", goto, etc) end up producing a single branch instruction.
The delay "at the end of loop" in the original posting is the function return and call overhead.

See also: http://arduino.cc/forum/index.php/topic,4324.0.html for lots of discussion on generating the fastest possible square wave...

Thanks westfw, I have bookmarked that thread looks like excellent reading! :slight_smile:

I am however effectively looking for the fastest way to latch 595 registers, any ideas there? Current code implementation in code mentioned in thread above. What you are seeing on the scope pictures is 8 bytes being sent via SPI and the associated latchings.

I got to asking this question because these factors influence my readings and calculations.

westfw:
All of the typical infinite loop constructs ("while (1)", "for (;;)", goto, etc) end up producing a single branch instruction.
The delay "at the end of loop" in the original posting is the function return and call overhead.

See also: http://arduino.cc/forum/index.php/topic,4324.0.html for lots of discussion on generating the fastest possible square wave...

To corroborate this, I did a simple test, print out main.cpp and blink before compile but after arduino process:

There's nothing at the end of the loop() or in main so must be overhead. I expect maybe several registers need to be changed (stack and instruction pointers etc.).

void loop() {
  LATCH_ON();
  a8:   28 9a           sbi     0x05, 0 ; 5
  LATCH_OFF();
  aa:   28 98           cbi     0x05, 0 ; 5
  LATCH_ON();
  ac:   28 9a           sbi     0x05, 0 ; 5
  LATCH_OFF();
  ae:   28 98           cbi     0x05, 0 ; 5
} // without
  b0:   08 95           ret

000000b2 <main>:
#include <WProgram.h>

int main(void)
{
        init();
  b2:   0e 94 a8 00     call    0x150   ; 0x150 <init>
        setup();
  b6:   0e 94 53 00     call    0xa6    ; 0xa6 <setup>
         for (;;)
                loop();
  ba:   0e 94 54 00     call    0xa8    ; 0xa8 <loop>
  be:   fd cf           rjmp    .-6             ; 0xba <main+0x8>

The LATCH_ON and LATCH_OFF all end up as single (2-cycle) instructions. The end/resumption of loop is three instructions (return, jmp, call) and both return and call take 4 cycles. So I'd expect the gap between the last bitset in the loop and the first one after the loop resumes to be about 5 times longer than the gap between consecutive bitsets inside the loop, which is just about what the scope trace shows.

I wouldn't call 10 cpu cycles a "delay"; when you optimize your code down to single instructions, you have to start being aware that EVERYTHING takes at least a little bit of time!

AWOL:
To answer the original question, a simple "goto" will nearly always be faster than a "return from function" + "call to (same) function", but the latter won't get you despised by half the users who think that people who write "goto" in a C program should be drowned at birth. :stuck_out_tongue:

If you are trying to generate an exact square wave at an exact frequency, I suggest the 555 chip (or is it the 666 chip? I can never remember).

As for "despise", it's simply a case of using the right tool for the job. The goto statement has its uses, in possibly 0.01% of cases. In the example given:

 start:
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
 goto start;

... there is still going to be a slight discrepancy between the end of the first OFF and the start of the second ON, and the next one. The goto just makes it smaller (the extra instruction, whatever it does). The timer interrupts firing will also delay the code slightly. It will never be a perfect square wave.

Let hardware do it for you.

Agreed. Goto is unduly demonized when in fact, its the author who should be in the receiving end of the ire. There is nothing wrong with goto in general. Having said that, its very frequently abused and misused. Its the classic, poor carpenter blaming his tools.

I completely agree with your comment. But, I do want to offer that the error can be further marginalized by unrolling the loop by hand. This is a little used optimization technique. With the above, the error is 1 out of every 2 pulses. Not so good.

 start:
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
 goto start;

So on and so on...unroll it until your error becomes acceptable - if possible. With the above, the error is now 1 out of 64 pulses. Still not great, but considerably better; being 32x more precise. Its the classic size vs speed trade off.

KE7GKP:

being 32x more precise.

Not really. Attempting to generate a "precise" (however you are defining it) frequency with Arduino is a classic example of using the wrong tool. It just isn't made for it. Arduino is good for a great many things, but being a precise frequency generator ain't one of them.

I think you misconstrued my intent. I was illustrating a way to minimize the error. I was not arguing fitness of the arduino as a precise square wave generator. I thought I was pretty clear on that. Especially since I agreed with the comment stating a timer, rtc, whatever, should be used rather than the arduino itself. And to be clear, "yes really." The error rate was diminished accordingly. Saying, "not really", doesn't really change that. I assume your, "not really", was more of a conflation of fitness and approach moreso than commentary of the results.

Err .. yes ... as the 'starter' of the thread I need to point at that I am just trying to determine what the 'somewhat theoretical' maximum speed is for Arduino to update 8 shiftregisters with eight bytes.

My tests are based on sending 8 bytes on SPI together with latching required. For those interested I achived 68KHz so far, or 4.3MHz at the bit level .. faster if you count the latches also.

I was not aware that adding code to functions and void loop{} added overhead and when I noticed these discrepancies I asked why, in this thread. I think the topic has been covered.

As stated I need to remove as much incidental overhead as possible from the measurement so as to closer approach the theoretical maximum. Thats all. No square wave generators and no goto vs no goto arguments intended.
I believe the results speak for themselves in this regard.

Thank you all for your valued input, I find this all fascinating and learned a great deal in a very quick time. Special thanks to gerg who first got me started on this track with his inline compiling suggestions. :slight_smile:

PS: IMHO goto is just another code statement, its all rock and roll to me :stuck_out_tongue:

JBMetal:
My tests are based on sending 8 bytes on SPI together with latching required. For those interested I achived 68KHz so far, or 4.3MHz at the bit level .. faster if you count the latches also.

In that case I would use the SPI hardware built into the chip. My measurements here:

... showed that I could achieve around 4 MHz at the bit level (0.125 uS per pulse).

Using the hardware means you can let interrupts do the work for you (on the receiving end at least). I think for the base chip (the Atmega) which is clocked at 16 MHz, getting pulses out at 4 MHz (where a pulse is an off and on sequence) is about the best you can do.

Okay, with all the "goto is evil... from my books' point of view" nay-sayers in this thread, sort of "third-party bashing" goto with ye olde C teachings... I've got to come to its rescue.

There is not a DANG thing wrong with using goto, in my opinion. Personally, I've always avoided it because of all the negativity surrounding it, but I think that may be subject to change. Even with the most efficient C-style code blocks and structures, a goto would make the code much easier to follow and understand, and save thousands of compiled bytes at times. Sometimes you just need to "loop this if that plus that, always run this part of the loop, only do this part of the loop sometimes, and skip to the end of the loop if that", and a "break" just won't do. Often, the more complicated the loop's control, the more efficient the program can run - too many independent function calls and control statements, and you end up wasting lots of code and clock cycles. The "lowly goto" is an amazingly elegant way to "just GO TO" a part of the code, discarding all the conditions in the block... well, provided you do so within the construct of the language you use, like, don't "goto" outside the current function or something. There's LOTS of ways to abuse goto, which is probably what started the whole "evil goto" thing. I think the use of goto should be considered on the implementation, not the fact that it's present at all.

Because in the end, goto is indeed the single most absolutely efficient branching method possible: function calls push/pop registers to set up its environment; control statements perform validation. It's one instruction: jump to "here". :slight_smile:

Because in the end, goto is indeed the single most absolutely efficient branching method possible:

I wonder if that's why compilers use it so much in "while", "do..while" and "for" loops?

Seriously, more often than not, if you have to use a goto, you haven't understood the structure of your problem correctly.

FalconFour:
Even with the most efficient C-style code blocks and structures, a goto would make the code much easier to follow and understand, and save thousands of compiled bytes at times.

I'm sorry to burst your bubble, but that simply isn't true. Consider this sketch:

void setup () {}
void loop ()
{
 while (true) 
  {
  digitalWrite (5, HIGH);
  } 
}

That uses a "while" loop.

Sketch size:

Binary sketch size: 724 bytes (of a 30720 byte maximum)

Generated code:

00000102 <loop>:
void loop ()
{
 while (true) 
  {
  digitalWrite (5, HIGH);
 102:	85 e0       	ldi	r24, 0x05	; 5
 104:	61 e0       	ldi	r22, 0x01	; 1
 106:	0e 94 86 00 	call	0x10c	; 0x10c <digitalWrite>
 10a:	fb cf       	rjmp	.-10     	; 0x102 <loop>

0000010c <digitalWrite>:
	}
}

Now consider this:

void setup () {}

void loop ()
{
  foo:
  digitalWrite (5, HIGH);
  goto foo;
}

Sketch size:

Binary sketch size: 724 bytes (of a 30720 byte maximum)

Generated code:

00000102 <loop>:
void loop ()
{
  foo:
  digitalWrite (5, HIGH);
 102:	85 e0       	ldi	r24, 0x05	; 5
 104:	61 e0       	ldi	r22, 0x01	; 1
 106:	0e 94 86 00 	call	0x10c	; 0x10c <digitalWrite>
 10a:	fb cf       	rjmp	.-10     	; 0x102 <loop>

0000010c <digitalWrite>:
	}
}

The code sizes are identical! The generated code is identical!

There is no saving of "thousands of bytes" using goto. None. Not a byte.

There is no speed improvement. The compiler has generated, in both cases, "rjmp .-10 " - the same machine code instruction.

All the goto does is make the code harder to read. It possibly introduces subtle errors of logic, if you "goto" over stuff you shouldn't.

It is not the panacea for saving memory, saving time. It does none of that.

In the OP's code (and I am sure he realizes this) he could have changed:

void loop() {
  start:
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
 goto start;
} // with goto

to:

void loop() {
  while (true)
  {
  LATCH_ON();
  LATCH_OFF();
  LATCH_ON();
  LATCH_OFF();
  }
} // with goto

The goto didn't save time. It was changing the code to omit the repeated function calls that saved time. But you can do that without using goto.

Because in the end, goto is indeed the single most absolutely efficient branching method possible: ...

No, that simply is not true. You can achieve the same thing with "do" and "while". Keep the code elegant and maintainable.

...and, as stated earlier in the thread... you can gain much efficiency by using Hardware SPI as built into the AVR chip.

Ed was able to get a "factor-of-15 speedup" with 595 latches.

As Nick points out, there is absolutely nothing magical about goto. I get the impression from some followup comments that people now believe goto should be commonly used. If I gave that impression, I apologize. Goto should absolutely not be commonly be used. It should be used very sparingly. Its easy to create rats nests of code which is extremely difficult to read and understand. This is a tale any old BASIC coder will be more than willing to share. It should not be viewed as a general purpose flow control mechanism. It should be viewed as one of many in a developer's optimization bag of tricks. And is widely known, the root of all evil is premature optimization. So that should tell you, if you're readily reaching for goto, you're using it wrong.

In my many, many years of coding, I've used goto in C/C++ code less than a half dozen times. Once or twice more I would have used it again except it was forbidden by the coding standards. In all cases where I've used it, it was in fairly complex code where goto was the only possibly means to obtain the optimizations required while ensuring some facet of readability. Generally speaking, if you find you're using goto more frequently than once every couple of years, while coding on a daily basis, chances are very high you're using it wrong.

Now then, as I originally stated, goto has been demonized and is frequently forbidden. Such a response is almost as inappropriate as daily use. But, just because the use of goto is a legitmate flow control technique doesn't mean it should be used without considerable thought. Generally speaking, it should only be used as a optimization technique of last resort, and then generally only by experienced coders.

Again, as Nick pointed out, the simple use of goto, in of itself, doesn't magically imbue optimizations. And by far, it can be easily used for evil. Which is exactly why it has such a bad reputation.