Sorry, you need to enable JavaScript to visit this website.

PS freezes when making use of PL

Unsolved
13 posts / 0 new
Cone83's picture
Cone83
Junior(0)
PS freezes when making use of PL

I am encountering very odd problems with my new Mini-ITX board. Whenever I make use of a significant portion of the PL resources, the PS just freezes.

I have done quite a lot of debugging on this, and the only thing that seems to affect this problem is the amount of resources used. I have a test design that does a lot of useless computations. If I allow Vivado to optimize away all the useless stuff and hence greatly reduce the resource usage, then the CPU does not freeze. If I prevent the optimization and use 56% LUTs, then the CPU freezes immediately.

There are no timing timing violations or other errors reported by Vivado. I have now come to a point where I suspect that my Zynq might actually be damaged.

I have created a set of boot files for reproducing this error on a Mini-ITX with Zynq 7100. The boot files can be downloaded here:
https://mega.nz/#F!9hF0kQrZ!RfGgoTnCO0QOtp6MSUFZxg

When booting this configuration, U-boot only manages to print a few lines on UART before the CPU stalls. Usually the output looks like this:

--------------
U-Boot 2015.07-dirty (Apr 09 2016 - 11:32:21 +0200)

Model: Zynq Mini-ITX Board
DRAM: ECC disabled 1 GiB
MMC: zynq_sdhci: 0
SF: Detected S25FL128S_64K with page size
--------------

Sometimes it even freezes before being able to print the first line. I would be really thankful if somebody could try out those boot files and report if this error also occurs on other Mini-ITX boards with Zynq 7100 SoCs.

Many thanks

TroutChaser's picture
TroutChaser
Moderator(18)
While I don't have a 7z100

While I don't have a 7z100 version of the Mini-ITX here to work with I was able to get someone to try your project on their board. It got further than you post indicates but did hang at the 'Starting Kernel' step, so it looks like your application and not your Mini-ITX hardware.
 
You have a lot of things going on here: the fsbl, u-boot, your OS, devicetree, bit stream etc. Is your zynq_fsbl current for your platform or did you copy it from somewhere else?
 
You might want to check and see if your system will boot correctly with a simple bare-metal program with your PL loaded. Maybe one of the canned memory tests, peripheral tests or an echo test. Or you could write a simple bare-metal 'hello world' with an indication to the terminal that it is running and then load your bit stream to see if it does indeed crash the PS. This could indicate if the problem really is a bit stream that crashes the PS, a problem with your software, or some more complex interaction between the two.
 
-Gary

Cone83's picture
Cone83
Junior(0)
Hi Gary,

Hi Gary,

Thank you so much for trying this out. Interesting to see that it got past the 3 second boot delay on another system. The whole thing actually boots nicely If I use a smaller bit stream. It also boots nicely if I use the large bit stream, but have all the logic idle. The system will then freeze immediately when the PL is put to work. On the external interfaces, there is no difference between the small and the large bit stream.

I'll try chopping this design down even further. Maybe something is wrong with the constraints or the PS configuration. But I have used the original board definition files for these.

Cone83's picture
Cone83
Junior(0)
I have now narrowed this down

I have now narrowed this down as far as I possibly can. I am left with the following set-up that reproduces the problem

PL:
* The design is connected to just one HP_AXI port
* It is attempted to read approx. 1 MB of data from the AXI port at a speed less than 60 MB/s
* No writing is performed
* An AXI Interconnect is wired in between with enabled protocol checking and no errors are reported
* No other connections exists form the PL to the outside world
* The processing system configuration is copied from one of the reference designs (I actually copied the XML code from the .bd file)

PS:
* Only the u-boot boot console is started
* No operating system is loaded

As soon as reading from the AXI port starts, the PS reproducibly stalls. If I allow Vivado to optimize the design and throw away most of the internal logic, everything works fine. But Vivado's optimization should not make any difference on the PL/PS interaction. Nothing happening inside the PL should be allowed to affect the PS in any way.

So, the only theories that I have left are:

* There's something wrong with the Mini-ITX board definition files (maybe DDR delay or something)
* There's a bug in my Vivado version (I'm using 2015.4.1 on Linux)
* The power consumption pattern of the PL affects the PS
* Maybe some other hardware bug

These are all things that I can't do anything about. So, currently the Mini-ITX is nut usable for my purposes. Is there anyone from Avnet I can get in touch with?

Thanks

TroutChaser's picture
TroutChaser
Moderator(18)
Hello,

Hello,
 
You mention in the post above that you are using Vivado 2015.4.1 on Linux. I have seen an issue in the past where Vivado on Linux created incorrect frequencies for PL fabric clocks if the regional settings for the Linux were set to something other than 'US'. Here is a post to the Xilinx Community forums and one on this forum that relate to that issue:
 
https://forums.xilinx.com/t5/Embedded-Development-Tools/Cannot-Set-speci...
 
http://zedboard.org/content/ethernet-does-not-work-linux-314
 
I don't know if this is an issue in the current version of Vivado, but it might be worth changing the Linux regional setting and regenerating your application to rule it out.
 
-Gary

Cone83's picture
Cone83
Junior(0)
Hi Gary,

Hi Gary,

I really appreciate your help on this. I'm aware of Vivado's problems with coping with different regional settings. That's why I already switched all settings to English on my system. I also just tried downgrading to 2015.2 and it produced the same result.

I think that the same error might also be responsible for my problems with getting PCIe running (other thread), as the PCIe data is also passed through a HP_AXI port. I have ordered the same NIC that is used in the reference design, but that is another topic.

I'll try if I can get a Windows machine to do a test-build, but building on Windows wouldn't be a longterm solution.

Cone83's picture
Cone83
Junior(0)
I just tried it with 2016.1

I just tried it with 2016.1 on Windows. This time it was running fine for several minutes before freezing. But that's probably due to other changes I did. I was also getting longer run times on my Linux builds recently. It's all just very random.

Cone83's picture
Cone83
Junior(0)
I still haven't made progress

I still haven't made progress on this and it's becoming a major issue. I think that the most likely explanation would be a power issue. I have monitored the supply voltages in chip scope and haven't found any values that would be outside the allowed voltage ranges, but I suppose chipscope wouldn't pick up a short spike in the voltage levels. When I clocked down the design from 125 MHz to 50 MHz it did work, which would support the hypotheses that it is power related. I really can't think of any other way that the PL could be affecting the PS. Could there be a problem with the Min-ITX power circuitry?

Cone83's picture
Cone83
Junior(0)
I'm also not convinced yet

I'm also not convinced yet that this isn't an issue with my board. Your test got stuck when loading the kernel. This could e.g. also be caused by a faulty device tree. To get sure I have created a bare metal application for reproducing the fault. The application cycles between periods of low and high CPU loads, as I have found that some bit files work well for as long as the CPU is idle. On a functioning system, this should output something like this:

Test with low CPU load
0
1
2
3
4
5
6
7
8
9
10
Test with high CPU load
0
1
2
3
4
5
6
7
8
9
10
Test with low CPU load
0
1
...

For me it usually gets stuck after printing the first two lines. I have made the boot.bin file available at:
https://mega.nz/#!QgtnRaAR!8QKhi_PFyIKuAeFEGoOdsmup8QKnQBcL6DoDhz6nPzI

I would be really thankful if you or somebody else could give this another try.

zedman2000's picture
zedman2000
Moderator(2)
Hi there,

Hi there,

I wanted to post and let you know that I was able to run your software on a M-ITX Z100 for the last hour without issues. I also tried to reboot it at least 5 times to ensure I did not see any stuck at boot issues you mentioned.

I would suggest you check your heat sink + fan that they are performing their function. That the CPU is not getting HOT. I would also validate your power rails and PC Power Supply is hooked up properly to ensure that your rails are not causing your issue.

--Dan

Cone83's picture
Cone83
Junior(0)
Hi Dan,

Hi Dan,

thank you so much for testing this. I have actually been monitoring the voltages and the temperature using the XADC and everything seems nominal. I have also tested the board with a different power supply and experienced the same behavior, so I can't really pinpoint the cause.

Anyway, many thanks for your help. I'll contact Avnet for replacement.

padudle's picture
padudle
Junior(0)
Hello,

Hello,
Does Avnet have a canned reference design you can run on the board?  When returning a board it is always best if it also fails the manufacturers reference design. :-)

zedman2000's picture
zedman2000
Moderator(2)
Hi there,

Hi there,

There are many reference designs and the Out Of Box Design that you can download from the Mini-ITX section of this support site.
Please find the designs located here:
http://picozed.org/support/design/2056/17

--Dan