Read multiple gds best practice

Dear all,

I'm currently using layout.read() to sequentially load multiple GDS files into a main layout and insert all cells into a top_cell. However, this process has proven to be quite time-consuming, and the bottleneck appears to be the sequential reading of each GDS file.

I've also thinking of use multiple layout to read concurrently, but the bottleneck becomes the copy_tree() part which copies all cells into the top cell.

I've also reviewed the reader source code in hopes of identifying potential optimizations for the read() function, but I haven't yet found a way to improve its performance.

May I kindly ask if anyone has experience with a more efficient approach to speed up this process? Additionally, do you believe there is still room to optimize the read() function itself?

Any insights or suggestions would be greatly appreciated. Thank you in advance!

Comments

  • edited July 10

    Could you be a little bit more specific ?

    What do you consider "time-consuming" ?
    How big are your files, are they gzipped or not ?
    How long does it take to "collect" them ?
    Which "uniquification" (blend-mode) strategy do you want ?
    Where are these files coming from ? In other words, which tools wrote them ?

    If you want smaller files and faster read/write times, consider using oasis instead of gds.

    Also, have a look at the buddy tool strm2gds/oas.
    They're not faster, but instead of klayout python scripts, you can use the command line options.

  • edited July 11

    Hi, thank you for your reply.

    I compiled the KLayout source code into a Python package and use Python to run it. The GDS files are stored on disk.

    It takes roughly 70 seconds to perform layout.read() on a non-zipped 3GB GDS file. Since I need to read multiple GDS files into a single layout, this process becomes quite time-consuming.

    For clarity, here is my pseudo code:

    main_layout = new Layout()
    top_cell = main_layout.create_cell("TOP")
    for chip in chips:
        layout.read(chip.path)
        for cell in main_layout.top_cells():  # retrieve the newly read cell
                if cell.name != "TOP":    
                      top_cell.insert(cell)
    layout.write(output.oas)
    

    The bottleneck of this code is in read and write.
    I wonder if there's a faster way to do this?

    Best regards,
    Ryan

  • edited July 11

    Multiple, not gzipped 3GB gds ?
    P&R or mask data ?
    70 sec for 3GB is as fast as a single thread reader can read gds.

    a) in my experience, if the files are in a file server (not a local disk) gds.gz files read/write faster since they’re 1/5th the size of plain gds. Network speed is the bottleneck.
    b) Use oasis.

  • edited July 13

    Thanks to OpenRoad there are publicly available testcases.
    Unfortunately efabless went out of business, so the mpw files are harder to find now.

    2.5GB gzipped or not gzipped can be read/written by strm2gds/oas in 12/14/11 seconds
    This is using a 2021 3.5GHz M1 MacBook.

    So your read time of 70 seconds for a 3GB file seems long.
    Please provide more information about your file and the machine you run on.

      compressed uncompressed  ratio uncompressed_name
       481005292   2448607018  80.3% caravel_mpw5-slot001/caravel_0005e9ec.gds
    
    struct top uref    sref + aref + path + polygon + text  = element  dbu/um
    ---------------------------------------------------------------------------------------------
      1444   1    0  278480      8    805  37779492  74594   38133379  1000.0  caravel_0005e9ec
    
    strm2gds -d=11 caravel_mpw5-slot001/caravel_0005e9ec.gds.gz a.gds
    Total: 11.36 (user) 0.77 (sys) 12.205187279 (wall) 841.61M (mem)
    
    strm2oas -d=11 caravel_mpw5-slot001/caravel_0005e9ec.gds.gz a.oas
    Total: 13.88 (user) 0.25 (sys) 14.128706912 (wall) 935.42M (mem)
    
    strm2gds -d=11 a.gds b.gds
    Total: 9.69 (user) 1.08 (sys) 11.120623849 (wall) 844.83M (mem)
    
     481005292 caravel_mpw5-slot001/caravel_0005e9ec.gds.gz
    2447499426 a.gds
      22417589 a.oas
    
  • Thank you for your reply and for providing concrete experiment results.

    70 seconds for my case does seem long.
    I am using MEBES format GDS and for the mebesreader I use downloaded mebes plugin and compiled it.
    I am not familiar with strm2gds. Does strm2gds use the same underlying methods to read/write gds to oas as using compiled klayout source code? or is it optimized?
    Plus, I saw you mentioned single thread reader, is there a possible way of using a multithreaded reader?
    Thank you very much for your time and assistance.

    Best regards,
    Ryan

  • strm2gds/oas use the same database as klayout.

    I'm not aware of any open source tool using multithreading, if anybody heard of one, let me know.
    No tool is even using double buffering to decouple gunzip and data parsing/database building.

    In 2020 I tested a viewer of one of the big three $$$$$ tool vendors that was multithreaded, but have not used $$$$$ tools since.

    Over that last couple of years I've been writing a multithreaded layout format reader/writer, but that will not be free.
    Initial testing showed that on a local machine with ssd, it scales nicely to 8 threads, but in a datacenter environment with NAS and other processes on your machine requiring IO, the gain dropped to 3x read speed. Not as impressive as running it locally.

  • Thank you for your reply, regarding multithreading:
    I’d like to ask for your advice regarding the feasibility of using multithreading to optimize GDS file reading.

    In my current sequential implementation, the performance bottleneck lies in reading all chip GDS files into a main_layout, as shown below:

    main_layout = new Layout()
    for chip in chips:
        main_layout.read(chip.path)
    

    I’m considering a multithreaded approach to improve efficiency. Ignoring Python’s GIL for now, here’s the idea:

    # Parallel, Use one thread per chip(or less) to read its layout
    for chip in chips:
        layout = new Layout(false)
        layout.read(chip.path)
        chip_layouts[chip["chip_name"]] = layout
    
    
    # This part remains sequential due to locking in main_layout and should be way faster than read/copy.
    for chip in chips:
        layout = chip_layouts[chip_name]
        for cell in layout.top_cells(): # each chip only have 1 top_cell
            copied_cell = main_layout.create_cell(cell.name)
            chip_cells[chip["chip_name"]] = copied_cell
    
    # Parallel, Use one thread per chip(or less) to copy into main_layout
    for chip in chips:
            layout = chip_layouts[chip_name]
            for cell in layout.top_cells(): # each chip only have 1 top_cell
                copied_cell = chip_cells[chip["chip_name"]]
                copied_cell.copy_tree(cell)
    

    Would this design be feasible and produce consistent results as sequential version?
    Ideally, I’m hoping the total read and copy time could be parallelized.

    I’d really appreciate your insights—thank you!

  • Hi @ryanke,

    you cannot ignore Python's GIL. It needs to be there, at least if you run the application.

    Free-threaded Python may be possible with the KLayout Python module, but I have never tried myself. You're entering unknown territory here. Python without GIL is a recent development, most likely needs special support in terms of integration and I have no idea about the stability.

    As a general statement I should say that I would not offer a solution as Open Source if it was really competitive with commercial ones. So you should not try too hard - there is a limit. Open Source scales better in terms of infinite license availability, not in terms of high-performance architectures. That is to be left to EDA companies that spend way more resources in their code development than me.

    Matthias

  • edited July 17

    Hi @Matthias,

    Thanks for the reply.
    To ignore python GIL, I am thinking to implement the parallel code in klayout source code. For example, overload the above algorithm into source code as a function. Inside that I do parallel read, copy_tree() with c multi-thread. Is this overloading possible? Plus, is paralleling read and copy ahievable?
    Thank you.

    Best regards,
    Ryan

  • edited July 18

    Hi @ryanke,

    This will not work they way you think. "copy_tree" is available as C++ code, but it's not MT-safe. In general, reading is, writing isn't. So, there is not multithreaded "copy_tree".

    Please think of KLayout as a swiss army knife. It's not a chainsaw. If you want to kill trees, you can't do so with a knife. Translates to: KLayout may not meet your requirements. In that case, you should use a different tool.

    Matthias

  • Thank you for taking the time to explain things to me.

Sign In or Register to comment.