Merge gds/oasis files using copy_tree is very slow

ahmedo · February 2021

Hello,

I have created a script to merge 2 or more gds/oasis files with support of few options (rename all cells, rename only cells with naming conflicts, etc.). The script is based on the function copy_tree of klayout API.
It works fine for small gds files. But, it takes a lot of time for bigger GDS/OASIS files. For example, it takes 24minutes for 2 oasis file merging, while it takes 2/3 minutes with a commercial tool.

Is there any alternative to this function ?

Thanks .

Regards,

Matthias · February 2021

@ahmedo Sorry, but I can't provide support if you don't provide details.

Here are some details you should provide:

Where is time is spent (reading, merging, writing?)
Memory footprint and whether maybe swapping is involved (physical memory exceeded)
How the script is made (layout objects or in-application). Ideally, please provide the script's code.
What reader/writer options you're using
The quality of your layouts (how big)
The KLayout version and OS you're using
What's the commercial tool you're comparing against and which merge concept you're using

I think that just complaining this way isn't a fair benchmark.

I have a benchmark of similar functionality against a commercial tool which does not show a significant performance difference. And "copy_tree" isn't slow in particular.

Matthias

ahmedo · February 2021

Hello Matthias,

Thank you for your answer.

The script takes some time in reading the oasis file (~1-2 minutes). The merge takes a lot of time (~20min) and the writing of the oasis takes almost 1 minute.

I didn't monitor the memory usage, but I am sure that I have enough memory on my server which is based on Linux RHEL7. I'm using Klayout 0.26.5.

Here is the code of the script mergeLayoutFiles.py :

`        
#!/usr/bin/env python

import klayout.tl as tl
import klayout.db as db
import klayout.QtCore
import klayout.QtGui

import klayout.lay

from klayout import db

import logging
import argparse
import collections
from itertools import chain
import os
logger = logging.getLogger(__name__)

def Usage():
    print "$0 -i gds1 -i gds2 -o gdsout [-r|--rename] [-fr|--forcerename] [-f|--force]"
    print "# where:"
    print "# gds1                       : the first input gds or oasis file."
    print "# gds2                       : the second input gds or oasis file."
    print "# gdsout                     : the path to tge output gds/oasis file."
    print "# -t or --outtopcell         : the name of the top cell in the output gds."
    print "# -tr or --transform         : the transformation (shift) to make on the gds while mergineg>.\n \
                Syntax is: -tr \"shift_x_gds1,shift_y_gds1;shift_x_gds2,shift_y_gds2;...;shift_x_gds<n>,shift_y_gds<n>\"\n"
    print "# -m or --mode               : option to rename only the cells from the input gds which have the same name or to rename all the cells using a specific suffix for each gds file. \n \
                'rename' will Rename only cells with name conflict while 'forcerename' will rename all the cells using a suffix."
    print "# -f or --force              : option used to overwrite the output gds/oasis file if it already exists."
    print "# -u or --usage              : Display this usage and exit. Equivalent option is -h or --help."
    exit(1)

def main():
    parser = argparse.ArgumentParser(description='Merge GDS files.')

    parser.add_argument('-v', '--verbose', action='store_true', help='show more detailed log output')

    parser.add_argument('-o', '--output', required=False, metavar='OUT', type=str, help='Output GDS file.')

    parser.add_argument('-i', '--inputs', required=False, action='append', nargs='+', metavar='GDS', type=str,
                        help='Input GDS files.')
    parser.add_argument('-t', '--outtopcell', required=False, metavar='TOPCELL', type=str, help='Output top cell.')

    parser.add_argument('-f','--force', action='store_true', help='Allow overwriting of output file.')
    parser.add_argument('-m', '--mode', choices=['rename','forcerename'], help='Use rename or forcerename mode. "rename" will Rename only cells with name conflict while "forcerename" will rename all the cells using a suffix.')
    parser.add_argument('-u', '--usage', action='store_true', required=False, help='Display script usage and exit.')
    parser.add_argument('-tr', '--transform', required=False, type=str, help='Tansformation that will be made on the input gds while writing it in the output mergeed gds. Syntax is: -tr "shift_x_gds1,shift_y_gds1;shift_x_gds2,shift_y_gds2;...;shift_x_gds<n>,shift_y_gds<n>')




`

ahmedo · February 2021

        args = parser.parse_args()

        # Display the script usage and exit if the user demands this.
        if args.usage:
            Usage()
            exit(1)

        # Setup logging
        log_level = logging.INFO
        if args.verbose:
            log_level = logging.DEBUG
        logging.basicConfig(format='%(module)16s %(levelname)8s: %(message)s', level=log_level)

        # Check if output would overwrite something.
        if args.output:
            if not args.force and os.path.exists(args.output):
                logger.error("Output file exists. Use --force to overwrite it.")
                exit(1)
        else:
            Usage()
            exit(1)

        if len(args.inputs)<2:
            Usage()
            exit(1)
        else:
            for file1 in list(chain(*args.inputs)):
                if not os.path.exists(file1):
                    print "File " + file1+ ": does not exist!"
                    exit(1)
        # rename mode
        if not args.mode:       
            mode = 'rename'
        else:
            mode = args.mode
        # Get the tranformation
        if args.transform:
            tran =  args.transform
        else:
            tran = ""
        shifts = tran.split(";")

        # Load GDS2
        gds_out_path = args.output
        gds_in_paths = list(chain(*args.inputs))

        # topcell
        if args.outtopcell:
            topcell = args.outtopcell
        else:
            topcell = "TOP_MERGE"
        layout = db.Layout()
        topc = layout.create_cell(topcell)
        if args.verbose:
            print("top cell:%s",topc)
        top_cell    = topc.cell_index()
        if args.verbose:
            print("top cell index:%s",top_cell)

        j = 0
        layouts = {}
        cells = {}
        all_cells = []
        dup_cells = []
        for i in gds_in_paths:
            print("gds=%s",i)
            layouts["layout"+format(j)] = db.Layout()
            layouts["layout"+format(j)].dbu = 0.001
            layouts["layout"+format(j)].read(i)
        #layout.read(i)

            if args.verbose:
                print("layouts[\"layout\"+format(j)]:%s",layouts["layout"+format(j)])

            cells_gds=[]
            for x in range(0, layouts["layout"+format(j)].cells()):
                cells_gds.append(layouts["layout"+format(j)].cell(x))

            cells["layout"+format(j)] = cells_gds

            if args.verbose:
                print("cells=",cells["layout"+ format(j)])
            t=0
            while t < len(cells["layout"+format(j)]):
                all_cells.append(cells["layout"+format(j)][t].name)
                #all_cells = all_cells + layouts["layout"+ format(j)].cell(t).name()
                t = t+1
            j = j+1


        if args.verbose:
            print("all_cells:%s", all_cells)

        # Looking for cells with the same name in different gds files.
        dup_cells = ([item for item, count in collections.Counter(all_cells).items() if count > 1])

        if args.verbose:
            print("dup_cells=%s", dup_cells)

        #Making the merge
        if args.verbose: 
            print("len(cells):%s",len(cells))
            print("len(cells):%s",layouts)
            print("cells|2=",cells)

        for index in range (0, len(cells)):
            topcell = layouts["layout"+format(index)].top_cell().name

        if args.verbose:
            print ("top cell:" , topcell)

        for cell in  cells["layout"+format(index)]:
                if mode == 'rename':
                    print ("cell index:",cell.cell_index())
                    if cell.name not in dup_cells:
                if cell.name == topcell:

                # We must create a new cell in the merged gds and copy the content of the cell in it
                            new_cell = layout.create_cell(cell.name)
                            new_cell.copy_tree(cell)

                #If the used has specified a shift in th command line
                            if len(shifts) >=  index+1 and shifts[index] != '':
                                if shifts[index].split(",")[0] == '':
                                    s_x = 0
                                else:
                                    s_x = int(shifts[index].split(",")[0])

                                if shifts[index].split(",")[1] == '':
                                    s_y = 0
                                else:
                                    s_y = int(shifts[index].split(",")[1])                  

                    # cell is shifted in the merged db
                                tr = db.Trans.new(s_x*1000,s_y*1000)
                                topc.insert(db.CellInstArray(new_cell.cell_index(), tr))

                            else:
                                # No cell shift
                                topc.insert(db.CellInstArray(new_cell.cell_index(), db.Trans()))

                # Delete cell as it is replaced by new_cell
                cell.delete()

ahmedo · February 2021

               else:  
            # We are in the case of the cells which have a conflicting names (2 cells with the same name in both input dbs)
                    if cell.name == topcell:
                        new_cell_name = cell.name + "_DB" + format(index)
                        #cell.name = new_cell_name
                        new_cell = layout.create_cell(new_cell_name)
                        new_cell.copy_tree(cell)

                        if args.verbose:
                            print("cell_name=",cell.name,new_cell_name)
                            print("topc:",topc.name)

            #If the used has specified a shift in th command line   
                        if len(shifts) >=  index+1 and shifts[index] != '':
                            if shifts[index].split(",")[0] == '':
                                s_x = 0
                            else:
                                s_x = int(shifts[index].split(",")[0])
                            if shifts[index].split(",")[1] == '':
                                s_y = 0
                            else:
                                s_y = int(shifts[index].split(",")[1])

                # Shif is applied to the cell before inserting it in the merged db.             
                            tr = db.Trans.new(s_x*1000,s_y*1000)
                            topc.insert(db.CellInstArray(new_cell.cell_index(), tr))

                        else:

                # No shift is applied in this case
                            topc.insert(db.CellInstArray(new_cell.cell_index(), db.Trans()))
                    else:
                # Here we are just making cell rename (cell is not a topcell of one of the input dbs)
                        new_cell_name = cell.name + "_DB" + format(index)

                        cell.name = new_cell_name            

            else:


# Call the main function
main()

ahmedo · February 2021

Sorry, I have omitted a part of the program as I have exceeded the limit of the caracters I could write in a single post.

Example of usage:
time python ./mergeLayoutFiles.py -i input.gds -i input2.gds -m rename -t MERGE_MASK --force -o output.gds -tr "0,0;100,100" -v

As I said before, it works fine on small gds files but takes a lot of time on a relatively big oasis file (19Mb). I have made tests and found that the copy_tree used in the script is the cause of the run time issue.

The commercial tool that has been used for comparison is K2_Viewer for Cadence.

Thanks.

Regards

Matthias · February 2021

@ahmedo Thanks for providing this information

I think there is a lot of optimization potential in your script. 19MB isn't really a big file - I deal with GB-size OASIS files on a daily basis. But OASIS can have a hidden complexity: small files may hide a huge number of shapes. But KLayout can deal with this.

First of all, please switch to the latest version. There is a no particular reason for doing so here, but it saves me the effort of looking for differences.

There are too many loops over cells to my taste. For example this:

            cells_gds=[]
            for x in range(0, layouts["layout"+format(j)].cells()):
                cells_gds.append(layouts["layout"+format(j)].cell(x))

            cells["layout"+format(j)] = cells_gds

            t=0
            while t < len(cells["layout"+format(j)]):
                all_cells.append(cells["layout"+format(j)][t].name)
                #all_cells = all_cells + layouts["layout"+ format(j)].cell(t).name()
                t = t+1

can be written much shorter and efficiently like this:

all_cells = [ cell.name for cell in layouts["layout"+format(j)].each_cell() ]

Second, you do not need to rename the cells. "copy_tree" takes care of creating unique cell names.

"copy_tree" will also take care of DBU translation which is another plus.

Third, OASIS works much better with "non-editable" layouts. Such layouts keep OASIS shape arrays without exploding them. This disables certain (few) operations associated with editing. Hence the name. But this does not mean the layout is immutable. Non-editable layouts can be created passing "False" to the constructor:

ly = pya.Layout(False)

I some KLayout versions, the default is taken from the user settings, so explicitly asking for non-editable makes sense.

I have no performance experience with the Python packages. I cannot say if their performance is equivalent to the application. I usually use the "klayout" binary in batch mode:

klayout -b -r script.py

It offers less options to access the command line, so I'll usually put a wrapper script around that and use "-rd var_name=value' to pass arguments to the script.

I have prepared a benchmark script which is the core essence of a merge script for two files:

t = pya.Timer()

print("Reading file1.oas ...")
t.start()
ly1 = pya.Layout(False)
ly1.read("file1.oas")
t.stop()
print("Time: " + str(t))
print("Number of cells in first layout: " + str(ly1.cells ()))

print("Reading file2.oas ...")
t.start()
ly2 = pya.Layout(False)
ly2.read("file2.oas")
t.stop()
print("Time: " + str(t))
print("Number of cells in second layout: " + str(ly2.cells ()))

ly = pya.Layout(False)
top = ly.create_cell("TOP")

file1 = ly.create_cell("FILE1")
top.insert(pya.CellInstArray(file1.cell_index(), pya.Trans(0, 0)))

file2 = ly.create_cell("FILE1")
top.insert(pya.CellInstArray(file2.cell_index(), pya.Trans(0, 3000000)))

print("copy_tree1 ...")
t.start()
file1.copy_tree(ly1.top_cell())
ly1._destroy # cleanup -> we don't need ly1 anymore
t.stop()
print("Time: " + str(t))

print("copy_tree2 ...")
t.start()
file2.copy_tree(ly2.top_cell())
ly2._destroy # cleanup -> we don't need ly1 anymore
t.stop()
print("Time: " + str(t))

print("Writing ...")
t.start()
ly.write("merged.oas")
t.stop()
print("Time: " + str(t))

For two files being 12MB each my numbers are the following when I run this script in batch mode ("klayout -b -r script.py"):

Reading file1.oas ...
Time: 0.07s (sys), 2.11s (user), 2.223s (wall)
Number of cells in first layout: 3926
Reading file2.oas ...
Time: 0.05s (sys), 2.12s (user), 2.167s (wall)
Number of cells in second layout: 3926
copy_tree1 ...
Time: 0.03s (sys), 2.18s (user), 2.203s (wall)
copy_tree2 ...
Time: 0.01s (sys), 3.68s (user), 3.692s (wall)
Writing ...
Time: 0.03s (sys), 2.66s (user), 2.747s (wall)

They are for 0.26.9 on a i7 with 3200MHz and reading the files from a local SSD. Not really bad numbers, I'd say. Over all it's 14 seconds.

You can improve these numbers by applying a trick: layouts can be put into "frozen" mode which does not mean they are frozen, but some internal properties are not updated automatically and wasting resources for information you don't need. This includes bounding boxes and hierarchy hints (such as child cells). As copy_tree does not need them you can use "start_changes" before the first copy and "end_changes" after the last one to disable this. This gains you a few seconds:

t = pya.Timer()

print("Reading file1.oas ...")
t.start()
ly1 = pya.Layout(False)
ly1.read("file1.oas")
t.stop()
print("Time: " + str(t))
print("Number of cells in first layout: " + str(ly1.cells ()))

print("Reading file2.oas ...")
t.start()
ly2 = pya.Layout(False)
ly2.read("file2.oas")
t.stop()
print("Time: " + str(t))
print("Number of cells in second layout: " + str(ly2.cells ()))

ly = pya.Layout(False)
top = ly.create_cell("TOP")

# put the layout into "frozen" mode in which modifications are
# faster, but certain features such as bounding boxes, region lookups
# and hierarchy traversals will not work properly
ly.start_changes()

file1 = ly.create_cell("FILE1")
top.insert(pya.CellInstArray(file1.cell_index(), pya.Trans(0, 0)))

file2 = ly.create_cell("FILE1")
top.insert(pya.CellInstArray(file2.cell_index(), pya.Trans(0, 3000000)))

print("copy_tree1 ...")
t.start()
file1.copy_tree(ly1.top_cell())
ly1._destroy # cleanup -> we don't need ly1 anymore
t.stop()
print("Time: " + str(t))

print("copy_tree2 ...")
t.start()
file2.copy_tree(ly2.top_cell())
ly2._destroy # cleanup -> we don't need ly1 anymore
t.stop()
print("Time: " + str(t))

print("Unfreeze ...")
t.start()
# leave "frozen" mode -> update bounding boxes, hierarchy hints and
# region lookup trees. Not absolutely necessary for the writer, but safer.
ly.end_changes()
t.stop()
print("Time: " + str(t))

print("Writing ...")
t.start()
ly.write("merged.oas")
t.stop()
print("Time: " + str(t))

with this change, the times are these (again, two OASIS files with 12MB):

Reading file1.oas ...
Time: 0.07s (sys), 2.15s (user), 2.218s (wall)
Number of cells in first layout: 3926
Reading file2.oas ...
Time: 0.06s (sys), 2.15s (user), 2.213s (wall)
Number of cells in second layout: 3926
copy_tree1 ...
Time: 0.02s (sys), 0.9s (user), 0.913s (wall)
copy_tree2 ...
Time: 0.01s (sys), 0.97s (user), 0.985s (wall)
Unfreeze ...
Time: 0s (sys), 2.56s (user), 2.563s (wall)
Writing ...
Time: 0.03s (sys), 1.73s (user), 1.803s (wall)

which sums to roughly 11 seconds.

I tried another test case which is a 100MB OASIS file and merged it two times into one file giving these numbers:

Reading file1.oas ...
Time: 0.16s (sys), 6.49s (user), 6.664s (wall)
Number of cells in first layout: 20648
Reading file2.oas ...
Time: 0.12s (sys), 6.52s (user), 6.642s (wall)
Number of cells in second layout: 20648
copy_tree1 ...
Time: 0.19s (sys), 5.06s (user), 5.253s (wall)
copy_tree2 ...
Time: 0.16s (sys), 5.77s (user), 5.924s (wall)
Unfreeze ...
Time: 0.02s (sys), 5.27s (user), 5.303s (wall)
Writing ...
Time: 0.19s (sys), 7.92s (user), 8.442s (wall)

Which is an overall 40 seconds and shows that the times scale pretty well with OASIS size.

Matthias

ahmedo · February 2021

Hello Matthias,

Thank you so much for your response and for the code examples that you have shared in this discussion. I know that it should take some of your time to help us resolve our layout related issues .

I have made tests with the latest release of Klayout (0.26.10) with the script I have shared with you and I've got the same runtime as with release 0.26.5.

But, when I have made the changes that you've suggested, the run time is now quite impressive. The run time changed from 23min to 2s and it's due to the command:

ly = pya.Layout(False)

With option 'False", copy_tree seems very fast. I have also added the freeze commands but I didn't change a lot for the run time, because I think finally that my oasis file is small (19MB) unlike what I have said in my initial discussion concerning this oasis file.

    ly.start_changes()
    ...
    ....
    ly.end_changes()

In addition, I've changed the script to use your simplified code for adding cells to the list all_cells. So now, I think that my script is now quite good for daily usage.

Concerning you question about using a lot of code for the merge while it could be done simply using copy_tree as in your examples. I have developed this script to give the user the ability to either rename all the cells or just the conflicting cells in the merged database. The user could sometimes want to add a specific suffix for the names of conflicting cells while copy_tree generates names automatically (without any control on that).

Thank you again for your help.

Ahmed

Matthias · February 2021

Very good!

Non-editable isn't a feature of copy_tree actually, but this means shape arrays are left compressed. This no only simplifies the work for copy_tree but also leads to a predicable and reduced memory footprint and will also make the OASIS writer's job easier. It does not need to establish shape arrays again and will rather reuse the ones it got on reading.

Matthias

Howdy, Stranger!

Categories

Merge gds/oasis files using copy_tree is very slow

Comments

Howdy, Stranger!

Quick Links

Categories

Merge gds/oasis files using copy_tree is very slow

Comments