Can duplication be removed ?

edited July 2015 in Ruby Scripting
Hi Matthias,
I have a group of designers that work on various portions of a layout. After a few months, one person (me) assembles everything into one file. This happens in several iterations over several weeks. At that point, there are a few problems:

(1) I usually import the designer's cells a new cells just in case they are different from already existing cells. Sometimes designers name cells the same but design it differently. At the end of the assembly, we have a mishmash of names and duplicate cells across all levels of hierarchy stemming from the import, i.e. "$1" and "$1$2$1$2" etc. at the end of the name. Neither the designers nor me can determine after all this time if the cells can be replaced with "stem" cells (names without the $) because we just dont remember, and we dont have the time to look at every cell. So my question is: There isnt by any chance a script out there that would recurse from the currently selected cell downwards into EVERY cell and shape in each cell, to determine if some cells are copies of others, then then replace them and rename them with a sensible name ?

(2) The designers are not too careful and place shapes and cells (through copy/paste) on top of each other by accident. I had one instance recently where a simple cell with 8 shapes in 16 cell instances had grown to more than half a million shapes. Working with a file like that is very delay-ridden. And no-one has time to inspect 5000 cells and 1 million total non-instanced shapes. There isnt by any chance a ruby script out there that would recurse into EVERY shape and cell to remove duplicately placed shapes and cell instances, is there ?

Vielen Dank !

Thomas

Comments

  • edited July 2015

    Hi Thomas,

    For detection of identical cells there probably is a scriptable solution although I think there are several issues with that approach. The tasks you describe becomes very complex in the end if you consider the variety of possible hierarchical configurations.

    For example, removing a shape in a cell which is a duplicate of another shape in a cell outside the given one can only be done when the shape is covered for all instances of the cell. Otherwise, the flat appearance of the layout will change.

    I see some other ways of dealing with your problems: I'd add a preparation step before assembling the cells:

    • Add a check to your module DRC (I assume you have one) to check for an unreasonable number of overlapping shapes (i.e. >5). Run the DRC before assembly.
    • Rename the cells with a unique prefix. For example module "A" will have cells called "A_..", module B will have cells called "B_.." etc. This will keep the cells separate.
    • Consider using libraries for common elements in the layouts avoiding duplicates (I'm not sure whether library references are retained in the merge, but this we can verify or a smart merge could do so).

    Best regards,

    Matthias

  • edited November -1
    Hi Matthias,
    I was thinking that I would want to remove duplicate shapes only within 1 cell (and then recurse), not shapes that overlap another cell's shape. Similarly, I would remove duplicate cells and instances only within one parent cell (and then recurse). You are right, it becomes complicated otherwise. I think this would clean up our designs already. Yes, we have DRC setup, but we don't use it much because our files are too large. Nothing ever finishes. However, if the DRC has overlap detection, then it may be worth for me to look at that code to adapt it to what I am trying to do. All this is probably not too hard to script. I have already set up a recursion algorithm that finds all unique cells, unique shapes, number of total shapes, number of total cells, etc.

    The harder part would be (1) - finding cells that are by their content the same as already used ones. One problem is computation time as I would have to compare each encountered cell during the recursion with all cells that I have already cataloged. The second problem is sheer memory. Due to computation time, we can only run it on a portion of our design, but we estimate that we end up with a few 10s of thousand of seemingly unique cells in the design, a few thousand of which are truly unique. We may have in the order of close to a million unique shapes, and nearly a billion total shapes. Storing all their shapes and origin coordinates in memory quickly runs up to the limit.

    In summary, brute force will probably not work. However, maybe I am thinking about this the wrong way. If I had to do it by hand, I would not do brute force. I would rather select a cell with all the suffix garbage and try to rename it to something I like. the system usually complains if there is a cell of that target name already in the layout. Maybe I need to recurse through the cell names, try to rename them, and whenever another cell of that name would already exist, I would need to compare the two. if they are the same, I can replace. If not I need to keep the garbage-suffix name. That would take care of obvious duplicates.

    Renaming the cells with a unique pre- or suffix would work yes, but that is the opposite of what I am trying to do. The import function already does this through the garbage suffixes. I need to find duplicate cells, even if they are named slightly differently by the import. Luckily, the designers name cells of certain functions always the same, so that in 99% of the instances, the name only differs by the import suffix.

    We tried the common library approach a while back. The problem is that the common cell's contents change with every new design that we start. And that is typically every 3 months. In the end, the common cells had to be changed with every design on every designers computer. Besides that, the few that dont change are only a handful. Was not working out.

    Where would I find the overlap detection code in the DRC deck ?

    Thanks for all the help !!!

    Thomas.
  • edited August 2015

    I actually observed weird problems with duplication of shapes on top of each other in earlier versions of KLayout, I think it was in 0.23.4 or something. I was never able to pin down the bug but sometimes after some scripts had run and closing and re-opening files I had many coincident shapes on top of each other when I had only originally drawn one.

    Anyway I think it has gone in 0.24-python-eval and so far not observed in 0.24, so first just try upgrading to the latest version.

  • edited November -1

    Hi all,

    I am not aware of a bug that causes duplication of shapes. Maybe that was a side effect of another bug (writer or reader?).

    Anyway, the functionality you ask for isn't straightforward to implement and there is some serious business in the area of pattern recognition, so I smell some IP violation risk here.

    However, the overlap DRC function is pretty straightforward:

    out = in.merged(2)
    

    will give all areas where 2 or more polygons overlap in layer in. The argument can be bigger - if you specify 10, you will see the areas where 10 or more polygons overlap.

    Matthias

  • edited November -1

    Thomas,

    Did you manage to find a solution to your original question #1?

    I have a similar problem to solve and am looking for an existing scripted solution before attempting my own. Or even if you only have a partial solution right now, that would be better than nothing.

    Thanks,

    David

  • edited November -1
    Hi David,
    yes, I am almost done with my script. I flat iterated through all cells in the layout and wrote a function that tests whether they are identical to other ones in the list. This includes shapes and child cells. I have found that roughly 50% of all cells are copies of each other - just named slightly different. the script only takes 60sec or so for 3000 cells in a design. Working on a cell replacement function now. Then I should end up with a unique list of cells. Will do a shape and instance overlapping detection next. That doesn't seem too hard to do now :-). I will post it when I am done.

    Matthias,
    I am almost done with that monster routine, but I am having trouble with a "simple" cell replacement function. it almost looks like there is none. there isn't by any chance a function like lv.cell(index2)=lv.cell(index1) that also updates instances, is there ? from the example useful module (replace cell from another layout), it looks like i would have to replace all instances of cell2 with instances of cell1, and then delete cell2 (or leave as "unhooked") ?

    Thanks.

    Thomas.
  • edited November -1

    Hi Thomas,

    I'm afraid you can't just replace one cell by another. But you can walk through all parents of the fell you want to replace and using Cell#each_parent_inst and identify the instances with ParentInstArray#child_inst. You can then use Instance#cell_inst to change the cell.

    You should just perform this operaion in two steps: first collect the instances in an array and then change their cell index. Changing the hierachy tree while iterating the parent instances may confuse the system.

    I'll paste some code once I find time. Maybe you can share your code (Suggestion: through a gist on GitHub). I might be able to give some more comments.

    Matthias

  • edited November -1
    Hi Matthias,
    what you are suggesting, is that what the code behind the menu item MENU>CELL>REPLACE_CELL does ?
    If so, is that accessible somewhere ? I feel like I am re-inventing the wheel ...
    Thanks.

    Thomas.
  • edited September 2015

    Hi Thomas,

    sure it's accessible somewhere: you'll find the C++ code in layLayoutView.cc, method LayoutView::cm_cell_replace. You'll just need to translate it to Ruby ...

    Or use this sample:

    # this sample will replace all instances of cell "A" by cell "B"
    # "ly" is the layout object
    
    cell_to_replace = ly.cell("A") 
    new_cell = ly.cell("B")
    
    insts_to_replace = []
    
    # collect all instances whose target cell to replace
    cell_to_replace.each_parent_inst { |pi| insts_to_replace << pi.child_inst }
    
    # actually replace the targets by the new cell
    insts_to_replace.each { |ci| ci.cell_index = new_cell.cell_index }
    

    Matthias

  • edited November -1
    Oh boy, that created a mess ! I had written up a very similar routine.
    I ran it and it replaced the cells in all instances correctly, but I think it did just take
    over the parameters from one instance. Still troubleshooting, but I think before replacing
    the cell index, one needs to save the array and trans parameters, and then re-imprint them
    onto the replaced instance.


    in my function, myCell2 is to be replaced by myCell1:

    myInstsInParents = []
    myCell2.each_parent_cell do |myParentInd|
    myParent=myCell2.layout.cell(myParentInd)
    if myParent.child_cells>0 and myParent.child_instances>0
    myParent.each_inst do |myInstInParent|
    if myInstInParent.cell_index==myCell2.cell_index
    # found an instant that needs to be replaced
    myInstsInParents.push(myInstInParent)
    end
    end
    end
    end
    # puts "Found a total of #{myInstsInParents.count} instances of #{myCell2.display_title} to be replaced with # myCell1.display_title}"
    if myInstsInParents.count>0
    for i in 0 .. myInstsInParents.count-1
    # puts "#{i+1}/#{myInstsInParents.count}:#{myInstsInParents[i].cell.display_title} in #{myInstsInParents[i].parent_cell.display_title}"
    # here is the final replacement
    myInstsInParents[i].cell_index=myCell1.cell_index
    # MESS !!!! NEED TO COPY OVER ALL THE PARAMETERS ALSO
    end
    end
  • edited November -1

    Hi Thomas,

    basically copying the parameters should not be required. Your code is basically identical to the one I have given above (apart from the way you iterate arrays and the way you find the parent instances).

    Just replacing the cell_index should replace the cell index and leave the transformations the same.

    But I understand that is exactly your problem: you are saying you took one instance as a template and wanted to replace another instance with this one. Maybe that template instance already carried a transformation? In that case, just replacing the cell index is truly not sufficient. But the question is what should happen to other instances of the template cell.

    I assume you have one template instance I1 of cell C1 and another instance I2 of a cell C2 which you identified to be duplicates. I1 carries a transformation T1 and I2 carries a transformation T2. Other instances of C2 may carry other transformations. Applying as a general rule

    i2.cplx_trans = i2.cplx_trans * i2_initial.cplx_trans.inverted * i1.cplx_trans
    i2.cell_index = i1.cell_index
    

    (here i2 is the I2 instance to treat, i2_initial is the C2 instance that you identified to be identical to I1). That is a general recipe and it makes sure that for i2 == i2_initial, the result is that i2_initial becomes identical to i1 (the non-inverse of i2's transformation and the inverse cancel and i1.cplx_trans remains).

    Or maybe I misunderstood your remark.

    Matthias

  • edited November -1
    Hi Matthias,
    it was a bit more complicated. You were right, the basic and complex parameters are not required to be replaced. They remain with the original object. The problem with my code was the recursive nature of the index replacement. Took a while to iron out, but now it works. The script now runs through all cells and a) finds duplicately placed shapes and instances and deletes them and b) iterates through all cells and replaces the ones that are different only by name. This is really helping us especially after imports.
    Only thing left for me to do is cleaning up the cell names, working on it now. I want to rid the names of the "$" and "_WB" suffixes they pick up somehow. I see that the methods "basic_name", "display_title", and "name" for CELL, and I noticed that these contain the same string for regular cells. Got 2 questions w.r.t. this:

    (1) I am a little fuzzy on the treatment of PCELLS (that we use heavily now). Basic_name is the library, and title is the given name (we name it during PCELL generation with certain dynamic values) ? I can only set "name" ? May not be an issue as import replaces pcells with equivalent regular cells anyway. I have to look into the response of my code on PCELLs more anyway. it almost looked like that replacement was not the same way.

    (2) I have not tried yet, but what happens if I force a name on a cell that is already used on another cell ? Would I have to check all other cells' names first ? I guess "behind the scenes" the system still distinguishes the cells via cell_index, but I don't want to create strange lists in the front panel with duplicate names.

    Thanks.

    Thomas.
  • edited November -1

    Hi Thomas,

    I missed the part about PCell's - this will complicate things since maintaining their instances is more tricky. Are your duplicates PCell actually instance duplicates?

    Regarding the $ suffixes: they are created when KLayout encounters identical cell names. If you import another layout having cells with existing names, they will be made unique by adding a proper suffix ($1, $2 ...). Also, if you call Layout#add_cell or Layout#create_cell the name you specify may not be the actual name used. If a cell with that name already exists, it will be made unique too.

    But beware (and this is the answer to (2)): by renaming a cell you can force a cell's name to a specific one without the suffix. This way you can create cell name duplicates. This does not create issues for KLayout (it will use the unique cell index to identify a cell), but when you save such a file to GDS for example, the GDS file may not be a valid one or the cells with identical names may become clubbed together and form a new single cell with magic effects.

    PCell naming needs an explanation: A PCell has actually two names. The "basic name" is the PCell's identifier in the PCell library (i.e. "TEXT" for Basic.TEXT). A PCell always has a representative cell (a "proxy" cell) which contains the generated layout. This proxy cell is like a usual cell and is subject to the same constraints. The difference to a normal cell is that it is linked to the PCell and carries parameters which allow it to regenerate itself when the parameters change. Since a PCell may have different instances with different parameters, there may be multiple proxy cells. Their names are derived from the basic name plus an optional suffix to make it unique.

    Finally, a PCell has a display text (this is the text shown in the cell tree). The display text is not a cell name but a description which tells the user the content of the cell. The display text is created by the PCell code and usually lists the primary parameters. The display text is not used in the layout database.

    Matthias

  • edited November -1
    Hi Matthias,
    I ended up writing code for suggesting a cell name and checking against all cells.
    After all duplicate shapes and cells are removed, and all cells are checked against
    each other by content, the remaining cells that have names that are close to each other
    must retain a name that is different. So maybe I will just keep the $1's for those.

    Your comment about pcell instancing worries me. We use these a lot now. Can you explain
    what you mean by maintaining their instances ? I ran my code on a few PCELL-heavy
    test layouts and I did not notice any obvious wrongdoing. I rely on child_cells and child_instances
    to iterate through anything contained in all cells, and it looks like PCELLS are treated
    correctly - meaning as if they were just regular cells. The only difference that I see
    is that they have a different display_title.

    Thanks for all the help!

    Thomas.
  • edited September 2015

    Hi Thomas,

    I did not intend to scare you with my comment about PCell's ... :-)

    What I wanted to say is that if you have a PCell and you want to detect duplicates that might mean you can test their PCell parameters to check whether they are identical.

    I think of the following case:

    You have two separate layouts done independently but using the same PCell with identical parameters, hence identical layout. If you merge both layouts, each layout will contribute one variant and I think (without having confirmed that) KLayout will produce two identical cells with differently suffixed names.

    Maybe that is the issue you are fighting against?

    A solution for such a case is

    1. KLayout joins these cells by itself
    2. A script detects the identity by comparing the PCell parameters

    Option 1.) is nice for sure, but right now that's not there. It's surely feasible, but bears some risk of modifying the layout unintendedly. Consider a case where layout A was done with a different version of the PCell cell (hence different geometry) but the same parameters than the same PCell in layout B which was done with another version of the PCell code. The layout files will store the geometry and not consult the PCell code for an update unless you request KLayout to do so. That will maintain the layout even if the PCell code has changed. Merging the PCell variants into the same cell will then modify one of the layouts because it replaces the PCell's layout. That's probably an unlikely and borderline case, but in my personal opinion safekeeping a layout's geometry has precedence over optimizing a layout's hierarchy. There are few things more annoying than 'magic' layout changes on your photomask.

    Option 2.) is close to what you are doing right now, but the compare procedure is a different one and involves extraction of the PCell parameters. But the term "complicated" was not chosen wisely - comparing layouts geometrically is probably more complicated in terms of coding. The final procedure of replacing the cells is actually exactly the same for PCell proxy cells than for a normal cells. As I mentioned, a PCell proxy is a perfectly normal cell with an added link to the PCell object.

    Or in other words:

    DON'T PANIC

    Matthias

Sign In or Register to comment.