Caching
go/types
string provides a struct in the following format: struct{ID int; Name string; Password string; Email string}
. The issue here is that a user can define two types (in different packages) with the exact same name and fields (i.e Account
) and the string will resolve to the same go/types
string. In addition, each go/types
type is a different object, regardless of its definition.
Problem
In most practical cases, this isn't an issue. A user will likely copy two types with the same name, but different fields. However, when this is not the case, checking for cyclic activity doesn't work. This means a user can run into stackoverflow while parsing a struct (interface and/or func) that is defined with similar fields (in a single setup file) regardless of its name or package.
Fix
There are two issues we must solve whenever we cache fields.
-
A unique key per type definition (using information from go/types
).
-
Keeping the cache valid: For example, setting the cache with a field, then filtering that field for depth will result in the field being modified and no longer valid.
1
Adding the go/types
package to the cache's key will prevent issues where similar structs or interfaces result in an invalid cache hit. However, this does not prevent two types in the same package (and import) from incorrect caching (with the same fields). In order to correctly cache a field, we need the actual name, package, and import of the struct, interface, or name, provided by *types.Named
.
In contrast, basic and simple composite types do not inherently have names; only when they are struct fields. As a result, basic and simple composite type go/types
String()
s can be used to provide cached fields - that should be cached without a name, parent, or package - and have these values (including variable name) explicitly set in structs. This is done by moving the responsibility for the assignment of variable name and field name to the fieldParser
struct.
Cache needs to be checked in case Named
and when go/types
string isn't a struct or interface. This is done by implementing the caching method explained in the previous paragraph.
if type.String() != struct {
// check cache of type string
// set name, variable name using fieldParser
}
// in the Named case
if struct or interface {
// check cache of type string + name + importpkg
// or in case of cache fail
// return underlying with set name, variablename of fieldParser
}
A named func
is not considered a type, and thus invalid.
The final issue lies with cyclic fields, where a field must be parsed before it can be deepcopied to the cache, but waiting to do this means inevitably waiting on the other cyclic field, which continues until a stack overflow. An intuitive flow of the program (for cyclic support) works like this:
// check for cached field and return
// otherwise parse field
// store field prior to parse
// parse subfield
// calls main parse which will
// a. Find cyclic in cache and return pointer
// b. Parse a new field
// pointer field is eventually completed; which also updates the cyclic pointer
If we want to use field caching for performance, we require:
// check for cached field and return
// otherwise parse field
// parse subfield
// calls main parse which will
// a. Find cyclic in cache and return a pointer (that is eventually deepcopied).
// b. Parse a new field
// store completed field as a deepcopy
In the intuitive flow, assigning a cyclic type is deceiving: Simply point the cyclic type to the cached field, right? Wrong. Using multiple of the same cyclic type means you will assign multiple unassociated fields to a single cyclic type, which complicates the matcher when AllFields
is inevitably called OR when a user selects the parent of a cyclic type expecting a unique field, but receiving the universal cyclic type.
As a result... We are forced to use field caching assignment. The complex part of this method is the following: When you store a deepcopy (of an incomplete field), you will receive a deepcopy of an incomplete field back. As a result, we must set cache (and cyclic) hits to a pointer that is eventually deepcopied. This entails checking the cache up front, then defer
ing the assignment of a pointer (to a deepcopied field).
// check for cached field
// defer assignment of new pointer after cached field is stored.
// otherwise parse field
// parse subfield
// calls main parse which will
// a. Find cyclic in cache and return a pointer (that is eventually deepcopied).
// b. Parse a new field
// store completed field as a deepcopy
// deferred statement is called and pointer is assigned with a deepcopy
You can't really defer an assignment so... But before we even figure that out, there is one other issue. Deepcopying a cyclic field's fields implies that it's field's eventually point to a cyclic field that... This issue is already solved and implemented in other methods.
How do we defer an assignment to a pointer?
You can't. However, we can manage the "immutable" cache (discussed in the next section). Instead of checking the cache for a valid field, we can always build a field from the cache (using an additional field to identify cyclic types). Then, when we get a deepcopy of the field, it will always be complete. Essentially, we always return a deepcopy of the top-level typefield (using cached subfields to speed up the process).
// if field is cached, create deepcopy in cache, point it, return it, then name the copy
// else; create incomplete entry
// parse field pointer (from cache)
// parse subfield
// calls main parse which will
// b. Parse a new field
// a. Find type in cache
// point subfield to main field
// name accordingly
// substitute if cyclic with incomplete entry
// return deepcopy of cached field
// which is named accordingly
FlowExample(string) account
1: [cached string], deepcopied string
2:
account [incomplete cached]
deepcopied string [to cache]
[cached int], deepcopied int [to cache]
cyclic ->
3:
account [incomplete cached]
deepcopied string [to cache]
deepcopied int [to cache]
cyclic [incomplete cached]
deepcopied string [to cache]
deepcopied int [to cache]
account deepcopied with Fields[0], renamed, fields redirected to account.Fields, an parent to cyclic
4:
account [incomplete cached]
deepcopied string
deepcopied int
cyclic [complete cached], deepcopied
string
int
account
5:
Account is cached, then deepcopy (of entire tree) is returned as a typefield.
The cache has a size of 13 fields: 1 string, 1 int, 1 cyclic (with 2 fields), 1 real account (with a cyclic deepcopy +4, a fake account, and 2 fields). A total of 6 fields return from the parse when the account is deepcopied or encountered again.
The size of a field is 192 bytes so 1 GB of RAM (1073741824 bytes or ~1 billion bytes) holds up to 5.6 million total fields or ~932067 account parses after the first one.
2
Instead of caching the field directly, we must maintain a deepcopy of the field for re-assignment of other fields. There are many ways to do this but it's often complex. One way to simplify this is by setting options at the function level (when all field's have been parsed) instead of the field level.
Using an "immutable" cache allows us to always store
s or get
a deepcopy of a field (and it's fields). In this way, we can return a deepcopy of a field (and set it's parent explicitly) on each cache hit and modify that field without creating other implications in the program.
Tests
Add a duplicate cyclic type in cyclic to test the "intuitive logic". Programmatic run of tests using a global cache that is not reset ensures everything works correctly.