Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need function to update all indexes for a table #3

Open
colestew opened this issue May 12, 2015 · 5 comments
Open

Need function to update all indexes for a table #3

colestew opened this issue May 12, 2015 · 5 comments
Assignees
Milestone

Comments

@colestew
Copy link
Contributor

Would be useful for indexed fields that change at fixed times, such as "date" objects. Consider the following code:

    @HgValue(value="age", index= HgIndexStyle.ORDERED)
    public int getAge() {
        Date today = new Date();
        Calendar cal = Calendar.getInstance();
        cal.setTime(today);
        int yearNow = cal.get(Calendar.YEAR);
        cal.setTime(birthday);
        int yearThen = cal.get(Calendar.YEAR);
        return yearNow - yearThen;
    }

It would be useful to have a quick way of updating the index when there is a new year automatically. Granted, this is a silly example, but if I think of a better one I'll comment here. You could think of a similar example where the result depends on the day and not the year.

@colestew colestew self-assigned this May 12, 2015
@colestew colestew added this to the 0.7.0 milestone May 12, 2015
@dilijev
Copy link
Contributor

dilijev commented May 13, 2015

I'm not sure I agree that the feature described (updating at fixed times) is a good idea. First of all, we don't fully understand use cases which span multiple runs of the program, as that would involve reading back in the data. If the program is running for a long period of time on a single day or multiple days, this is also potentially something to consider, but I'm not sure our library should handle these updates automatically. As we have previously discussed, hidden or poorly-understood automatic behavior is bad for us as a library. Our users should have a clear idea of what is happening, and should explicitly enable any fancy behavior.

The core issue here is that whenever an @HgValue is calculated based on dynamic data like the current data and time, or random numbers, there is no way for us to keep the database consistent. We can provide a method to update the database and require the user to do that when they want to bring their results to the present, but the results would become quickly out of date unless the dynamic data is stored somewhere in can be referenced. (So without storing the current value and depending on that, this seems like an unworkable solution.)

I think this becomes a design problem for the client, which we should mention in the documentation: instead of relying on dynamic data sources, the client should opt to store those values as data inside of their objects, or as the value of a static field (although I'm less sure about the static field, see below).

As of now, we have not defined the semantics for updates to static fields. It is a bad idea to update the indexes of all values which depend on static fields for EVERY object that currently exists because then the time it takes to do the update is linear in the number of objects of a particular type which currently exist, which the user will not be expecting for. This significantly changes the expected time complexity of client code, which is something we have been trying to avoid. Every operation the user performs should have a time complexity no worse than the time complexity that same operation would have if they were not using the MercuryDB hooks, unless that increased time complexity is worth it for the speed up when performing queries.

(That is, for updating any single @HgValue if an unordered index is used, it should take an additional O(1) operation to update that value in the index, and for an ordered index, it should take an additional O(log(n)) operation. This means that for simple data references, updates for ordered indexes will take slightly longer, but for any other kind of update operation which involves a O(log(n)) update or slower, the runtime is guaranteed to be unaffected.)

@dilijev
Copy link
Contributor

dilijev commented May 13, 2015

For the moment I think we should discourage creating @HgValues which depend on static fields, until we have a clear way of dealing with the issue.

I think we should come up with an answer to this problem which clearly indicates to the client that the operation of updating a static field will be linear in the number of objects. For instance, from a database point of view, there is no such thing as data which is common to every row of a table. Thus, having values which are backed by static fields is nonsense from a database perspective. However, if someone wants to have a static field, it will be accessible from any object returned by queries in HgDB, so it is in essence duplicated to every object. Thus, perhaps we can provide a method to updateAllXyz where xyz is a static value, thus advertising to the developer that this operation will take some time.

So I think what needs to happen is that we should create new annotation types for values based on static fields @HgStaticValue, @HgStaticUpdate, and then we should mark any regular values which depend on these fields @HgDependsOnStaticValue and all of the associated logic.

@dilijev
Copy link
Contributor

dilijev commented May 15, 2015

We should consider the case that you have an @HgValue which depends on both static and non-static fields. Like "canSmoke" could depend on the age of the Person and the current legal age for smoking (a static value).

@dilijev
Copy link
Contributor

dilijev commented May 15, 2015

This also brings up the question of what we should do when a user declares a schema where some of their @HgValues depend on state inside of an object they reference. In the case that the referenced object is updated, we need a way to update all of the objects that referenced that object. The static update potentially seems like a generalization of the necessary logic. This suggests that the static update MAY or MAY NOT need to be considered as a special case. If we can find an optimization that can only work for static fields then we should consider that a special case, otherwise both will be handled the same way.

@dilijev
Copy link
Contributor

dilijev commented May 15, 2015

If the answer is that the user should do a query for all the objects that match a certain predicate when they perform the given update, then I believe we can find a way to automate that update based on declared data dependencies. We have to discuss how to annotate the schema to make this possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants