Caching Block Guidance?

Topics: Caching Application Block
Jun 16, 2008 at 3:19 PM
Hi,

I'm still using EntLib 3.1. Checked out 4.0 but the caching is pretty much the same?

The problem I have is that our webservice has to respond quickly to actions that take about 20 minutes to complete. So, we will have to cache these actions every night. What I made is a class that sort of wraps around the CacheManager. It's needed to make sure something doesn't get cached a second time while the first one is still being cached + it also does the calculation on a different thread. Could someone please point out things that might be dangerous and perhaps a way to do it better? I'm asking this since the CAB is something of a mistery to me and very fragile apparently. Thx in advance!

Here is what I have for now:
The Cache class that has one CacheManager. It keeps track of what's being cached and it launches the calculations in a new thread. It throws an exception if it's being cached or started caching. Or Nothing if you choose to not throw exceptions.

Imports Microsoft.Practices.EnterpriseLibrary.Caching
Imports System.Threading
Imports System.Runtime.Serialization

Public Class Cache

#Region "Singleton"
    Private Shared _instance As Cache = Nothing

    Private Sub New()

    End Sub

    Public Shared Function GetObject() As Cache
        If _instance Is Nothing Then
            _instance = New Cache()
        End If
        Return _instance
    End Function
#End Region

#Region "Exceptions"
    Public Class InProcessException
        Inherits Exception

        Public Sub New(ByVal str As String)
            MyBase.New(str)
        End Sub

        Public Sub New(ByVal str As String, ByVal inner As Exception)
            MyBase.New(str, inner)
        End Sub

        Public Sub New(ByVal info As SerializationInfo, ByVal context As StreamingContext)
            MyBase.New(info, context)
        End Sub
    End Class

    Public Class StartedProcessingException
        Inherits Exception

        Public Sub New(ByVal str As String)
            MyBase.New(str)
        End Sub

        Public Sub New(ByVal str As String, ByVal inner As Exception)
            MyBase.New(str, inner)
        End Sub

        Public Sub New(ByVal info As SerializationInfo, ByVal context As StreamingContext)
            MyBase.New(info, context)
        End Sub
    End Class
#End Region

    Private _cache_manager As CacheManager = Nothing

    Private _in_progress As New List(Of String)

    Private ReadOnly Property Store() As CacheManager
        Get
            If IsNothing(_cache_manager) Then
                _cache_manager = CacheFactory.GetCacheManager("SampleCacheManager")
            End If

            Return _cache_manager
        End Get
    End Property

    ''' <summary>
    ''' Generate the cache.
    ''' </summary>
    ''' <param name="parameters"></param>
    ''' <remarks></remarks>
    Private Sub GenerateThreaded(ByVal parameters As Object)
        Dim p As ToCache = parameters("process")
        Dim key As String = parameters("key")
        Dim expirations As List(Of ICacheItemExpiration) = parameters("expirations")
        Dim refresh_action As ICacheItemRefreshAction = parameters("refresh_action")
        Dim o As Object = p.Invoke(parameters("key"), parameters("original_parameters"))

        Store.Add(key, o, CacheItemPriority.Normal, refresh_action, expirations.ToArray)

        _in_progress.Remove(key)
    End Sub

    ''' <summary>
    ''' Put all your parameters in a hashtable ... This is the Delegate for an action that can be cached.
    ''' </summary>
    ''' <param name="parameters"></param>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Delegate Function ToCache(ByVal key As String, ByVal parameters As Hashtable) As Object

    Public Function RunByCache(ByVal key As String, ByVal process As ToCache, ByVal original_parameters As Hashtable, ByVal return_instantly As Boolean) As Object
        Return RunByCache(key, process, original_parameters, return_instantly, CacheItemPriority.Normal, Nothing, Nothing)
    End Function


    Public Function RunByCache(ByVal key As String, ByVal process As ToCache, ByVal original_parameters As Hashtable, ByVal return_instantly As Boolean, ByVal priority As CacheItemPriority, ByVal refresh_action As ICacheItemRefreshAction, Optional ByVal expirations As List(Of ICacheItemExpiration) = Nothing, Optional ByVal throw_exceptions As Boolean = True) As Object
        ' If it's still in the cache, do fetch and return it.
        If Store.Contains(key) Then Return Store.Item(key)

        ' If it's still in the process of being calculated notify the caller.
        If throw_exceptions AndAlso _in_progress.Contains(key) Then Throw New InProcessException(String.Format("The {0} action is still in process.", key))

        ' If it is not yet cached check if we should return a result immediately.
        If return_instantly Then
            ' Add it to the cache and return it.
            Dim o As Object = process.Invoke(key, original_parameters)
            Store.Add(key, o)
            Return o
        Else
            ' If it should be executed in a thread start doing this stuff ...
            _in_progress.Add(key)
            Dim parameters As New Hashtable
            parameters.Add("original_parameters", original_parameters)
            parameters.Add("process", process)
            parameters.Add("key", key)
            parameters.Add("expirations", expirations)
            parameters.Add("refresh_action", refresh_action)
            ThreadPool.QueueUserWorkItem(New WaitCallback(AddressOf GenerateThreaded), parameters)
            If throw_exceptions Then Throw New StartedProcessingException(String.Format("Started processing {0}.", key))
            Return Nothing
        End If
    End Function

    Public Function Contains(ByVal key As String)
        Return Store.Contains(key)
    End Function
End Class


And then a class that uses this. It has a DoLogic function that is callable by the Delegate in the Cache class. Every parameters is given to this function in a hashtable to keep things simple. usually though the parameter list will be nothing. It also has a function (GetStuff) that should be called every time. And a Refresh function that is used to automatically refresh the cache.

<Serializable()> _
Public Class Action
    Implements ICacheItemRefreshAction

    Public Shared Function GetStuff(ByVal customer As String) As List(Of String)
        Dim parameters As New Hashtable
        parameters.Add("customer", customer)
        Dim result As List(Of String) = Nothing

        Dim exps As New List(Of ICacheItemExpiration)
        exps.Add(New Expirations.SlidingTime(New TimeSpan(0, 5, 0)))
        result = Cache.GetObject.RunByCache(customer, AddressOf DoLogic, parameters, False, CacheItemPriority.High, New Action, exps)

        Return result
    End Function

    ''' <summary>
    '''
    ''' </summary>
    ''' <param name="key"></param>
    ''' <param name="parameters"></param>
    ''' <returns></returns>
    ''' <remarks>This function should always be self-contained based on the key. But just in case it can take a parameter hashtable.</remarks>
    Public Shared Function DoLogic(ByVal key As String, ByVal parameters As Hashtable) As Object
        Dim l As New List(Of String)

        Thread.Sleep(1000)
        l.Add("1." & key & " " & Date.Now.ToShortTimeString)
        Thread.Sleep(1000)
        l.Add("2." & key & " " & Date.Now.ToShortDateString)
        Thread.Sleep(1000)
        l.Add("3." & key & " " & Date.Now.Ticks)
        Thread.Sleep(1000)
        l.Add("4." & key & " " & Date.Now.ToString)
        Thread.Sleep(3000)
        l.Add("5." & key & " :)")

        Return l
    End Function

    Public Sub Refresh(ByVal removedKey As String, ByVal expiredValue As Object, ByVal removalReason As CacheItemRemovedReason) Implements ICacheItemRefreshAction.Refresh
        Dim exps As New List(Of ICacheItemExpiration)
        exps.Add(New Expirations.SlidingTime(New TimeSpan(0, 5, 0)))

        Cache.GetObject.RunByCache(removedKey, AddressOf DoLogic, Nothing, False, CacheItemPriority.High, New Action, exps, False)
    End Sub
End Class

Jun 16, 2008 at 4:05 PM
Hi,

You have a lot of code in there :)

I can see your singleton implementation is not thread safe, and neither is the way you manage the contents of _in_progress. You can read about thread safe implementations of the singleton pattern in http://www.yoda.arachsys.com/csharp/singleton.html; it targets C# but should also apply to VB.NET.

It's hard to tell from your code how this cache is to be used. What should happen with the clients of the cached information while it's being calculated, and how would the custom exceptions be handled?

Fernando

Jun 17, 2008 at 8:09 AM
Fernando,

Thx for the quick reply. I'll most certainly look into making it thread safe. The idea is that this will be implemented in a webservice and a special error will be thrown when the calculation is still in progress. The clients will know how to handle it. If something needs to get cached but can be calculated quickly the return_instantly boolean is set to True and it will get cached and returned. If we know it'll run for a long time it will be calculated in a different thread and either nothing or an error will be returned. That's the logic of it :)

But I wonder ... if it runs in IIS and I create multiple instances (with a pool) for performance issues ... would the instances known what the others are calculating? I assume they wouldn't. Is there a way to work around this? Perhaps put the caching logic in a seperate service? Because the service that the clients will see has to do a lot of other things as well.
Jun 17, 2008 at 1:17 PM
Hi,

EntLib's cache is not distributed across app domains, but there's integration to NCache's distributed cache that might help you. You will likely have to distribute your internal control structures too, so it might make sense to keep the calculation in a different service but it will likely become a bottleneck.

Fernando


SpoBo wrote:
Fernando,

Thx for the quick reply. I'll most certainly look into making it thread safe. The idea is that this will be implemented in a webservice and a special error will be thrown when the calculation is still in progress. The clients will know how to handle it. If something needs to get cached but can be calculated quickly the return_instantly boolean is set to True and it will get cached and returned. If we know it'll run for a long time it will be calculated in a different thread and either nothing or an error will be returned. That's the logic of it :)

But I wonder ... if it runs in IIS and I create multiple instances (with a pool) for performance issues ... would the instances known what the others are calculating? I assume they wouldn't. Is there a way to work around this? Perhaps put the caching logic in a seperate service? Because the service that the clients will see has to do a lot of other things as well.


Jun 23, 2008 at 1:31 PM
Fernando,

I think I fixed the multithreaded problems by locking every time _in_progress is accessed or updated and also by immediately initializing the Cache instance.

So, now the big problem would be to make sure every instance of the webservice is aware of what's been cached and what is being cached. This might sound silly and stupid .... but ... it's impossible to share an object between applications out of the box with the .NET framework, right?

Thanks for pointing me to NCache. But I'm trying to do it without third party code first (except EntLib).
Jun 23, 2008 at 3:23 PM
Hi,

It seems to me you're trying to use a cache as a distributed data structure. If that's the case, you might get in trouble because getting that right is hard.

Regarding the object sharing question, I'm not sure what you're really looking for. Distributed caches (or even caches that use some persistence mechanism) typically rely on serialization to keep copies of shared objects in the different app domains, but if you want a single object instance to be accessed from different apps (probably not) you would use a remoting mechanism.

You may want to look at Microsoft's Velocity project. It's still on the bleeding edge, though.

Fernando



SpoBo wrote:
Fernando,

I think I fixed the multithreaded problems by locking every time _in_progress is accessed or updated and also by immediately initializing the Cache instance.

So, now the big problem would be to make sure every instance of the webservice is aware of what's been cached and what is being cached. This might sound silly and stupid .... but ... it's impossible to share an object between applications out of the box with the .NET framework, right?

Thanks for pointing me to NCache. But I'm trying to do it without third party code first (except EntLib).


Jun 23, 2008 at 4:23 PM
Fernando,

Thx! Your right I think that I'm more after a distributed data structure. I guess I'm better of simplifying it a little then ... and look at Velocity once it's usable in production. I'm probably just going to skip on using the CAB as well since it's extremely difficult to keep things organised once you have multiple instances. To me it seems like it isn't intended for storing the results of realy long calculations.

Thx for all your help though :)
Jun 23, 2008 at 4:41 PM
Hi,

I wouldn't say the CAB is not for storing the results of long calculations! It's just that it works as a local cache with simple Set and Get primitives, so it doesn't deal with the kind of additional features you're looking for like distributed locking.

IMO a cache is a convenience. If the data is there you get it quickly, but if it's not then you take the longer route and optionally add it to the cache. If two threads find that the data isn't available they would both take the long route, and one would eventually set the result for the benefit of other threads down the rod. Now, if the identity of the cached value is important then you would need to add some gate keeping before hitting the cache but that would hurt performance. When you involve the cache in your algorithm's logic (other than checking for the data in the cache and adding it if calculated, of course), you have a different beast with different requirements.

In any case, use the tool that better suite your needs.

Regards,
Fernando



SpoBo wrote:
Fernando,

Thx! Your right I think that I'm more after a distributed data structure. I guess I'm better of simplifying it a little then ... and look at Velocity once it's usable in production. I'm probably just going to skip on using the CAB as well since it's extremely difficult to keep things organised once you have multiple instances. To me it seems like it isn't intended for storing the results of realy long calculations.

Thx for all your help though :)


Jun 24, 2008 at 11:00 AM
Fernando,

Again, I agree :) Although I probably didn't phrase it correctly. What I meant was that a cache just isn't a good place to store data that's required by multiple applications and instances. I only realised this back then lol. What I'll do is make the longer paths, have it all save the data somewhere else and if I want it to be more performant I'll implement the CAB so that it refreshes it's data once the general store is updated. But for now it seems overly complex to store my data.

Thx for all the help and insights! I know what I have to do now :)