msrccs / prajna Goto Github PK

View Code? Open in Web Editor NEW

167.0 167.0 28.0 6.65 MB

Prajna: A Distributed Functional Programming Platform for Interactive Big Data Analytics and Cloud Service Building

Home Page: http://MSRCCS.github.io/Prajna

License: Apache License 2.0

Shell 0.02% Batchfile 0.01% F# 97.64% C# 0.69% PowerShell 0.55% C++ 1.09% Objective-C 0.01% C 0.01%

prajna's People

Stargazers

Watchers

prajna's Issues

setup an cluster in azure

Looking at the examples it looks like one can pass in a PrajnaClusterFile

I can not see a sample Prajna Cluster File.

Also, it is not clear how to setup a cluster in a cloud provider like Azure, any guidance would be appreciated.

Using DSet<bool> cause exception at Prajna.Core.MetaFunction`1.EncodeFunc

For code like

    let numPartitions = 4
    let guid = Guid.NewGuid().ToString("D")
    let d = DSet<_> ( Name = guid, Cluster = cluster)
           |> DSet.sourceI numPartitions (fun i -> seq { for i = 0 to 9 do 
                                                            if i % 2 = 0 then yield true else yield false})

    let r = d.ToSeq() |> Array.ofSeq

When run it with local cluster, an exception was thrown in serializer.

The exception is

  System.ArgumentException: Offset and length were out of bounds for the array or count is greater than the number of elements from index to the end of the source collection

The stack is

   at System.Buffer.BlockCopy(Array src, Int32 srcOffset, Array dst, Int32 dstOffset, Int32 count)
   at Prajna.Tools.BufferListStream`1.SrcDstBlkCopy[T1,T2,T1Elem,T2Elem](T1 src, Int32& srcOffset, Int32& srcLen, T2 dst, Int32& dstOffset, Int32& dstLen) in C:\GitHub\Prajna\src\tools\tools\bufferliststream.fs:line 994
   at Prajna.Tools.BufferListStream`1.WriteArrT(Array buf, Int32 offset, Int32 count) in C:\GitHub\Prajna\src\tools\tools\bufferliststream.fs:line 1353
   at Prajna.Tools.Serializer.writeMemoryBlittableArray(Type elType, Array arrObj, MemoryStream memStream) in C:\GitHub\Prajna\src\tools\tools\serialize.fs:line 348
   at <StartupCode$Prajna-Tools>[email protected](Tuple`4 tupledArg) in C:\GitHub\Prajna\src\tools\tools\serialize.fs:line 379
   at Prajna.Tools.Serializer.WriteArray(Type arrayType, Array arrObj) in C:\GitHub\Prajna\src\tools\tools\serialize.fs:line 410
   at Prajna.Tools.BinarySerializer.System-Runtime-Serialization-IFormatter-Serialize(Stream stream, Object graph) in C:\GitHub\Prajna\src\tools\tools\serialize.fs:line 745
   at Prajna.Core.MetaFunction`1.EncodeFunc(BlobMetadata meta, U[] elemArray) in C:\GitHub\Prajna\src\CoreLib\function.fs:line 275
   at Prajna.Core.MetaFunction`1.EncodeFuncFromObj(BlobMetadata meta, Object o) in C:\GitHub\Prajna\src\CoreLib\function.fs:line 253

Support concurrent read from remote DSet

The current implementation doesn’t support the scenario when a DSet is concurrently read from two different threads. Effectively it means one cannot have two enumerators over the same DSet at the same time.

Is this project dead?

There is no activity since past few months.

executable not generated

hi, I built Prajna from master branch with VisualStudio. I start client without options. and I created an app, but it cannot generate any output. I checked the log, it says the process start failed, and I located the source, and try to print the executable name:

    let StartChild(startInfo : ProcessStartInfo) =
        if (CommonJob <> IntPtr.Zero) then
            printfn "---> %A" startInfo.FileName
            let proc = Process.Start(startInfo)
            let res = AssignProcessToJobObject(CommonJob, proc.Handle)
            if (not res) then

And now it prints out the path:

C:\Users\solom\Documents\Projects\Prajna\src\Client\Client\bin\Releasex64>PrajnaClient.exe
---> "C:\Prajna\Job1082\PrajnaTest.CS\A17E5B37B00DA042\PrajnaClientExt_PrajnaTest.CS.exe.exe"

Then I look into that path, that file really doesn't exist!! only one config file and many referenced dll there. Any idea?

How to specify native dll and custom file to remote?

In your paper, it says there can be non-assembly files uploaded, but I cannot find any example how to do that? any help? sorry, new to Prajna, but it is cool :)

app.config is lost in remote

I tried to use app.config to configure some stuff of my application, which contains a configSection. But this doesn't work, and then I check the file in C:\Prajna\Job1082\MyApp\XXXX, the app.config is a totally new one, which just add some FSharp.Core binding. Is this a bug? or should this be fixed?

Cannot support PCL.

I hope you soon provide PCL version to support UWP.

In the Prajna nuget package, tools\Client should not contain the .srcsrv files

In the Prajna nuget package, tools\Client should not contain the .srcsrv files. These files were generated as by product of SourceLink, should not be included in the package.

Factor network interface so that all service/contract doesn't need to consume network interface directly.

Build under Linux

To build Prajna project under Mono Environment.

Add Documentation and FAQ

Better error msg for authentication failure

When the clients are deployed with a passwd, and when the app tries to send job request to such clients but does not supply the passwd or supply wrong passwd, the request will fail, but the exception msg is something like "Job fails as some source is not available". A better error message that accurately describes the reason is needed.

When roughly a stable version will be released?

Just wondering when roughly Prajna will be released? and what is the roadmap for this project?

Default Cache Memory Limit

This issue was brought to my attention by Bruno.

One of the workload that he has written doesn't cache the data as desired. Investigation found out that the default memory size of the container is set as 1024MB. So when the data set becomes large, it is not cached, causing the performance degradation.

I would like to open this issue to document the behavior. The questions are:

Should we raise default memory size?
Should we give a warning when cache is not working (if we do, how? Should we throw an exception, print out a warning, etc).

Environment.SystemDirectory get empty string under Linux/Mono

daemon doesn't work in remote machine

Hi,

I did the following:

I built Prajna with build.cmd R from the master branch source code;
I copied the client folder to two machines
I deleted the folder C:\Prajna on both machines
I turned off the Windows Firewall completely on both machines
I started client without any options (so it will work on default port 1082)

Then I simply want to call this from remote:

        private static void SayHello(Cluster cluster)
        {
            var dset = new DSet<int> { Name = Guid.NewGuid().ToString("D"), Cluster = cluster };
            var descriptions =
                dset
                .Distribute(Enumerable.Range(0, cluster.NumNodes))
                .Select(i =>
                {
                    var gpuId = Int32.Parse(ConfigurationManager.AppSettings["GpuId"]);
                    var machineName = System.Environment.MachineName;
                    var process = System.Diagnostics.Process.GetCurrentProcess();
                    var gpu = Gpu.Get(gpuId);
                    return $"Hello from {machineName} {gpu} taskId={i} processId={process.Id} threadId={Thread.CurrentThread.ManagedThreadId}";
                })
                .ToIEnumerable()
                .ToArray();
            foreach (var description in descriptions)
            {
                Console.WriteLine(description);
            }
        }

The test result is like this:

If I use the following cluster.lst, then it WORKS:

XiangCluster,1082
localhost,1082

Also, if I use real IP, it also works (I launch the application from the same machine):

XiangCluster,1082
192.168.1.110,1082

Then if I want to add a remote machine, like:

XiangCluster,1082
192.168.1.110,1082
192.168.1.108,1082

Then it DOESN'T WORK anymore.

I checked the log of daemon on 192.168.1.110, I found something like:

============== New Log File ======================= 
160222_020627.133310,1,Info,PrajnaMachineId is 290efbd143477d11
160222_020627.173490,1,Info,Initialize network stack with initial buffers: 128 max buffers: 33554 buffer size: 128000 network threads: 2
160222_020627.215722,1,Info,Start PrajnaClient at port 1082 (1100-1150)...................... Mode x64, 1 MB
160222_020627.218012,1,Info,Minimum threads: 16, Minimum I/O completion threads: 4
160222_020627.218622,1,Info,Maximum threads: 32767, Maximum I/O completion threads: 1000
160222_020627.219319,1,Info,Available threads: 32767, Available I/O completion threads: 1000
160222_020627.220786,1,Info,Start Parameters [||]
160222_020627.228628,1,Info,All command parsed ==== true
160222_020627.261606,1,Info,Authentication parameters: pwd=empty keyfile= keyfilepwd=empty
160222_020709.983452,18,Info,GetDriveSpace, fail to retrieve remote storage information for machine 192.168.1.108, with exception System.Management.ManagementException: Access denied 
   at System.Management.ManagementException.ThrowWithExtendedInfo(ManagementStatus errorCode)
   at System.Management.ManagementScope.InitializeGuts(Object o)
   at System.Management.ManagementScope.Initialize()
   at System.Management.ManagementObjectSearcher.Initialize()
   at System.Management.ManagementObjectSearcher.Get()
   at Prajna.Core.RemoteConfig.GetDriveSpace(String machineName)
160222_020744.693316,16,Error,Prajna.Core.Task.ErrorInSeparateApp : (Close,Job) Failed to find Job Action object for Job a6dfc439-1db5-41f5-9843-569a50737867, error has happened before?

BTW, when I use the Prajna from the NuGet package, it works.

Application hangs in cleanup

Hi, I just made a test and find an issue that it hangs when doing Prajna cleanup.

I put the test project in here: https://github.com/soloman817/PrajnaTest

You can find the steps to reproduce in the README.md file.

Enourmous memory usage on example with word counting

Hi!
I just tried to run your example from https://github.com/MSRCCS/Prajna/wiki/C%23-Examples#the-first-prajna-c-example-walk-through

My text file is about 10Mb. I not expected to see memory usage about 2Gb to calculate words in this file. Yea, there may be little overhead, but can you actually tell how i can tune memory usage?
I can see from one of the issues, that you use memory cache 1Gb per node, but i cannot see any possibility, how do i can configure cache size, or even maybe use my own cache (for example, out of application memory).
One more thing i can see, node count not correspond to cache usage (for 2 or 3 nodes, i have 2Gb, for 4 nodes - 3Gb). Seems to be, node allocate memory only when it use it, but it is not clear right now for me. Could you please explain?

Question about the order in DSet sequence

Hi, I am testing Prajna today, it is very cool. But I face a strange behavior. I don't know if it is by design which I mis-understand, or if it is a bug.

First, I have two machines, both run PrajnaClient. So my cluster is composed by two nodes.

I run the test from a machine called "KINGKONG", and I think it is faster because it is local. So, in the transform, I do some heavy job (like print trace log a lot) if I see this node is "KINGKONG", to slow down it. Then after that, if I call DSet.ToIEnumerable, I get a reversed order.

The test code is like:

        static void SequenceOrder(Cluster cluster)
        {
            var dset = new DSet<int> {Name = Guid.NewGuid().ToString("D"), Cluster = cluster};
            var inputs = Enumerable.Range(0, cluster.NumNodes).ToArray();
            var outputs =
                dset.Distribute(inputs)
                    .Select(x =>
                    {
                        var machineName = System.Environment.MachineName;
                        if (machineName == "KINGKONG")
                        {
                            for (var i = 0; i < 10000; ++i)
                            {
                                Trace.TraceInformation("{0}...", i);
                            }
                        }
                        return x;
                    }).ToIEnumerable().ToArray();

            Console.WriteLine("Partitions: {0}", dset.NumPartitions);

            for (var i = 0; i < inputs.Length; ++i)
            {
                Console.WriteLine("#.{0}: {1} {2}", i, inputs[i], outputs[i]);
            }
        }

And the output looks like:

Partitions: 2
#.0: 0 1
#.1: 1 0
Press any key to continue . . .

unittest under Linux

pass unit tests under Linux.

DSet.fold zeroState is separate across node, but is shared within node.

"Job fails as some source is not available" when invoking on cluster

I've already seen #60, but in this case no passwords are used. I'm trying to invoke one of the examples (Pi) on a cluster comprising of my local machine only. When running the example with new Cluster("local[2]") it all works fine, but as soon as I use a cluster.lst I get the "Job fails as some source is not available" error.
My cluster.lst has:
Test,7077
localhost
And the client is invoked with:
-port 7077 -jobports 8888-8890

Any idea what might be causing this?

msrccs / prajna Goto Github PK

prajna's People

Stargazers

Watchers

Forkers

prajna's Issues

Recommend Projects

Recommend Topics

Recommend Org