Giter Club home page Giter Club logo

Comments (10)

mabicca avatar mabicca commented on August 24, 2024

After a few minutes, you start seeing files like:

[root@centos72 run]# pwd
/var/opt/omi/run
[root@centos72 run]# ls -la
total 5700
drwxr-xr-x. 2 root root 95 Oct 6 05:41 .
drwxr-xr-x. 8 root root 75 Oct 6 05:39 ..
-rw-------. 1 root root 2416640 Oct 6 05:40 core.1415
-rw-------. 1 root root 1880064 Oct 6 05:40 core.1418
-rw-------. 1 root root 2945024 Oct 6 05:41 core.1495

You can clearly see them growing up fast:

[root@centos72 run]# ls -la
total 12172
drwxr-xr-x. 2 root root 4096 Oct 6 05:41 .
drwxr-xr-x. 8 root root 75 Oct 6 05:39 ..
-rw-------. 1 root root 2416640 Oct 6 05:40 core.1415
-rw-------. 1 root root 1880064 Oct 6 05:40 core.1418
-rw-------. 1 root root 2945024 Oct 6 05:41 core.1495
-rw-------. 1 root root 3321856 Oct 6 05:41 core.1507
-rw-------. 1 root root 2805760 Oct 6 05:41 core.1514
-rw-------. 1 root root 2805760 Oct 6 05:41 core.1523

from oms-agent-for-linux.

jeffaco avatar jeffaco commented on August 24, 2024

I looked at something similar to this three or four months ago. While your host file is different than the one that I looked at, the fundamental problem is: Actual Linux system calls were failing to even look up the host name. We need the host name (and domain) to generate a certificate and, without it, we can't proceed.

We can certainly look at the OMS installation script to make it abort installation when this occurs. But in the end, we MUST generate a certificate for secure communications, generation of that certificate is mandatory for secure communications, and we cant generate a certificate because, with certain bad /etc/hosts files, Linux system calls fail.

So, given this, what would you suggest for us to do?

from oms-agent-for-linux.

KrisBash avatar KrisBash commented on August 24, 2024

@mabicca Can you run hostname and hostname --fqdn without error? I am not sure that the core dumps are necessarily related to the name lookup error. Let's work through them separately.

from oms-agent-for-linux.

mabicca avatar mabicca commented on August 24, 2024

Hi @jeffaco yes! :) I worked with you on that, I think if we can somehow abort the installation or show a warning about the /etc/hosts file it would be good. Is that something we can do?

@KrisBash I'll be working tomorrow and I can definitely run that and add in here too.

from oms-agent-for-linux.

mabicca avatar mabicca commented on August 24, 2024

I am getting this with hostname and hostname --fqdn

[root@centos72 run]# hostname --fqdn
centos72.nd4dyczin4gujlu0aywslhe4ig.dx.internal.cloudapp.net
[root@centos72 run]# hostname
centos72

I did notice that with the bad hosts file hostname -A fails and hostname -a returns null. It also seems that the recommended method of getting the hostname is using getent. I have not tested that yet and not sure if that would be an option.

from oms-agent-for-linux.

KrisBash avatar KrisBash commented on August 24, 2024

@mabicca
What is the output of: sudo /opt/microsoft/scx/bin/tools/scxsslconfig -f -v ? This will attempt to regenerate the certificate (the failing step of the installation) with verbose output.

Thanks

from oms-agent-for-linux.

KrisBash avatar KrisBash commented on August 24, 2024

@mabicca @jeffaco and I discussed this today. Firstly, we have had another report of core dumps being generated by the omiagent process. We have a potential fix for that under testing now. Unless we find out otherwise, I suspect the core dumps you are observing are likely related.

As for the exception in scxsslconfig. We are contemplating adding improved error messages for such a case, but in any case, we won't end up with an appropriate certificate (with a good hostname) even with better error handling. That being said, if you are using OMS only (and not System Center Operations Manager), the agent's ssl certificate is never actually used (the OMI web service does not listen on any ports by default for OMS). In that case, as long as OMI is up and running, you should be able to get performance metrics from it and into OMS. You can confirm by running: /opt/omi/bin/omicli ei root/scx SCX_ProcessorStatisticalInformation

I'll reply back when the coring fix is available

from oms-agent-for-linux.

mabicca avatar mabicca commented on August 24, 2024

Hi Kris, thank you for the update.

Yes, running the scxsslconfig throws an exception:

/opt/microsoft/scx/bin/tools/scxsslconfig -f -v
Setting debugMode=true
terminate called after throwing an instance of 'SCXCoreLib::SCXInternalErrorException'
Aborted

I can see that excluding the lines it runs fine. Is the scxsslconfig failing because it sees more than one hostname associated with the IP of the host?

I am doing some tests right now and it seems that if I split the hostname entries in different lines it stops crashing too, so that might be a possible workaround at least for now.

from oms-agent-for-linux.

jeffaco avatar jeffaco commented on August 24, 2024

@mabicca I'd like to get better error handling for this than an internal exception and a core dump! 😞 Can you please give me super-explicit steps on how to reproduce this? That is:

  • Give me the host file that is on the system,
  • Looks like you were installing omsagent-1.2.0-75, yes?
  • You have access to our build share, yes? (Follow up internally if you don't know what I'm talking about) If yes, you can test something in your environment to verify better behavior if I can provide some sort of fix?

@KrisBash said he tried to reproduce your installation error, and did not succeed. I have not tried to reproduce it, but am reaching out to you to hopefully have better success.

Feel free to submit data to me internally via E-Mail and, if we have a fix, I can come back here and close the issue.

My understanding here is that there are two independent problems:

  1. The core dump at installation (and scxsslconfig crash, they're the same), and
  2. Continuous core dumps later on

If my understanding is correct, I'm looking to solve the first issue. I believe @KrisBash already understands the second error and will provide a fix.

from oms-agent-for-linux.

mabicca avatar mabicca commented on August 24, 2024

I'm following up internally with you guys.

I just confirmed I was indeed using 1.2.0-75.

Thanks!

from oms-agent-for-linux.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.