Giter Club home page Giter Club logo

nhtmlunit's Introduction

HtmlUnit

Version 4.2.0 / June 05, 2024

❤️ Sponsor

Maven Central OpenSSF Scorecard

Homepage

htmlunit.org

HtmlUnit@mastodon | HtmlUnit@Twitter

HtmlUnit Kanban Board

Check out HtmlUnit satellite projects, such as:

Note as well that you can use HtmlUnit with Selenium via their htmlunit-driver!

Sponsoring

Constantly updating and maintaining the HtmlUnit code base already takes a lot of time.

I would like to make 2 major extensions in the next few months

For doing this I need your sponsoring.

Get it!

Maven

Add to your pom.xml:

<dependency>
    <groupId>org.htmlunit</groupId>
    <artifactId>htmlunit</artifactId>
    <version>4.2.0</version>
</dependency>

Gradle

Add to your build.gradle:

implementation group: 'org.htmlunit', name: 'htmlunit', version: '4.2.0'

Vulnerabilities

List of Vulnerabilities

Security Policy

Overview

HtmlUnit is a "GUI-less browser for Java programs". It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc... just like you do in your "normal" browser.

It has fairly good JavaScript support (which is constantly improving) and is able to work even with quite complex AJAX libraries, simulating Chrome, Firefox or Internet Explorer depending on the configuration used.

HtmlUnit is typically used for testing purposes or to retrieve information from web sites.

Features

  • Support for the HTTP and HTTPS protocols
  • Support for cookies
  • Ability to specify whether failing responses from the server should throw exceptions or should be returned as pages of the appropriate type (based on content type)
  • Support for submit methods POST and GET (as well as HEAD, DELETE, ...)
  • Ability to customize the request headers being sent to the server
  • Support for HTML responses
    • Wrapper for HTML pages that provides easy access to all information contained inside them
    • Support for submitting forms
    • Support for clicking links
    • Support for walking the DOM model of the HTML document
  • Proxy server support
  • Support for basic and NTLM authentication
  • Excellent JavaScript support

Getting Started

You can start here:

Contributing

Pull Requests and all other Community Contributions are essential for open source software. Every contribution - from bug reports to feature requests, typos to full new features - are greatly appreciated.

Last CI build

The latest builds are available from our Jenkins CI build server

Build Status

Read on if you want to try the latest bleeding-edge snapshot.

Maven

Add the snapshot repository and dependency to your pom.xml:

    <!-- ... -->
    <repository>
      <id>OSS Sonatype snapshots</id>
      <url>https://s01.oss.sonatype.org/content/repositories/snapshots/</url>
      <snapshots>
        <enabled>true</enabled>
        <updatePolicy>always</updatePolicy>
      </snapshots>
      <releases>
        <enabled>false</enabled>
      </releases>
    </repository>

    <!-- ... -->
    <dependencies>
      <dependency>
          <groupId>org.htmlunit</groupId>
          <artifactId>htmlunit</artifactId>
          <version>4.3.0-SNAPSHOT</version>
      </dependency>
      <!-- ... -->
    </dependencies>

    <!-- ... -->

Gradle

Add the snapshot repository and dependency to your build.gradle:

repositories {
  maven { url "https://s01.oss.sonatype.org/content/repositories/snapshots" }
  // ...
}
// ...
dependencies {
    implementation group: 'org.htmlunit', name: 'htmlunit', version: '4.2.0-SNAPSHOT'
  // ...
}

License

This project is licensed under the Apache 2.0 License

Development

useful mvn command lines

setup as or refresh the eclipse project

mvn eclipse:eclipse -DdownloadSources=true

run the whole core test suite (no huge tests, no libary tests)

mvn test -U -P without-library-and-huge-tests -Dgpg.skip -Djava.awt.headless=true

check dependencies for known security problems

mvn dependency-check:check

Contributing

I welcome contributions, especially in the form of pull requests. Please try to keep your pull requests small (don't bundle unrelated changes) and try to include test cases.

Some insights

HtmlUnit at openhub

Stargazers

Stargazers

nhtmlunit's People

Contributors

alexanderstromer avatar asbjornu avatar beewarloc avatar dependabot[bot] avatar jonanders avatar timorzadir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nhtmlunit's Issues

Azure: Address family not supported by protocol family: connect

Hi,

I'm trying to get NHtmlUnit working to do some basic web scraping on a website. It works fine on the local end, but when deployed onto a server (Windows Azure) I get the following error

'Address family not supported by protocol family: connect'

when it tries to execute .getPage on a WebClient.

Any idea why?

Greatly appreciate any help, thanks.

[NHtmlUnit] - Too slow when enable javascript

With the same browser setting, HtmlUnit took 8 seconds to navigate, but with NHtmlUnit it took 220 seconds,
Any ideas? Am i wrong somewhere?

Here is my setting:

webClient = new WebClient(BrowserVersion.CHROME);
webClient.Options.JavaScriptEnabled = true;
webClient.Options.CssEnabled = false;
webClient.Options.AppletEnabled = false;
webClient.Options.Timeout = 30000;
webClient.Options.RedirectEnabled = true;
webClient.Options.ThrowExceptionOnFailingStatusCode = false;
webClient.Options.ThrowExceptionOnScriptError = false;

Navigate

page = WebBrowserHelper.Instance.navigate("http://vnraovat.net/register");

Thanks,

AlertHandler in NhtmlUnit

I am trying to use the AlertHandler as follows:

        java.util.List collectedAlerts = new ArrayList();
        webClient.WObj.setAlertHandler(new CollectingAlertHandler(collectedAlerts));

The problem I am having is that I getting this error in my Weclient Object –

webClient.AlertHandler' threw an exception of type 'System.InvalidOperationException
{"Operation is not valid due to the current state of the object."}

What am I missing.

Sam

[NHtmlUnit] - HtmlTextArea.SetText() not exist

In java, HtmlTextArea.SetText() will set content of textarea. But in .NET, there are no SetText() method in HtmlTextArea.
I am using NHtmlUnit 2.15, how can I set content of textarea?
Do I miss something?

Thanks,

Issue during OAuth 2.0 flow: DomNode has not been set for this SimpleScriptable

I've tried a simple piece of code that interacts with Uber's new OAuth 2.0 flow:

var url = "https://login.uber.com/oauth/authorize?response_type=code&client_id=MYCLIENTID&redirect_uri=http%3a%2f%2flocalhost%3a2497%2fLogin%2fCallback"
var webClient = new WebClient(BrowserVersion.CHROME);

webClient.Options.JavaScriptEnabled = true;
webClient.Options.ActiveXNative = true;
webClient.Options.CssEnabled = true;

var page = webClient.GetHtmlPage(url);

However, when it runs, I get the following exception:

An exception of type 'com.gargoylesoftware.htmlunit.ScriptException' occurred in HtmlUnit.dll but was not handled in user code

Additional information: DomNode has not been set for this SimpleScriptable: com.gargoylesoftware.htmlunit.javascript.host.html.HTMLFormElement

Based on a cursory look, it seems that this may be a problem with the underlying HtmlUnit assembly? I'd appreciate it if anyone could provide some input. Thanks!

Set up build on Travis

To have a publicly available build on Linux, we should set up a build on Travis. I would do this myself if I could, but only an administrator of @HtmlUnit are allowed to create builds, so we need help from @asashour or someone else with the rights to do it.

No creator found for com.gargoylesoftware.htmlunit.UnexpectedPage (JSON response)

I'm trying to write a functional test for an MVC5 + Web API project where I log in (via web forms), and then make a POST call directly to the API route. When I make the call to client.GetPage, I get an InvalidOperationException with this message: No creator found for com.gargoylesoftware.htmlunit.UnexpectedPage

This was reported against HtmlUnit itself on SO about five years ago.

var client = new WebClient(BrowserVersion.CHROME);
 var method = new HttpMethod(com.gargoylesoftware.htmlunit.HttpMethod.POST);
var json = JsonConvert.SerializeObject(inputObject);

var webRequest = new WebRequest(new java.net.URL("http://localhost/site/api/game"), new HttpMethod(method))
{
    RequestBody = "=" + json // WebAPI expects this. Don't ask me why.
};
var page = client.GetPage(webRequest);

I can see the call hit the server side (including the response object which comes back) -- it's not throwing an exception there, only on the client. The response is pure JSON:

{
    "Data": [
        {
            "Id": "00000000-0000-0000-0000-000000000000",
            "Name": "FT-FunctionalTestGame",
            "OwnerId": 3
        }
    ],
    "Errors": null
}

Difference with HtmlUnit while mathjax script processing

I try to process web pages with mathml using mathjax via nhtmlunit

In java (using htmlunit) everything is fine.
I use the next snippet:

WebClient webClient = new WebClient(BrowserVersion.CHROME);
HtmlPage page = webClient.getPage(new File(mathml).toURI().toURL());
webClient.waitForBackgroundJavaScript(300000 * 1000);
String pageAsXml = page.asXml();

and the next html - https://pastebin.com/hvJZmtnH
The result is quite good : https://pastebin.com/kHzrM3rJ

However it's not so good in C# (NHtmlUnit):

The code i use:

WebClient webClient = new WebClient(BrowserVersion.CHROME);
webClient.Options.JavaScriptEnabled = true;
webClient.Options.Timeout = 30000;
webClient.Options.RedirectEnabled = true;
HtmlPage page = webClient.GetHtmlPage(mathml);
webClient.WaitForBackgroundJavaScript(50000000);
string pageAsXml = page.AsXml();

The result: https://pastebin.com/E5uXfUZ1

As you can see the javascript wasn't processed.

Please confirm that it's impossible to process such scripts via NHtmlUnit or point me out how to fix the issue.

Right-clicking on a Telerik RadTreeNode

radtreenode

If I try to right-click on a Telerik RadTreeNode using NHtmlUnit, nothing happens. Please help. I can send you an example project if you send me the email to send it to.

I'm right-clicking on 'Amazing Weekend Package' in the attached screenshot. It normally brings up a context menu.

How to implement wrapped interfaces?

For example I want to implement an AttachmentHandler. Strightforwardly I tought I only need to implement IAttachmentHanlder, but it also seems I'll need to derive ObjectWrapper in order to implement WrappedObject. But ObjectWrapper has a one-argument c'tor.

So how should I implement such interfaces in my code?

    class AttachmentHandler : ObjectWrapper, NHtmlUnit.Attachment.IAttachmentHandler
    {
        public void HandleAttachment(IPage p)
        {
            Console.WriteLine("HAHAHAHAHA *****************************************************");
        }
    }

Set up build on AppVeyor

To have a publicly available build on Windows, we should set up a build on AppVeyor. I would do this myself if I could, but only an administrator of @HtmlUnit are allowed to create builds, so we need help from @asashour or someone else with the rights to do it.

Method not found: 'Void java.util.HashMap.__<clinit>()'.

System.MissingMethodException was unhandled by user code
HResult=-2146233069
Message=Method not found: 'Void java.util.HashMap.__()'.
Source=HtmlUnit
StackTrace:
at com.gargoylesoftware.htmlunit.Cache..ctor()
at com.gargoylesoftware.htmlunit.WebClient..ctor(BrowserVersion browserVersion)
at NHtmlUnit.WebClient..ctor(BrowserVersion browserVersion)
at [our_project].FWWebClient.ClientWrapper..ctor(BrowserVersion version) in [our_project]\FWWebClient\ClientWrapper.cs:line 14
at [our_project].FWWebClient.WebClientPool..ctor() in [our_project]\FWWebClient\WebClientPool.cs:line 16
at [our_project]...
InnerException:

This error occurs after IKVM.NET update from 7.4.5196 to at least 8.0.5449 (the next one).
Using this version of IKVM.NET is critical for us because otherwise NHtmlUnit doesn' work with HTTPS in some cases due to 'Could not generate DH keypair' error at com.gargoylesoftware.htmlunit.WebClient.download(WebWindow requestingWindow, String target, WebRequest request, Boolean isHashJump, String description). As far as we can see, this is due to an obsolete Java version, that is, IKVM.NET 7.4.5196 is too old. (Indeed, IKVM.NET 8.0.5449 is more than a year old).
Could you please include the IKVM 8.0.5449 support?

Thanks in advance,
Regards, L.B.

Exception debug information is almost useless

So I hit an exception while getting page using WebClient. But the exception information is nearly useless. I believe the .Net wrapper somehow wraps the exception but discarded the useful diagnostic information.

This severely limits the diagnosability. Please fix.

System.InvalidOperationException was unhandled by user code
  HResult=-2146233079
  Message=Operation is not valid due to the current state of the object.
  Source=NHtmlUnit
  StackTrace:
       at NHtmlUnit.ObjectWrapper.CreateWrapper[T](Object wrappedObject)
       at NHtmlUnit.WebClient.GetPage(String url)
       at TestProject.TestProject.TryDownloadOne(DownloadSpec spec, FlexUri uri, String& content) in c:\Users\myuser\Documents\Visual Studio 2013\Projects\TestProject\TestProject\TestProject.cs:line 101
       at TestProject.TestProject.DownloadAll() in c:\Users\myuser\Documents\Visual Studio 2013\Projects\TestProject\TestProject\TestProject.cs:line 52
       at TestProjectTestCases.UnitTest1.TestMethod1() in c:\Users\myuser\Documents\Visual Studio 2013\Projects\TestProject\TestProjectTestCases\UnitTest1.cs:line 19
  InnerException: 

Fix NuGet publishing

I have no idea why, but neither Travis nor AppVeyor is running our builds, so even though #37 is merged, tagged and released, there won't be an automatic push of the .nupkg to NuGet.org. We should just delete appveyor.yml and .travis.yml and set up auto-publishing to NuGet.org with GitHub Actions instead.

unable to connect website with TLS1.2 only

Hello,

I am facing error while connecting website running with TLS1.2 protocol only.

following is the error:

Received fatal alert: handshake_failure

at org.apache.http.impl.execchain.RetryExec.execute(HttpRoute route, HttpRequestWrapper request, HttpClientContext context, HttpExecutionAware execAware)

 at org.apache.http.impl.execchain.RedirectExec.execute(HttpRoute route, HttpRequestWrapper request, HttpClientContext context, HttpExecutionAware execAware)

 at org.apache.http.impl.client.InternalHttpClient.doExecute(HttpHost , HttpRequest , HttpContext )

 at org.apache.http.impl.client.CloseableHttpClient.execute(HttpHost target, HttpRequest request, HttpContext context)

 at com.gargoylesoftware.htmlunit.HttpWebConnection.getResponse(WebRequest request)

 at com.gargoylesoftware.htmlunit.WebClient.loadWebResponseFromWebConnection(WebRequest , Int32 )

 at com.gargoylesoftware.htmlunit.WebClient.loadWebResponse(WebRequest webRequest)

 at com.gargoylesoftware.htmlunit.WebClient.getPage(WebWindow webWindow, WebRequest webRequest)

 at com.gargoylesoftware.htmlunit.WebClient.getPage(URL url)

 at com.gargoylesoftware.htmlunit.WebClient.getPage(String url)

 at NHtmlUnit.WebClient.GetPage(String url)

 at NHtmlUnit.WebClient.GetHtmlPage(String url)

Project size and IKVM dependency...

Dear All

I like HtmlUnit, it’s fast once started and accurate. However my team use it for smoke testing and like to deploy the tools to test with the package. The IKVM makes this very heavy. Has there ever any thought into totally porting this project into .net code and not relying on the IKVM runtime and the supporting DLLs? Currently the project when built is 53 meg, weighty you would agree.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.