In my embedded Selenium/PhantomJSDriver driver it seems resources are not being cleaned up. Running the client synchronously causes millions of open files and eventually throws a "Too many files open" type exception.
Here is some output I gathered from lsof while the program is running for ~1 minute
$ lsof | awk '{ print $2; }' | uniq -c | sort -rn | head
1221966 12180
34790 29773
31260 12138
20955 8414
17940 10343
16665 32332
9512 27713
7275 19226
5496 7153
5040 14065
$ lsof -p 12180 | awk '{ print $2; }' | uniq -c | sort -rn | head
2859 12180
1 PID
$ lsof -p 12180 -Fn | sort -rn | uniq -c | sort -rn | head
1124 npipe
536 nanon_inode
4 nsocket
3 n/opt/jdk/jdk1.8.0_60/jre/lib/jce.jar
3 n/opt/jdk/jdk1.8.0_60/jre/lib/charsets.jar
3 n/dev/urandom
3 n/dev/random
3 n/dev/pts/20
2 n/usr/share/sbt-launcher-packaging/bin/sbt-launch.jar
2 n/usr/share/java/jayatana.jar
I don't understand why using the -p flag on lsof has a smaller result set. But it appears most of the entries are pipe and anon_inode.
The client is very simple at ~100 lines, and at the end of usage calls driver.close() and driver.quit(). I experimented with caching and reusing clients but it did not alleviate the open files
case class HeadlessClient(
country: String,
userAgent: String,
inheritSessionId: Option[Int] = None
) {
protected var numberOfRequests: Int = 0
protected val proxySessionId: Int = inheritSessionId.getOrElse(new Random().nextInt(Integer.MAX_VALUE))
protected val address = InetAddress.getByName("proxy.domain.com")
protected val host = address.getHostAddress
protected val login: String = HeadlessClient.username + proxySessionId
protected val windowSize = new org.openqa.selenium.Dimension(375, 667)
protected val (mobProxy, seleniumProxy) = {
val proxy = new BrowserMobProxyServer()
proxy.setTrustAllServers(true)
proxy.setChainedProxy(new InetSocketAddress(host, HeadlessClient.port))
proxy.chainedProxyAuthorization(login, HeadlessClient.password, AuthType.BASIC)
proxy.addLastHttpFilterFactory(new HttpFiltersSourceAdapter() {
override def filterRequest(originalRequest: HttpRequest): HttpFilters = {
new HttpFiltersAdapter(originalRequest) {
override def proxyToServerRequest(httpObject: HttpObject): io.netty.handler.codec.http.HttpResponse = {
httpObject match {
case req: HttpRequest => req.headers().remove(HttpHeaders.Names.VIA)
case _ =>
}
null
}
}
}
})
proxy.enableHarCaptureTypes(CaptureType.REQUEST_CONTENT, CaptureType.RESPONSE_CONTENT)
proxy.start(0)
val seleniumProxy = ClientUtil.createSeleniumProxy(proxy)
(proxy, seleniumProxy)
}
protected val driver: PhantomJSDriver = {
val capabilities: DesiredCapabilities = DesiredCapabilities.chrome()
val cliArgsCap = new util.ArrayList[String]
cliArgsCap.add("--webdriver-loglevel=NONE")
cliArgsCap.add("--ignore-ssl-errors=yes")
cliArgsCap.add("--load-images=no")
capabilities.setCapability(CapabilityType.PROXY, seleniumProxy)
capabilities.setCapability("phantomjs.page.customHeaders.Referer", "")
capabilities.setCapability("phantomjs.page.settings.userAgent", userAgent)
capabilities.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, cliArgsCap)
new PhantomJSDriver(capabilities)
}
driver.executePhantomJS(
"""
|var navigation = [];
|
|this.onNavigationRequested = function(url, type, willNavigate, main) {
| navigation.push(url)
| console.log('Trying to navigate to: ' + url);
|}
|
|this.onResourceRequested = function(request, net) {
| console.log("Requesting " + request.url);
| if (! (navigation.indexOf(request.url) > -1)) {
| console.log("Aborting " + request.url)
| net.abort();
| }
|};
""".stripMargin
)
driver.manage().window().setSize(windowSize)
def follow(url: String)(implicit ec: ExecutionContext): List[HarEntry] = {
try{
Await.result(Future{
mobProxy.newHar(url)
driver.get(url)
val entries = mobProxy.getHar.getLog.getEntries.asScala.toList
shutdown()
entries
}, 45.seconds)
} catch {
case e: Exception =>
try {
shutdown()
} catch {
case shutdown: Exception =>
throw new Exception(s"Error ${shutdown.getMessage} cleaning up after Exception: ${e.getMessage}")
}
throw e
}
}
def shutdown() = {
driver.close()
driver.quit()
}
}
I tried several versions of Selenium in case there was a bugfix. The build.sbt:
libraryDependencies += "org.seleniumhq.selenium" % "selenium-java" % "3.0.1"
libraryDependencies += "net.lightbody.bmp" % "browsermob-core" % "2.1.2"
Also, I tried PhantomJS 2.0.1, and 2.1.1:
$ phantomjs --version
2.0.1-development
$ phantomjs --version
2.1.1
Is this a PhantomJS or Selenium problem? Is my client using the API improperly?
The resource usage is caused by BrowserMob. To close the proxy and clean-up its resources, one must call stop().
For this client that means modifying the shutdown method
def shutdown() = {
mobProxy.stop()
driver.close()
driver.quit()
}
Another method, abort, offers immediate termination of the proxy server and does not wait for traffic to cease.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With