environment:
JAVA_OPTS: >-
-XX:+UseG1GC
-XX:MaxGCPauseMillis=200
-XX:+ParallelRefProcEnabled
-XX:+UseStringDeduplication
-XX:InitiatingHeapOccupancyPercent=45
-XX:G1ReservePercent=15
-Xms1024m
-Xmx3072m
-XX:MaxMetaspaceSize=256m
-XX:MaxDirectMemorySize=256m
-XX:+ExitOnOutOfMemoryError
-XX:G1HeapWastePercent=10
-XX:G1MixedGCCountTarget=4
deploy:
resources:
limits:
cpus: "4.2"
memory: 5.2G
reservations:
cpus: "2"
memory: 2.5G
healthcheck:
test: |
/bin/bash -c '
if ! timeout 55s wget --spider --no-verbose http://127.0.0.1:8090/yacysearch.html?query=exiguus; then
exit 1
fi
if ! timeout 55s yacy_search_server/bin/checkalive.sh; then
exit 1
fi
exit 0
'
interval: 120s
timeout: 60s
retries: 3
start_period: 240s
That's the smallest I got it running mostly stable and self-healing with a index size of +100GB. I also avoid to use crawling by the build in tasks and use the API and cron jobs for weekly feed importing, because I found out, that kind of crawling eats up less resources then the usual. All-Over, to much running crawlers, make retrieving search results slow.
For production use, I suggest to min. double the resources. If you do this, it becomes very stable.Thanks to pointing out kiwix. I'll give it a try.
> Thanks to pointing out kiwix. I'll give it a try.
I see YaCy works with ZIM files [0] packaged by Kiwix so this is great.
In theory if you run YaCy kiwix is not necessary but they do package already valuable sites likes Wikipedia, iFixit, archwiki, etc. [0] so you do not have the worry of your crawler to be blocked and have local copy anyway [1]. So a lot of bandwidth and headache saved.
- [0] https://github.com/yacy/yacy_search_server/tree/master/sourc...