Geo (Geo Replication)
Notes:
- Geo is part of GitLab Premium.
- Introduced in GitLab Enterprise Edition 8.9. We recommend you use it with at least GitLab Enterprise Edition 10.0 for basic Geo features, or latest version for a better experience.
- You should make sure that all nodes run the same GitLab version.
- Geo requires PostgreSQL 9.6 and Git 2.9 in addition to GitLab's usual minimum requirements
- Using Geo in combination with High Availability is considered GA in GitLab Enterprise Edition 10.4
Note: Geo changes significantly from release to release. Upgrades are supported and documented, but you should ensure that you're following the right version of the documentation for your installation! The best way to do this is to follow the documentation from the
/help
endpoint on your primary node, but you can also navigate to this page on GitLab.com and choose the appropriate release from thetags
dropdown, e.g.,v10.0.0-ee
.
Geo allows you to replicate your GitLab instance to other geographical locations as a read-only fully operational version.
Overview
If you have two or more teams geographically spread out, but your GitLab instance is in a single location, fetching large repositories can take a long time.
Your Geo instance can be used for cloning and fetching projects, in addition to reading any data. This will make working with large repositories over large distances much faster.
When Geo is enabled, we refer to your original instance as a primary node and the replicated read-only ones as secondaries.
Keep in mind that:
- Secondaries talk to the primary to get user data for logins (API) and to replicate repositories, LFS Objects and Attachments (HTTPS + JWT).
- Since GitLab Premium 10.0, the primary no longer talks to secondaries to notify for changes (API).
Use-cases
- Can be used for cloning and fetching projects, in addition to reading any data available in the GitLab web interface (see current limitations)
- Overcomes slow connection between distant offices, saving time by improving speed for distributed teams
- Helps reducing the loading time for automated tasks, custom integrations and internal workflows
- Quickly failover to a Geo secondary in a Disaster Recovery scenario
- Allows planned failover to a Geo secondary
Architecture
The following diagram illustrates the underlying architecture of Geo (source diagram).
In this diagram, there is one Geo primary node and one secondary. The secondary clones repositories via git over HTTPS. Attachments, LFS objects, and other files are downloaded via HTTPS using the GitLab API to authenticate, with a special endpoint protected by JWT.
Writes to the database and Git repositories can only be performed on the Geo primary node. The secondary node receives database updates via PostgreSQL streaming replication.
Note that the secondary needs two different PostgreSQL databases: a read-only instance that streams data from the main GitLab database and another used internally by the secondary node to record what data has been replicated.
In the secondary nodes there is an additional daemon: Geo Log Cursor.
Geo Recommendations
We highly recommend that you install Geo on an operating system that supports OpenSSH 6.9 or higher. The following operating systems are known to ship with a current version of OpenSSH:
* CentOS 7.4
* Ubuntu 16.04
Note that CentOS 6 and 7.0 ship with an old version of OpenSSH that do not support a feature that Geo requires. See the documentation on Geo SSH access for more details.
LDAP
We recommend that if you use LDAP on your primary that you also set up a secondary LDAP server for the secondary Geo node. Otherwise, users will not be able to perform Git operations over HTTP(s) on the secondary Geo node using HTTP Basic Authentication. However, Git via SSH and personal access tokens will still work.
Check with your LDAP provider for instructions on on how to set up replication. For example, OpenLDAP provides these instructions.
Geo Tracking Database
We use the tracking database as metadata to control what needs to be updated on the disk of the local instance (for example, download new assets, fetch new LFS Objects or fetch changes from a repository that has recently been updated).
Because the replicated instance is read-only, we need this additional instance per secondary location.
Geo Log Cursor
This daemon reads a log of events replicated by the primary node to the secondary database and updates the Geo Tracking Database with changes that need to be executed.
When something is marked to be updated in the tracking database, asynchronous jobs running on the secondary node will execute the required operations and update the state.
This new architecture allows us to be resilient to connectivity issues between the nodes. It doesn't matter if it was just a few minutes or days. The secondary instance will be able to replay all the events in the correct order and get in sync again.
Setup instructions
These instructions assume you have a working instance of GitLab. They will guide you through making your existing instance the primary Geo node and adding secondary Geo nodes.
The steps below should be followed in the order they appear. Make sure the GitLab version is the same on all nodes.
Using Omnibus GitLab
If you installed GitLab using the Omnibus packages (highly recommended):
- Install GitLab Enterprise Edition on the server that will serve as the secondary Geo node. Do not create an account or login to the new secondary node.
- Upload the GitLab License on the primary Geo node to unlock Geo.
-
Setup the database replication (
primary (read-write) <-> secondary (read-only)
topology). - Configure fast lookup of authorized SSH keys in the database, this step is required and needs to be done on both the primary AND secondary nodes.
- Configure GitLab to set the primary and secondary nodes.
- Optional: Configure a secondary LDAP server for the secondary. See notes on LDAP.
- Follow the "Using a Geo Server" guide.
Using GitLab installed from source
If you installed GitLab from source:
- Install GitLab Enterprise Edition on the server that will serve as the secondary Geo node. Do not create an account or login to the new secondary node.
- Upload the GitLab License on the primary Geo node to unlock Geo.
-
Setup the database replication (
primary (read-write) <-> secondary (read-only)
topology). - Configure fast lookup of authorized SSH keys in the database, do this step for both primary AND secondary nodes.
- Configure GitLab to set the primary and secondary nodes.
- Follow the "Using a Geo Server" guide.
Configuring Geo
Read through the Geo configuration documentation.
Updating the Geo nodes
Read how to update your Geo nodes to the latest GitLab version.
Configuring Geo HA
Read through the Geo High Availability documentation.
Configuring Geo with Object storage
When you have object storage enabled, please consult the Geo with Object Storage documentation.
Disaster Recovery
Read through the Disaster Recovery documentation how to use Geo to mitigate data-loss and restore services in a disaster scenario.
Replicating the Container Registry
Read how to replicate the Container Registry.
Current limitations
IMPORTANT: This list of limitations tracks only the latest version. If you are in an older version, extra limitations may be in place.
- You cannot push code to secondary nodes, see gitlab-org/gitlab-ee#3912 for details.
- The primary node has to be online for OAuth login to happen (existing sessions and Git are not affected)
- It works for repos, wikis, issues, merge requests, file attachments, artifacts and job logs but it does not work for,
GitLab Pages, and Docker images of the Container Registry (by default, but you can configure it separately,
see replicate the Container Registry for details).
- The installation takes multiple manual steps that together can take about an hour depending on circumstances; we are working on improving this experience, see gitlab-org/omnibus-gitlab#2978 for details.
- Real-time updates of issues/merge requests (e.g. via long polling) doesn't work on the secondary
- Broadcast messages set on the primary won't be seen on the secondary without a cache flush (e.g. gitlab-rake cache:clear)
Frequently Asked Questions
Read more in the Geo FAQ.
Log files
Since GitLab 9.5, Geo stores structured log messages in a geo.log
file. For
Omnibus installations, this file can be found in
/var/log/gitlab/gitlab-rails/geo.log
. This file contains information about
when Geo attempts to sync repositories and files. Each line in the file contains a
separate JSON entry that can be ingested into Elasticsearch, Splunk, etc. For
example:
{"severity":"INFO","time":"2017-08-06T05:40:16.104Z","message":"Repository update","project_id":1,"source":"repository","resync_repository":true,"resync_wiki":true,"class":"Gitlab::Geo::LogCursor::Daemon","cursor_delay_s":0.038}
This message shows that Geo detected that a repository update was needed for project 1.
Security of Geo
Read the security review page.
Tuning Geo
Read the Geo tuning documentation.
Troubleshooting
Read the troubleshooting document.